Patents Examined by Douglas Godbold
-
Patent number: 11797782Abstract: A cross-lingual voice conversion system and method comprises a voice feature extractor configured to receive a first voice audio segment in a first language and a second voice audio segment in a second language, and extract, respectively, audio features comprising first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features. One or more generators are configured to receive extracted features, and produce therefrom a third voice candidate keeping the first-voice, speaker-dependent acoustic features and the second-voice, speaker-independent linguistic features, wherein the third voice candidate speaks the second language. One or more discriminators are configured to compare the third voice candidate with the ground truth data, and provide results of the comparison back to the generator for refining the third voice candidate.Type: GrantFiled: December 30, 2020Date of Patent: October 24, 2023Assignee: TMRW Foundation IP S. À R.L.Inventor: Cevat Yerli
-
Patent number: 11798566Abstract: The present disclosure discloses a data transmission method performed by a computer device and a non-transitory computer-readable storage medium. According to the present disclosure, voice criticality analysis is performed on a to-be-transmitted audio to obtain a criticality level of each to-be-transmitted audio frame in the to-be-transmitted audio, and a corrected redundancy multiple of each to-be-transmitted audio frame is obtained according to a current redundancy multiple and a redundant transmission factor corresponding to the criticality level of each to-be-transmitted audio frame. Therefore, each to-be-transmitted audio frame is duplicated according to a corrected redundancy multiple of each to-be-transmitted audio frame, to obtain at least one redundancy data packet, and the at least one redundancy data packet is transmitted to a target terminal, which can improve the network anti-packet loss effect without causing network congestion.Type: GrantFiled: October 28, 2021Date of Patent: October 24, 2023Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventor: Junbin Liang
-
Patent number: 11798547Abstract: A voice activated device for interaction with a digital assistant is provided. The device comprises a housing, one or more processors, and memory, the memory coupled to the one or more processors and comprising instructions for automatically identifying and connecting to a digital assistant server. The device further comprises a power supply, a wireless network module, and a human-machine interface. The human-machine interface consists essentially of: at least one speaker, at least one microphone, an ADC coupled to the microphone, a DAC coupled to the at least one speaker, and zero or more additional components selected from the set consisting of: a touch-sensitive surface, one or more cameras, and one or more LEDs. The device is configured to act as an interface for speech communications between the user and a digital assistant of the user on the digital assistant server.Type: GrantFiled: August 6, 2020Date of Patent: October 24, 2023Assignee: Apple Inc.Inventor: Kevin Milden
-
Patent number: 11797773Abstract: Navigating text using an extended discourse tree. In an example, a method accesses an extended discourse tree that includes a first discourse tree for a first document and a second discourse tree for a second document. The method determines a first elementary discourse unit that is responsive to a query from a user device and a corresponding first position. The method further determines a set of navigation options including a first rhetorical relationship between the first elementary discourse unit and a second elementary discourse unit of the first discourse tree and a second rhetorical relationship between the first elementary discourse unit and a third elementary discourse unit of the second discourse tree. The method presents the rhetorical relationships to a user device. Responsive to receiving, from a user device, a selection of a rhetorical relationship, the method presents a corresponding elementary discourse unit to the user device.Type: GrantFiled: February 24, 2022Date of Patent: October 24, 2023Assignee: Oracle International CorporationInventor: Boris Galitsky
-
Patent number: 11790933Abstract: Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.Type: GrantFiled: March 31, 2020Date of Patent: October 17, 2023Assignee: Verizon Patent and Licensing Inc.Inventors: Peter F. Kocks, Guoning Hu, Ping-Hao Wu
-
Patent number: 11790171Abstract: A natural language understanding method begins with a radiological report text containing clinical findings. Errors in the text are corrected by analyzing character-level optical transformation costs weighted by a frequency analysis over a corpus corresponding to the report text. For each word within the report text, a word embedding is obtained, character-level embeddings are determined, and the word and character-level embeddings are concatenated to a neural network which generates a plurality of NER tagged spans for the report text. A set of linked relationships are calculated for the NER tagged spans by generating masked text sequences based on the report text and determined pairs of potentially linked NER spans. A dense adjacency matrix is calculated based on attention weights obtained from providing the one or more masked text sequences to a Transformer deep learning network, and graph convolutions are then performed over the calculated dense adjacency matrix.Type: GrantFiled: April 15, 2020Date of Patent: October 17, 2023Assignee: Covera HealthInventors: Ron Vianu, W. Nathaniel Brown, Gregory Allen Dubbin, Daniel Robert Elgort, Benjamin L. Odry, Benjamin Sellman Suutari, Jefferson Chen
-
Patent number: 11783841Abstract: A method and system for secure speaker authentication between a caller device and a first device using an authentication server are provided. The system comprises extracting features into a feature matrix from an incoming audio call; generating a partial i-vector, wherein the partial i-vector includes a first low-order statistic; sending the partial i-vector to the authentication server; and receiving from the authentication server a match score generated based on a full i-vector and another i-vector being stored on the authentication server, wherein the full i-vector is generated from the partial i-vector.Type: GrantFiled: March 15, 2021Date of Patent: October 10, 2023Assignee: ILLUMA LABS INC.Inventor: Milind Borkar
-
Patent number: 11775778Abstract: Embodiments of the disclosed technologies incorporate taxonomy information into a cross-lingual entity graph and input the taxonomy-informed cross-lingual entity graph into a graph neural network. The graph neural network computes semantic alignment scores for node pairs. The semantic alignment scores are used to determine whether a node pair represents a valid machine translation.Type: GrantFiled: November 5, 2020Date of Patent: October 3, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Zhuliu Li, Xiao Yan, Yiming Wang, Jaewon Yang
-
Patent number: 11776549Abstract: Techniques are described herein for multi-factor audio watermarking. A method includes: receiving audio data; processing the audio data to generate predicted output that indicates a probability of one or more hotwords being present in the audio data; determining that the predicted output satisfies a threshold that is indicative of the one or more hotwords being present in the audio data; in response to determining that the predicted output satisfies the threshold, processing the audio data using automatic speech recognition to generate a speech transcription feature; detecting a watermark that is embedded in the audio data; and in response to detecting the watermark: determining that the speech transcription feature corresponds to one of a plurality of stored speech transcription features; and in response to determining that the speech transcription feature corresponds to one of the plurality of stored speech transcription features, suppressing processing of a query included in the audio data.Type: GrantFiled: December 7, 2020Date of Patent: October 3, 2023Assignee: GOOGLE LLCInventors: Aleks Kracun, Matthew Sharifi
-
Patent number: 11775773Abstract: A virtual assistant server determines at least one user intent based on an analysis of a received conversational user input. One or more of a plurality of views is identified based on the at least one user intent. Further, the virtual assistant server retrieves content based on the at least one user intent or the identified one or more views. The virtual assistant server determines one of a plurality of graphical user interface layers to display for each of one or more parts of the content and the identified one or more views based at least on one or more factors related to the content. Subsequently, the virtual assistant server outputs instructions based on the determined one of the graphical user interface layers in response to the received conversational user input.Type: GrantFiled: December 15, 2020Date of Patent: October 3, 2023Assignee: KORE.AI, INC.Inventors: Rajkumar Koneru, Prasanna Kumar Arikala Gunalan
-
Patent number: 11769480Abstract: The present disclosure discloses a method and apparatus for training a model, a method and apparatus for synthesizing a speech, a device and a storage medium, and relates to the field of natural language processing and deep learning technology. The method for training a model may include: determining a phoneme feature and a prosodic word boundary feature of sample text data; inserting a pause character into the phoneme feature according to the prosodic word boundary feature to obtain a combined feature of the sample text data; and training an initial speech synthesis model according to the combined feature of the sample text data, to obtain a target speech synthesis model.Type: GrantFiled: December 3, 2020Date of Patent: September 26, 2023Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.Inventors: Zhengkun Gao, Junteng Zhang, Wenfu Wang, Tao Sun
-
Patent number: 11755849Abstract: The present disclosure provides an information switching method. The method includes: obtaining tilting information after an tilt direction of a device changes; searching a pre-set tilt direction matching the tilting information and determining pre-set information corresponding to the matched pre-set tilt direction; and switching first input information of the device to second input information, where the second input information is determined based on the pre-set information matching the pre-set tilt direction.Type: GrantFiled: November 27, 2019Date of Patent: September 12, 2023Assignee: BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO., LTD.Inventor: Hailei Ma
-
Patent number: 11749297Abstract: A voice quality estimation apparatus includes: a packet sequence creation unit configured to create a first sequence by applying a first characteristic indicating that quality degradation caused by packet loss is perceived by a user all at once, to a sequence consisting of elements each indicating whether or not a packet of a voice call has been lost; a smoothing unit configured to create a second sequence from the first sequence; a degradation amount emphasis unit configured to create a third sequence from the second sequence; a packet loss tolerance characteristics reflection unit configured to create a fourth sequence from the third sequence; a degradation amount calculation unit configured to calculate a degradation amount from the fourth sequence; and a listening quality estimation unit configured to estimate voice quality that is to be experienced by the user, from the degradation amount.Type: GrantFiled: February 13, 2020Date of Patent: September 5, 2023Assignee: Nippon Telegraph and Telephone CorporationInventors: Hitoshi Aoki, Atsuko Kurashima, Ginga Kawaguchi
-
Patent number: 11749275Abstract: Systems and processes for application integration with a digital assistant are provided. In accordance with one example, a method includes, at an electronic device having one or more processors and memory, receiving a natural-language user input; identifying, with the one or more processors, an intent object of a set of intent objects and a parameter associated with the intent, where the intent object and the parameter are derived from the natural-language user input. The method further includes identifying a software application associated with the intent object of the set of intent objects; and providing the intent object and the parameter to the software application.Type: GrantFiled: October 8, 2021Date of Patent: September 5, 2023Assignee: Apple Inc.Inventors: Robert A. Walker, II, Brandon J. Newendorp, Rohit Dasari, Richard D. Giuli, Thomas R. Gruber, Carey E. Radebaugh, Ashish Garg, Vineet Khosla, Jonathan H. Russell, Corey Peterson
-
Patent number: 11741955Abstract: A method to select a response in a multi-turn conversation between a user and a conversational bot. The conversation is composed of a set of events, wherein an event is a linear sequence of observations that are user speech or physical actions. Queries are processed against a set of conversations that are organized as a set of inter-related data tables, with events and observations stored in distinct tables. As the multi-turn conversation proceeds, a data model comprising an observation history, together with a hierarchy of events determined to represent the conversation up to at least one turn, is persisted. When a new input (speech or physical action) is received, it is classified using a statistical model to generate a result. The result is then mapped to an observation in the data model. Using the mapped observation, a look-up is performed into the data tables to retrieve a possible response.Type: GrantFiled: February 22, 2021Date of Patent: August 29, 2023Assignee: Drift.com, Inc.Inventors: Jeffrey D. Orkin, Christopher M. Ward
-
Patent number: 11735183Abstract: This disclosure relates generally to optically switchable devices, and more particularly, to methods for controlling optically switchable devices. In various embodiments, one or more optically switchable devices may be controlled via voice control and/or gesture control.Type: GrantFiled: February 22, 2021Date of Patent: August 22, 2023Assignee: View, Inc.Inventors: Dhairya Shrivastava, Mark D. Mendenhall
-
Patent number: 11735196Abstract: Described are an encoder for coding speech-like content and/or general audio content, wherein the encoder is configured to embed, at least in some frames, parameters in a bitstream, which parameters enhance a concealment in case an original frame is lost, corrupted or delayed, and a decoder for decoding speech-like content and/or general audio content, wherein the decoder is configured to use parameters which are sent later in time to enhance a concealment in case an original frame is lost, corrupted or delayed, as well as a method for encoding and a method for decoding.Type: GrantFiled: December 18, 2020Date of Patent: August 22, 2023Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Jérémie Lecomte, Benjamin Schubert, Michael Schnabel, Martin Dietz
-
Patent number: 11721318Abstract: A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.Type: GrantFiled: October 14, 2021Date of Patent: August 8, 2023Assignee: TENCENT AMERICA LLCInventors: Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
-
Patent number: 11710482Abstract: Systems and processes for operating a virtual assistant to provide natural assistant interaction are provided. In accordance with one or more examples, a method includes, at an electronic device with one or more processors and memory: receiving a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant. If at least one candidate text representation is to be disregarded, one or more candidate intents are generated based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation.Type: GrantFiled: October 8, 2020Date of Patent: July 25, 2023Assignee: Apple Inc.Inventors: Juan Carlos Garcia, Paul S. McCarthy, Kurt Piersol
-
Patent number: 11706568Abstract: Devices, systems and processes for providing an adaptive audio environment are disclosed. For an embodiment, a system may include a wearable device and a hub. The hub may include an interface module configured to communicatively couple the wearable device and the hub and a processor, configured to execute non-transient computer executable instructions for a machine learning engine configured to apply a first machine learning process to at least one data packet received from the wearable device and output an action-reaction data set and for a sounds engine configured to apply a sound adapting process to the action-reaction data set and provide audio output data to the wearable device via the interface module.Type: GrantFiled: November 1, 2021Date of Patent: July 18, 2023Assignee: DISH Network L.L.C.Inventors: Rima Shah, Nicholas Brandon Newell