Patents Examined by Shreyans A Patel

Direction based end-pointing for speech recognition

Patent number: 11978478

Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

Type: Grant

Filed: March 13, 2023

Date of Patent: May 7, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Kenneth John Basye, Jeffrey Penrod Adams
Classification of documents

Patent number: 11977841

Abstract: An apparatus includes a display device that displays an input document in a user interface and at least one processor configured to receive a command to determine a document type of the input document and classify the input document to assign at least one document type and a respective confidence score. The processor assigns a significance score to each word of the input document that is indicative of a degree of influence the word has in deciding that the input document is of the at least one document type. The processor determines a level of visual emphasis to be placed on each word of the input document based on the significance score of the word and displays the input document on the display device with each word of the input document visually emphasized in accordance with the determined level of visual emphasis of the word.

Type: Grant

Filed: December 22, 2021

Date of Patent: May 7, 2024

Assignee: Bank of America Corporation

Inventors: Jeremy A. Geiman, Kongkuo Lu, Ron Papka
Automated patent language generation

Patent number: 11972225

Abstract: Methods, systems, and architectures for drafting a patent application are presented. The method comprises acquiring, at least one input, where the input is an image corresponding to a class of patent documents; encoding the image input via at least one first network; generating a set of vectors via the at least one first network, where the set of vectors corresponding to a partial representation of the image, derived from the at least one first network; decoding the set of vectors, based on a predetermined text corpus that corresponds to the class of patent documents, via the at least one second network; and obtaining the claim set via the at least one second network corresponding to the image.

Type: Grant

Filed: October 1, 2020

Date of Patent: April 30, 2024

Inventors: Shrey Pathak, Xin Gao
Contrastive Siamese network for semi-supervised speech recognition

Patent number: 11961515

Abstract: A method includes receiving a plurality of unlabeled audio samples corresponding to spoken utterances not paired with corresponding transcriptions. At a target branch of a contrastive Siamese network, the method also includes generating a sequence of encoder outputs for the plurality of unlabeled audio samples and modifying time characteristics of the encoder outputs to generate a sequence of target branch outputs. At an augmentation branch of a contrastive Siamese network, the method also includes performing augmentation on the unlabeled audio samples, generating a sequence of augmented encoder outputs for the augmented unlabeled audio samples, and generating predictions of the sequence of target branch outputs generated at the target branch. The method also includes determining an unsupervised loss term based on target branch outputs and predictions of the sequence of target branch outputs. The method also includes updating parameters of the audio encoder based on the unsupervised loss term.

Type: Grant

Filed: December 14, 2021

Date of Patent: April 16, 2024

Assignee: Google LLC

Inventors: Jaeyoung Kim, Soheil Khorram, Hasim Sak, Anshuman Tripathi, Han Lu, Qian Zhang
Fulfillment of actionable requests ahead of a user selecting a particular autocomplete suggestion for completing a current user input

Patent number: 11960837

Abstract: Implementations set forth herein relate to providing selectable autofill suggestions, which correspond to application actions that are at least partially fulfilled using server command data—prior to a user selecting a particular selectable autofill suggestion. Proactively fulfilling command data in this way mitigates latency between user selection of a suggestion and fulfillment of a particular action. Initially, a partial input can be processed to generate autofill suggestions, which can be communicated to a server device for further processing. The autofill suggestions can also be rendered for selection at a touch display interface, thereby allowing a user to select one of the autofill suggestions. As command fulfillment data is provided by the server, the command fulfillment data can be available to a corresponding application(s) in order that any corresponding actions can be at least partially fulfilled prior to user selection.

Type: Grant

Filed: January 13, 2023

Date of Patent: April 16, 2024

Assignee: GOOGLE LLC

Inventor: Keun Soo Yim
Method and apparatus for performing entity linking

Patent number: 11961010

Abstract: Provided is a method for performing entity linking between a surface entity mention in a surface text and entities of a knowledge graph, including supplying the surface text to a contextual text representation model, pooling contextual representations of the tokens of a surface entity mention in the surface text with contextual representations of the other tokens within the surface text to provide a contextual entity representation vector representing the surface entity mention; supplying an identifier of a candidate knowledge graph entity to a knowledge graph embedding model, to provide an entity node embedding vector and combining the contextual entity representation vector with the entity node embedding vector to generate an input vector applied to a fully connected layer which provides an unnormalized output transformed by a softmax function into a normalized output processed to classify whether the surface entity mention corresponds to the candidate knowledge graph entity.

Type: Grant

Filed: June 21, 2021

Date of Patent: April 16, 2024

Assignee: SIEMENS AKTIENGESELLSCHAFT

Inventors: Rakebul Muff Hasan, Ulugbek Peter Kodirov
Information processing device, information processing method, and program

Patent number: 11940896

Abstract: Erroneous ignitions of a process of a device caused by characteristics of speeches/behaviors of a user are efficiently prevented. Provided is an information processing device having: a notification control unit configured to notify a user of information about a candidate speech/behavior estimated to be suitable as a trigger for executing a predetermined process among a plurality of speeches/behaviors extractable from a behavior log of the user, wherein the notification control unit further notifies the user of an inquiry whether or not execution of the candidate speech/behavior estimated from the behavior log is to be applied as the trigger, and the candidate speech/behavior is estimated based on a number of times by which the speech/behavior is extracted from the behavior log.

Type: Grant

Filed: August 10, 2018

Date of Patent: March 26, 2024

Assignee: Sony Group Corporation

Inventors: Hideo Nagasaka, Kei Takahashi, Junichi Shimizu
Generating a synthetic voice using neural networks

Patent number: 11935515

Abstract: A method of generating a synthetic voice by capturing audio data, cutting it into discrete phoneme and pitch segments, forming superior phoneme and pitch segments by averaging segments having similar phoneme, pitch, and other sound qualities, and training neural networks to correctly concatenate the segments.

Type: Grant

Filed: December 27, 2021

Date of Patent: March 19, 2024

Inventor: Claude Polonov
Systems and methods for adapting human speaker embeddings in speech synthesis

Patent number: 11929058

Abstract: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.

Type: Grant

Filed: August 18, 2020

Date of Patent: March 12, 2024

Assignee: DOLBY LABORATORIES LICENSING CORPORATION

Inventors: Cong Zhou, Xiaoyu Liu, Michael Getty Horgan, Vivek Kumar
Fulfillment of actionable requests ahead of a user selecting a particular autocomplete suggestion for completing a current user input

Patent number: 11922119

Abstract: Implementations set forth herein relate to providing selectable autofill suggestions, which correspond to application actions that are at least partially fulfilled using server command data—prior to a user selecting a particular selectable autofill suggestion. Proactively fulfilling command data in this way mitigates latency between user selection of a suggestion and fulfillment of a particular action. Initially, a partial input can be processed to generate autofill suggestions, which can be communicated to a server device for further processing. The autofill suggestions can also be rendered for selection at a touch display interface, thereby allowing a user to select one of the autofill suggestions. As command fulfillment data is provided by the server, the command fulfillment data can be available to a corresponding application(s) in order that any corresponding actions can be at least partially fulfilled prior to user selection.

Type: Grant

Filed: January 13, 2023

Date of Patent: March 5, 2024

Assignee: GOOGLE LLC

Inventor: Keun Soo Yim
Access to multiple virtual assistants

Patent number: 11922938

Abstract: A multi-assistant speech-processing system that centrally determines multiple execution plans to respond to a user input. A central component determines whether a particular input should be processed using a requested assistant or a different assistant or should be terminated. Assistant handoff may be determined based on system policies as well as user input-specific data. A ranked list of execution options may be supplemented by augmented data corresponding to messages to a user. The system may attempt to execute plans in the ranked order until a plan succeeds.

Type: Grant

Filed: November 22, 2021

Date of Patent: March 5, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Yaser Khan, Piyush Kandpal, Ritesh Patel, Mark Lawrence, Srinivas Palla, Ashish Rangole, Jason Wang
Multilingual neural text-to-speech synthesis

Patent number: 11922924

Abstract: Method and apparatus for generating speech through multilingual neural text-to-speech (TTS) synthesis are provided in the present disclosure. A text input in at least a first language may be received. Speaker latent space information of a target speaker may be provided through a speaker encoder. Language latent space information of a second language may be provided through a language encoder. At least one acoustic feature may be generated, through an acoustic feature predictor, based on the text input, the speaker latent space information and the language latent space information of the second language. A speech waveform corresponding to the text input may be generated, through a neural vocoder, based on the at least one acoustic feature.

Type: Grant

Filed: May 21, 2020

Date of Patent: March 5, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jingzhou Yang, Lei He
Outcome-oriented dialogs on a speech recognition platform

Patent number: 11915707

Abstract: A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform multiple actions corresponding to this intent. The platform may select a target action to perform, and may engage in a back-and-forth dialog to obtain information for completing the target action. The action may include streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user.

Type: Grant

Filed: June 14, 2021

Date of Patent: February 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jeff Bradley Beal, Kevin Robert Charter, Ajay Gopalakrishnan, Sumedha Arvind Kshirsagar, Nishant Kumar
Voice adaptation using synthetic speech processing

Patent number: 11915683

Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

Type: Grant

Filed: February 14, 2022

Date of Patent: February 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
Speech synthesis utilizing audio waveform difference signal(s)

Patent number: 11915682

Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.

Type: Grant

Filed: May 20, 2019

Date of Patent: February 27, 2024

Assignee: DeepMind Technologies Limited

Inventors: Luis Carlos Cobo Rus, Nal Kalchbrenner, Erich Elsen, Chenjie Gu
Automatic speaker identification using speech recognition features

Patent number: 11900948

Abstract: Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.

Type: Grant

Filed: January 7, 2022

Date of Patent: February 13, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Hugh Evan Secker-Walker, Baiyang Liu, Frederick Victor Weber
Compressor target curve to avoid boosting noise

Patent number: 11894006

Abstract: The processing of audio signals during playback is provided, so that audio signals that fall below a specified threshold loudness level are processed to avoid making unwanted background noise audible. N-channel audio is received from a playback volume controller/leveler (101). The level of the audio is compared with a threshold level. If the level is greater than the threshold level, the audio is processed with a first amount of gain in accordance with a first dynamic range control (DRC) compression curve that is tuned for professionally produced audio. If the level is less than or equal to the threshold level, the audio is processed with a second amount of gain in accordance with a second DRC compression curve that is designed to avoid boosting unwanted background noise. After applying the gain to the audio, the audio is sent to a downstream device.

Type: Grant

Filed: July 18, 2019

Date of Patent: February 6, 2024

Assignee: DOLBY LABORATORIES LICENSING CORPORATION

Inventors: Zhongjin Wang, Andrew Peter Reilly, Michael William Mason
Media playback system with concurrent voice assistance

Patent number: 11893308

Abstract: Example techniques involve invoking voice assistance for a media playback system. In some embodiments, a NMD stores in memory a set of command information comprising a listing of playback commands and associated command criteria. The NMD captures a voice input and detects inclusion, within the voice input, of one or more particular playback commands from among the playback commands in the listing. In response, the NMD selects a local voice assistant that supports (a) one or more additional playback commands relative to a cloud-based VAS and (b) fewer non-playback commands relative to the cloud-based VAS, determines, via the local voice assistant, an intent in the captured voice input, and performs a response to the determined intent. The NMD foregoes selection of the cloud-based VAS when the local voice assistant is selected.

Type: Grant

Filed: March 28, 2022

Date of Patent: February 6, 2024

Assignee: Sonos, Inc.

Inventors: Dayn Wilberding, John Tolomei
Biasing voice correction suggestions

Patent number: 11881207

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the method includes receiving a voice input from a user device; generating a recognition output; receiving a user selection of one or more terms in the recognition output; receiving a user input of one or more letters replacing the user selected one or more terms; determining suggested correction candidates based in part on the user input and the voice input; and providing one or more suggested correction candidates to the user device as suggested corrected recognition outputs.

Type: Grant

Filed: March 23, 2022

Date of Patent: January 23, 2024

Assignee: Google LLC

Inventors: Evgeny A. Cherepanov, Jakob Nicolaus Foerster, Vikram Sridar, Ishai Rabinovitz, Omer Tabach
Input device activation noise suppression

Patent number: 11875811

Abstract: A method includes receiving sound input features representative of sound received during an electronic conference, the sound including voice and input device activation sound, receiving an input event feature indicative of the input device activation, and processing the received sound input features and input event feature via a trained model to identify a stored spectral file to be subtracted from the received sound to suppress the input device activation sound.

Type: Grant

Filed: December 9, 2021

Date of Patent: January 16, 2024

Assignee: Lenovo (United States) Inc.

Inventors: Scott Wentao Li, Robert J. Kapinos, Robert James Norton, Jr., Russell Speight Vanblon

1 2 3 4 5 … next