Patents Examined by Shreyans A Patel
  • Patent number: 11978478
    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.
    Type: Grant
    Filed: March 13, 2023
    Date of Patent: May 7, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Kenneth John Basye, Jeffrey Penrod Adams
  • Patent number: 11977841
    Abstract: An apparatus includes a display device that displays an input document in a user interface and at least one processor configured to receive a command to determine a document type of the input document and classify the input document to assign at least one document type and a respective confidence score. The processor assigns a significance score to each word of the input document that is indicative of a degree of influence the word has in deciding that the input document is of the at least one document type. The processor determines a level of visual emphasis to be placed on each word of the input document based on the significance score of the word and displays the input document on the display device with each word of the input document visually emphasized in accordance with the determined level of visual emphasis of the word.
    Type: Grant
    Filed: December 22, 2021
    Date of Patent: May 7, 2024
    Assignee: Bank of America Corporation
    Inventors: Jeremy A. Geiman, Kongkuo Lu, Ron Papka
  • Patent number: 11972225
    Abstract: Methods, systems, and architectures for drafting a patent application are presented. The method comprises acquiring, at least one input, where the input is an image corresponding to a class of patent documents; encoding the image input via at least one first network; generating a set of vectors via the at least one first network, where the set of vectors corresponding to a partial representation of the image, derived from the at least one first network; decoding the set of vectors, based on a predetermined text corpus that corresponds to the class of patent documents, via the at least one second network; and obtaining the claim set via the at least one second network corresponding to the image.
    Type: Grant
    Filed: October 1, 2020
    Date of Patent: April 30, 2024
    Inventors: Shrey Pathak, Xin Gao
  • Patent number: 11961515
    Abstract: A method includes receiving a plurality of unlabeled audio samples corresponding to spoken utterances not paired with corresponding transcriptions. At a target branch of a contrastive Siamese network, the method also includes generating a sequence of encoder outputs for the plurality of unlabeled audio samples and modifying time characteristics of the encoder outputs to generate a sequence of target branch outputs. At an augmentation branch of a contrastive Siamese network, the method also includes performing augmentation on the unlabeled audio samples, generating a sequence of augmented encoder outputs for the augmented unlabeled audio samples, and generating predictions of the sequence of target branch outputs generated at the target branch. The method also includes determining an unsupervised loss term based on target branch outputs and predictions of the sequence of target branch outputs. The method also includes updating parameters of the audio encoder based on the unsupervised loss term.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: April 16, 2024
    Assignee: Google LLC
    Inventors: Jaeyoung Kim, Soheil Khorram, Hasim Sak, Anshuman Tripathi, Han Lu, Qian Zhang
  • Patent number: 11960837
    Abstract: Implementations set forth herein relate to providing selectable autofill suggestions, which correspond to application actions that are at least partially fulfilled using server command data—prior to a user selecting a particular selectable autofill suggestion. Proactively fulfilling command data in this way mitigates latency between user selection of a suggestion and fulfillment of a particular action. Initially, a partial input can be processed to generate autofill suggestions, which can be communicated to a server device for further processing. The autofill suggestions can also be rendered for selection at a touch display interface, thereby allowing a user to select one of the autofill suggestions. As command fulfillment data is provided by the server, the command fulfillment data can be available to a corresponding application(s) in order that any corresponding actions can be at least partially fulfilled prior to user selection.
    Type: Grant
    Filed: January 13, 2023
    Date of Patent: April 16, 2024
    Assignee: GOOGLE LLC
    Inventor: Keun Soo Yim
  • Patent number: 11961010
    Abstract: Provided is a method for performing entity linking between a surface entity mention in a surface text and entities of a knowledge graph, including supplying the surface text to a contextual text representation model, pooling contextual representations of the tokens of a surface entity mention in the surface text with contextual representations of the other tokens within the surface text to provide a contextual entity representation vector representing the surface entity mention; supplying an identifier of a candidate knowledge graph entity to a knowledge graph embedding model, to provide an entity node embedding vector and combining the contextual entity representation vector with the entity node embedding vector to generate an input vector applied to a fully connected layer which provides an unnormalized output transformed by a softmax function into a normalized output processed to classify whether the surface entity mention corresponds to the candidate knowledge graph entity.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: April 16, 2024
    Assignee: SIEMENS AKTIENGESELLSCHAFT
    Inventors: Rakebul Muff Hasan, Ulugbek Peter Kodirov
  • Patent number: 11940896
    Abstract: Erroneous ignitions of a process of a device caused by characteristics of speeches/behaviors of a user are efficiently prevented. Provided is an information processing device having: a notification control unit configured to notify a user of information about a candidate speech/behavior estimated to be suitable as a trigger for executing a predetermined process among a plurality of speeches/behaviors extractable from a behavior log of the user, wherein the notification control unit further notifies the user of an inquiry whether or not execution of the candidate speech/behavior estimated from the behavior log is to be applied as the trigger, and the candidate speech/behavior is estimated based on a number of times by which the speech/behavior is extracted from the behavior log.
    Type: Grant
    Filed: August 10, 2018
    Date of Patent: March 26, 2024
    Assignee: Sony Group Corporation
    Inventors: Hideo Nagasaka, Kei Takahashi, Junichi Shimizu
  • Patent number: 11935515
    Abstract: A method of generating a synthetic voice by capturing audio data, cutting it into discrete phoneme and pitch segments, forming superior phoneme and pitch segments by averaging segments having similar phoneme, pitch, and other sound qualities, and training neural networks to correctly concatenate the segments.
    Type: Grant
    Filed: December 27, 2021
    Date of Patent: March 19, 2024
    Inventor: Claude Polonov
  • Patent number: 11929058
    Abstract: Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: March 12, 2024
    Assignee: DOLBY LABORATORIES LICENSING CORPORATION
    Inventors: Cong Zhou, Xiaoyu Liu, Michael Getty Horgan, Vivek Kumar
  • Patent number: 11922119
    Abstract: Implementations set forth herein relate to providing selectable autofill suggestions, which correspond to application actions that are at least partially fulfilled using server command data—prior to a user selecting a particular selectable autofill suggestion. Proactively fulfilling command data in this way mitigates latency between user selection of a suggestion and fulfillment of a particular action. Initially, a partial input can be processed to generate autofill suggestions, which can be communicated to a server device for further processing. The autofill suggestions can also be rendered for selection at a touch display interface, thereby allowing a user to select one of the autofill suggestions. As command fulfillment data is provided by the server, the command fulfillment data can be available to a corresponding application(s) in order that any corresponding actions can be at least partially fulfilled prior to user selection.
    Type: Grant
    Filed: January 13, 2023
    Date of Patent: March 5, 2024
    Assignee: GOOGLE LLC
    Inventor: Keun Soo Yim
  • Patent number: 11922938
    Abstract: A multi-assistant speech-processing system that centrally determines multiple execution plans to respond to a user input. A central component determines whether a particular input should be processed using a requested assistant or a different assistant or should be terminated. Assistant handoff may be determined based on system policies as well as user input-specific data. A ranked list of execution options may be supplemented by augmented data corresponding to messages to a user. The system may attempt to execute plans in the ranked order until a plan succeeds.
    Type: Grant
    Filed: November 22, 2021
    Date of Patent: March 5, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Yaser Khan, Piyush Kandpal, Ritesh Patel, Mark Lawrence, Srinivas Palla, Ashish Rangole, Jason Wang
  • Patent number: 11922924
    Abstract: Method and apparatus for generating speech through multilingual neural text-to-speech (TTS) synthesis are provided in the present disclosure. A text input in at least a first language may be received. Speaker latent space information of a target speaker may be provided through a speaker encoder. Language latent space information of a second language may be provided through a language encoder. At least one acoustic feature may be generated, through an acoustic feature predictor, based on the text input, the speaker latent space information and the language latent space information of the second language. A speech waveform corresponding to the text input may be generated, through a neural vocoder, based on the at least one acoustic feature.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: March 5, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jingzhou Yang, Lei He
  • Patent number: 11915707
    Abstract: A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform multiple actions corresponding to this intent. The platform may select a target action to perform, and may engage in a back-and-forth dialog to obtain information for completing the target action. The action may include streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user.
    Type: Grant
    Filed: June 14, 2021
    Date of Patent: February 27, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeff Bradley Beal, Kevin Robert Charter, Ajay Gopalakrishnan, Sumedha Arvind Kshirsagar, Nishant Kumar
  • Patent number: 11915683
    Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.
    Type: Grant
    Filed: February 14, 2022
    Date of Patent: February 27, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
  • Patent number: 11915682
    Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: February 27, 2024
    Assignee: DeepMind Technologies Limited
    Inventors: Luis Carlos Cobo Rus, Nal Kalchbrenner, Erich Elsen, Chenjie Gu
  • Patent number: 11900948
    Abstract: Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
    Type: Grant
    Filed: January 7, 2022
    Date of Patent: February 13, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Hugh Evan Secker-Walker, Baiyang Liu, Frederick Victor Weber
  • Patent number: 11894006
    Abstract: The processing of audio signals during playback is provided, so that audio signals that fall below a specified threshold loudness level are processed to avoid making unwanted background noise audible. N-channel audio is received from a playback volume controller/leveler (101). The level of the audio is compared with a threshold level. If the level is greater than the threshold level, the audio is processed with a first amount of gain in accordance with a first dynamic range control (DRC) compression curve that is tuned for professionally produced audio. If the level is less than or equal to the threshold level, the audio is processed with a second amount of gain in accordance with a second DRC compression curve that is designed to avoid boosting unwanted background noise. After applying the gain to the audio, the audio is sent to a downstream device.
    Type: Grant
    Filed: July 18, 2019
    Date of Patent: February 6, 2024
    Assignee: DOLBY LABORATORIES LICENSING CORPORATION
    Inventors: Zhongjin Wang, Andrew Peter Reilly, Michael William Mason
  • Patent number: 11893308
    Abstract: Example techniques involve invoking voice assistance for a media playback system. In some embodiments, a NMD stores in memory a set of command information comprising a listing of playback commands and associated command criteria. The NMD captures a voice input and detects inclusion, within the voice input, of one or more particular playback commands from among the playback commands in the listing. In response, the NMD selects a local voice assistant that supports (a) one or more additional playback commands relative to a cloud-based VAS and (b) fewer non-playback commands relative to the cloud-based VAS, determines, via the local voice assistant, an intent in the captured voice input, and performs a response to the determined intent. The NMD foregoes selection of the cloud-based VAS when the local voice assistant is selected.
    Type: Grant
    Filed: March 28, 2022
    Date of Patent: February 6, 2024
    Assignee: Sonos, Inc.
    Inventors: Dayn Wilberding, John Tolomei
  • Patent number: 11881207
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the method includes receiving a voice input from a user device; generating a recognition output; receiving a user selection of one or more terms in the recognition output; receiving a user input of one or more letters replacing the user selected one or more terms; determining suggested correction candidates based in part on the user input and the voice input; and providing one or more suggested correction candidates to the user device as suggested corrected recognition outputs.
    Type: Grant
    Filed: March 23, 2022
    Date of Patent: January 23, 2024
    Assignee: Google LLC
    Inventors: Evgeny A. Cherepanov, Jakob Nicolaus Foerster, Vikram Sridar, Ishai Rabinovitz, Omer Tabach
  • Patent number: 11875811
    Abstract: A method includes receiving sound input features representative of sound received during an electronic conference, the sound including voice and input device activation sound, receiving an input event feature indicative of the input device activation, and processing the received sound input features and input event feature via a trained model to identify a stored spectral file to be subtracted from the received sound to suppress the input device activation sound.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: January 16, 2024
    Assignee: Lenovo (United States) Inc.
    Inventors: Scott Wentao Li, Robert J. Kapinos, Robert James Norton, Jr., Russell Speight Vanblon