Patents Examined by Leonard Saint-Cyr

Multi-task multi-modal machine learning system

Patent number: 10789427

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a machine learning model to perform multiple machine learning tasks from multiple machine learning domains. One system includes a machine learning model that includes multiple input modality neural networks corresponding to respective different modalities and being configured to map received data inputs of the corresponding modality to mapped data inputs from a unified representation space; an encoder neural network configured to process mapped data inputs from the unified representation space to generate respective encoder data outputs; a decoder neural network configured to process encoder data outputs to generate respective decoder data outputs from the unified representation space; and multiple output modality neural networks corresponding to respective different modalities and being configured to map decoder data outputs to data outputs of the corresponding modality.

Type: Grant

Filed: November 19, 2019

Date of Patent: September 29, 2020

Assignee: Google LLC

Inventors: Noam M. Shazeer, Aidan Nicholas Gomez, Lukasz Mieczyslaw Kaiser, Jakob D. Uszkoreit, Llion Owen Jones, Niki J. Parmar, Ashish Teku Vaswani
Audio bandwidth selection

Patent number: 10777213

Abstract: A device includes a receiver configured to receive an audio frame of an audio stream. The audio frame includes information that indicates a coded bandwidth of the audio frame. The device also includes a decoder configured to generate first decoded speech associated with the audio frame and to determine an output mode of the decoder based at least in part on the information that indicates the coded bandwidth. A bandwidth mode indicated by the output mode of the decoder is different than a bandwidth mode indicated by the information that indicates the coded bandwidth. The decoder is further configured to output second decoded speech based on the first decoded speech. The second decoded speech is generated according to an output mode of the decoder.

Type: Grant

Filed: August 3, 2018

Date of Patent: September 15, 2020

Assignee: QUALCOMM Incorporated

Inventors: Venkatraman S. Atti, Venkata Subrahmanyam Chandra Sekhar Chebiyyam, Vivek Rajendran
System and method for inferring user intent from speech inputs

Patent number: 10769385

Abstract: A text string with a first and a second portion is provided. A domain of the text string is determined by applying a first word-matching process to the first portion of the text string. It is then determined whether the second portion of the text string matches a word of a set of words associated with the domain by applying a second word-matching process to the second portion of the text string. Upon determining that the second portion of the text string matches the word of the set of words, it is determined whether a user intent from the text string based at least in part on the domain and the word of the set of words.

Type: Grant

Filed: November 29, 2018

Date of Patent: September 8, 2020

Assignee: APPLE INC.

Inventor: Gunnar Evermann
Correlating distinct events using linguistic analysis

Patent number: 10769369

Abstract: Linguistic analysis based correlation of distinct events is provided. In examples, trouble shooting tickets may be received over a time period. A linguistic analysis may be performed on one or more portions of the one or more comments using a linguistic model and a similarity score may be computed for one or more keywords within the one or more portions of the one or more comments based on criteria associated with each of the keywords. The similarity score for each of the keywords may be compared to a validation threshold and if the similarity score for a subset of the keywords within a trouble shooting ticket exceeds the validation threshold, the trouble shooting ticket may be validated as associated with the incident. If a number of trouble shooting tickets are validated as being associated with the incident exceeds a service outage threshold, an alert may be issued for the service outage.

Type: Grant

Filed: April 24, 2018

Date of Patent: September 8, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventor: Pamela Bhattacharya
Encoding method, decoding method, encoding apparatus, and decoding apparatus

Patent number: 10770085

Abstract: An encoding method, a decoding method, an encoding apparatus, a decoding apparatus, a transmitter, a receiver, and a communications system, where the encoding method includes dividing a to-be-encoded time-domain signal into a low band signal and a high band signal, performing encoding on the low band signal to obtain a low frequency encoding parameter, performing encoding on the high band signal to obtain a high frequency encoding parameter, obtaining a synthesized high band signal; performing short-time post-filtering processing on the synthesized high band signal to obtain a short-time filtering signal, and calculating a high frequency gain based on the high band signal and the short-time filtering signal.

Type: Grant

Filed: January 3, 2019

Date of Patent: September 8, 2020

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Bin Wang, Zexin Liu, Lei Miao
Automatically identifying speakers in real-time through media processing with dialog understanding supported by AI techniques

Patent number: 10762906

Abstract: Automatically identifying speakers in real-time through media processing with dialog understanding. A plurality of audio streams may be received, an audio stream representing a speech of a participant speaking during an online meeting. A voice characteristic of a voice corresponding to the speech of the participant in the audio stream may be determined. The plurality of audio streams may be converted into text and a natural language processing may be performed to determine content context of the dialog. The natural language processing infers a name to associate with the voice in the audio stream based on the determined content context. A data structure linking the name with the voice may be created and stored in a knowledge base. A user interface associated with the online meeting application is triggered to present the name or identity of the speaker.

Type: Grant

Filed: May 1, 2018

Date of Patent: September 1, 2020

Assignee: International Business Machines Corporation

Inventors: Marcio Ferreira Moreno, Helon Vicente Hultmann Ayala, Daniel Salles Chevitarese, Rafael R. de Mello Brandao, Renato Fontoura de Gusmao Cerqueira
Conference system, conference system control method, and program

Patent number: 10741172

Abstract: A conference system includes an utterance indication processing unit configured to display text information representing utterance content of each speaker on a display unit of each of one or more terminals, and a notification unit configured to notify a speaker of a request to slow down a speech rate of the speaker.

Type: Grant

Filed: March 23, 2018

Date of Patent: August 11, 2020

Assignee: HONDA MOTOR CO., LTD.

Inventors: Takashi Kawachi, Kazuhiro Nakadai, Tomoyuki Sahata, Syota Mori, Yuki Uezono, Kyosuke Hineno, Kazuya Maura
Real-time search and validation of phrases using linguistic phrase components

Patent number: 10742577

Abstract: A method and system is disclosed for evaluating a chat message sent between users of an online environment. The method may include receiving a chat message and parsing the message into words. The method determines the acceptability of the message by matching the message to a plurality of acceptable messages stored in a data structure. Upon determining the message does not match any acceptable messages, the method replaces each word in the message with grammatical metadata. The method may use templates to determine if the message has acceptable word combinations based on the metadata. The method may also compare the metadata to rules wherein the rules determine if the message has unacceptable word combinations based on the metadata. The method may send the message to a user upon determining words in the message do not match any word in a list of unacceptable words.

Type: Grant

Filed: October 21, 2013

Date of Patent: August 11, 2020

Assignee: Disney Enterprises, Inc.

Inventors: Sean O'Dell, Paul Pak, Drew Beechum, Vita Markman, Marc Silbey
Method and system for automated intent mining, classification and disposition

Patent number: 10740566

Abstract: An agent automation system includes a memory configured to store a corpus of utterances and a semantic mining framework and a processor configured to execute instructions of the semantic mining framework to cause the agent automation system to perform actions, wherein the actions include: detecting intents within the corpus of utterances; producing intent vectors for the intents within the corpus; calculating distances between the intent vectors; generating meaning clusters of intent vectors based on the distances; detecting stable ranges of cluster radius values for the meaning clusters; and generating an intent/entity model from the meaning clusters and the stable ranges of cluster radius values, wherein the agent automation system is configured to use the intent/entity model to classify intents in received natural language requests.

Type: Grant

Filed: November 2, 2018

Date of Patent: August 11, 2020

Assignee: ServiceNow, Inc.

Inventors: Edwin Sapugay, Anil Kumar Madamala, Maxim Naboka, Srinivas SatyaSai Sunkara, Lewis Savio Landry Santos, Murali B. Subbarao
Multi-modal interface in a voice-activated network

Patent number: 10733984

Abstract: Systems and methods of the present technical solution enable a multi-modal interface for voice-based devices, such as digital assistants. The solution can enable a user to interact with video and other content through a touch interface and through voice commands. In addition to inputs such as stop and play, the present solution can also automatically generate annotations for displayed video files. From the annotations, the solution can identify one or more break points that are associated with different scenes, video portions, or how-to steps in the video. The digital assistant can receive input audio signal and parse the input audio signal to identify semantic entities within the input audio signal. The digital assistant can map the identified semantic entities to the annotations to select a portion of the video that corresponds to the users request in the input audio signal.

Type: Grant

Filed: May 7, 2018

Date of Patent: August 4, 2020

Assignee: Google LLC

Inventors: Masoud Loghmani, Anshul Kothari, Ananth Devulapalli
Decoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element

Patent number: 10734010

Abstract: Embodiments relate to an audio processing unit that includes a buffer, bitstream payload deformatter, and a decoding subsystem. The buffer stores at least one block of an encoded audio bitstream. The block includes a fill element that begins with an identifier followed by fill data. The fill data includes at least one flag identifying whether enhanced spectral band replication (eSBR) processing is to be performed on audio content of the block. A corresponding method for decoding an encoded audio bitstream is also provided.

Type: Grant

Filed: September 12, 2019

Date of Patent: August 4, 2020

Assignee: Dolby International AB

Inventors: Lars Villemoes, Heiko Purnhagen, Per Ekstrand
Dynamic vocabulary customization in automated voice systems

Patent number: 10720149

Abstract: Techniques to dynamically customize a menu system presented to a user by a voice interaction system are provided. Audio data from a user that includes the speech of a user can be received. Features can be extracted from the received audio data, including a vocabulary of the speech of the user. The extracted features can be compared to features associated with a plurality of user group models. A user group model to assign to the user from the plurality of user group models can be determined based on the comparison. The user group models can cluster users together based on estimated characteristics of the users and can specify customized menu systems for each different user group. Audio data can then be generated and provided to the user in response to the received audio data based on the determined user group model assigned to the user.

Type: Grant

Filed: October 23, 2018

Date of Patent: July 21, 2020

Assignee: Capital One Services, LLC

Inventors: Reza Farivar, Jeremy Edward Goodsitt, Fardin Abdi Taghi Abad, Austin Grant Walters
Apparatus for encoding and decoding of integrated speech and audio

Patent number: 10714103

Abstract: Provided is an encoding apparatus for integrally encoding and decoding a speech signal and a audio signal, and may include: an input signal analyzer to analyze a characteristic of an input signal; a stereo encoder to down mix the input signal to a mono signal when the input signal is a stereo signal, and to extract stereo sound image information; a frequency band expander to expand a frequency band of the input signal; a sampling rate converter to convert a sampling rate; a speech signal encoder to encode the input signal using a speech encoding module when the input signal is a speech characteristics signal; a audio signal encoder to encode the input signal using a audio encoding module when the input signal is a audio characteristic signal; and a bitstream generator to generate a bitstream.

Type: Grant

Filed: August 30, 2019

Date of Patent: July 14, 2020

Assignees: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KWANGWOON UNIVERSITY INDUSTRY-ACADEMIC COLLABORATION FOUNDATION

Inventors: Tae Jin Lee, Seung-Kwon Baek, Min Je Kim, Dae Young Jang, Jeongil Seo, Kyeongok Kang, Jin-Woo Hong, Hochong Park, Young-Cheol Park
Audio signal encoding method and mobile phone

Patent number: 10706866

Abstract: An audio signal encoding method and a mobile phone, where the audio signal encoding method includes obtaining a digital audio signal in time domain; transforming the digital audio signal in time domain to an audio signal in frequency domain, which comprises a current frame comprises a plurality of subbands; obtaining, reference parameters of the plurality of subbands; encoding, using a HQ algorithm, the current frame to obtain an encoded audio signal when the reference parameters meet a preset parameter condition; and transmitting the encoded audio signal via a network. The audio signal encoding method and the mobile phone help improve encoding quality or encoding efficiency in audio signal encoding.

Type: Grant

Filed: October 30, 2019

Date of Patent: July 7, 2020

Assignee: Huawei Technologies Co., Ltd.

Inventors: Zexin Liu, Lei Miao
Information processing system and information processing method for speech recognition

Patent number: 10706844

Abstract: The present disclosure relates to an information processing apparatus, an information processing method, and a program that are capable of providing better user experience. An information processing apparatus includes an activation word setting unit that sets, on the basis of a detection result of detecting a user operation, a word used as an activation word for activating a predetermined function, the activation word being uttered by a user, the number of activation words being increased or decreased by the setting; and an activation word recognition unit that performs speech recognition on speech uttered by the user and recognizes that the word set by the activation word setting unit to be used as the activation word is uttered. The present technology is applicable to, for example, a wearable terminal provided with a speech recognition function.

Type: Grant

Filed: May 6, 2016

Date of Patent: July 7, 2020

Assignee: SONY CORPORATION

Inventor: Hiroaki Ogawa
Raw speech speaker-recognition

Patent number: 10706857

Abstract: An apparatus including a multi time-frequency resolution convolution neural network module; a two dimensional convolution neural network layers module; and a discriminative fully-connected classifier layers module; wherein the multi time-frequency resolution convolution neural network module receives a raw speech signal from a human speaker and processes the raw speech signal to provide a first processed output in the form of multiple multi time-frequency resolution spectrographic feature maps; wherein the two dimensional convolution neural network layers module processes the first processed output to provide a second processed output; and wherein the discriminative fully-connected classifier layers module processes the second processed output to provide a third processed output, wherein the third processed output provides an indication of an identify of a human speaker or provides an indication of verification of the identify of a human speaker.

Type: Grant

Filed: April 20, 2020

Date of Patent: July 7, 2020

Assignee: KAIZEN SECURE VOIZ, INC.

Inventors: Viswanathan Ramasubramanian, Sunderrajan Kumar
Selective adaptation and utilization of noise reduction technique in invocation phrase detection

Patent number: 10706842

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. Various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Grant

Filed: January 14, 2019

Date of Patent: July 7, 2020

Assignee: GOOGLE LLC

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
Techniques for concurrent processing of user speech

Patent number: 10699713

Abstract: A server receives a user audio stream, the stream comprising multiple utterances. A query-processing module of the server continuously listens to and processes the utterances. The processing includes parsing successive utterances and recognizing corresponding queries, taking appropriate actions while the utterances are being received. In some embodiments, a query may be parsed and executed before the previous query's execution is complete.

Type: Grant

Filed: April 18, 2019

Date of Patent: June 30, 2020

Assignee: SoundHound, Inc.

Inventors: Scott Halstvedt, Bernard Mont-Reynaud, Kazi Asif Wadud
Non-speech input to speech processing system

Patent number: 10692489

Abstract: A system and method for incorporating motion into a speech processing system. A wearable device that is capable of both capturing spoken utterances and capturing motion data may be used to interact with a speech processing system. In certain circumstances, such as when voice communication are unreliable (due to noise) or when controlling the system by motion is desired, motion of a device may be used to provide input to a speech processing system. For example, sensor data or gesture data resulting from movement of a device may be processed and input into a natural language system as representative of a spoken command portion or other input. The motion information may be interpreted to provide prompts to the system (e.g., “yes,” “no,” etc.), to perform certain commands (skip, forward, back, cancel) or to otherwise control the system.

Type: Grant

Filed: December 23, 2016

Date of Patent: June 23, 2020

Assignee: Amazon Technologies, Inc.

Inventor: Travis Grizzel
Non-speech input to speech processing system

Patent number: 10692485

Abstract: A system and method for associating motion data with utterance audio data for use with a speech processing system. A device, such as a wearable device, may be capable of capturing utterance audio data and sending it to a remote server for speech processing, for example for execution of a command represented in the utterance. The device may also capture motion data using motion sensors of the device. The motion data may correspond to gestures, such as head gestures, that may be interpreted by the speech processing system to determine and execute commands. The device may associate the motion data with the audio data so the remote server knows what motion data corresponds to what portion of audio data for purposes of interpreting and executing commands. Metadata sent with the audio data and/or motion data may include association data such as timestamps, session identifiers, message identifiers, etc.

Type: Grant

Filed: December 23, 2016

Date of Patent: June 23, 2020

Assignee: Amazon Technologies, Inc.

Inventor: Travis Grizzel

prev … 9 10 11 12 13 14 15 16 17 … next