Patents Examined by Matthew Baker

Speech/audio signal processing method and coding apparatus

Patent number: 10056090

Abstract: The present disclosure provides a speech/audio signal processing method based on wideband switching and a coding apparatus. The method includes: if a first wideband speech/audio signal is a harmonic signal, adjusting a determining condition for determining that a second wideband speech/audio signal is a harmonic signal, to obtain a first determining condition, where the first wideband speech/audio signal is a signal before wideband switching, and the second wideband speech/audio signal is a signal after the wideband switching; and determining, according to the first determining condition, whether the second wideband speech/audio signal is a harmonic signal. In the case of wideband switching, signal types of speech/audio signals remain as consistent as possible before and after the switching, so that continuity of the speech/audio signal decoded by a decoder device is ensured as much as possible, further improving speech communication service quality.

Type: Grant

Filed: December 5, 2014

Date of Patent: August 21, 2018

Assignee: Huawei Technologies Co., Ltd.

Inventors: Chen Hu, Zexin Liu, Lei Miao
Systems and methods for manipulating electronic content based on speech recognition

Patent number: 10032465

Abstract: Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.

Type: Grant

Filed: March 1, 2016

Date of Patent: July 24, 2018

Assignee: OATH INC.

Inventors: Peter F. Kocks, Guoning Hu, Ping-Hao Wu
Automated receiver message sentiment analysis, classification and prioritization

Patent number: 10007661

Abstract: Techniques are provided for performing automated operations to analyze and prioritize incoming user messages. An indication of a message sent to a recipient user is received. Based at least in part on configuration information associated with the recipient user, the received message is analyzed. Analyzing the received message includes at least one of determining sentiments associated with the received message, determining intentions associated with the received message, determining document classes associated with the received message, and generating summary information corresponding to the received message. Based at least in part on the analyzing of the received message, a prioritized listing of multiple messages associated with the recipient user, including the received message, is displayed to the recipient user.

Type: Grant

Filed: September 26, 2016

Date of Patent: June 26, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gregg M. Arquero, Eli M. Dow, Syed F. Hossain, Joshua Schaeffer, Yunli Tang
Method and apparatus for performing speaker recognition

Patent number: 10008208

Abstract: Embodiments of the present invention perform speaker identification and verification by first prompting a user to speak a phrase that includes a common phrase component and a personal identifier. Then, the embodiments decompose the spoken phrase to locate the personal identifier. Finally, the embodiments identify and verify the user based on the results of the decomposing.

Type: Grant

Filed: September 18, 2014

Date of Patent: June 26, 2018

Assignee: Nuance Communications, Inc.

Inventors: Almog Aley-Raz, Kevin R. Farrell, Oshrit Yaron, Luca Scarpato
Recognizing address and point of interest speech received at a vehicle

Patent number: 10006777

Abstract: A system and method of recognizing speech received at a vehicle includes: receiving speech from a vehicle occupant via a microphone; determining whether the speech relates to a point of interest (POI) or an address without receiving a POI command prompt or an address command prompt in the speech from the vehicle occupant; selecting a POI function or an address function based on the determination; and processing the received speech to identify a POI or an address.

Type: Grant

Filed: October 2, 2015

Date of Patent: June 26, 2018

Assignee: GM Global Technology Operations LLC

Inventors: Gaurav Talwar, Xufang Zhao
Speech recognition with attention-based recurrent neural networks

Patent number: 9990918

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

Type: Grant

Filed: October 19, 2017

Date of Patent: June 5, 2018

Assignee: Google LLC

Inventors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Noam M. Shazeer
Latency reduction for content playback

Patent number: 9990176

Abstract: Methods and devices for determining whether a local version of content is stored on an electronic device associated with a user account on a backend system are described herein. In a non-limiting embodiment, the backend system may track and monitor the content stored on the electronic device using the associated user account. If an individual speaks an utterance requesting a particular content item, the backend system may determine, prior to sending the content to the electronic device, whether a local version is stored within the electronic device's memory. If so, the backend system may instruct the electronic device to output the local version, thereby reducing the amount of bandwidth consumed. The backend system may further be capable of predictively generating and then caching certain audio data to the electronic device.

Type: Grant

Filed: June 28, 2016

Date of Patent: June 5, 2018

Assignee: Amazon Technologies, Inc.

Inventor: Timothy Thomas Gray
Method and system for constructed response grading

Patent number: 9984585

Abstract: A method and system for constructive response grading for spoken language is disclosed. The method and system are computer implemented and involve a crowdsourcing step to derive evaluation features. The method includes steps for posting a speech test through an automated speech assessment tool, receiving candidate responses from candidates for the speech test; delivering the candidate responses to crowdsource volunteers; receiving crowdsourced responses from crowdsource volunteers, where the crowdsourced responses comprise a transcription of the speech test; deriving features from the transcription; and deriving a individual scores based on the features, where the individual scores are representative of pronunciation score, fluency score, content organization score and grammar score of the spoken language for each candidate.

Type: Grant

Filed: September 18, 2014

Date of Patent: May 29, 2018

Inventors: Varun Aggarwal, Vinay Shashidhar
System and method for rapid customization of speech recognition models

Patent number: 9978363

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

Type: Grant

Filed: June 12, 2017

Date of Patent: May 22, 2018

Assignee: NUANCE COMMUNICATIONS, INC.

Inventors: Srinivas Bangalore, Robert Bell, Diamantino Antonio Caseiro, Mazin Gilbert, Patrick Haffner
Deep tagging background noises

Patent number: 9972340

Abstract: In a computer system for navigating to a location in recorded content, a computer receives a descriptive term or phrase associated with a searchable tag. The searchable tag corresponds to a point-in-time at which a non-speech sound occurred during the recording of recorded content of a communication between a plurality of participants. The recorded content includes speech from one or more of the plurality of participants, the descriptive term includes an automatically generated phonetic translation of the non-speech sound, and the non-speech sound was transmitted to the plurality of participants during the recording. The computer navigates to a location in the recorded content corresponding to the point-in-time at which the non-speech sound occurred.

Type: Grant

Filed: July 27, 2016

Date of Patent: May 15, 2018

Assignee: International Business Machines Corporation

Inventors: Denise A. Bell, Lisa Seacat Deluca, Jana H. Jenkins, Jeffrey A. Kusnitz
Method for judgment of drinking using differential energy in time domain, recording medium and device for performing the method

Patent number: 9943260

Abstract: An alcohol consumption determination method includes detecting a plurality of effective frames of an input voice signal; detecting a difference in signal of within the original signal of each of the effective frames; detecting average energy of the original signal and average energy of the difference signal for each of the effective frames; and determining whether alcohol has been consumed based on a difference between the average energy of the original signal and the average energy of the difference between the signals for each effective frame. Accordingly, it is also possible to determine, from a remote location, whether a driver or an operator remote has consumed alcohol and the degree of inebriation by using a difference signal energy method using a voice signal, thus preventing an accident caused by a operation vehicles and machines under the influence of alcohol.

Type: Grant

Filed: April 2, 2014

Date of Patent: April 17, 2018

Assignee: FOUNDATION OF SOONGSIL UNIVERSITY—INDUSTRY COOPERATION

Inventors: Myung Jin Bae, Sang Gil Lee, Seong Geon Bae
Method and system for text-to-speech synthesis

Patent number: 9916825

Abstract: There are disclosed methods and systems for text-to-speech synthesis for outputting a synthetic speech having a selected speech attribute. First, an acoustic space model is trained based on a set of training data of speech attributes, using a deep neural network to determine interdependency factors between the speech attributes in the training data, the dnn generating a single, continuous acoustic space model based on the interdependency factors, the acoustic space model thereby taking into account a plurality of interdependent speech attributes and allowing for modelling of a continuous spectrum of the interdependent speech attributes. Next, a text is received; a selection of one or more speech attribute is received, each speech attribute having a selected attribute weight; the text is converted into synthetic speech using the acoustic space model, the synthetic speech having the selected speech attribute; and the synthetic speech is outputted as audio having the selected speech attribute.

Type: Grant

Filed: September 13, 2016

Date of Patent: March 13, 2018

Assignee: YANDEX EUROPE AG

Inventor: Ilya Vladimirovich Edrenkin
On-line voice translation method and device

Patent number: 9910851

Abstract: Disclosed are on-line voice translation method and device. The method comprises: conducting voice recognition on first voice information input by a first user, so as to obtain first recognition information; prompting the first user to confirm the first recognition information; translating the confirmed first recognition information to obtain and output first translation information; extracting, according to second information which is fed back by a second user, associated information corresponding to the second information; and correcting the first translation information according to the associated information and outputting the corrected translation information. By means of the on-line voice translation method and device, smooth communication can be ensured in cross-language exchanges.

Type: Grant

Filed: November 12, 2014

Date of Patent: March 6, 2018

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Haifeng Wang, Hua Wu
Method for judgment of drinking using differential frequency energy, recording medium and device for performing the method

Patent number: 9907509

Abstract: An alcohol consumption determination method includes: detecting an effective frame of an input voice signal; detecting a difference signal of an original signal of the effective frame; performing fast Fourier conversion on the original signal and the difference signal; and determining, in the frequency domain, whether alcohol has been consumed based on a slope difference between the fast-Fourier-transformed original signal and the fast-Fourier-transformed difference signal. Accordingly, it is also possible to determine whether a driver or an operator from a remote location has consumed alcohol and a degree of the consumption, thus preventing an accident caused by an individual operating a vehicle under the influence.

Type: Grant

Filed: April 2, 2014

Date of Patent: March 6, 2018

Assignee: FOUNDATION OF SOONGSIL UNIVERSITY—INDUSTRY COOPERATION

Inventors: Myung Jin Bae, Sang Gil Lee, Seong Geon Bae
System and method for using semantic and syntactic graphs for utterance classification

Patent number: 9905223

Abstract: Disclosed herein is a system, method and computer readable medium storing instructions related to semantic and syntactic information in a language understanding system. The method embodiment of the invention is a method for classifying utterances during a natural language dialog between a human and a computing device. The method comprises receiving a user utterance; generating a semantic and syntactic graph associated with the received utterance, extracting all n-grams as features from the generated semantic and syntactic graph and classifying the utterance. Classifying the utterance may be performed any number of ways such as using the extracted n-grams, a syntactic and semantic graphs or writing rules.

Type: Grant

Filed: December 9, 2015

Date of Patent: February 27, 2018

Assignee: Nuance Communications, Inc.

Inventors: Ananlada Chotimongkol, Dilek Z. Hakkani-Tur, Gokhan Tur
Generating topic-specific language models

Patent number: 9892730

Abstract: Speech recognition may be improved by generating and using a topic specific language model. A topic specific language model may be created by performing an initial pass on an audio signal using a generic or basis language model. A speech recognition device may then determine topics relating to the audio signal based on the words identified in the initial pass and retrieve a corpus of text relating to those topics. Using the retrieved corpus of text, the speech recognition device may create a topic specific language model. In one example, the speech recognition device may adapt or otherwise modify the generic language model based on the retrieved corpus of text.

Type: Grant

Filed: July 1, 2009

Date of Patent: February 13, 2018

Assignee: Comcast Interactive Media, LLC

Inventors: David F. Houghton, Seth Michael Murray, Sibley Verbeck Simon
Method and system for generating a definition of a word from multiple sources

Patent number: 9875232

Abstract: There is provided a method of performing an on-line definition of a first word, the first word received from a user of an electronic device via a communication network. The method can be executed at a server. The method comprises: obtaining a first definition set from a first source, the first definition set being based on the first word; obtaining a second definition set from a second source, the second definition set being based on the first word; parsing the first definition set to obtain individual first set words; parsing the second definition set to obtain individual second set words; organizing the individual first set words into at least one definition cluster; causing the electronic device to display to the user at least the first cluster.

Type: Grant

Filed: October 22, 2014

Date of Patent: January 23, 2018

Assignee: YANDEX EUROPE AG

Inventors: Andrey Nikolaevich Mikheev, Andrei Igorevich Shevchenko
Subject estimation system for estimating subject of dialog

Patent number: 9870768

Abstract: A subject estimation system includes a convolutional neural network to estimate a subject label of a dialog. The convolution neural network includes: one or more topic-dependent convolutional layers and one topic-independent convolutional layer, each of the one or more topic-dependent convolutional layers performing, on an input of a word-string vector sequence corresponding to dialog text transcribed from a dialog, a convolution operation dependent on a topic, and the topic-independent convolutional layer performing, on the input of the word-string vector sequence, a convolution operation not dependent on the topic; a pooling layer performing pooling process on outputs of the convolutional layer; and a fully connected layer performing full connection process on outputs of the pooling layer.

Type: Grant

Filed: September 12, 2016

Date of Patent: January 16, 2018

Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.

Inventors: Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami
Pulse encoding and decoding method and pulse codec

Patent number: 9858938

Abstract: In a pulse encoding and decoding method and a pulse codec, more than two tracks are jointly encoded, so that free codebook space in the situation of single track encoding can be combined during joint encoding to become code bits that may be saved. Furthermore, a pulse that is on each track and required to be encoded is combined according to positions, and the number of positions having pulses, distribution of the positions that have pulses on the track, and the number of pulses on each position that has a pulse are encoded separately, so as to avoid separate encoding performed on multiple pulses of a same position, thereby further saving code bits.

Type: Grant

Filed: October 28, 2016

Date of Patent: January 2, 2018

Assignee: Huawei Technologies Co., Ltd.

Inventors: Fuwei Ma, Dejun Zhang
Voice recognition grammar selection based on context

Patent number: 9858921

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.

Type: Grant

Filed: August 2, 2013

Date of Patent: January 2, 2018

Assignee: Google Inc.

Inventors: David P. Singleton, Debajit Ghosh

1 2 3 4 5 … next