Dynamic Time Warping Patents (Class 704/241)

Machine learning driven teleprompter

Patent number: 11902690

Abstract: Techniques performed by a data processing system for a machine learning driven teleprompter include displaying a teleprompter transcript associated with a presentation on a display of a computing device associated with a presenter; receiving audio content of the presentation including speech of the presenter in which the presenter is reading the teleprompter transcript; analyzing the audio content of the presentation using a first machine learning model to obtain a real-time textual translation of the audio content, the first machine learning model being a natural language processing model trained to receive audio content including speech and to translate the audio content into a textual representation of the speech; analyzing the real-time textual representation and the teleprompter transcript with a second machine learning model to obtain transcript position information; and automatically scrolling the teleprompter transcript on the display of the computing device based on the transcript position informatio

Type: Grant

Filed: January 19, 2022

Date of Patent: February 13, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chakkaradeep Chinnakonda Chandran, Stephanie Lorraine Horn, Michael Jay Gilmore, Tarun Malik, Sarah Zaki, Tiffany Michelle Smith, Shivani Gupta, Pranjal Saxena, Ridhima Gupta
Methods, systems, and apparatuses to respond to voice requests to play desired video clips in streamed media based on matched close caption and sub-title text

Patent number: 11849193

Abstract: Methods, Systems, and Apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.

Type: Grant

Filed: October 19, 2022

Date of Patent: December 19, 2023

Inventor: Mayank Verma
Expressive text-to-speech system and method

Patent number: 11830473

Abstract: A system for synthesising expressive speech includes: an interface configured to receive an input text for conversion to speech; a memory; and at least one processor coupled to the memory. The processor is configured to generate, using an expressivity characterisation module, a plurality of expression vectors, wherein each expression vector is a representation of prosodic information in a reference audio style file, and synthesise expressive speech from the input text, using an expressive acoustic model comprising a deep convolutional neural network that is conditioned by at least one of the plurality of expression vectors.

Type: Grant

Filed: September 29, 2020

Date of Patent: November 28, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jesus Monge Alvarez, Holly Francois, Hosang Sung, Seungdo Choi, Kihyun Choo, Sangjun Park
Methods and systems for synchronization of closed captions with content output

Patent number: 11785278

Abstract: Alignment between closed caption and audio/video content may be improved by determining text associated with a portion of the audio or a portion of the video and comparing the determined text to a portion of closed caption text. Based on the comparison, a delay may be determined and the audio/video content may be buffered based on the determined delay.

Type: Grant

Filed: March 18, 2022

Date of Patent: October 10, 2023

Assignee: Comcast Cable Communications, LLC

Inventor: Christopher Stone
Interactive pronunciation learning system

Patent number: 11736773

Abstract: Systems and methods for generating audible pronunciation of a closed captioning word in a content item. For example, a system generates for output on a first device a content item comprising dialogue. The system generates for display on the first device a closed captioning word corresponding to the dialogue where the closed captioning word is selectable via a user interface of the first device. The system receives a selection of the closed captioning word via the user interface of the first device. In response to receiving the selection of the closed captioning word, the system generates for playback on the first device at least a portion of the dialogue corresponding to the selected closed captioning word.

Type: Grant

Filed: October 15, 2021

Date of Patent: August 22, 2023

Assignee: Rovi Guides, Inc.

Inventor: Serhad Doken
Methods and apparatus to determine the speed-up of media programs using speech recognition

Patent number: 11683558

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to determine the speed-up of media programs using speech recognition. An example apparatus disclosed herein is to perform speech recognition on a first audio clip collected by a media meter to recognize a first text string associated with the first audio clip, compare the first text string to a plurality of reference text strings associated with a corresponding plurality of reference audio clips to identify a matched one of the reference text strings, and estimate a presentation rate of the first audio clip based on a first time associated with the first audio clip and a second time associated with a first one of the reference audio clips corresponding to the matched one of the reference text strings.

Type: Grant

Filed: December 29, 2021

Date of Patent: June 20, 2023

Assignee: THE NIELSEN COMPANY (US), LLC

Inventor: Morris Lee
Speaker recognition method and system

Patent number: 11615800

Abstract: A speaker recognition system for assessing the identity of a speaker through a speech signal based on speech uttered by said speaker is provided. The system includes a framing module that subdivides the speech signal over time into a set of frames, and a filtering module that analyzes the frames of the set to discard frames affected by noise and frames which do not comprise a speech, based on a spectral analysis of the frames. A feature extraction module extracts audio features from frames which have not been discarded, and a classification module processes the audio features extracted from the frames which have not been discarded for assessing the identity of the speaker.

Type: Grant

Filed: April 18, 2018

Date of Patent: March 28, 2023

Assignee: TELECOM ITALIA S.p.A.

Inventors: Igor Bisio, Cristina Fra', Chiara Garibotto, Fabio Lavagetto, Andrea Sciarrone, Massimo Valla
Methods, systems, and apparatuses to respond to voice requests to play desired video clips in streamed media based on matched close caption and sub-title text

Patent number: 11509969

Abstract: Methods, Systems, and Apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.

Type: Grant

Filed: May 25, 2021

Date of Patent: November 22, 2022

Inventor: Mayank Verma
System and computerized method for subtitles synchronization of audiovisual content using the human voice detection for synchronization

Patent number: 11445266

Abstract: Audiovisual content in the form of video clip files, streamed or broadcasted may further contain subtitles. Such subtitles are provided with timing information so that each subtitle should be displayed synchronously with the spoken words. However, at times such synchronization with the audio portion of the audiovisual content has a timing offset which when above a predetermined threshold is bothersome. The system and method determine time spans in which a human speaks and attempts to synchronize those time spans with the subtitle content. Indication is provided when an incurable synchronization exists as well as the case where the subtitles and audio are well synchronized. It further is able to determine, when an offset exists, the type of offset (constant or dynamic) and providing the necessary adjustment information so that the timing used in conjunction with the subtitles timing provided may be corrected and synchronization deficiency resolved.

Type: Grant

Filed: March 12, 2021

Date of Patent: September 13, 2022

Assignee: IChannel.IO Ltd.

Inventor: Oren Jack Maurice
System and method for generating localized contextual video annotation

Patent number: 11270123

Abstract: Embodiments described herein provide a system for localized contextual video annotation. During operation, the system can segment a video into a plurality of segments based on a segmentation unit and parse a respective segment for generating multiple input modalities for the segment. A respective input modality can indicate a form of content in the segment. The system can then classify the segment into a set of semantic classes based on the input modalities and determine an annotation for the segment based on the set of semantic classes.

Type: Grant

Filed: September 21, 2020

Date of Patent: March 8, 2022

Assignee: Palo Alto Research Center Incorporated

Inventors: Karunakaran Sureshkumar, Raja Bala
Electronic device, speech recognition method, and recording medium

Patent number: 11223878

Abstract: An electronic device is disclosed. The electronic device comprises: a microphone for receiving voice; a memory for storing a plurality of text sets; and a processor for converting the voice, received via the microphone, into text, searching for words common to the converted text with respect to each of the plurality of text sets, and determining at least one text set of the plurality of text sets on the basis of the ratio of the searched common words.

Type: Grant

Filed: October 25, 2018

Date of Patent: January 11, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Jae Hyun Bae
Low energy deep-learning networks for generating auditory features for audio processing pipelines

Patent number: 11205419

Abstract: Low energy deep-learning networks for generating auditory features such as mel frequency cepstral coefficients in audio processing pipelines are provided. In various embodiments, a first neural network is trained to output auditory features such as mel-frequency cepstral coefficients, linear predictive coding coefficients, perceptual linear predictive coefficients, spectral coefficients, filter bank coefficients, and/or spectro-temporal receptive fields based on input audio samples. A second neural network is trained to output a classification based on input auditory features such as mel-frequency cepstral coefficients. An input audio sample is provided to the first neural network. Auditory features such as mel-frequency cepstral coefficients are received from the first neural network. The auditory features such as mel-frequency cepstral coefficients are provided to the second neural network. A classification of the input audio sample is received from the second neural network.

Type: Grant

Filed: August 28, 2018

Date of Patent: December 21, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Davis Barch, Andrew S. Cassidy, Myron D. Flickner
Apparatus and method for processing speech recognition

Patent number: 11195527

Abstract: Apparatus and method for processing speech recognition may include a speech recognition module that recognizes a voice uttered from a user, and a processing module that calls a user DB where information associated with the user is registered when a voice command of the user is input by the speech recognition module, verifies setting information related to a domain corresponding to the voice command, and processes the voice command through a content provider linked to the associated domain.

Type: Grant

Filed: August 1, 2019

Date of Patent: December 7, 2021

Assignees: Hyundai Motor Company, Kia Motors Corporation

Inventor: Jae Min Joh
Electronic device, speech recognition method, and recording medium

Patent number: 11178463

Abstract: An electronic device is disclosed. The electronic device comprises: a microphone for receiving voice; a memory for storing a plurality of text sets; and a processor for converting the voice, received via the microphone, into text, searching for words common to the converted text with respect to each of the plurality of text sets, and determining at least one text set of the plurality of text sets on the basis of the ratio of the searched common words.

Type: Grant

Filed: October 25, 2018

Date of Patent: November 16, 2021

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Jae Hyun Bae
Methods, systems, and apparatuses to respond to voice requests to play desired video clips in streamed media based on matched close caption and sub-title text

Patent number: 11032620

Abstract: Methods, Systems, and Apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.

Type: Grant

Filed: February 14, 2020

Date of Patent: June 8, 2021

Assignee: SLING MEDIA PVT LTD

Inventor: Mayank Verma
Speaker verification

Patent number: 11024318

Abstract: A method of speaker verification comprises: comparing a test input against a model of a user's speech obtained during a process of enrolling the user; obtaining a first score from comparing the test input against the model of the user's speech; comparing the test input against a first plurality of models of speech obtained from a first plurality of other speakers respectively; obtaining a plurality of cohort scores from comparing the test input against the plurality of models of speech obtained from a plurality of other speakers; obtaining statistics describing the plurality of cohort scores; modifying said statistics to obtain adjusted statistics; normalising the first score using the adjusted statistics to obtain a normalised score; and using the normalised score for speaker verification.

Type: Grant

Filed: November 15, 2019

Date of Patent: June 1, 2021

Assignee: Cirrus Logic, Inc.

Inventors: John Paul Lesso, Gordon Richard McLeod
Voice identification feature optimization and dynamic registration methods, client, and server

Patent number: 11011177

Abstract: A voice identification method comprises: obtaining audio data, and extracting an audio feature of the audio data; determining whether a voice identification feature having a similarity with the audio feature above a preset matching threshold exists in an associated feature library; and in response to determining that the voice identification feature exists in the associated feature library, updating, by using the audio feature, the voice identification feature obtained through matching.

Type: Grant

Filed: June 14, 2018

Date of Patent: May 18, 2021

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Gang Liu, Qingen Zhao, Guangxing Liu
Interactive artificial intelligence analytical system

Patent number: 11010645

Abstract: A method and system for an AI-based communication training system for individuals and organizations is disclosed. A video analyzer is used to convert a video signal into a plurality of human morphology features with an accompanying audio analyzer converting an audio signal into a plurality of human speech features. A transformation module transforms the morphology features and the speech features into a current multi-dimensional performance vector and combinatorial logic generates an integration of the current multi-dimensional performance vector and one or more prior multi-dimensional performance vectors to generate a multi-session rubric. Backpropagation logic applies a current multi-dimensional performance vector from the combinatorial logic to the video analyzer and the audio analyzer.

Type: Grant

Filed: August 26, 2019

Date of Patent: May 18, 2021

Assignee: TalkMeUp

Inventors: JiaoJiao Xu, Yi Xu, Chenchen Zhu, Matthew Thomas Spettel
Uplink reference signal sequence design in 5G new radio

Patent number: 11005620

Abstract: Various aspects described herein relate to techniques for uplink reference signal sequence design in wireless communications systems. A method, a computer-readable medium, and an apparatus are provided. In an aspect, the method includes identifying a set of sequences to include at least a base sequence, a reverse order sequence of the base sequence, a complex conjugate sequence of the base sequence, or a reverse order complex conjugate sequence of the base sequence, and transmitting an uplink reference signal based on at least one of the sequences in the set. The techniques described herein may apply to different communications technologies, including 5th Generation (5G) New Radio (NR) communications technology.

Type: Grant

Filed: June 14, 2018

Date of Patent: May 11, 2021

Assignee: QUALCOMM Incorporated

Inventors: Seyong Park, Renqiu Wang, Yi Huang, Hao Xu, Peter Gaal
Location-based voice recognition system with voice command

Patent number: 10884096

Abstract: An object of the present invention is to facilitate recognition of a voice command of a user in a situation where multiple devices including microphones are connected through a sensor network. A relative location of each device is determined and a location and a direction of the user are tracked through a time difference in which the voice command is applied. The command is interpreted based on the location and the direction of the user. Such a method as a method for a sensor network, Machine to Machine (M2M), Machine Type Communication (MTC), and Internet of Things (IoT) may be used for an intelligent service (smart home, smart building, etc.), digital education, security and safety related services, and the like.

Type: Grant

Filed: February 13, 2018

Date of Patent: January 5, 2021

Assignee: LUXROBO CO., LTD.

Inventors: Seungmin Baek, Seungbae Son
Processing communications using a prototype classifier

Patent number: 10747957

Abstract: In some applications, it may be desired to process a message to determine an intent of the message, where the intent indicates the meaning of the message. An intent classifier may be used to determine the meaning of a message by processing the message to compute a message embedding vector that represents the message in a vector space. Each possible intent may be represented by a prototype vector, and the intent of the message may be determined by comparing the message embedding to one or more prototype vectors, such as by selecting an intent whose prototype vector is closest to the message embedding. An intent classifier may be used, for example, (i) to implement an automated communications system with states where each state is associated with a subset of the possible intents or (ii) for processing usage data of a communications system to update the intents of the communications system.

Type: Grant

Filed: November 13, 2018

Date of Patent: August 18, 2020

Assignee: ASAPP, INC.

Inventor: Jeremy Elliot Azriel Wohlwend
Determining a response of a crowd to a request using an audio having concurrent responses of two or more respondents

Patent number: 10540991

Abstract: In various example embodiments, a system and method for determining a crowd response for a crowd are presented. One method is disclosed that includes receiving an audio signal that includes concurrent responses from two or more respondents, determining the concurrent responses from the audio signal without regard to the identity of the respondents, and generating a crowd based on the concurrent responses.

Type: Grant

Filed: August 20, 2015

Date of Patent: January 21, 2020

Assignee: eBay Inc.

Inventor: Sergio Pinzon Gonzales, Jr.
Sound processing method and sound processing apparatus

Patent number: 10482893

Abstract: A sound processing method includes a step of applying a nonlinear filter to a temporal sequence of spectral envelope of an acoustic signal, wherein the nonlinear filter smooths a fine temporal perturbation of the spectral envelope without smoothing out a large temporal change. A sound processing apparatus includes a smoothing processor configured to apply a nonlinear filter to a temporal sequence of spectral envelope of an acoustic signal, wherein the nonlinear filter smooths a fine temporal perturbation of the spectral envelope without smoothing out a large temporal change.

Type: Grant

Filed: November 1, 2017

Date of Patent: November 19, 2019

Assignee: YAMAHA CORPORATION

Inventors: Ryunosuke Daido, Hiraku Kayama
User identification method and user identification apparatus

Patent number: 10451710

Abstract: The present disclosure relates to a user identification method applicable to a vehicle, the vehicle including at least two microphone arrays, the respective microphone arrays being disposed at different positions of the vehicle, respectively. The user identification method includes: receiving a voice of a user within the vehicle through the at least two microphone arrays; determining directions from the user to the microphone arrays, respectively, according to the voice; calculating an angle between any two of the directions; and identifying a type of the user based at least on the angle.

Type: Grant

Filed: October 26, 2018

Date of Patent: October 22, 2019

Assignee: BOE TECHNOLOGY GROUP CO., LTD.

Inventors: Hongyang Li, Xin Li, Xiangdong Yang
Technologies for end-of-sentence detection using syntactic coherence

Patent number: 10418028

Abstract: Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.

Type: Grant

Filed: November 15, 2017

Date of Patent: September 17, 2019

Assignee: Intel Corporation

Inventors: Oren Shamir, Oren Pereg, Moshe Wasserblat, Jonathan Mamou, Michel Assayag
System and method for remote management and detection of client complications

Patent number: 10402924

Abstract: A system and method are presented for relationship management workflow processes. At least one embodiment may apply to process automation to health care. More specifically, the system and method may be applied to patient management of healthcare, such as the management of Diabetes or other medical conditions. Other embodiments may apply to process automation in other areas utilizing management workflow software.

Type: Grant

Filed: February 25, 2014

Date of Patent: September 3, 2019

Inventors: Zachary Hinkle, Jason Andrew Loucks, Logan H. Weilenman, Ryan Collins
Method and device for performing voice recognition using grammar model

Patent number: 10403267

Abstract: A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.

Type: Grant

Filed: January 16, 2015

Date of Patent: September 3, 2019

Assignee: Samsung Electronics Co., Ltd

Inventors: Chi-youn Park, Il-hwan Kim, Kyung-min Lee, Nam-hoon Kim, Jae-won Lee
Methods and apparatus for parallel evaluation of pattern queries over large N-dimensional datasets to identify features of interest

Patent number: 10360215

Abstract: Pattern queries are evaluated in parallel over large N-dimensional datasets to identify features of interest.

Type: Grant

Filed: March 30, 2015

Date of Patent: July 23, 2019

Assignee: EMC Corporation

Inventors: Angelo E. M. Ciarlini, Fabio A. M. Porto, Amir H. K. Moghadam, Jonas F. Bias, Paulo de Figueiredo Pires, Fabio A. Perosi, Alex L. Bordignon, Bruno Carlos da Cunha Costa, Wagner dos Santos Vieira
Classifying segments of speech based on acoustic features and context

Patent number: 10311863

Abstract: There is provided a system including a microphone configured to receive an input speech, an analog to digital (A/D) converter configured to convert the input speech to a digital form and generate a digitized speech including a plurality of segments having acoustic features, a memory storing an executable code, and a processor executing the executable code to extract a plurality of acoustic feature vectors from a first segment of the digitized speech, determine, based on the plurality of acoustic feature vectors, a plurality of probability distribution vectors corresponding to the probabilities that the first segment includes each of a first keyword, a second keyword, both the first keyword and the second keyword, a background, and a social speech, and assign a first classification label to the first segment based on an analysis of the plurality of probability distribution vectors of one or more segments preceding the first segment and the probability distribution vectors of the first segment.

Type: Grant

Filed: September 2, 2016

Date of Patent: June 4, 2019

Assignee: Disney Enterprises, Inc.

Inventors: Jill Fain Lehman, Nikolas Wolfe, Andre Pereira
Speech recognition circuit using parallel processors

Patent number: 10217460

Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.

Type: Grant

Filed: December 28, 2016

Date of Patent: February 26, 2019

Assignee: ZENTIAN LIMITED.

Inventor: Mark Catchpole
Accelerating full wavefield inversion with nonstationary point-spread functions

Patent number: 10036818

Abstract: Method for reducing computational time in inversion of geophysical data to infer a physical property model (91), especially advantageous in full wavefield inversion of seismic data. An approximate Hessian is pre-calculated by computing the product of the exact Hessian and a sampling vector composed of isolated point diffractors (82), and the approximate Hessian is stored in computer hard disk or memory (83). The approximate Hessian is then retrieved when needed (99) for computing its product with the gradient (93) of an objective function or other vector. Since the approximate Hessian is very sparse (diagonally dominant), its product with a vector may therefore be approximated very efficiently with good accuracy. Once the approximate Hessian is computed and stored, computing its product with a vector requires no simulator calls (wavefield propagations) at all. The pre-calculated approximate Hessian can also be reused in the subsequent steps whenever necessary.

Type: Grant

Filed: July 14, 2014

Date of Patent: July 31, 2018

Assignee: ExxonMobil Upstream Research Company

Inventors: Yaxun Tang, Sunwoong Lee
Context aware hearing optimization engine

Patent number: 9886954

Abstract: One or more context aware processing parameters and an ambient audio stream are received. One or more sound characteristics associated with the ambient audio stream are identified using a machine learning model. One or more actions to perform are determined using the machine learning model and based on the one or more context aware processing parameters and the identified one or more sound characteristics. The one or more actions are performed.

Type: Grant

Filed: September 30, 2016

Date of Patent: February 6, 2018

Assignee: Doppler Labs, Inc.

Inventors: Jacob Meacham, Matthew Sills, Richard Fritz Lanman, III, Jeffrey Baker
System and method for identifying special information

Patent number: 9881604

Abstract: A system and method for identifying special information is provided. Endpoints are defined within a voice recording. One or more of the endpoints are identified within the voice recording and the voice recording is partitioned into segments based on the identified endpoints. Elements of text are identified by applying speech recognition to each of the segments and a list of prompt list candidates are applied to the text elements. The segments with text elements that match one or more prompt list candidates are identified. Portions of the voice recording following the prompt list candidates that include special information are identified and the special information is rendered unintelligible within the voice recording.

Type: Grant

Filed: February 9, 2015

Date of Patent: January 30, 2018

Assignee: Intellisist, Inc.

Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
Caching speech recognition scores

Patent number: 9858922

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score.

Type: Grant

Filed: June 23, 2014

Date of Patent: January 2, 2018

Assignee: Google Inc.

Inventors: Eugene Weinstein, Sanjiv Kumar, Ignacio L. Moreno, Andrew W. Senior, Nikhil Prasad Bhat
Technologies for end-of-sentence detection using syntactic coherence

Patent number: 9837069

Abstract: Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.

Type: Grant

Filed: December 22, 2015

Date of Patent: December 5, 2017

Assignee: Intel Corporation

Inventors: Oren Shamir, Oren Pereg, Moshe Wasserblat, Jonathan Mamou, Michel Assayag
Data model pattern updating in a data collecting system

Patent number: 9805111

Abstract: A pattern analysing device (27) in a pattern processing node {21} of a data collection system (10) comprises a pattern updating unit equipped with a pattern collecting element configured to obtain an existing pattern of historical data according to at least one existing data model, where the existing pattern relates to an entity (11) associated with the data collection system and obtain a further pattern of newer data according to a further data model, where the further pattern also relates to the entity, <<pattern updating element configured to compare the patterns with each other, determine if the existing data model can be mapped on the further data model and update the existing pattern with the further pattern in relation to the historical data if the existing data model can be mapped on the further data model.

Type: Grant

Filed: October 4, 2010

Date of Patent: October 31, 2017

Assignee: TELEFONAKTIEBOLAGET L M ERICSSON

Inventors: Johan Hjelm, Mattias Lidstrom, Mona Matti
Always-on low-power keyword spotting

Patent number: 9703350

Abstract: The invention relates to an electronic device that includes a wake-up system that operates at a substantially low power level and is applied to wake up the electronic device from a sleep mode. The wake-up system comprises a sound transducer that converts a received sound signal to an electrical signal and a keyword detection logic that preliminarily identifies a speech energy profile that corresponds to at least one of a plurality of keywords in a part of the electrical signal. In some embodiments, a keyword finder is further activated to identify with an enhanced accuracy whether the at least one keyword exists in the part of the electrical signal, and generates a wake-up control to activate a host of the electronic device from its sleep mode.

Type: Grant

Filed: June 24, 2013

Date of Patent: July 11, 2017

Assignee: Maxim Integrated Products, Inc.

Inventors: Vivek Nigam, Yadong Wang, Anthony Stephen Doy, Todd D. Moore
Method and apparatus for generating data in a missing segment of a time data sequence

Patent number: 9684872

Abstract: A method and an apparatus for generating data in a missing segment of a target time data sequence are disclosed. The method includes: determining whether there is a breakpoint in the missing segment; determining candidate values of the data in the missing segment; and generating values of the data in the missing segment by selectively using the candidate values of the data in the missing segment, according to whether there is the breakpoint in the missing segment. With the method and the apparatus, the data in the missing segment of the target time data sequence can be generated more accurately.

Type: Grant

Filed: June 4, 2015

Date of Patent: June 20, 2017

Assignee: International Business Machines Corporation

Inventors: Wei S. Dong, Wen Q. Huang, Chang S. Li, Yu Wang, Junchi Yan, Chao Zhang, Xin Zhang, Xiu F. Zhu
Efficient discrimination of voiced and unvoiced sounds

Patent number: 9454976

Abstract: A method is disclosed for discriminating voiced and unvoiced sounds in speech. The method detects characteristic waveform features of voiced and unvoiced sounds, by applying integral and differential functions to the digitized sound signal in the time domain. Laboratory tests demonstrate extremely high reliability in separating voiced and unvoiced sounds. The method is very fast and computationally efficient. The method enables voice activation in resource-limited and battery-limited devices, including mobile devices, wearable devices, and embedded controllers. The method also enables reliable command identification in applications that recognize only predetermined commands. The method is suitable as a pre-processor for natural language speech interpretation, improving recognition and responsiveness. The method enables realtime coding or compression of speech according to the sound type, improving transmission efficiency.

Type: Grant

Filed: April 15, 2014

Date of Patent: September 27, 2016

Inventor: David Edward Newman
Systems and methods for determining content preferences based on vocal utterances and/or movement by a user

Patent number: 9202520

Abstract: This disclosure relates to systems and methods for determining when a user likes a piece of content based, at least in part, on analyzing user responses to the content. In one embodiment, the user's response may be monitored by audio and motion detection devices to determine when the user's vocals or movements are emulating the content. When the user's emulation exceeds a threshold amount the content may be designated as “liked.” In certain instances, a similar piece of content may be selected to play when the current content is finished.

Type: Grant

Filed: October 17, 2012

Date of Patent: December 1, 2015

Assignee: Amazon Technologies, Inc.

Inventor: Joshua K. Tang
Multi-language speech recognition system

Patent number: 9190063

Abstract: A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different natural languages are used to support and detect a natural language spoken by a user. In some implementations an interactive electronic agent responds in the user's native language to facilitate an real-time, human like dialog.

Type: Grant

Filed: October 31, 2007

Date of Patent: November 17, 2015

Assignee: Nuance Communications, Inc.

Inventors: Ian Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
Low latency real-time vocal tract length normalization

Patent number: 9165555

Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Grant

Filed: November 26, 2014

Date of Patent: October 20, 2015

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
Method and system for extracting audio features from an encoded bitstream for audio classification

Patent number: 9123350

Abstract: A method and system for extracting audio features from an encoded bitstream for audio classification. The method comprises partially decoding the encoded bitstream; obtaining uniform window block size spectral coefficients of the encoded bitstream; and extracting audio features based on the uniform window block spectral coefficients.

Type: Grant

Filed: December 14, 2005

Date of Patent: September 1, 2015

Assignee: Panasonic Intellectual Property Management Co., Ltd.

Inventor: Ying Zhao
Distributed real time speech recognition system

Patent number: 9076448

Abstract: A real-time system incorporating speech recognition and linguistic processing for recognizing a spoken query by a user and distributed between client and server, is disclosed. The system accepts user's queries in the form of speech at the client where minimal processing extracts a sufficient number of acoustic speech vectors representing the utterance. These vectors are sent via a communications channel to the server where additional acoustic vectors are derived. Using Hidden Markov Models (HMMs), and appropriate grammars and dictionaries conditioned by the selections made by the user, the speech representing the user's query is fully decoded into text (or some other suitable form) at the server. This text corresponding to the user's query is then simultaneously sent to a natural language engine and a database processor where optimized SQL statements are constructed for a full-text search from a database for a recordset of several stored questions that best matches the user's query.

Type: Grant

Filed: October 10, 2003

Date of Patent: July 7, 2015

Assignee: Nuance Communications, Inc.

Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
Method and printing system for designing code books for super resolution encoding

Patent number: 9066112

Abstract: A method of designing a code book for super resolution encoding. The method includes, for example, via a processor, creating a first group of entries in the code book that includes a plurality of gray font values for encoding data; via the processor, creating a second group of entries in the code book that includes a set of values for each of the gray font values for decoding data; via the processor, creating a third group of entries in the code book that includes a pattern corresponding to each of the plurality of gray font values; and storing the code book in a database in communication with the processor.

Type: Grant

Filed: August 2, 2012

Date of Patent: June 23, 2015

Assignee: XEROX CORPORATION

Inventors: Guo-Yau Lin, Farzin Blurfrushan
Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program

Patent number: 9025777

Abstract: An audio signal decoder for providing a decoded multi-channel audio signal representation on the basis of an encoded multi-channel audio signal representation has a time warp decoder configured to selectively use individual audio channel specific time warp contours or a joint multi-channel time warp contour for a reconstruction of a plurality of audio channels represented by the encoded multi-channel audio signal representation.

Type: Grant

Filed: July 1, 2009

Date of Patent: May 5, 2015

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Stefan Bayer, Sascha Disch, Ralf Geiger, Guillaume Fuchs, Max Neuendorf, Gerald Schuller, Bernd Edler
Methods and systems for synchronizing media

Patent number: 8996380

Abstract: Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.

Type: Grant

Filed: May 4, 2011

Date of Patent: March 31, 2015

Assignee: Shazam Entertainment Ltd.

Inventors: Avery Li-Chun Wang, Rahul Powar, William Michael Mills, Christopher Jacques Penrose Barton, Philip Georges Inghelbrecht, Dheeraj Shankar Mukherjee
Frequency axis warping factor estimation apparatus, system, method and program

Patent number: 8909518

Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.

Type: Grant

Filed: September 22, 2008

Date of Patent: December 9, 2014

Assignee: NEC Corporation

Inventor: Tadashi Emori
Low latency real-time vocal tract length normalization

Patent number: 8909527

Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Grant

Filed: June 24, 2009

Date of Patent: December 9, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
Enhanced interface for use with speech recognition

Patent number: 8909538

Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.

Type: Grant

Filed: November 11, 2013

Date of Patent: December 9, 2014

Assignee: Verizon Patent and Licensing Inc.

Inventor: James Mark Kondziela

1 2 3 4 next