Subportions Patents (Class 704/254)

System and method for simultaneous touch and voice control

Patent number: 10528318

Abstract: For a mobile computing device, enhanced functionality may be provided by associating actions with combined speech and touch gestures. A touch gesture is received into the device and simultaneously or near-simultaneously speech is received into the device. The touch gesture and speech are processed to determine a result and the device performs an action based on the result. In particular embodiments, commands for a mapping application may be based on spoken search terms and geographic areas marked by touch gestures.

Type: Grant

Filed: November 2, 2016

Date of Patent: January 7, 2020

Assignee: OPEN INNOVATION NETWORK LLC

Inventor: David Gerard Ledet
System and method for segmenting audio files for transcription

Patent number: 10522135

Abstract: A system and method for segmenting an audio file. The method includes analyzing an audio file, wherein the analyzing includes identifying speech recognition features within the audio file; generating metadata based on the audio file, wherein the metadata includes transcription characteristics of the audio file; and determining a segmenting interval for the audio file based on the speech recognition features and the metadata.

Type: Grant

Filed: December 31, 2017

Date of Patent: December 31, 2019

Assignee: Verbit Software Ltd.

Inventors: Tom Livne, Kobi Ben Tzvi, Eric Shellef
Compressed recurrent neural network models

Patent number: 10515307

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

Type: Grant

Filed: June 3, 2016

Date of Patent: December 24, 2019

Assignee: Google LLC

Inventors: Tara N. Sainath, Vikas Sindhwani
Increasing activation cue uniqueness

Patent number: 10510350

Abstract: One embodiment provides a method, including receiving, at an audio capture device, a customized activation cue; identifying, using a processor, contextual information associated with a user; analyzing, using the contextual information, characteristics of the customized activation cue; identifying, based on the analyzation, a uniqueness associated with the customized activation cue; and responsive to said identifying, notifying a user that the customized activation cue has inadequate uniqueness. Other aspects are described and claimed.

Type: Grant

Filed: March 30, 2016

Date of Patent: December 17, 2019

Assignee: Lenovo (Singapore) Pte. Ltd.

Inventors: Aaron Michael Stewart, Rod D. Waltermann, Russell Speight VanBlon
Content analysis to enhance voice search

Patent number: 10445360

Abstract: Methods and apparatus for improving speech recognition accuracy in media content searches are described. An advertisement for a media content item is analyzed to identify keywords that may describe the media content item. The identified keywords are associated with the media content item for use during a voice search to locate the media content item. A user may speak the one or more of the keywords as a search input and be provided with the media content item as a result of the search.

Type: Grant

Filed: November 24, 2015

Date of Patent: October 15, 2019

Assignee: Comcast Cable Communications, LLC

Inventor: George Thomas Des Jardins
Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network

Patent number: 10431205

Abstract: A dialog device comprises a natural language interfacing device (chat interface or a telephonic device), and a natural language output device (the chat interface, a display device, or a speech synthesizer outputting to the telephonic device). A computer stores natural language dialog conducted via the interfacing device and constructs a current utterance word-by-word. Each word is chosen by applying a plurality of language models to a context comprising concatenation of the stored dialog and the current utterance thus far. Each language model outputs a distribution over the words of a vocabulary. A recurrent neural network (RNN) is applied to the distributions to generate a mixture distribution. The next word is chosen using the mixture distribution. The output device outputs the current natural language utterance after it has been constructed by the computer.

Type: Grant

Filed: April 27, 2016

Date of Patent: October 1, 2019

Assignee: CONDUENT BUSINESS SERVICES, LLC

Inventors: Phong Le, Marc Dymetman, Jean-Michel Renders
Systems and methods for multi-user mutli-lingual communications

Patent number: 10417351

Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.

Type: Grant

Filed: October 18, 2018

Date of Patent: September 17, 2019

Assignee: MZ IP Holdings, LLC

Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
Methods and systems for segmenting multimedia content

Patent number: 10404806

Abstract: A method and a system are provided for segmenting a multimedia content. The method estimates a count of a plurality of multimedia segments in the multimedia content, and a duration of each of the plurality of multimedia segments in the multimedia content. The method determines a cost function associated with a multimedia segment from the plurality of multimedia segments, based on the count of the plurality of multimedia segments, and the duration of each of the plurality of multimedia segments. The method further determines an updated count of the plurality of multimedia segments, and an updated duration of each of the plurality of multimedia segments until the cost function satisfies a pre-defined criteria. Based on the updated count of the plurality of multimedia segments, and the updated duration of each of the plurality of multimedia segments, the method segments the multimedia content into the plurality of multimedia segments.

Type: Grant

Filed: September 1, 2015

Date of Patent: September 3, 2019

Inventors: Arijit Biswas, Ankit Gandhi, Ranjeet Kumar, Om D Deshmukh
Voice processing device and voice processing method for impression evaluation

Patent number: 10403289

Abstract: A voice processing device includes: a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: detecting a first utterance section included in a first voice and a second utterance section included in a second voice; specifying an overlapping section within which the first utterance section and the second utterance section overlap with each other; calculating a first utterance continuation section from a start point of the overlapping section to an end point of the first utterance section; and evaluating an impression regarding the first voice at least on the basis of information relating to a length of the first utterance continuation section.

Type: Grant

Filed: November 25, 2015

Date of Patent: September 3, 2019

Assignee: FUJITSU LIMITED

Inventors: Taro Togawa, Chisato Shioda, Sayuri Kohmura, Takeshi Otani
Custodian disambiguation and data matching

Patent number: 10394852

Abstract: Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format; grouping the information sets in the unified format into indexing buckets based on a user name using a non-phonetic algorithm; determining a similarity score for each pair of information sets in each of the indexing buckets; classifying each information set pair into a set of classes based on the similarity scores, wherein the set of classes comprise at least matches and non-matches; and using a data structure for merging information of information set pairs classified as matches.

Type: Grant

Filed: March 11, 2016

Date of Patent: August 27, 2019

Assignee: International Business Machines Corporation

Inventors: Lars Bremer, Thomas A. P. Hampp-Bahnmueller, Markus Lorch, Pavlo Petrenko, Sebastian B. Schmid
Display apparatus and method for registration of user command

Patent number: 10381004

Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.

Type: Grant

Filed: October 17, 2017

Date of Patent: August 13, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-yeong Kwon, Kyung-mi Park
Method for recognizing voice signal and electronic device supporting the same

Patent number: 10354643

Abstract: An electronic device is provided including at least one microphone, a communication circuit, a processor and a memory, wherein the memory stores at least one application program or a software program executing a voice instruction, which is triggered in response to a voice input, upon the performance, the memory stores instructions to allow the processor to sequentially receive a plurality of utterances including a first speech element from a first user through the at least one microphone, generate a voice recognition model of the first user on the basis of at least some of the plurality of utterances, store the generated voice recognition model in the memory, and transmit the generated voice recognition model of the first user to the outside through the communication circuit so that a first external device uses the generated voice recognition model of the first user.

Type: Grant

Filed: October 13, 2016

Date of Patent: July 16, 2019

Assignee: Samsung Electronics Co., Ltd.

Inventors: Subhojit Chakladar, Junhui Kim
Display apparatus and method for registration of user command

Patent number: 10347251

Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.

Type: Grant

Filed: October 17, 2017

Date of Patent: July 9, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-yeong Kwon, Kyung-mi Park
Machine learning via a two-dimensional symbol

Patent number: 10331967

Abstract: Methods of facilitating machine learning via a 2-D symbol are disclosed. Features of an object are received in a first computing system having a 2-D symbol creation application module installed thereon. A multi-layer 2-D symbol is formed from the features according to a set of symbol creation rules. 2-D symbol is a matrix of N×N pixels partitioned into a number of sub-matrices with each sub-matrix containing one feature, where N is a positive integer. Meaning of the combined features in the 2-D symbol is learned in a second computing system by using an image processing technique to classify the 2-D symbol transmitted from the first computing system. The symbol creation rules determine the importance order, size and location of sub-matrices in the 2-D symbol.

Type: Grant

Filed: December 5, 2018

Date of Patent: June 25, 2019

Assignee: Gyrfalcon Technology Inc.

Inventors: Lin Yang, Baohua Sun
Server side hotwording

Patent number: 10311876

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Grant

Filed: February 14, 2017

Date of Patent: June 4, 2019

Assignee: Google LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Social-based spelling correction for online social networks

Patent number: 10303731

Abstract: In one embodiment, a method includes, receiving, from a client system of a user, a search query including n-grams. The method includes associating each n-gram with verticals based on an analysis of the n-grams by language models. The method includes determining, for each n-gram, if a bloom filter for a vertical associated with the n-gram indicates, based on sub-bloom filters of the bloom filter, the n-gram does exist or does not exist in a set of object names associated with the vertical. Each sub-bloom filter is associated with a subset of the set of object names and indicates the n-gram does exist or does not exist in its subset of object names. The method includes sending, to the client system, an indication that an n-gram of the n-grams is misspelled if a bloom filter indicates the n-gram does not exist in the set of object names associated with the vertical.

Type: Grant

Filed: May 1, 2017

Date of Patent: May 28, 2019

Assignee: Facebook, Inc.

Inventors: Ian Douglas Hegerty, Daniel Bernhardt, Feng Liang, Agnieszka Anna Podsiadlo
Classifying videos using neural networks

Patent number: 10289912

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying videos using neural networks. One of the methods includes obtaining a temporal sequence of video frames, wherein the temporal sequence comprises a respective video frame from a particular video at each of a plurality time steps; for each time step of the plurality of time steps: processing the video frame at the time step using a convolutional neural network to generate features of the video frame; and processing the features of the video frame using an LSTM neural network to generate a set of label scores for the time step and classifying the video as relating to one or more of the topics represented by labels in the set of labels from the label scores for each of the plurality of time steps.

Type: Grant

Filed: April 29, 2016

Date of Patent: May 14, 2019

Assignee: Google LLC

Inventors: Sudheendra Vijayanarasimhan, George Dan Toderici, Yue Hei Ng, Matthew John Hausknecht, Oriol Vinyals, Rajat Monga
Method of controlling function execution in a mobile terminal by recognizing writing gesture and apparatus for performing the same

Patent number: 10282084

Abstract: Methods and apparatus are provided for executing a function of a mobile terminal by recognizing a writing gesture. The writing gesture that is inputted on a touchscreen of the mobile terminal is detected. At least one target item to which the writing gesture applies is determined. A preset writing gesture of the at least one target item is compared with the detected writing gesture to determine whether the preset writing gesture is at least similar to the detected writing gesture. An execution command corresponding to the preset writing gesture is extracted, when it is determined that the detected writing gesture is at least similar to the preset writing gesture. The function of the at least one target item is executed by the execution command.

Type: Grant

Filed: August 20, 2013

Date of Patent: May 7, 2019

Assignee: Samsung Electronics Co., Ltd

Inventor: Musik Kwon
Speech signal processing circuit

Patent number: 10249318

Abstract: A speech-signal-processing-circuit configured to receive a time-frequency-domain-reference-speech-signal and a time-frequency-domain-degraded-speech-signal. The time-frequency-domain-reference-speech-signal comprises: an upper-band-reference-component with frequencies that are greater than a frequency-threshold-value; and a lower-band-reference-component with frequencies that are less than the frequency-threshold-value. The time-frequency-domain-degraded-speech-signal comprises: an upper-band-degraded-component with frequencies that are greater than the frequency-threshold-value; and a lower-band-degraded-component with frequencies that are less than the frequency-threshold-value.

Type: Grant

Filed: March 20, 2017

Date of Patent: April 2, 2019

Assignee: NXP B.V.

Inventors: Magdalena Kaniewska, Wouter Joos Tirry, Cyril Guillaumé, Johannes Abel, Tim Fingscheidt
System and method for higher order long short-term memory (LSTM) network

Patent number: 10241684

Abstract: A method and apparatus are provided. The method includes configuring a plurality of long short term memory (LSTM) networks, wherein each of the plurality of LSTM networks is at a different network layer, configuring a plurality of memory cells in a spatial domain of the plurality of LSTM networks, configuring the plurality of memory cells in a temporal domain of the plurality of LSTM networks, controlling an output of each of the plurality of LSTM networks based on highway connections to outputs from at least one previous layer and at least one previous time of the plurality of LSTM networks, and controlling the plurality of memory cells based on highway connections to memory cells from the at least one previous time.

Type: Grant

Filed: April 5, 2017

Date of Patent: March 26, 2019

Assignee: Samsung Electronics Co., Ltd

Inventors: Jaeyoung Kim, Inyup Kang, Mostafa El-Khamy, Jungwon Lee
Natural language translation interface for networked devices

Patent number: 10235353

Abstract: Embodiments for translating an input message into a device specific command for a network interface device, by: receiving the input message as a generalized language message at an input interface; separating the input message into its language parts to identify keyword elements; identifying keyword actions, targets, and variables used to indicate corresponding device specific commands; classifying the keyword elements against a learned language map to identify a best match action; utilizing the best match action to access a playlist data set for the device specific commands of the target device for execution; and providing a feedback path to a learning mechanism for adding new message and language semantics into the learned language map when identification of the best match action is unclear.

Type: Grant

Filed: September 15, 2017

Date of Patent: March 19, 2019

Assignee: Dell Products LP

Inventors: Mark S Sanders, Gavin R Cato
Interpretation distributing device, control device, terminal device, interpretation distributing method, control method, information processing method, and program

Patent number: 10235364

Abstract: An interpretation distributing device includes: an interpreted voice acquiring unit that acquires at least one piece of interpreted voice data of two or more pieces of interpreted voice data which are voice data obtained by interpreting voice in a first language into voice in two or more different languages; and an interpreted voice transmitting unit that transmits at least one piece of the interpreted voice data acquired by the interpreted voice acquiring unit to one or more terminal devices.

Type: Grant

Filed: April 11, 2016

Date of Patent: March 19, 2019

Assignee: SHIN TRADING CO., LTD.

Inventor: Jungbum Shin
Speech recognition with acoustic models

Patent number: 10192556

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

Type: Grant

Filed: November 13, 2017

Date of Patent: January 29, 2019

Assignee: Google LLC

Inventors: Hasim Sak, Andrew W. Senior
Providing pre-computed hotword models

Patent number: 10186268

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.

Type: Grant

Filed: January 19, 2018

Date of Patent: January 22, 2019

Assignee: Google LLC

Inventor: Matthew Sharifi
Method and system for acoustic data selection for training the parameters of an acoustic model

Patent number: 10157610

Abstract: A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.

Type: Grant

Filed: December 21, 2017

Date of Patent: December 18, 2018

Inventors: Vivek Tyagi, Aravind Ganapathiraju, Felix Immanuel Wyss
Systems and methods for multi-user mutli-lingual communications

Patent number: 10146773

Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.

Type: Grant

Filed: November 6, 2017

Date of Patent: December 4, 2018

Assignee: MZ IP Holdings, LLC

Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
Electronic device and voice recognition method thereof

Patent number: 10134390

Abstract: An electronic device includes a memory configured to store a user pronunciation lexicon, a voice input unit configured to receive a user's uttered voice, and a processor configured to extract a user pronunciation pattern from the received uttered voice and to update the user pronunciation lexicon according to a pronunciation pattern rule generated based on the extracted pronunciation pattern.

Type: Grant

Filed: August 3, 2016

Date of Patent: November 20, 2018

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Sung-hwan Shin
Systems and methods for training a model to determine whether a query with multiple segments comprises multiple distinct commands or a combined command

Patent number: 10133735

Abstract: Systems and methods are disclosed herein for training a model to accurately determine whether two phrases are conversationally connected. A media guidance application may detect a first phrase and a second phrase, translate each phrase to a string of word types, append each string to the back of a prior string to create a combined string, determine a degree to which any of the individual strings matches any singleton template, and determine a degree to which the combined string matches any conversational template. Based on the degrees to which the individual and combination strings match the singleton and conversational templates, respectively, strengths of association are correspondingly updated.

Type: Grant

Filed: February 29, 2016

Date of Patent: November 20, 2018

Assignee: Rovi Guides, Inc.

Inventors: Sashikumar Venkataraman, Ahmed Nizam Mohaideen P, Manik Malhotra
Voice action biasing system

Patent number: 10089982

Abstract: Methods, systems, and apparatus for determining that a software application installed on a user device is compatible with a new voice action, wherein the new voice action is specified by an application developer of the software application. One or more trigger terms for triggering the software application to perform the new voice action are identified. An automatic speech recognizer is biased to prefer the identified trigger terms of the new voice action over trigger terms of other voice actions. A transcription of an utterance generated by the biased automatic speech recognizer is obtained. The transcription of the utterance generated by the biased automatic speech recognizer is determined to include a particular trigger term included in the identified trigger terms. Based at least on determining that the transcription of the utterance generated by the biased automatic speech recognizer includes the particular trigger term, execution of the new voice action is triggered.

Type: Grant

Filed: June 8, 2017

Date of Patent: October 2, 2018

Assignee: GOOGLE LLC

Inventors: Bo Wang, Sunil Vemuri, Barnaby John James, Pravir Kumar Gupta, Scott B. Huffman
Word alignment score computing apparatus, word alignment apparatus, and computer program

Patent number: 10049105

Abstract: [Object] An object is to provide an apparatus for attaining highly precise word alignment. [Solution] The apparatus includes: selecting means receiving a bilingual sentence pair and a word alignment for the bilingual sentence pair, for successively selecting words fj of a sentence in a first language in a prescribed order; and a recurrent neural network (RNN) 100, computing, for all words of the sentence in the first language, a score 102 representing a probability that a word pair consisting of the word fj and a word ea_{j} aligned with the word fj by a word alignment aj in a second language of the bilingual sentence pair is a correct word pair, and based on this score, for computing a score of the word alignment aj. When computing a score of word pair (fj, ea_{j}), RNN 100 computes a score 102 of the word pair (fj, ea_{j}) based on all word alignments a1j-1 selected by the selecting means prior to the word fj of the word pair (fj, ea_{j}), of the word alignments aj, by means of a recurrent connection 118.

Type: Grant

Filed: February 12, 2015

Date of Patent: August 14, 2018

Assignee: National Institute of Information and Communications Technology

Inventors: Akihiro Tamura, Taro Watanabe, Eiichiro Sumita
Creating, rendering and interacting with a multi-faceted audio cloud

Patent number: 10013485

Abstract: Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.

Type: Grant

Filed: August 31, 2012

Date of Patent: July 3, 2018

Assignee: International Business Machines Corporation

Inventors: Jitendra Ajmera, Om Dadaji Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput
Creating, rendering and interacting with a multi-faceted audio cloud

Patent number: 10007724

Abstract: Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.

Type: Grant

Filed: June 29, 2012

Date of Patent: June 26, 2018

Assignee: International Business Machines Corporation

Inventors: Jitendra Ajmera, Om Dadaji Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput
Method for exemplary voice morphing

Patent number: 9984700

Abstract: A method of morphing speech from an original speaker into the speech of a second, target speaker with decomposing either speech into source and filter, and without the need to determine the formant positions by warping spectral envelops.

Type: Grant

Filed: November 9, 2012

Date of Patent: May 29, 2018

Assignee: SPEECH MORPHING SYSTEMS, INC.

Inventor: Jordan Cohen
Pronunciation accuracy in speech recognition

Patent number: 9978364

Abstract: A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string.

Type: Grant

Filed: March 28, 2016

Date of Patent: May 22, 2018

Assignee: International Business Machines Corporation

Inventors: Gakuto Kurata, Masafumi Nishimura, Ryuki Tachibana
Method and system for acoustic data selection for training the parameters of an acoustic model

Patent number: 9972306

Abstract: A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.

Type: Grant

Filed: August 5, 2013

Date of Patent: May 15, 2018

Assignee: Interactive Intelligence Group, Inc.

Inventors: Vivek Tyagi, Aravind Ganapathiraju, Felix Immanuel Wyss
Location-based speech recognition for preparation of electronic tax return

Patent number: 9946699

Abstract: Methods, systems and articles of manufacture for location-based speech recognition for preparation of an electronic tax return.

Type: Grant

Filed: August 29, 2012

Date of Patent: April 17, 2018

Assignee: INTUIT INC.

Inventors: Christopher M. Dye, Azhar M. Zuberi, Richard E. McVickar
Performance detection and enhancement of machine translation

Patent number: 9940324

Abstract: In an approach for evaluating performance of machine translation, a processor receives a first document in a source language. A processor translates the first document in the source language to a second document in a target language, based, at least in part, on a first quantity of information. A processor evaluates the second document in the target language, based, at least, on one or more aspects of the translation. A processor determines, based, at least in part, on the evaluation, the second document in the target language meets a predetermined threshold.

Type: Grant

Filed: August 13, 2015

Date of Patent: April 10, 2018

Assignee: International Business Machines Corporation

Inventors: Mohamed A. Bahgat, Ossama Emam, Ayman S Hanafy, Sara A. Noeman
Method and apparatus for speech recognition

Patent number: 9940933

Abstract: A speech recognition method includes receiving a sentence generated through speech recognition, calculating a degree of suitability for each word in the sentence based on a relationship of each word with other words in the sentence, detecting a target word to be corrected among the words in the sentence based on the degree of suitability for each word, and replacing the target word with any one of candidate words corresponding to the target word.

Type: Grant

Filed: September 30, 2015

Date of Patent: April 10, 2018

Assignee: Samsung Electronics Co., Ltd.

Inventors: Heeyoul Choi, Hoshik Lee
In store voice picking system

Patent number: 9928531

Abstract: Systems, methods, devices, and non-transitory processor readable media of the various embodiments enable in store voice picking systems. In various embodiments, an end-to-end voice ordering and fulfillment system for a retail store may enable a customer to place an order over the phone, a personal shopper to be directed to fill the order via voice commands, and the order to be made available for pickup at the retail store by the customer or delivered to the customer from the retail store.

Type: Grant

Filed: February 20, 2015

Date of Patent: March 27, 2018

Assignee: Intelligrated Headquarters LLC

Inventor: Michael Donovan McCarthy
Stochastic modeling of user interactions with a detection system

Patent number: 9899021

Abstract: Features are disclosed for modeling user interaction with a detection system using a stochastic dynamical model in order to determine or adjust detection thresholds. The model may incorporate numerous features, such as the probability of false rejection and false acceptance of a user utterance and the cost associated with each potential action. The model may determine or adjust detection thresholds so as to minimize the occurrence of false acceptances and false rejections while preserving other desirable characteristics. The model may further incorporate background and speaker statistics. Adjustments to the model or other operation parameters can be implemented based on the model, user statistics, and/or additional data.

Type: Grant

Filed: December 20, 2013

Date of Patent: February 20, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Rohit Prasad
System and method to perform textual queries on voice communications

Patent number: 9898536

Abstract: Methods and systems to perform textual queries on voice communications. The system has an index service for storing a audio content data sets for voice communications. The audio content data sets include at least three audio content data sets for each voice communication. The three audio content data sets include a first audio content data set generated using a speech-to-text conversion technique, a second audio content data set generated using a phoneme lattice technique, and a third audio content data set generated using a keyword identification technique. The system includes a search engine configured to: receive search criteria from a user, the search criteria having at least one keyword; search each of the first, second and third audio content data sets for at least a portion of the plurality of voice communications to identify voice communications matching the search criteria; and combine the voice communications identified by each search to produce a combined list of identified voice communications.

Type: Grant

Filed: June 27, 2013

Date of Patent: February 20, 2018

Assignees: JAJAH LTD., Telefonica, S.A.

Inventors: Diego Urdiales Delgado, John Eugene Neystadt
Systems and methods for multi-user mutli-lingual communications

Patent number: 9836459

Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.

Type: Grant

Filed: February 15, 2017

Date of Patent: December 5, 2017

Assignee: Machine Zone, Inc.

Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
Electronic apparatus and voice processing method thereof

Patent number: 9830911

Abstract: Apparatuses and methods related an electronic apparatus and a voice processing method thereof are provided. More particularly, the apparatuses and methods relate to an electronic apparatus capable of recognizing a user's voice and a voice processing method thereof. An electronic apparatus includes: a voice recognizer configured to recognize a user's voice; a storage configured to have previously stored instructions; a function executor which performs a predetermined function; and a controller configured to control the function executor to execute the function in response to the instruction in response to a user's voice corresponding to the instruction being input, and controls the function executor to execute the function in accordance with results of an external server which analyzes a user's voice in response to a preset dialog selection signal and a dialog voice for executing the function being input by a user.

Type: Grant

Filed: November 6, 2013

Date of Patent: November 28, 2017

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Joo-yeong Lee, Sang-shin Park
Background audio identification for speech disambiguation

Patent number: 9812123

Abstract: Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.

Type: Grant

Filed: August 13, 2015

Date of Patent: November 7, 2017

Assignee: Google Inc.

Inventors: Jason Sanders, Gabriel Taubman, John J. Lee
Translation station

Patent number: 9779088

Abstract: An interactive electronic translation and communications process is provided for use in a translation station that provides a mobile or stationary fixed interactive facility for interviews or interrogations to be carried out between two persons speaking in different languages. The process can be assisted by animated virtual characters (avatars) realistically created and displayed on a computer screen to represent ethnic looks from around the globe. The avatars can be lip synchronized to deliver messages to the interviewee in the interviewee's languages and can guide the users and interviewee through a series of questions and answers. Biometric conditions of the interviewee and electronic identification of the interviewee can also be readily accomplished by the novel process. The process is particularly useful for hospitals, law enforcement, military, airport security, transportation terminals, financial institutions, and government agencies.

Type: Grant

Filed: February 12, 2016

Date of Patent: October 3, 2017

Inventor: David Lynton Jephcott
Method and apparatus for controlling play of an audio signal

Patent number: 9762963

Abstract: Apparatus and methods conforming to the present invention comprise a method of controlling playback of an audio signal through analysis of a corresponding close caption signal in conjunction with analysis of the corresponding audio signal. Objection text or other specified text in the close caption signal is identified through comparison with user identified objectionable text. Upon identification of the objectionable text, the audio signal is analyzed to identify the audio portion corresponding to the objectionable text. Upon identification of the audio portion, the audio signal may be controlled to mute the audible objectionable text.

Type: Grant

Filed: June 12, 2015

Date of Patent: September 12, 2017

Assignee: ClearPlay, Inc.

Inventors: Matthew T. Jarman, William S. Meisel
Identifying substitute pronunciations

Patent number: 9747897

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, including selecting terms; obtaining an expected phonetic transcription of an idealized native speaker of a natural language speaking the terms; receiving audio data corresponding to a particular user speaking the terms in the natural language; obtaining, based on the audio data, an actual phonetic transcription of the particular user speaking the terms in the natural language; aligning the expected phonetic transcription of the idealized native speaker of the natural language with the actual phonetic transcription of the particular user; identifying, based on the aligning, a portion of the expected phonetic transcription that is different than a corresponding portion of the actual phonetic transcription; and based on identifying the portion of the expected phonetic transcription, designating the expected phonetic transcription as a substitute pronunciation for the corresponding portion of the actual phonetic tr

Type: Grant

Filed: December 17, 2013

Date of Patent: August 29, 2017

Assignee: Google Inc.

Inventors: Fuchun Peng, Francoise Beaufays, Pedro J. Moreno Mengibar, Brian Patrick Strope
Estimation of reliability in speaker recognition

Patent number: 9741346

Abstract: A method for estimating the reliability of a result of a speaker recognition system concerning a testing audio and a speaker model, which is based on one, two, three or more model audios, the method using a Bayesian Network to estimate whether the result is reliable. In estimating the reliability of the result of the speaker recognition system one, two, three, four or more than four quality measures of the testing audio and one, two, three, four or more than four quality measures of the model audio(s) are used.

Type: Grant

Filed: April 23, 2014

Date of Patent: August 22, 2017

Assignee: AGNITIO, S.L.

Inventors: Carlos Vaquero Avilés-Casco, Luis Buera Rodriguez, Jesús Antonio Villalba López
Maintaining audio communication in a congested communication channel

Patent number: 9712666

Abstract: The invention relates to a communication system and a method of maintaining audio communication in a congested communication channel currently bearing the transmission of speech in audio communication between a sender side and a receiver side, the communication channel having at least one signaling channel and at least one payload channel having a quality of service. During the audio communication the quality of service of the payload channel is monitored. If the quality of service of the payload channel is below a threshold the speech at the respective sender side is converted to text; and transmitted over the retained communication channel to the respective receiver side. The text may be converted back to speech at the receiver side.

Type: Grant

Filed: August 29, 2013

Date of Patent: July 18, 2017

Assignee: Unify GmbH & Co. KG

Inventors: Bizhan Karimi-Cherkandi, Farrokh Mohammadzadeh Kouchri, Schah Walli Ali
Mood monitoring of bipolar disorder using speech analysis

Patent number: 9685174

Abstract: A system that monitors and assesses the moods of subjects with neurological disorders, like bipolar disorder, by analyzing normal conversational speech to identify speech data that is then analyzed through an automated speech data classifier. The classifier may be based on a vector, separator, hyperplane, decision boundary, or other set of rules to classify one or more mood states of a subject. The system classifier is used to assess current mood state, predicted instability, and/or a change in future mood state, in particular for subjects with bipolar disorder.

Type: Grant

Filed: May 1, 2015

Date of Patent: June 20, 2017

Assignee: THE REGENTS OF THE UNIVERSITY OF MICHIGAN

Inventors: Zahi N. Karam, Satinder Singh Baveja, Melvin Mcinnis, Emily Mower Provost

prev 1 2 3 4 5 6 7 … next