Subportions Patents (Class 704/254)
  • Patent number: 10528318
    Abstract: For a mobile computing device, enhanced functionality may be provided by associating actions with combined speech and touch gestures. A touch gesture is received into the device and simultaneously or near-simultaneously speech is received into the device. The touch gesture and speech are processed to determine a result and the device performs an action based on the result. In particular embodiments, commands for a mapping application may be based on spoken search terms and geographic areas marked by touch gestures.
    Type: Grant
    Filed: November 2, 2016
    Date of Patent: January 7, 2020
    Assignee: OPEN INNOVATION NETWORK LLC
    Inventor: David Gerard Ledet
  • Patent number: 10522135
    Abstract: A system and method for segmenting an audio file. The method includes analyzing an audio file, wherein the analyzing includes identifying speech recognition features within the audio file; generating metadata based on the audio file, wherein the metadata includes transcription characteristics of the audio file; and determining a segmenting interval for the audio file based on the speech recognition features and the metadata.
    Type: Grant
    Filed: December 31, 2017
    Date of Patent: December 31, 2019
    Assignee: Verbit Software Ltd.
    Inventors: Tom Livne, Kobi Ben Tzvi, Eric Shellef
  • Patent number: 10515307
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.
    Type: Grant
    Filed: June 3, 2016
    Date of Patent: December 24, 2019
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Vikas Sindhwani
  • Patent number: 10510350
    Abstract: One embodiment provides a method, including receiving, at an audio capture device, a customized activation cue; identifying, using a processor, contextual information associated with a user; analyzing, using the contextual information, characteristics of the customized activation cue; identifying, based on the analyzation, a uniqueness associated with the customized activation cue; and responsive to said identifying, notifying a user that the customized activation cue has inadequate uniqueness. Other aspects are described and claimed.
    Type: Grant
    Filed: March 30, 2016
    Date of Patent: December 17, 2019
    Assignee: Lenovo (Singapore) Pte. Ltd.
    Inventors: Aaron Michael Stewart, Rod D. Waltermann, Russell Speight VanBlon
  • Patent number: 10445360
    Abstract: Methods and apparatus for improving speech recognition accuracy in media content searches are described. An advertisement for a media content item is analyzed to identify keywords that may describe the media content item. The identified keywords are associated with the media content item for use during a voice search to locate the media content item. A user may speak the one or more of the keywords as a search input and be provided with the media content item as a result of the search.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: October 15, 2019
    Assignee: Comcast Cable Communications, LLC
    Inventor: George Thomas Des Jardins
  • Patent number: 10431205
    Abstract: A dialog device comprises a natural language interfacing device (chat interface or a telephonic device), and a natural language output device (the chat interface, a display device, or a speech synthesizer outputting to the telephonic device). A computer stores natural language dialog conducted via the interfacing device and constructs a current utterance word-by-word. Each word is chosen by applying a plurality of language models to a context comprising concatenation of the stored dialog and the current utterance thus far. Each language model outputs a distribution over the words of a vocabulary. A recurrent neural network (RNN) is applied to the distributions to generate a mixture distribution. The next word is chosen using the mixture distribution. The output device outputs the current natural language utterance after it has been constructed by the computer.
    Type: Grant
    Filed: April 27, 2016
    Date of Patent: October 1, 2019
    Assignee: CONDUENT BUSINESS SERVICES, LLC
    Inventors: Phong Le, Marc Dymetman, Jean-Michel Renders
  • Patent number: 10417351
    Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.
    Type: Grant
    Filed: October 18, 2018
    Date of Patent: September 17, 2019
    Assignee: MZ IP Holdings, LLC
    Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
  • Patent number: 10404806
    Abstract: A method and a system are provided for segmenting a multimedia content. The method estimates a count of a plurality of multimedia segments in the multimedia content, and a duration of each of the plurality of multimedia segments in the multimedia content. The method determines a cost function associated with a multimedia segment from the plurality of multimedia segments, based on the count of the plurality of multimedia segments, and the duration of each of the plurality of multimedia segments. The method further determines an updated count of the plurality of multimedia segments, and an updated duration of each of the plurality of multimedia segments until the cost function satisfies a pre-defined criteria. Based on the updated count of the plurality of multimedia segments, and the updated duration of each of the plurality of multimedia segments, the method segments the multimedia content into the plurality of multimedia segments.
    Type: Grant
    Filed: September 1, 2015
    Date of Patent: September 3, 2019
    Inventors: Arijit Biswas, Ankit Gandhi, Ranjeet Kumar, Om D Deshmukh
  • Patent number: 10403289
    Abstract: A voice processing device includes: a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: detecting a first utterance section included in a first voice and a second utterance section included in a second voice; specifying an overlapping section within which the first utterance section and the second utterance section overlap with each other; calculating a first utterance continuation section from a start point of the overlapping section to an end point of the first utterance section; and evaluating an impression regarding the first voice at least on the basis of information relating to a length of the first utterance continuation section.
    Type: Grant
    Filed: November 25, 2015
    Date of Patent: September 3, 2019
    Assignee: FUJITSU LIMITED
    Inventors: Taro Togawa, Chisato Shioda, Sayuri Kohmura, Takeshi Otani
  • Patent number: 10394852
    Abstract: Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format; grouping the information sets in the unified format into indexing buckets based on a user name using a non-phonetic algorithm; determining a similarity score for each pair of information sets in each of the indexing buckets; classifying each information set pair into a set of classes based on the similarity scores, wherein the set of classes comprise at least matches and non-matches; and using a data structure for merging information of information set pairs classified as matches.
    Type: Grant
    Filed: March 11, 2016
    Date of Patent: August 27, 2019
    Assignee: International Business Machines Corporation
    Inventors: Lars Bremer, Thomas A. P. Hampp-Bahnmueller, Markus Lorch, Pavlo Petrenko, Sebastian B. Schmid
  • Patent number: 10381004
    Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: August 13, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Nam-yeong Kwon, Kyung-mi Park
  • Patent number: 10354643
    Abstract: An electronic device is provided including at least one microphone, a communication circuit, a processor and a memory, wherein the memory stores at least one application program or a software program executing a voice instruction, which is triggered in response to a voice input, upon the performance, the memory stores instructions to allow the processor to sequentially receive a plurality of utterances including a first speech element from a first user through the at least one microphone, generate a voice recognition model of the first user on the basis of at least some of the plurality of utterances, store the generated voice recognition model in the memory, and transmit the generated voice recognition model of the first user to the outside through the communication circuit so that a first external device uses the generated voice recognition model of the first user.
    Type: Grant
    Filed: October 13, 2016
    Date of Patent: July 16, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Subhojit Chakladar, Junhui Kim
  • Patent number: 10347251
    Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: July 9, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Nam-yeong Kwon, Kyung-mi Park
  • Patent number: 10331967
    Abstract: Methods of facilitating machine learning via a 2-D symbol are disclosed. Features of an object are received in a first computing system having a 2-D symbol creation application module installed thereon. A multi-layer 2-D symbol is formed from the features according to a set of symbol creation rules. 2-D symbol is a matrix of N×N pixels partitioned into a number of sub-matrices with each sub-matrix containing one feature, where N is a positive integer. Meaning of the combined features in the 2-D symbol is learned in a second computing system by using an image processing technique to classify the 2-D symbol transmitted from the first computing system. The symbol creation rules determine the importance order, size and location of sub-matrices in the 2-D symbol.
    Type: Grant
    Filed: December 5, 2018
    Date of Patent: June 25, 2019
    Assignee: Gyrfalcon Technology Inc.
    Inventors: Lin Yang, Baohua Sun
  • Patent number: 10311876
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.
    Type: Grant
    Filed: February 14, 2017
    Date of Patent: June 4, 2019
    Assignee: Google LLC
    Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
  • Patent number: 10303731
    Abstract: In one embodiment, a method includes, receiving, from a client system of a user, a search query including n-grams. The method includes associating each n-gram with verticals based on an analysis of the n-grams by language models. The method includes determining, for each n-gram, if a bloom filter for a vertical associated with the n-gram indicates, based on sub-bloom filters of the bloom filter, the n-gram does exist or does not exist in a set of object names associated with the vertical. Each sub-bloom filter is associated with a subset of the set of object names and indicates the n-gram does exist or does not exist in its subset of object names. The method includes sending, to the client system, an indication that an n-gram of the n-grams is misspelled if a bloom filter indicates the n-gram does not exist in the set of object names associated with the vertical.
    Type: Grant
    Filed: May 1, 2017
    Date of Patent: May 28, 2019
    Assignee: Facebook, Inc.
    Inventors: Ian Douglas Hegerty, Daniel Bernhardt, Feng Liang, Agnieszka Anna Podsiadlo
  • Patent number: 10289912
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying videos using neural networks. One of the methods includes obtaining a temporal sequence of video frames, wherein the temporal sequence comprises a respective video frame from a particular video at each of a plurality time steps; for each time step of the plurality of time steps: processing the video frame at the time step using a convolutional neural network to generate features of the video frame; and processing the features of the video frame using an LSTM neural network to generate a set of label scores for the time step and classifying the video as relating to one or more of the topics represented by labels in the set of labels from the label scores for each of the plurality of time steps.
    Type: Grant
    Filed: April 29, 2016
    Date of Patent: May 14, 2019
    Assignee: Google LLC
    Inventors: Sudheendra Vijayanarasimhan, George Dan Toderici, Yue Hei Ng, Matthew John Hausknecht, Oriol Vinyals, Rajat Monga
  • Patent number: 10282084
    Abstract: Methods and apparatus are provided for executing a function of a mobile terminal by recognizing a writing gesture. The writing gesture that is inputted on a touchscreen of the mobile terminal is detected. At least one target item to which the writing gesture applies is determined. A preset writing gesture of the at least one target item is compared with the detected writing gesture to determine whether the preset writing gesture is at least similar to the detected writing gesture. An execution command corresponding to the preset writing gesture is extracted, when it is determined that the detected writing gesture is at least similar to the preset writing gesture. The function of the at least one target item is executed by the execution command.
    Type: Grant
    Filed: August 20, 2013
    Date of Patent: May 7, 2019
    Assignee: Samsung Electronics Co., Ltd
    Inventor: Musik Kwon
  • Patent number: 10249318
    Abstract: A speech-signal-processing-circuit configured to receive a time-frequency-domain-reference-speech-signal and a time-frequency-domain-degraded-speech-signal. The time-frequency-domain-reference-speech-signal comprises: an upper-band-reference-component with frequencies that are greater than a frequency-threshold-value; and a lower-band-reference-component with frequencies that are less than the frequency-threshold-value. The time-frequency-domain-degraded-speech-signal comprises: an upper-band-degraded-component with frequencies that are greater than the frequency-threshold-value; and a lower-band-degraded-component with frequencies that are less than the frequency-threshold-value.
    Type: Grant
    Filed: March 20, 2017
    Date of Patent: April 2, 2019
    Assignee: NXP B.V.
    Inventors: Magdalena Kaniewska, Wouter Joos Tirry, Cyril Guillaumé, Johannes Abel, Tim Fingscheidt
  • Patent number: 10241684
    Abstract: A method and apparatus are provided. The method includes configuring a plurality of long short term memory (LSTM) networks, wherein each of the plurality of LSTM networks is at a different network layer, configuring a plurality of memory cells in a spatial domain of the plurality of LSTM networks, configuring the plurality of memory cells in a temporal domain of the plurality of LSTM networks, controlling an output of each of the plurality of LSTM networks based on highway connections to outputs from at least one previous layer and at least one previous time of the plurality of LSTM networks, and controlling the plurality of memory cells based on highway connections to memory cells from the at least one previous time.
    Type: Grant
    Filed: April 5, 2017
    Date of Patent: March 26, 2019
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Jaeyoung Kim, Inyup Kang, Mostafa El-Khamy, Jungwon Lee
  • Patent number: 10235353
    Abstract: Embodiments for translating an input message into a device specific command for a network interface device, by: receiving the input message as a generalized language message at an input interface; separating the input message into its language parts to identify keyword elements; identifying keyword actions, targets, and variables used to indicate corresponding device specific commands; classifying the keyword elements against a learned language map to identify a best match action; utilizing the best match action to access a playlist data set for the device specific commands of the target device for execution; and providing a feedback path to a learning mechanism for adding new message and language semantics into the learned language map when identification of the best match action is unclear.
    Type: Grant
    Filed: September 15, 2017
    Date of Patent: March 19, 2019
    Assignee: Dell Products LP
    Inventors: Mark S Sanders, Gavin R Cato
  • Patent number: 10235364
    Abstract: An interpretation distributing device includes: an interpreted voice acquiring unit that acquires at least one piece of interpreted voice data of two or more pieces of interpreted voice data which are voice data obtained by interpreting voice in a first language into voice in two or more different languages; and an interpreted voice transmitting unit that transmits at least one piece of the interpreted voice data acquired by the interpreted voice acquiring unit to one or more terminal devices.
    Type: Grant
    Filed: April 11, 2016
    Date of Patent: March 19, 2019
    Assignee: SHIN TRADING CO., LTD.
    Inventor: Jungbum Shin
  • Patent number: 10192556
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.
    Type: Grant
    Filed: November 13, 2017
    Date of Patent: January 29, 2019
    Assignee: Google LLC
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 10186268
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.
    Type: Grant
    Filed: January 19, 2018
    Date of Patent: January 22, 2019
    Assignee: Google LLC
    Inventor: Matthew Sharifi
  • Patent number: 10157610
    Abstract: A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 18, 2018
    Inventors: Vivek Tyagi, Aravind Ganapathiraju, Felix Immanuel Wyss
  • Patent number: 10146773
    Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.
    Type: Grant
    Filed: November 6, 2017
    Date of Patent: December 4, 2018
    Assignee: MZ IP Holdings, LLC
    Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
  • Patent number: 10134390
    Abstract: An electronic device includes a memory configured to store a user pronunciation lexicon, a voice input unit configured to receive a user's uttered voice, and a processor configured to extract a user pronunciation pattern from the received uttered voice and to update the user pronunciation lexicon according to a pronunciation pattern rule generated based on the extracted pronunciation pattern.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: November 20, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Sung-hwan Shin
  • Patent number: 10133735
    Abstract: Systems and methods are disclosed herein for training a model to accurately determine whether two phrases are conversationally connected. A media guidance application may detect a first phrase and a second phrase, translate each phrase to a string of word types, append each string to the back of a prior string to create a combined string, determine a degree to which any of the individual strings matches any singleton template, and determine a degree to which the combined string matches any conversational template. Based on the degrees to which the individual and combination strings match the singleton and conversational templates, respectively, strengths of association are correspondingly updated.
    Type: Grant
    Filed: February 29, 2016
    Date of Patent: November 20, 2018
    Assignee: Rovi Guides, Inc.
    Inventors: Sashikumar Venkataraman, Ahmed Nizam Mohaideen P, Manik Malhotra
  • Patent number: 10089982
    Abstract: Methods, systems, and apparatus for determining that a software application installed on a user device is compatible with a new voice action, wherein the new voice action is specified by an application developer of the software application. One or more trigger terms for triggering the software application to perform the new voice action are identified. An automatic speech recognizer is biased to prefer the identified trigger terms of the new voice action over trigger terms of other voice actions. A transcription of an utterance generated by the biased automatic speech recognizer is obtained. The transcription of the utterance generated by the biased automatic speech recognizer is determined to include a particular trigger term included in the identified trigger terms. Based at least on determining that the transcription of the utterance generated by the biased automatic speech recognizer includes the particular trigger term, execution of the new voice action is triggered.
    Type: Grant
    Filed: June 8, 2017
    Date of Patent: October 2, 2018
    Assignee: GOOGLE LLC
    Inventors: Bo Wang, Sunil Vemuri, Barnaby John James, Pravir Kumar Gupta, Scott B. Huffman
  • Patent number: 10049105
    Abstract: [Object] An object is to provide an apparatus for attaining highly precise word alignment. [Solution] The apparatus includes: selecting means receiving a bilingual sentence pair and a word alignment for the bilingual sentence pair, for successively selecting words fj of a sentence in a first language in a prescribed order; and a recurrent neural network (RNN) 100, computing, for all words of the sentence in the first language, a score 102 representing a probability that a word pair consisting of the word fj and a word ea_{j} aligned with the word fj by a word alignment aj in a second language of the bilingual sentence pair is a correct word pair, and based on this score, for computing a score of the word alignment aj. When computing a score of word pair (fj, ea_{j}), RNN 100 computes a score 102 of the word pair (fj, ea_{j}) based on all word alignments a1j-1 selected by the selecting means prior to the word fj of the word pair (fj, ea_{j}), of the word alignments aj, by means of a recurrent connection 118.
    Type: Grant
    Filed: February 12, 2015
    Date of Patent: August 14, 2018
    Assignee: National Institute of Information and Communications Technology
    Inventors: Akihiro Tamura, Taro Watanabe, Eiichiro Sumita
  • Patent number: 10013485
    Abstract: Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.
    Type: Grant
    Filed: August 31, 2012
    Date of Patent: July 3, 2018
    Assignee: International Business Machines Corporation
    Inventors: Jitendra Ajmera, Om Dadaji Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput
  • Patent number: 10007724
    Abstract: Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: June 26, 2018
    Assignee: International Business Machines Corporation
    Inventors: Jitendra Ajmera, Om Dadaji Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput
  • Patent number: 9984700
    Abstract: A method of morphing speech from an original speaker into the speech of a second, target speaker with decomposing either speech into source and filter, and without the need to determine the formant positions by warping spectral envelops.
    Type: Grant
    Filed: November 9, 2012
    Date of Patent: May 29, 2018
    Assignee: SPEECH MORPHING SYSTEMS, INC.
    Inventor: Jordan Cohen
  • Patent number: 9978364
    Abstract: A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string.
    Type: Grant
    Filed: March 28, 2016
    Date of Patent: May 22, 2018
    Assignee: International Business Machines Corporation
    Inventors: Gakuto Kurata, Masafumi Nishimura, Ryuki Tachibana
  • Patent number: 9972306
    Abstract: A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: May 15, 2018
    Assignee: Interactive Intelligence Group, Inc.
    Inventors: Vivek Tyagi, Aravind Ganapathiraju, Felix Immanuel Wyss
  • Patent number: 9946699
    Abstract: Methods, systems and articles of manufacture for location-based speech recognition for preparation of an electronic tax return.
    Type: Grant
    Filed: August 29, 2012
    Date of Patent: April 17, 2018
    Assignee: INTUIT INC.
    Inventors: Christopher M. Dye, Azhar M. Zuberi, Richard E. McVickar
  • Patent number: 9940324
    Abstract: In an approach for evaluating performance of machine translation, a processor receives a first document in a source language. A processor translates the first document in the source language to a second document in a target language, based, at least in part, on a first quantity of information. A processor evaluates the second document in the target language, based, at least, on one or more aspects of the translation. A processor determines, based, at least in part, on the evaluation, the second document in the target language meets a predetermined threshold.
    Type: Grant
    Filed: August 13, 2015
    Date of Patent: April 10, 2018
    Assignee: International Business Machines Corporation
    Inventors: Mohamed A. Bahgat, Ossama Emam, Ayman S Hanafy, Sara A. Noeman
  • Patent number: 9940933
    Abstract: A speech recognition method includes receiving a sentence generated through speech recognition, calculating a degree of suitability for each word in the sentence based on a relationship of each word with other words in the sentence, detecting a target word to be corrected among the words in the sentence based on the degree of suitability for each word, and replacing the target word with any one of candidate words corresponding to the target word.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: April 10, 2018
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Heeyoul Choi, Hoshik Lee
  • Patent number: 9928531
    Abstract: Systems, methods, devices, and non-transitory processor readable media of the various embodiments enable in store voice picking systems. In various embodiments, an end-to-end voice ordering and fulfillment system for a retail store may enable a customer to place an order over the phone, a personal shopper to be directed to fill the order via voice commands, and the order to be made available for pickup at the retail store by the customer or delivered to the customer from the retail store.
    Type: Grant
    Filed: February 20, 2015
    Date of Patent: March 27, 2018
    Assignee: Intelligrated Headquarters LLC
    Inventor: Michael Donovan McCarthy
  • Patent number: 9899021
    Abstract: Features are disclosed for modeling user interaction with a detection system using a stochastic dynamical model in order to determine or adjust detection thresholds. The model may incorporate numerous features, such as the probability of false rejection and false acceptance of a user utterance and the cost associated with each potential action. The model may determine or adjust detection thresholds so as to minimize the occurrence of false acceptances and false rejections while preserving other desirable characteristics. The model may further incorporate background and speaker statistics. Adjustments to the model or other operation parameters can be implemented based on the model, user statistics, and/or additional data.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: February 20, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Rohit Prasad
  • Patent number: 9898536
    Abstract: Methods and systems to perform textual queries on voice communications. The system has an index service for storing a audio content data sets for voice communications. The audio content data sets include at least three audio content data sets for each voice communication. The three audio content data sets include a first audio content data set generated using a speech-to-text conversion technique, a second audio content data set generated using a phoneme lattice technique, and a third audio content data set generated using a keyword identification technique. The system includes a search engine configured to: receive search criteria from a user, the search criteria having at least one keyword; search each of the first, second and third audio content data sets for at least a portion of the plurality of voice communications to identify voice communications matching the search criteria; and combine the voice communications identified by each search to produce a combined list of identified voice communications.
    Type: Grant
    Filed: June 27, 2013
    Date of Patent: February 20, 2018
    Assignees: JAJAH LTD., Telefonica, S.A.
    Inventors: Diego Urdiales Delgado, John Eugene Neystadt
  • Patent number: 9836459
    Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.
    Type: Grant
    Filed: February 15, 2017
    Date of Patent: December 5, 2017
    Assignee: Machine Zone, Inc.
    Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
  • Patent number: 9830911
    Abstract: Apparatuses and methods related an electronic apparatus and a voice processing method thereof are provided. More particularly, the apparatuses and methods relate to an electronic apparatus capable of recognizing a user's voice and a voice processing method thereof. An electronic apparatus includes: a voice recognizer configured to recognize a user's voice; a storage configured to have previously stored instructions; a function executor which performs a predetermined function; and a controller configured to control the function executor to execute the function in response to the instruction in response to a user's voice corresponding to the instruction being input, and controls the function executor to execute the function in accordance with results of an external server which analyzes a user's voice in response to a preset dialog selection signal and a dialog voice for executing the function being input by a user.
    Type: Grant
    Filed: November 6, 2013
    Date of Patent: November 28, 2017
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Joo-yeong Lee, Sang-shin Park
  • Patent number: 9812123
    Abstract: Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.
    Type: Grant
    Filed: August 13, 2015
    Date of Patent: November 7, 2017
    Assignee: Google Inc.
    Inventors: Jason Sanders, Gabriel Taubman, John J. Lee
  • Patent number: 9779088
    Abstract: An interactive electronic translation and communications process is provided for use in a translation station that provides a mobile or stationary fixed interactive facility for interviews or interrogations to be carried out between two persons speaking in different languages. The process can be assisted by animated virtual characters (avatars) realistically created and displayed on a computer screen to represent ethnic looks from around the globe. The avatars can be lip synchronized to deliver messages to the interviewee in the interviewee's languages and can guide the users and interviewee through a series of questions and answers. Biometric conditions of the interviewee and electronic identification of the interviewee can also be readily accomplished by the novel process. The process is particularly useful for hospitals, law enforcement, military, airport security, transportation terminals, financial institutions, and government agencies.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: October 3, 2017
    Inventor: David Lynton Jephcott
  • Patent number: 9762963
    Abstract: Apparatus and methods conforming to the present invention comprise a method of controlling playback of an audio signal through analysis of a corresponding close caption signal in conjunction with analysis of the corresponding audio signal. Objection text or other specified text in the close caption signal is identified through comparison with user identified objectionable text. Upon identification of the objectionable text, the audio signal is analyzed to identify the audio portion corresponding to the objectionable text. Upon identification of the audio portion, the audio signal may be controlled to mute the audible objectionable text.
    Type: Grant
    Filed: June 12, 2015
    Date of Patent: September 12, 2017
    Assignee: ClearPlay, Inc.
    Inventors: Matthew T. Jarman, William S. Meisel
  • Patent number: 9747897
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, including selecting terms; obtaining an expected phonetic transcription of an idealized native speaker of a natural language speaking the terms; receiving audio data corresponding to a particular user speaking the terms in the natural language; obtaining, based on the audio data, an actual phonetic transcription of the particular user speaking the terms in the natural language; aligning the expected phonetic transcription of the idealized native speaker of the natural language with the actual phonetic transcription of the particular user; identifying, based on the aligning, a portion of the expected phonetic transcription that is different than a corresponding portion of the actual phonetic transcription; and based on identifying the portion of the expected phonetic transcription, designating the expected phonetic transcription as a substitute pronunciation for the corresponding portion of the actual phonetic tr
    Type: Grant
    Filed: December 17, 2013
    Date of Patent: August 29, 2017
    Assignee: Google Inc.
    Inventors: Fuchun Peng, Francoise Beaufays, Pedro J. Moreno Mengibar, Brian Patrick Strope
  • Patent number: 9741346
    Abstract: A method for estimating the reliability of a result of a speaker recognition system concerning a testing audio and a speaker model, which is based on one, two, three or more model audios, the method using a Bayesian Network to estimate whether the result is reliable. In estimating the reliability of the result of the speaker recognition system one, two, three, four or more than four quality measures of the testing audio and one, two, three, four or more than four quality measures of the model audio(s) are used.
    Type: Grant
    Filed: April 23, 2014
    Date of Patent: August 22, 2017
    Assignee: AGNITIO, S.L.
    Inventors: Carlos Vaquero Avilés-Casco, Luis Buera Rodriguez, Jesús Antonio Villalba López
  • Patent number: 9712666
    Abstract: The invention relates to a communication system and a method of maintaining audio communication in a congested communication channel currently bearing the transmission of speech in audio communication between a sender side and a receiver side, the communication channel having at least one signaling channel and at least one payload channel having a quality of service. During the audio communication the quality of service of the payload channel is monitored. If the quality of service of the payload channel is below a threshold the speech at the respective sender side is converted to text; and transmitted over the retained communication channel to the respective receiver side. The text may be converted back to speech at the receiver side.
    Type: Grant
    Filed: August 29, 2013
    Date of Patent: July 18, 2017
    Assignee: Unify GmbH & Co. KG
    Inventors: Bizhan Karimi-Cherkandi, Farrokh Mohammadzadeh Kouchri, Schah Walli Ali
  • Patent number: 9685174
    Abstract: A system that monitors and assesses the moods of subjects with neurological disorders, like bipolar disorder, by analyzing normal conversational speech to identify speech data that is then analyzed through an automated speech data classifier. The classifier may be based on a vector, separator, hyperplane, decision boundary, or other set of rules to classify one or more mood states of a subject. The system classifier is used to assess current mood state, predicted instability, and/or a change in future mood state, in particular for subjects with bipolar disorder.
    Type: Grant
    Filed: May 1, 2015
    Date of Patent: June 20, 2017
    Assignee: THE REGENTS OF THE UNIVERSITY OF MICHIGAN
    Inventors: Zahi N. Karam, Satinder Singh Baveja, Melvin Mcinnis, Emily Mower Provost