Subportions Patents (Class 704/254)
-
Patent number: 10528318Abstract: For a mobile computing device, enhanced functionality may be provided by associating actions with combined speech and touch gestures. A touch gesture is received into the device and simultaneously or near-simultaneously speech is received into the device. The touch gesture and speech are processed to determine a result and the device performs an action based on the result. In particular embodiments, commands for a mapping application may be based on spoken search terms and geographic areas marked by touch gestures.Type: GrantFiled: November 2, 2016Date of Patent: January 7, 2020Assignee: OPEN INNOVATION NETWORK LLCInventor: David Gerard Ledet
-
Patent number: 10522135Abstract: A system and method for segmenting an audio file. The method includes analyzing an audio file, wherein the analyzing includes identifying speech recognition features within the audio file; generating metadata based on the audio file, wherein the metadata includes transcription characteristics of the audio file; and determining a segmenting interval for the audio file based on the speech recognition features and the metadata.Type: GrantFiled: December 31, 2017Date of Patent: December 31, 2019Assignee: Verbit Software Ltd.Inventors: Tom Livne, Kobi Ben Tzvi, Eric Shellef
-
Patent number: 10515307Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.Type: GrantFiled: June 3, 2016Date of Patent: December 24, 2019Assignee: Google LLCInventors: Tara N. Sainath, Vikas Sindhwani
-
Patent number: 10510350Abstract: One embodiment provides a method, including receiving, at an audio capture device, a customized activation cue; identifying, using a processor, contextual information associated with a user; analyzing, using the contextual information, characteristics of the customized activation cue; identifying, based on the analyzation, a uniqueness associated with the customized activation cue; and responsive to said identifying, notifying a user that the customized activation cue has inadequate uniqueness. Other aspects are described and claimed.Type: GrantFiled: March 30, 2016Date of Patent: December 17, 2019Assignee: Lenovo (Singapore) Pte. Ltd.Inventors: Aaron Michael Stewart, Rod D. Waltermann, Russell Speight VanBlon
-
Patent number: 10445360Abstract: Methods and apparatus for improving speech recognition accuracy in media content searches are described. An advertisement for a media content item is analyzed to identify keywords that may describe the media content item. The identified keywords are associated with the media content item for use during a voice search to locate the media content item. A user may speak the one or more of the keywords as a search input and be provided with the media content item as a result of the search.Type: GrantFiled: November 24, 2015Date of Patent: October 15, 2019Assignee: Comcast Cable Communications, LLCInventor: George Thomas Des Jardins
-
Patent number: 10431205Abstract: A dialog device comprises a natural language interfacing device (chat interface or a telephonic device), and a natural language output device (the chat interface, a display device, or a speech synthesizer outputting to the telephonic device). A computer stores natural language dialog conducted via the interfacing device and constructs a current utterance word-by-word. Each word is chosen by applying a plurality of language models to a context comprising concatenation of the stored dialog and the current utterance thus far. Each language model outputs a distribution over the words of a vocabulary. A recurrent neural network (RNN) is applied to the distributions to generate a mixture distribution. The next word is chosen using the mixture distribution. The output device outputs the current natural language utterance after it has been constructed by the computer.Type: GrantFiled: April 27, 2016Date of Patent: October 1, 2019Assignee: CONDUENT BUSINESS SERVICES, LLCInventors: Phong Le, Marc Dymetman, Jean-Michel Renders
-
Patent number: 10417351Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.Type: GrantFiled: October 18, 2018Date of Patent: September 17, 2019Assignee: MZ IP Holdings, LLCInventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
-
Patent number: 10404806Abstract: A method and a system are provided for segmenting a multimedia content. The method estimates a count of a plurality of multimedia segments in the multimedia content, and a duration of each of the plurality of multimedia segments in the multimedia content. The method determines a cost function associated with a multimedia segment from the plurality of multimedia segments, based on the count of the plurality of multimedia segments, and the duration of each of the plurality of multimedia segments. The method further determines an updated count of the plurality of multimedia segments, and an updated duration of each of the plurality of multimedia segments until the cost function satisfies a pre-defined criteria. Based on the updated count of the plurality of multimedia segments, and the updated duration of each of the plurality of multimedia segments, the method segments the multimedia content into the plurality of multimedia segments.Type: GrantFiled: September 1, 2015Date of Patent: September 3, 2019Inventors: Arijit Biswas, Ankit Gandhi, Ranjeet Kumar, Om D Deshmukh
-
Patent number: 10403289Abstract: A voice processing device includes: a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: detecting a first utterance section included in a first voice and a second utterance section included in a second voice; specifying an overlapping section within which the first utterance section and the second utterance section overlap with each other; calculating a first utterance continuation section from a start point of the overlapping section to an end point of the first utterance section; and evaluating an impression regarding the first voice at least on the basis of information relating to a length of the first utterance continuation section.Type: GrantFiled: November 25, 2015Date of Patent: September 3, 2019Assignee: FUJITSU LIMITEDInventors: Taro Togawa, Chisato Shioda, Sayuri Kohmura, Takeshi Otani
-
Patent number: 10394852Abstract: Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format; grouping the information sets in the unified format into indexing buckets based on a user name using a non-phonetic algorithm; determining a similarity score for each pair of information sets in each of the indexing buckets; classifying each information set pair into a set of classes based on the similarity scores, wherein the set of classes comprise at least matches and non-matches; and using a data structure for merging information of information set pairs classified as matches.Type: GrantFiled: March 11, 2016Date of Patent: August 27, 2019Assignee: International Business Machines CorporationInventors: Lars Bremer, Thomas A. P. Hampp-Bahnmueller, Markus Lorch, Pavlo Petrenko, Sebastian B. Schmid
-
Patent number: 10381004Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.Type: GrantFiled: October 17, 2017Date of Patent: August 13, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Nam-yeong Kwon, Kyung-mi Park
-
Patent number: 10354643Abstract: An electronic device is provided including at least one microphone, a communication circuit, a processor and a memory, wherein the memory stores at least one application program or a software program executing a voice instruction, which is triggered in response to a voice input, upon the performance, the memory stores instructions to allow the processor to sequentially receive a plurality of utterances including a first speech element from a first user through the at least one microphone, generate a voice recognition model of the first user on the basis of at least some of the plurality of utterances, store the generated voice recognition model in the memory, and transmit the generated voice recognition model of the first user to the outside through the communication circuit so that a first external device uses the generated voice recognition model of the first user.Type: GrantFiled: October 13, 2016Date of Patent: July 16, 2019Assignee: Samsung Electronics Co., Ltd.Inventors: Subhojit Chakladar, Junhui Kim
-
Patent number: 10347251Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.Type: GrantFiled: October 17, 2017Date of Patent: July 9, 2019Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Nam-yeong Kwon, Kyung-mi Park
-
Patent number: 10331967Abstract: Methods of facilitating machine learning via a 2-D symbol are disclosed. Features of an object are received in a first computing system having a 2-D symbol creation application module installed thereon. A multi-layer 2-D symbol is formed from the features according to a set of symbol creation rules. 2-D symbol is a matrix of N×N pixels partitioned into a number of sub-matrices with each sub-matrix containing one feature, where N is a positive integer. Meaning of the combined features in the 2-D symbol is learned in a second computing system by using an image processing technique to classify the 2-D symbol transmitted from the first computing system. The symbol creation rules determine the importance order, size and location of sub-matrices in the 2-D symbol.Type: GrantFiled: December 5, 2018Date of Patent: June 25, 2019Assignee: Gyrfalcon Technology Inc.Inventors: Lin Yang, Baohua Sun
-
Patent number: 10311876Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.Type: GrantFiled: February 14, 2017Date of Patent: June 4, 2019Assignee: Google LLCInventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
-
Patent number: 10303731Abstract: In one embodiment, a method includes, receiving, from a client system of a user, a search query including n-grams. The method includes associating each n-gram with verticals based on an analysis of the n-grams by language models. The method includes determining, for each n-gram, if a bloom filter for a vertical associated with the n-gram indicates, based on sub-bloom filters of the bloom filter, the n-gram does exist or does not exist in a set of object names associated with the vertical. Each sub-bloom filter is associated with a subset of the set of object names and indicates the n-gram does exist or does not exist in its subset of object names. The method includes sending, to the client system, an indication that an n-gram of the n-grams is misspelled if a bloom filter indicates the n-gram does not exist in the set of object names associated with the vertical.Type: GrantFiled: May 1, 2017Date of Patent: May 28, 2019Assignee: Facebook, Inc.Inventors: Ian Douglas Hegerty, Daniel Bernhardt, Feng Liang, Agnieszka Anna Podsiadlo
-
Patent number: 10289912Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying videos using neural networks. One of the methods includes obtaining a temporal sequence of video frames, wherein the temporal sequence comprises a respective video frame from a particular video at each of a plurality time steps; for each time step of the plurality of time steps: processing the video frame at the time step using a convolutional neural network to generate features of the video frame; and processing the features of the video frame using an LSTM neural network to generate a set of label scores for the time step and classifying the video as relating to one or more of the topics represented by labels in the set of labels from the label scores for each of the plurality of time steps.Type: GrantFiled: April 29, 2016Date of Patent: May 14, 2019Assignee: Google LLCInventors: Sudheendra Vijayanarasimhan, George Dan Toderici, Yue Hei Ng, Matthew John Hausknecht, Oriol Vinyals, Rajat Monga
-
Patent number: 10282084Abstract: Methods and apparatus are provided for executing a function of a mobile terminal by recognizing a writing gesture. The writing gesture that is inputted on a touchscreen of the mobile terminal is detected. At least one target item to which the writing gesture applies is determined. A preset writing gesture of the at least one target item is compared with the detected writing gesture to determine whether the preset writing gesture is at least similar to the detected writing gesture. An execution command corresponding to the preset writing gesture is extracted, when it is determined that the detected writing gesture is at least similar to the preset writing gesture. The function of the at least one target item is executed by the execution command.Type: GrantFiled: August 20, 2013Date of Patent: May 7, 2019Assignee: Samsung Electronics Co., LtdInventor: Musik Kwon
-
Patent number: 10249318Abstract: A speech-signal-processing-circuit configured to receive a time-frequency-domain-reference-speech-signal and a time-frequency-domain-degraded-speech-signal. The time-frequency-domain-reference-speech-signal comprises: an upper-band-reference-component with frequencies that are greater than a frequency-threshold-value; and a lower-band-reference-component with frequencies that are less than the frequency-threshold-value. The time-frequency-domain-degraded-speech-signal comprises: an upper-band-degraded-component with frequencies that are greater than the frequency-threshold-value; and a lower-band-degraded-component with frequencies that are less than the frequency-threshold-value.Type: GrantFiled: March 20, 2017Date of Patent: April 2, 2019Assignee: NXP B.V.Inventors: Magdalena Kaniewska, Wouter Joos Tirry, Cyril Guillaumé, Johannes Abel, Tim Fingscheidt
-
Patent number: 10241684Abstract: A method and apparatus are provided. The method includes configuring a plurality of long short term memory (LSTM) networks, wherein each of the plurality of LSTM networks is at a different network layer, configuring a plurality of memory cells in a spatial domain of the plurality of LSTM networks, configuring the plurality of memory cells in a temporal domain of the plurality of LSTM networks, controlling an output of each of the plurality of LSTM networks based on highway connections to outputs from at least one previous layer and at least one previous time of the plurality of LSTM networks, and controlling the plurality of memory cells based on highway connections to memory cells from the at least one previous time.Type: GrantFiled: April 5, 2017Date of Patent: March 26, 2019Assignee: Samsung Electronics Co., LtdInventors: Jaeyoung Kim, Inyup Kang, Mostafa El-Khamy, Jungwon Lee
-
Patent number: 10235353Abstract: Embodiments for translating an input message into a device specific command for a network interface device, by: receiving the input message as a generalized language message at an input interface; separating the input message into its language parts to identify keyword elements; identifying keyword actions, targets, and variables used to indicate corresponding device specific commands; classifying the keyword elements against a learned language map to identify a best match action; utilizing the best match action to access a playlist data set for the device specific commands of the target device for execution; and providing a feedback path to a learning mechanism for adding new message and language semantics into the learned language map when identification of the best match action is unclear.Type: GrantFiled: September 15, 2017Date of Patent: March 19, 2019Assignee: Dell Products LPInventors: Mark S Sanders, Gavin R Cato
-
Patent number: 10235364Abstract: An interpretation distributing device includes: an interpreted voice acquiring unit that acquires at least one piece of interpreted voice data of two or more pieces of interpreted voice data which are voice data obtained by interpreting voice in a first language into voice in two or more different languages; and an interpreted voice transmitting unit that transmits at least one piece of the interpreted voice data acquired by the interpreted voice acquiring unit to one or more terminal devices.Type: GrantFiled: April 11, 2016Date of Patent: March 19, 2019Assignee: SHIN TRADING CO., LTD.Inventor: Jungbum Shin
-
Patent number: 10192556Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.Type: GrantFiled: November 13, 2017Date of Patent: January 29, 2019Assignee: Google LLCInventors: Hasim Sak, Andrew W. Senior
-
Patent number: 10186268Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.Type: GrantFiled: January 19, 2018Date of Patent: January 22, 2019Assignee: Google LLCInventor: Matthew Sharifi
-
Patent number: 10157610Abstract: A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.Type: GrantFiled: December 21, 2017Date of Patent: December 18, 2018Inventors: Vivek Tyagi, Aravind Ganapathiraju, Felix Immanuel Wyss
-
Patent number: 10146773Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.Type: GrantFiled: November 6, 2017Date of Patent: December 4, 2018Assignee: MZ IP Holdings, LLCInventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
-
Patent number: 10134390Abstract: An electronic device includes a memory configured to store a user pronunciation lexicon, a voice input unit configured to receive a user's uttered voice, and a processor configured to extract a user pronunciation pattern from the received uttered voice and to update the user pronunciation lexicon according to a pronunciation pattern rule generated based on the extracted pronunciation pattern.Type: GrantFiled: August 3, 2016Date of Patent: November 20, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Sung-hwan Shin
-
Patent number: 10133735Abstract: Systems and methods are disclosed herein for training a model to accurately determine whether two phrases are conversationally connected. A media guidance application may detect a first phrase and a second phrase, translate each phrase to a string of word types, append each string to the back of a prior string to create a combined string, determine a degree to which any of the individual strings matches any singleton template, and determine a degree to which the combined string matches any conversational template. Based on the degrees to which the individual and combination strings match the singleton and conversational templates, respectively, strengths of association are correspondingly updated.Type: GrantFiled: February 29, 2016Date of Patent: November 20, 2018Assignee: Rovi Guides, Inc.Inventors: Sashikumar Venkataraman, Ahmed Nizam Mohaideen P, Manik Malhotra
-
Patent number: 10089982Abstract: Methods, systems, and apparatus for determining that a software application installed on a user device is compatible with a new voice action, wherein the new voice action is specified by an application developer of the software application. One or more trigger terms for triggering the software application to perform the new voice action are identified. An automatic speech recognizer is biased to prefer the identified trigger terms of the new voice action over trigger terms of other voice actions. A transcription of an utterance generated by the biased automatic speech recognizer is obtained. The transcription of the utterance generated by the biased automatic speech recognizer is determined to include a particular trigger term included in the identified trigger terms. Based at least on determining that the transcription of the utterance generated by the biased automatic speech recognizer includes the particular trigger term, execution of the new voice action is triggered.Type: GrantFiled: June 8, 2017Date of Patent: October 2, 2018Assignee: GOOGLE LLCInventors: Bo Wang, Sunil Vemuri, Barnaby John James, Pravir Kumar Gupta, Scott B. Huffman
-
Patent number: 10049105Abstract: [Object] An object is to provide an apparatus for attaining highly precise word alignment. [Solution] The apparatus includes: selecting means receiving a bilingual sentence pair and a word alignment for the bilingual sentence pair, for successively selecting words fj of a sentence in a first language in a prescribed order; and a recurrent neural network (RNN) 100, computing, for all words of the sentence in the first language, a score 102 representing a probability that a word pair consisting of the word fj and a word ea_{j} aligned with the word fj by a word alignment aj in a second language of the bilingual sentence pair is a correct word pair, and based on this score, for computing a score of the word alignment aj. When computing a score of word pair (fj, ea_{j}), RNN 100 computes a score 102 of the word pair (fj, ea_{j}) based on all word alignments a1j-1 selected by the selecting means prior to the word fj of the word pair (fj, ea_{j}), of the word alignments aj, by means of a recurrent connection 118.Type: GrantFiled: February 12, 2015Date of Patent: August 14, 2018Assignee: National Institute of Information and Communications TechnologyInventors: Akihiro Tamura, Taro Watanabe, Eiichiro Sumita
-
Patent number: 10013485Abstract: Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.Type: GrantFiled: August 31, 2012Date of Patent: July 3, 2018Assignee: International Business Machines CorporationInventors: Jitendra Ajmera, Om Dadaji Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput
-
Patent number: 10007724Abstract: Methods and arrangements for effecting a cloud representation of audio content. An audio cloud is created and rendered, and user interaction with at least a clip portion of the audio cloud is afforded.Type: GrantFiled: June 29, 2012Date of Patent: June 26, 2018Assignee: International Business Machines CorporationInventors: Jitendra Ajmera, Om Dadaji Deshmukh, Anupam Jain, Amit Anil Nanavati, Nitendra Rajput
-
Patent number: 9984700Abstract: A method of morphing speech from an original speaker into the speech of a second, target speaker with decomposing either speech into source and filter, and without the need to determine the formant positions by warping spectral envelops.Type: GrantFiled: November 9, 2012Date of Patent: May 29, 2018Assignee: SPEECH MORPHING SYSTEMS, INC.Inventor: Jordan Cohen
-
Patent number: 9978364Abstract: A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string.Type: GrantFiled: March 28, 2016Date of Patent: May 22, 2018Assignee: International Business Machines CorporationInventors: Gakuto Kurata, Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 9972306Abstract: A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.Type: GrantFiled: August 5, 2013Date of Patent: May 15, 2018Assignee: Interactive Intelligence Group, Inc.Inventors: Vivek Tyagi, Aravind Ganapathiraju, Felix Immanuel Wyss
-
Patent number: 9946699Abstract: Methods, systems and articles of manufacture for location-based speech recognition for preparation of an electronic tax return.Type: GrantFiled: August 29, 2012Date of Patent: April 17, 2018Assignee: INTUIT INC.Inventors: Christopher M. Dye, Azhar M. Zuberi, Richard E. McVickar
-
Patent number: 9940324Abstract: In an approach for evaluating performance of machine translation, a processor receives a first document in a source language. A processor translates the first document in the source language to a second document in a target language, based, at least in part, on a first quantity of information. A processor evaluates the second document in the target language, based, at least, on one or more aspects of the translation. A processor determines, based, at least in part, on the evaluation, the second document in the target language meets a predetermined threshold.Type: GrantFiled: August 13, 2015Date of Patent: April 10, 2018Assignee: International Business Machines CorporationInventors: Mohamed A. Bahgat, Ossama Emam, Ayman S Hanafy, Sara A. Noeman
-
Patent number: 9940933Abstract: A speech recognition method includes receiving a sentence generated through speech recognition, calculating a degree of suitability for each word in the sentence based on a relationship of each word with other words in the sentence, detecting a target word to be corrected among the words in the sentence based on the degree of suitability for each word, and replacing the target word with any one of candidate words corresponding to the target word.Type: GrantFiled: September 30, 2015Date of Patent: April 10, 2018Assignee: Samsung Electronics Co., Ltd.Inventors: Heeyoul Choi, Hoshik Lee
-
Patent number: 9928531Abstract: Systems, methods, devices, and non-transitory processor readable media of the various embodiments enable in store voice picking systems. In various embodiments, an end-to-end voice ordering and fulfillment system for a retail store may enable a customer to place an order over the phone, a personal shopper to be directed to fill the order via voice commands, and the order to be made available for pickup at the retail store by the customer or delivered to the customer from the retail store.Type: GrantFiled: February 20, 2015Date of Patent: March 27, 2018Assignee: Intelligrated Headquarters LLCInventor: Michael Donovan McCarthy
-
Patent number: 9899021Abstract: Features are disclosed for modeling user interaction with a detection system using a stochastic dynamical model in order to determine or adjust detection thresholds. The model may incorporate numerous features, such as the probability of false rejection and false acceptance of a user utterance and the cost associated with each potential action. The model may determine or adjust detection thresholds so as to minimize the occurrence of false acceptances and false rejections while preserving other desirable characteristics. The model may further incorporate background and speaker statistics. Adjustments to the model or other operation parameters can be implemented based on the model, user statistics, and/or additional data.Type: GrantFiled: December 20, 2013Date of Patent: February 20, 2018Assignee: Amazon Technologies, Inc.Inventors: Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Rohit Prasad
-
Patent number: 9898536Abstract: Methods and systems to perform textual queries on voice communications. The system has an index service for storing a audio content data sets for voice communications. The audio content data sets include at least three audio content data sets for each voice communication. The three audio content data sets include a first audio content data set generated using a speech-to-text conversion technique, a second audio content data set generated using a phoneme lattice technique, and a third audio content data set generated using a keyword identification technique. The system includes a search engine configured to: receive search criteria from a user, the search criteria having at least one keyword; search each of the first, second and third audio content data sets for at least a portion of the plurality of voice communications to identify voice communications matching the search criteria; and combine the voice communications identified by each search to produce a combined list of identified voice communications.Type: GrantFiled: June 27, 2013Date of Patent: February 20, 2018Assignees: JAJAH LTD., Telefonica, S.A.Inventors: Diego Urdiales Delgado, John Eugene Neystadt
-
Patent number: 9836459Abstract: Various embodiments described herein facilitate multi-lingual communications. The systems and methods of some embodiments may enable multi-lingual communications through different modes of communications including, for example, Internet-based chat, e-mail, text-based mobile phone communications, postings to online forums, postings to online social media services, and the like. Certain embodiments may implement communications systems and methods that translate text between two or more languages (e.g., spoken), while handling/accommodating for one or more of the following in the text: specialized/domain-related jargon, abbreviations, acronyms, proper nouns, common nouns, diminutives, colloquial words or phrases, and profane words or phrases.Type: GrantFiled: February 15, 2017Date of Patent: December 5, 2017Assignee: Machine Zone, Inc.Inventors: Gabriel Leydon, Francois Orsini, Nikhil Bojja, Shailen Karur
-
Patent number: 9830911Abstract: Apparatuses and methods related an electronic apparatus and a voice processing method thereof are provided. More particularly, the apparatuses and methods relate to an electronic apparatus capable of recognizing a user's voice and a voice processing method thereof. An electronic apparatus includes: a voice recognizer configured to recognize a user's voice; a storage configured to have previously stored instructions; a function executor which performs a predetermined function; and a controller configured to control the function executor to execute the function in response to the instruction in response to a user's voice corresponding to the instruction being input, and controls the function executor to execute the function in accordance with results of an external server which analyzes a user's voice in response to a preset dialog selection signal and a dialog voice for executing the function being input by a user.Type: GrantFiled: November 6, 2013Date of Patent: November 28, 2017Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Joo-yeong Lee, Sang-shin Park
-
Patent number: 9812123Abstract: Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.Type: GrantFiled: August 13, 2015Date of Patent: November 7, 2017Assignee: Google Inc.Inventors: Jason Sanders, Gabriel Taubman, John J. Lee
-
Patent number: 9779088Abstract: An interactive electronic translation and communications process is provided for use in a translation station that provides a mobile or stationary fixed interactive facility for interviews or interrogations to be carried out between two persons speaking in different languages. The process can be assisted by animated virtual characters (avatars) realistically created and displayed on a computer screen to represent ethnic looks from around the globe. The avatars can be lip synchronized to deliver messages to the interviewee in the interviewee's languages and can guide the users and interviewee through a series of questions and answers. Biometric conditions of the interviewee and electronic identification of the interviewee can also be readily accomplished by the novel process. The process is particularly useful for hospitals, law enforcement, military, airport security, transportation terminals, financial institutions, and government agencies.Type: GrantFiled: February 12, 2016Date of Patent: October 3, 2017Inventor: David Lynton Jephcott
-
Patent number: 9762963Abstract: Apparatus and methods conforming to the present invention comprise a method of controlling playback of an audio signal through analysis of a corresponding close caption signal in conjunction with analysis of the corresponding audio signal. Objection text or other specified text in the close caption signal is identified through comparison with user identified objectionable text. Upon identification of the objectionable text, the audio signal is analyzed to identify the audio portion corresponding to the objectionable text. Upon identification of the audio portion, the audio signal may be controlled to mute the audible objectionable text.Type: GrantFiled: June 12, 2015Date of Patent: September 12, 2017Assignee: ClearPlay, Inc.Inventors: Matthew T. Jarman, William S. Meisel
-
Patent number: 9747897Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, including selecting terms; obtaining an expected phonetic transcription of an idealized native speaker of a natural language speaking the terms; receiving audio data corresponding to a particular user speaking the terms in the natural language; obtaining, based on the audio data, an actual phonetic transcription of the particular user speaking the terms in the natural language; aligning the expected phonetic transcription of the idealized native speaker of the natural language with the actual phonetic transcription of the particular user; identifying, based on the aligning, a portion of the expected phonetic transcription that is different than a corresponding portion of the actual phonetic transcription; and based on identifying the portion of the expected phonetic transcription, designating the expected phonetic transcription as a substitute pronunciation for the corresponding portion of the actual phonetic trType: GrantFiled: December 17, 2013Date of Patent: August 29, 2017Assignee: Google Inc.Inventors: Fuchun Peng, Francoise Beaufays, Pedro J. Moreno Mengibar, Brian Patrick Strope
-
Patent number: 9741346Abstract: A method for estimating the reliability of a result of a speaker recognition system concerning a testing audio and a speaker model, which is based on one, two, three or more model audios, the method using a Bayesian Network to estimate whether the result is reliable. In estimating the reliability of the result of the speaker recognition system one, two, three, four or more than four quality measures of the testing audio and one, two, three, four or more than four quality measures of the model audio(s) are used.Type: GrantFiled: April 23, 2014Date of Patent: August 22, 2017Assignee: AGNITIO, S.L.Inventors: Carlos Vaquero Avilés-Casco, Luis Buera Rodriguez, Jesús Antonio Villalba López
-
Patent number: 9712666Abstract: The invention relates to a communication system and a method of maintaining audio communication in a congested communication channel currently bearing the transmission of speech in audio communication between a sender side and a receiver side, the communication channel having at least one signaling channel and at least one payload channel having a quality of service. During the audio communication the quality of service of the payload channel is monitored. If the quality of service of the payload channel is below a threshold the speech at the respective sender side is converted to text; and transmitted over the retained communication channel to the respective receiver side. The text may be converted back to speech at the receiver side.Type: GrantFiled: August 29, 2013Date of Patent: July 18, 2017Assignee: Unify GmbH & Co. KGInventors: Bizhan Karimi-Cherkandi, Farrokh Mohammadzadeh Kouchri, Schah Walli Ali
-
Patent number: 9685174Abstract: A system that monitors and assesses the moods of subjects with neurological disorders, like bipolar disorder, by analyzing normal conversational speech to identify speech data that is then analyzed through an automated speech data classifier. The classifier may be based on a vector, separator, hyperplane, decision boundary, or other set of rules to classify one or more mood states of a subject. The system classifier is used to assess current mood state, predicted instability, and/or a change in future mood state, in particular for subjects with bipolar disorder.Type: GrantFiled: May 1, 2015Date of Patent: June 20, 2017Assignee: THE REGENTS OF THE UNIVERSITY OF MICHIGANInventors: Zahi N. Karam, Satinder Singh Baveja, Melvin Mcinnis, Emily Mower Provost