Phonemic Context, E.g., Pronunciation Rules, Phonotactical Constraints, Phoneme N-grams, Etc. (epo) Patents (Class 704/E15.02)
  • Patent number: 11978455
    Abstract: The present disclosure provides various embodiments of methods for intelligent active speaker identification and information handling systems (IHSs) utilizing such methods. In general, the methods disclosed herein may be used to accurately identify an active speaker in a communication session with an application or an IHS, regardless of whether the active speaker is alone, in a group environment, or using someone else's system or login to participate in the communication session. The methods disclosed herein may use voice processing technology and one or more voice identification databases (VIDs) to identify the active speaker in a communication session. In some embodiments, the disclosed methods may display the identity of the active speaker to other users or participants in the same communication session. In other embodiments, the disclosed methods may dynamically switch between user profiles or accounts during the communication session based on the identity of the active speaker.
    Type: Grant
    Filed: March 7, 2022
    Date of Patent: May 7, 2024
    Inventors: Douglas J. Peeler, Srinivas Kamepalli
  • Patent number: 11935425
    Abstract: Pronunciation learning processing is performed, in which evaluation scores on pronunciation for respective words are acquired from a pronunciation test that uses multiple words, the acquired evaluation scores are summated for each combination of consecutive pronunciation components in the words, and learning information based on the result of summation is output.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: March 19, 2024
    Assignee: CASIO COMPUTER CO., LTD.
    Inventor: Manato Ono
  • Patent number: 11922943
    Abstract: In general, this disclosure describes techniques for generating and evaluating automatic transcripts of audio recordings containing human speech. In some examples, a computing system is configured to: generate transcripts of a plurality of audio recordings; determine an error rate for each transcript by comparing the transcript to a reference transcript of the audio recording; receive, for each transcript, a subjective ranking selected from a plurality of subjective rank categories; determine, based on the error rates and subjective rankings, objective rank categories defined by error-rate ranges; and assign an objective ranking to a new machine-generated transcript of a new audio recording, based on the objective rank categories and an error rate of the new machine-generated transcript.
    Type: Grant
    Filed: January 26, 2021
    Date of Patent: March 5, 2024
    Assignee: Wells Fargo Bank, N.A.
    Inventors: Yong Yi Bay, Yang Angelina Yang, Menglin Cao
  • Patent number: 11893983
    Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: February 6, 2024
    Assignee: International Business Machines Corporation
    Inventors: Masayuki Suzuki, Gakuto Kurata
  • Patent number: 11777957
    Abstract: Disclosed is a method for detection a malicious attack based on deep learning in a transportation cyber-physical system (TCPS), comprising: extracting original feature data of a malicious data flow and a normal data flow from a TCPS; cleaning and coding original feature data; selecting key features from the feature data; cleaning and coding the key features to establish a deep learning model; finally, inputing unknown behavior data to be identified into the deep learning model to identify whether the data is malicious data, thereby detecting a malicious attack. The present invention uses a deep learning method to extract and learn the behavior of a program in a TCPS, and detect the malicious attack according to the deep learning result, and effectively identify the malicious attack in the TCPS.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: October 3, 2023
    Assignee: HANGZHOU DIANZI UNIVERSITY
    Inventors: Yuanfang Chen, Ting Wu, Hengli Yue, Chengnan Hu
  • Patent number: 11748559
    Abstract: A conversational interface generation method, system, and computer program product that includes determining a conversational artifact for a computer program from a specification of the computer program and generating a conversational interface for the computer program based on the conversational artifact for the computer program included in the specification.
    Type: Grant
    Filed: March 24, 2021
    Date of Patent: September 5, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yara Rizk, Vatche Isahagian, Yasaman Khazaeni, Scott Boag, Falk Pollok
  • Patent number: 11676575
    Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.
    Type: Grant
    Filed: July 27, 2021
    Date of Patent: June 13, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
  • Patent number: 11526671
    Abstract: An example method for identifying a reading location in a text source as a user reads the text source aloud includes determining phoneme data of the text source, the text source comprising a sequence of words; receiving audio data comprising a spoken word associated with the text source; comparing, by a processing device, the phoneme data of the text source and phoneme data of the audio data; and identifying a location in the sequence of words based on the comparing phoneme data.
    Type: Grant
    Filed: September 4, 2018
    Date of Patent: December 13, 2022
    Assignee: Google LLC
    Inventors: Chaitanya Gharpure, Evan Fisher, Eric Liu, Peng Yang, Emily Hou, Victoria Fang
  • Publication number: 20120004901
    Abstract: Various embodiments of phonetic keys for the Japanese language are described herein. A Kana rule set is applied to Kana characters provided by a user. The Kana characters are defined in an alphabetic language based on the sound of the Kana characters. A full phonetic key is then generated based on the defined Kana characters. A replaced-vowel phonetic key is generated by replacing a vowel in the full phonetic key and a no-vowel phonetic key is generated by removing the vowel in the full phonetic key. Kana records in a database are then processed to determine a relevant Kana record that has a phonetic key identical to at least one of the full phonetic key, the replaced-vowel phonetic key, and the no-vowel phonetic key. The relevant Kana records are then presented to the user.
    Type: Application
    Filed: June 30, 2010
    Publication date: January 5, 2012
    Inventor: HOZUMI NAKANO
  • Publication number: 20110119051
    Abstract: A phonetic variation model building apparatus, having a phoneme database for recording at least a standard phonetic model of a language and a plurality of non-standardized phonemes of the language is provided. A phonetic variation identifier identifies a plurality of phonetic variations between the non-standardized phonemes and the standard phonetic model. A phonetic transformation calculator calculates a plurality of coefficients of a phonetic transformation function based on the phonetic variations and the phonetic transformation function. A phonetic variation model generator generates at least a phonetic variation model based on the standard phonetic model, the phonetic transformation function and the coefficients thereof.
    Type: Application
    Filed: December 15, 2009
    Publication date: May 19, 2011
    Applicant: INSTITUTE FOR INFORMATION INDUSTRY
    Inventors: Huan-Chung Li, Chung-Hsien Wu, Han-Ping Shen, Chun-Kai Wang, Chia-Hsin Hsieh
  • Publication number: 20110040774
    Abstract: According to one embodiment, searching media includes receiving a search query comprising search terms. At least one search term is expanded to yield a set of conceptually equivalent terms. The set of conceptually equivalent terms is converted to a set of search phonemes. Files that record phonemes are searched according to the set of search phonemes. A file that includes a phoneme that matches at least one search phoneme is selected and output to a client.
    Type: Application
    Filed: August 14, 2009
    Publication date: February 17, 2011
    Applicant: Raytheon Company
    Inventors: Bruce E. Peoples, Michael R. Johnson, Kristopher D. Barr
  • Patent number: 7844457
    Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: November 30, 2010
    Assignee: Microsoft Corporation
    Inventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
  • Patent number: 7831911
    Abstract: A spell checking system includes a letter spelling engine. The letter spelling engine is configured to select a plurality of candidate letter target strings that closely match a misspelled source string. The spell checking system includes a phoneme spelling engine. The phoneme spelling engine is configured to select a plurality of candidate phoneme target strings that closely match the misspelled source string. A ranker module is configured to combine the candidate letter target strings and the candidate phoneme target strings into a combined list of candidate target strings. The ranker module is also configured to rank the list of candidate target strings to provide a list of best candidate target strings for the misspelled source string.
    Type: Grant
    Filed: March 8, 2006
    Date of Patent: November 9, 2010
    Assignee: Microsoft Corporation
    Inventor: William D. Ramsey
  • Patent number: 7761297
    Abstract: A system for multi-lingual speech recognition. The inventive system includes a speech modeling engine, a speech search engine, and a decision reaction engine. The speech modeling engine receives and transfers a mixed multi-lingual speech signal into speech features. The speech search engine locates and compares candidate data sets. The decision reaction engine selects resulting speech models from the candidate speech models and generates a speech command.
    Type: Grant
    Filed: February 18, 2004
    Date of Patent: July 20, 2010
    Assignee: Delta Electronics, Inc.
    Inventor: Yun-Wen Lee
  • Publication number: 20100121643
    Abstract: The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, called Crystal Decoder. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in the reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time.
    Type: Application
    Filed: November 2, 2009
    Publication date: May 13, 2010
    Applicant: Melodis Corporation
    Inventors: Keyvan Mohajer, Seyed Majid Emami, Jon Grossman, Joe Kyaw Soe Aung, Sina Sohangir
  • Publication number: 20090204401
    Abstract: Provided is a speech translation system for receiving an input of the original speech in a first language, translating an input content into a second language, and outputting a result of the translating as a speech, including: an input processing part for receiving the input of the original speech, and generating, from the original speech, an original language text and the prosodic information of the original speech; a translation part for generating a translated sentence by translating the first language into the second language; prosodic feature transform information including associated prosodic information between the first language and the second language; a prosodic feature transform part for transforming the prosodic information of the original speech into prosodic information of the speech to be output; and a speech synthesis part for outputting the translated sentence as a speech synthesized based on the prosodic information of the speech to be output.
    Type: Application
    Filed: November 13, 2008
    Publication date: August 13, 2009
    Inventor: Shehui Bu
  • Publication number: 20090150153
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Application
    Filed: December 7, 2007
    Publication date: June 11, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
  • Publication number: 20080162129
    Abstract: One provides (101) a plurality of frames of sampled audio content and then processes (102) that plurality of frames using a speech recognition search process that comprises, at least in part, searching for at least two of state boundaries, subword boundaries, and word boundaries using different search resolutions.
    Type: Application
    Filed: December 29, 2006
    Publication date: July 3, 2008
    Applicant: MOTOROLA, INC.
    Inventor: Yan Ming Cheng
  • Publication number: 20080120091
    Abstract: A real-time open domain speech translation system for simultaneous translation of a spoken presentation that is a spoken monologue comprising one of a lecture, a speech, a presentation, a colloquium, and a seminar. The system includes an automatic speech recognition unit configured for accepting sound comprising the spoken presentation in a first language and for continuously creating word hypotheses, and a machine translation unit that receives the hypotheses, wherein the machine translation unit outputs a translation, into a second language, from the spoken presentation.
    Type: Application
    Filed: October 26, 2007
    Publication date: May 22, 2008
    Inventors: Alexander Waibel, Christian Fuegen
  • Patent number: RE40458
    Abstract: Parsing routines extract from a conventional pronunciation dictionary an entry, which includes a dictionary word and dictionary phonemes representing the pronunciation of the dictionary word. A correspondence table is used to compress the pronunciation dictionary. The correspondence table includes correspondence sets for a particular language, each set having a correspondence text entry, a correspondence phoneme entry representing the pronunciation of the correspondence text entry and a unique correspondence set identifying symbol. A matching system compares a dictionary entry with the correspondence sets, and replaces the dictionary entry with the symbols representing the best matches. In the absence of a match, symbols representing silent text or unmatched phonemes can be used. The correspondence symbols representing the best matches provide compressed pronunciation dictionary entries. The matching system also generates decoder code sets for subsequently translating the symbol sets.
    Type: Grant
    Filed: January 13, 2003
    Date of Patent: August 12, 2008
    Assignee: Apple Inc.
    Inventor: Timothy Fredenburg