Phonemic Context, E.g., Pronunciation Rules, Phonotactical Constraints, Phoneme N-grams, Etc. (epo) Patents (Class 704/E15.02)
-
Patent number: 12254866Abstract: A method of determining an alignment sequence between a reference sequence of symbols and a hypothesis sequence of symbols includes loading a reference sequence of symbols to a computing system and creating a reference finite state automaton for the reference sequence of symbols. The method further includes loading a hypothesis sequence of symbols to the computing system and creating a hypothesis finite state automaton for the hypothesis sequence of symbols. The method further includes traversing the reference finite state automaton, adding new reference arcs and new reference transforming properties arcs and traversing the hypothesis finite state automaton, adding new hypothesis arcs and new hypothesis transforming properties arcs. The method further includes composing the hypothesis finite state automaton with the reference finite state automaton creating alternative paths to form a composed finite state automaton and tracking a number of the alternative paths created.Type: GrantFiled: October 13, 2020Date of Patent: March 18, 2025Assignee: Rev.com, Inc.Inventors: Jean-Philippe Robichaud, Miguel Jette, Joshua Ian Dong, Quinten McNamara, Nishchal Bhandari, Michelle Kai Yu Huang
-
Patent number: 12062375Abstract: Disclosed herein are systems and methods for processing an audio file to perform audio Segmentation and Speaker Role Identification (SRID) by training low level classifier and high level clustering components to separate and identify audio from different sources in an audio file by unifying audio separation and automatic speech recognition (ASR) techniques in a single system. Segmentation and SRID can include separating audio in an audio file into one or more segments, based on a determination of the identity of the speaker, category of the speaker, or source of audio in the segment. In one or more examples, the disclosed systems and methods use machine learning and artificial intelligence technology to determine the source of segments of audio using a combination of acoustic and language information. In some examples, the acoustic and language information is used to classify audio in each frame and cluster the audio into segments.Type: GrantFiled: December 8, 2021Date of Patent: August 13, 2024Assignee: The MITRE CorporationInventor: Yuan-Jun Wei
-
Patent number: 12014730Abstract: A voice processing method includes: collecting a voice signal by a microphone of an electronic device, and signal-processing the collected voice signal to obtain a first voice frame segment; performing voice recognition on the first voice frame segment to obtain a first recognition result; in response to the first recognition result not matching a target content and a plurality of tokens in the first recognition result meeting a preset condition, performing frame compensation on the first voice frame segment to obtain a second voice frame segment; and performing voice recognition on the second voice frame segment to obtain a second recognition result. A matching degree between the second recognition result and the target content is greater than a matching degree between the first recognition result and the target content.Type: GrantFiled: May 17, 2021Date of Patent: June 18, 2024Assignee: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.Inventor: Xiangyan Xu
-
Patent number: 11978455Abstract: The present disclosure provides various embodiments of methods for intelligent active speaker identification and information handling systems (IHSs) utilizing such methods. In general, the methods disclosed herein may be used to accurately identify an active speaker in a communication session with an application or an IHS, regardless of whether the active speaker is alone, in a group environment, or using someone else's system or login to participate in the communication session. The methods disclosed herein may use voice processing technology and one or more voice identification databases (VIDs) to identify the active speaker in a communication session. In some embodiments, the disclosed methods may display the identity of the active speaker to other users or participants in the same communication session. In other embodiments, the disclosed methods may dynamically switch between user profiles or accounts during the communication session based on the identity of the active speaker.Type: GrantFiled: March 7, 2022Date of Patent: May 7, 2024Inventors: Douglas J. Peeler, Srinivas Kamepalli
-
Patent number: 11935425Abstract: Pronunciation learning processing is performed, in which evaluation scores on pronunciation for respective words are acquired from a pronunciation test that uses multiple words, the acquired evaluation scores are summated for each combination of consecutive pronunciation components in the words, and learning information based on the result of summation is output.Type: GrantFiled: August 31, 2020Date of Patent: March 19, 2024Assignee: CASIO COMPUTER CO., LTD.Inventor: Manato Ono
-
Patent number: 11922943Abstract: In general, this disclosure describes techniques for generating and evaluating automatic transcripts of audio recordings containing human speech. In some examples, a computing system is configured to: generate transcripts of a plurality of audio recordings; determine an error rate for each transcript by comparing the transcript to a reference transcript of the audio recording; receive, for each transcript, a subjective ranking selected from a plurality of subjective rank categories; determine, based on the error rates and subjective rankings, objective rank categories defined by error-rate ranges; and assign an objective ranking to a new machine-generated transcript of a new audio recording, based on the objective rank categories and an error rate of the new machine-generated transcript.Type: GrantFiled: January 26, 2021Date of Patent: March 5, 2024Assignee: Wells Fargo Bank, N.A.Inventors: Yong Yi Bay, Yang Angelina Yang, Menglin Cao
-
Patent number: 11893983Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.Type: GrantFiled: June 23, 2021Date of Patent: February 6, 2024Assignee: International Business Machines CorporationInventors: Masayuki Suzuki, Gakuto Kurata
-
Patent number: 11777957Abstract: Disclosed is a method for detection a malicious attack based on deep learning in a transportation cyber-physical system (TCPS), comprising: extracting original feature data of a malicious data flow and a normal data flow from a TCPS; cleaning and coding original feature data; selecting key features from the feature data; cleaning and coding the key features to establish a deep learning model; finally, inputing unknown behavior data to be identified into the deep learning model to identify whether the data is malicious data, thereby detecting a malicious attack. The present invention uses a deep learning method to extract and learn the behavior of a program in a TCPS, and detect the malicious attack according to the deep learning result, and effectively identify the malicious attack in the TCPS.Type: GrantFiled: December 4, 2019Date of Patent: October 3, 2023Assignee: HANGZHOU DIANZI UNIVERSITYInventors: Yuanfang Chen, Ting Wu, Hengli Yue, Chengnan Hu
-
Patent number: 11748559Abstract: A conversational interface generation method, system, and computer program product that includes determining a conversational artifact for a computer program from a specification of the computer program and generating a conversational interface for the computer program based on the conversational artifact for the computer program included in the specification.Type: GrantFiled: March 24, 2021Date of Patent: September 5, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yara Rizk, Vatche Isahagian, Yasaman Khazaeni, Scott Boag, Falk Pollok
-
Patent number: 11676575Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.Type: GrantFiled: July 27, 2021Date of Patent: June 13, 2023Assignee: Amazon Technologies, Inc.Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
-
Patent number: 11526671Abstract: An example method for identifying a reading location in a text source as a user reads the text source aloud includes determining phoneme data of the text source, the text source comprising a sequence of words; receiving audio data comprising a spoken word associated with the text source; comparing, by a processing device, the phoneme data of the text source and phoneme data of the audio data; and identifying a location in the sequence of words based on the comparing phoneme data.Type: GrantFiled: September 4, 2018Date of Patent: December 13, 2022Assignee: Google LLCInventors: Chaitanya Gharpure, Evan Fisher, Eric Liu, Peng Yang, Emily Hou, Victoria Fang
-
Publication number: 20120004901Abstract: Various embodiments of phonetic keys for the Japanese language are described herein. A Kana rule set is applied to Kana characters provided by a user. The Kana characters are defined in an alphabetic language based on the sound of the Kana characters. A full phonetic key is then generated based on the defined Kana characters. A replaced-vowel phonetic key is generated by replacing a vowel in the full phonetic key and a no-vowel phonetic key is generated by removing the vowel in the full phonetic key. Kana records in a database are then processed to determine a relevant Kana record that has a phonetic key identical to at least one of the full phonetic key, the replaced-vowel phonetic key, and the no-vowel phonetic key. The relevant Kana records are then presented to the user.Type: ApplicationFiled: June 30, 2010Publication date: January 5, 2012Inventor: HOZUMI NAKANO
-
Publication number: 20110119051Abstract: A phonetic variation model building apparatus, having a phoneme database for recording at least a standard phonetic model of a language and a plurality of non-standardized phonemes of the language is provided. A phonetic variation identifier identifies a plurality of phonetic variations between the non-standardized phonemes and the standard phonetic model. A phonetic transformation calculator calculates a plurality of coefficients of a phonetic transformation function based on the phonetic variations and the phonetic transformation function. A phonetic variation model generator generates at least a phonetic variation model based on the standard phonetic model, the phonetic transformation function and the coefficients thereof.Type: ApplicationFiled: December 15, 2009Publication date: May 19, 2011Applicant: INSTITUTE FOR INFORMATION INDUSTRYInventors: Huan-Chung Li, Chung-Hsien Wu, Han-Ping Shen, Chun-Kai Wang, Chia-Hsin Hsieh
-
Publication number: 20110040774Abstract: According to one embodiment, searching media includes receiving a search query comprising search terms. At least one search term is expanded to yield a set of conceptually equivalent terms. The set of conceptually equivalent terms is converted to a set of search phonemes. Files that record phonemes are searched according to the set of search phonemes. A file that includes a phoneme that matches at least one search phoneme is selected and output to a client.Type: ApplicationFiled: August 14, 2009Publication date: February 17, 2011Applicant: Raytheon CompanyInventors: Bruce E. Peoples, Michael R. Johnson, Kristopher D. Barr
-
Patent number: 7844457Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.Type: GrantFiled: February 20, 2007Date of Patent: November 30, 2010Assignee: Microsoft CorporationInventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
-
Patent number: 7831911Abstract: A spell checking system includes a letter spelling engine. The letter spelling engine is configured to select a plurality of candidate letter target strings that closely match a misspelled source string. The spell checking system includes a phoneme spelling engine. The phoneme spelling engine is configured to select a plurality of candidate phoneme target strings that closely match the misspelled source string. A ranker module is configured to combine the candidate letter target strings and the candidate phoneme target strings into a combined list of candidate target strings. The ranker module is also configured to rank the list of candidate target strings to provide a list of best candidate target strings for the misspelled source string.Type: GrantFiled: March 8, 2006Date of Patent: November 9, 2010Assignee: Microsoft CorporationInventor: William D. Ramsey
-
Patent number: 7761297Abstract: A system for multi-lingual speech recognition. The inventive system includes a speech modeling engine, a speech search engine, and a decision reaction engine. The speech modeling engine receives and transfers a mixed multi-lingual speech signal into speech features. The speech search engine locates and compares candidate data sets. The decision reaction engine selects resulting speech models from the candidate speech models and generates a speech command.Type: GrantFiled: February 18, 2004Date of Patent: July 20, 2010Assignee: Delta Electronics, Inc.Inventor: Yun-Wen Lee
-
Publication number: 20100121643Abstract: The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, called Crystal Decoder. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in the reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time.Type: ApplicationFiled: November 2, 2009Publication date: May 13, 2010Applicant: Melodis CorporationInventors: Keyvan Mohajer, Seyed Majid Emami, Jon Grossman, Joe Kyaw Soe Aung, Sina Sohangir
-
Publication number: 20090204401Abstract: Provided is a speech translation system for receiving an input of the original speech in a first language, translating an input content into a second language, and outputting a result of the translating as a speech, including: an input processing part for receiving the input of the original speech, and generating, from the original speech, an original language text and the prosodic information of the original speech; a translation part for generating a translated sentence by translating the first language into the second language; prosodic feature transform information including associated prosodic information between the first language and the second language; a prosodic feature transform part for transforming the prosodic information of the original speech into prosodic information of the speech to be output; and a speech synthesis part for outputting the translated sentence as a speech synthesized based on the prosodic information of the speech to be output.Type: ApplicationFiled: November 13, 2008Publication date: August 13, 2009Inventor: Shehui Bu
-
Publication number: 20090150153Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.Type: ApplicationFiled: December 7, 2007Publication date: June 11, 2009Applicant: MICROSOFT CORPORATIONInventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
-
Publication number: 20080162129Abstract: One provides (101) a plurality of frames of sampled audio content and then processes (102) that plurality of frames using a speech recognition search process that comprises, at least in part, searching for at least two of state boundaries, subword boundaries, and word boundaries using different search resolutions.Type: ApplicationFiled: December 29, 2006Publication date: July 3, 2008Applicant: MOTOROLA, INC.Inventor: Yan Ming Cheng
-
Publication number: 20080120091Abstract: A real-time open domain speech translation system for simultaneous translation of a spoken presentation that is a spoken monologue comprising one of a lecture, a speech, a presentation, a colloquium, and a seminar. The system includes an automatic speech recognition unit configured for accepting sound comprising the spoken presentation in a first language and for continuously creating word hypotheses, and a machine translation unit that receives the hypotheses, wherein the machine translation unit outputs a translation, into a second language, from the spoken presentation.Type: ApplicationFiled: October 26, 2007Publication date: May 22, 2008Inventors: Alexander Waibel, Christian Fuegen
-
Patent number: RE40458Abstract: Parsing routines extract from a conventional pronunciation dictionary an entry, which includes a dictionary word and dictionary phonemes representing the pronunciation of the dictionary word. A correspondence table is used to compress the pronunciation dictionary. The correspondence table includes correspondence sets for a particular language, each set having a correspondence text entry, a correspondence phoneme entry representing the pronunciation of the correspondence text entry and a unique correspondence set identifying symbol. A matching system compares a dictionary entry with the correspondence sets, and replaces the dictionary entry with the symbols representing the best matches. In the absence of a match, symbols representing silent text or unmatched phonemes can be used. The correspondence symbols representing the best matches provide compressed pronunciation dictionary entries. The matching system also generates decoder code sets for subsequently translating the symbol sets.Type: GrantFiled: January 13, 2003Date of Patent: August 12, 2008Assignee: Apple Inc.Inventor: Timothy Fredenburg