Phonemic Context, E.g., Pronunciation Rules, Phonotactical Constraints, Phoneme N-grams, Etc. (epo) Patents (Class 704/E15.02)

Systems and methods for aligning a reference sequence of symbols with hypothesis requiring reduced processing and memory

Patent number: 12254866

Abstract: A method of determining an alignment sequence between a reference sequence of symbols and a hypothesis sequence of symbols includes loading a reference sequence of symbols to a computing system and creating a reference finite state automaton for the reference sequence of symbols. The method further includes loading a hypothesis sequence of symbols to the computing system and creating a hypothesis finite state automaton for the hypothesis sequence of symbols. The method further includes traversing the reference finite state automaton, adding new reference arcs and new reference transforming properties arcs and traversing the hypothesis finite state automaton, adding new hypothesis arcs and new hypothesis transforming properties arcs. The method further includes composing the hypothesis finite state automaton with the reference finite state automaton creating alternative paths to form a composed finite state automaton and tracking a number of the alternative paths created.

Type: Grant

Filed: October 13, 2020

Date of Patent: March 18, 2025

Assignee: Rev.com, Inc.

Inventors: Jean-Philippe Robichaud, Miguel Jette, Joshua Ian Dong, Quinten McNamara, Nishchal Bhandari, Michelle Kai Yu Huang
Systems and methods for separating and identifying audio in an audio file using machine learning

Patent number: 12062375

Abstract: Disclosed herein are systems and methods for processing an audio file to perform audio Segmentation and Speaker Role Identification (SRID) by training low level classifier and high level clustering components to separate and identify audio from different sources in an audio file by unifying audio separation and automatic speech recognition (ASR) techniques in a single system. Segmentation and SRID can include separating audio in an audio file into one or more segments, based on a determination of the identity of the speaker, category of the speaker, or source of audio in the segment. In one or more examples, the disclosed systems and methods use machine learning and artificial intelligence technology to determine the source of segments of audio using a combination of acoustic and language information. In some examples, the acoustic and language information is used to classify audio in each frame and cluster the audio into segments.

Type: Grant

Filed: December 8, 2021

Date of Patent: August 13, 2024

Assignee: The MITRE Corporation

Inventor: Yuan-Jun Wei
Voice processing method, electronic device, and storage medium

Patent number: 12014730

Abstract: A voice processing method includes: collecting a voice signal by a microphone of an electronic device, and signal-processing the collected voice signal to obtain a first voice frame segment; performing voice recognition on the first voice frame segment to obtain a first recognition result; in response to the first recognition result not matching a target content and a plurality of tokens in the first recognition result meeting a preset condition, performing frame compensation on the first voice frame segment to obtain a second voice frame segment; and performing voice recognition on the second voice frame segment to obtain a second recognition result. A matching degree between the second recognition result and the target content is greater than a matching degree between the first recognition result and the target content.

Type: Grant

Filed: May 17, 2021

Date of Patent: June 18, 2024

Assignee: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.

Inventor: Xiangyan Xu
Information handling systems and methods for accurately identifying an active speaker in a communication session

Patent number: 11978455

Abstract: The present disclosure provides various embodiments of methods for intelligent active speaker identification and information handling systems (IHSs) utilizing such methods. In general, the methods disclosed herein may be used to accurately identify an active speaker in a communication session with an application or an IHS, regardless of whether the active speaker is alone, in a group environment, or using someone else's system or login to participate in the communication session. The methods disclosed herein may use voice processing technology and one or more voice identification databases (VIDs) to identify the active speaker in a communication session. In some embodiments, the disclosed methods may display the identity of the active speaker to other users or participants in the same communication session. In other embodiments, the disclosed methods may dynamically switch between user profiles or accounts during the communication session based on the identity of the active speaker.

Type: Grant

Filed: March 7, 2022

Date of Patent: May 7, 2024

Inventors: Douglas J. Peeler, Srinivas Kamepalli
Electronic device, pronunciation learning method, server apparatus, pronunciation learning processing system, and storage medium

Patent number: 11935425

Abstract: Pronunciation learning processing is performed, in which evaluation scores on pronunciation for respective words are acquired from a pronunciation test that uses multiple words, the acquired evaluation scores are summated for each combination of consecutive pronunciation components in the words, and learning information based on the result of summation is output.

Type: Grant

Filed: August 31, 2020

Date of Patent: March 19, 2024

Assignee: CASIO COMPUTER CO., LTD.

Inventor: Manato Ono
KPI-threshold selection for audio-transcription models

Patent number: 11922943

Abstract: In general, this disclosure describes techniques for generating and evaluating automatic transcripts of audio recordings containing human speech. In some examples, a computing system is configured to: generate transcripts of a plurality of audio recordings; determine an error rate for each transcript by comparing the transcript to a reference transcript of the audio recording; receive, for each transcript, a subjective ranking selected from a plurality of subjective rank categories; determine, based on the error rates and subjective rankings, objective rank categories defined by error-rate ranges; and assign an objective ranking to a new machine-generated transcript of a new audio recording, based on the objective rank categories and an error rate of the new machine-generated transcript.

Type: Grant

Filed: January 26, 2021

Date of Patent: March 5, 2024

Assignee: Wells Fargo Bank, N.A.

Inventors: Yong Yi Bay, Yang Angelina Yang, Menglin Cao
Adding words to a prefix tree for improving speech recognition

Patent number: 11893983

Abstract: An approach for improving speech recognition is provided. A processor receives a new word to add to a prefix tree. A processor determines a bonus score for a first transition from a first node to a second node in a prefix tree on condition that the first transition is included in a path of at least one transition representing the new word. A processor determines a hypothesis score for a hypothesis that corresponds to a speech sequence based on the prefix tree, where the hypothesis score adds the bonus score to an initial hypothesis score to determine the hypothesis score. In response to a determination that the hypothesis score exceeds a threshold value, a processor generates an output text sequence for the speech sequence based on the hypothesis.

Type: Grant

Filed: June 23, 2021

Date of Patent: February 6, 2024

Assignee: International Business Machines Corporation

Inventors: Masayuki Suzuki, Gakuto Kurata
Method for detecting malicious attacks based on deep learning in traffic cyber physical system

Patent number: 11777957

Abstract: Disclosed is a method for detection a malicious attack based on deep learning in a transportation cyber-physical system (TCPS), comprising: extracting original feature data of a malicious data flow and a normal data flow from a TCPS; cleaning and coding original feature data; selecting key features from the feature data; cleaning and coding the key features to establish a deep learning model; finally, inputing unknown behavior data to be identified into the deep learning model to identify whether the data is malicious data, thereby detecting a malicious attack. The present invention uses a deep learning method to extract and learn the behavior of a program in a TCPS, and detect the malicious attack according to the deep learning result, and effectively identify the malicious attack in the TCPS.

Type: Grant

Filed: December 4, 2019

Date of Patent: October 3, 2023

Assignee: HANGZHOU DIANZI UNIVERSITY

Inventors: Yuanfang Chen, Ting Wu, Hengli Yue, Chengnan Hu
Auto generation of conversational artifacts from specifications

Patent number: 11748559

Abstract: A conversational interface generation method, system, and computer program product that includes determining a conversational artifact for a computer program from a specification of the computer program and generating a conversational interface for the computer program based on the conversational artifact for the computer program included in the specification.

Type: Grant

Filed: March 24, 2021

Date of Patent: September 5, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yara Rizk, Vatche Isahagian, Yasaman Khazaeni, Scott Boag, Falk Pollok
On-device learning in a hybrid speech processing system

Patent number: 11676575

Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.

Type: Grant

Filed: July 27, 2021

Date of Patent: June 13, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
Reading progress estimation based on phonetic fuzzy matching and confidence interval

Patent number: 11526671

Abstract: An example method for identifying a reading location in a text source as a user reads the text source aloud includes determining phoneme data of the text source, the text source comprising a sequence of words; receiving audio data comprising a spoken word associated with the text source; comparing, by a processing device, the phoneme data of the text source and phoneme data of the audio data; and identifying a location in the sequence of words based on the comparing phoneme data.

Type: Grant

Filed: September 4, 2018

Date of Patent: December 13, 2022

Assignee: Google LLC

Inventors: Chaitanya Gharpure, Evan Fisher, Eric Liu, Peng Yang, Emily Hou, Victoria Fang
PHONETIC KEYS FOR THE JAPANESE LANGUAGE

Publication number: 20120004901

Abstract: Various embodiments of phonetic keys for the Japanese language are described herein. A Kana rule set is applied to Kana characters provided by a user. The Kana characters are defined in an alphabetic language based on the sound of the Kana characters. A full phonetic key is then generated based on the defined Kana characters. A replaced-vowel phonetic key is generated by replacing a vowel in the full phonetic key and a no-vowel phonetic key is generated by removing the vowel in the full phonetic key. Kana records in a database are then processed to determine a relevant Kana record that has a phonetic key identical to at least one of the full phonetic key, the replaced-vowel phonetic key, and the no-vowel phonetic key. The relevant Kana records are then presented to the user.

Type: Application

Filed: June 30, 2010

Publication date: January 5, 2012

Inventor: HOZUMI NAKANO
Phonetic Variation Model Building Apparatus and Method and Phonetic Recognition System and Method Thereof

Publication number: 20110119051

Abstract: A phonetic variation model building apparatus, having a phoneme database for recording at least a standard phonetic model of a language and a plurality of non-standardized phonemes of the language is provided. A phonetic variation identifier identifies a plurality of phonetic variations between the non-standardized phonemes and the standard phonetic model. A phonetic transformation calculator calculates a plurality of coefficients of a phonetic transformation function based on the phonetic variations and the phonetic transformation function. A phonetic variation model generator generates at least a phonetic variation model based on the standard phonetic model, the phonetic transformation function and the coefficients thereof.

Type: Application

Filed: December 15, 2009

Publication date: May 19, 2011

Applicant: INSTITUTE FOR INFORMATION INDUSTRY

Inventors: Huan-Chung Li, Chung-Hsien Wu, Han-Ping Shen, Chun-Kai Wang, Chia-Hsin Hsieh
Searching Spoken Media According to Phonemes Derived From Expanded Concepts Expressed As Text

Publication number: 20110040774

Abstract: According to one embodiment, searching media includes receiving a search query comprising search terms. At least one search term is expanded to yield a set of conceptually equivalent terms. The set of conceptually equivalent terms is converted to a set of search phonemes. Files that record phonemes are searched according to the set of search phonemes. A file that includes a phoneme that matches at least one search phoneme is selected and output to a client.

Type: Application

Filed: August 14, 2009

Publication date: February 17, 2011

Applicant: Raytheon Company

Inventors: Bruce E. Peoples, Michael R. Johnson, Kristopher D. Barr
Unsupervised labeling of sentence level accent

Patent number: 7844457

Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.

Type: Grant

Filed: February 20, 2007

Date of Patent: November 30, 2010

Assignee: Microsoft Corporation

Inventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
Spell checking system including a phonetic speller

Patent number: 7831911

Abstract: A spell checking system includes a letter spelling engine. The letter spelling engine is configured to select a plurality of candidate letter target strings that closely match a misspelled source string. The spell checking system includes a phoneme spelling engine. The phoneme spelling engine is configured to select a plurality of candidate phoneme target strings that closely match the misspelled source string. A ranker module is configured to combine the candidate letter target strings and the candidate phoneme target strings into a combined list of candidate target strings. The ranker module is also configured to rank the list of candidate target strings to provide a list of best candidate target strings for the misspelled source string.

Type: Grant

Filed: March 8, 2006

Date of Patent: November 9, 2010

Assignee: Microsoft Corporation

Inventor: William D. Ramsey
System and method for multi-lingual speech recognition

Patent number: 7761297

Abstract: A system for multi-lingual speech recognition. The inventive system includes a speech modeling engine, a speech search engine, and a decision reaction engine. The speech modeling engine receives and transfers a mixed multi-lingual speech signal into speech features. The speech search engine locates and compares candidate data sets. The decision reaction engine selects resulting speech models from the candidate speech models and generates a speech command.

Type: Grant

Filed: February 18, 2004

Date of Patent: July 20, 2010

Assignee: Delta Electronics, Inc.

Inventor: Yun-Wen Lee
MELODIS CRYSTAL DECODER METHOD AND DEVICE

Publication number: 20100121643

Abstract: The technology disclosed relates to a system and method for fast, accurate and parallelizable speech search, called Crystal Decoder. It is particularly useful for search applications, as opposed to dictation. It can achieve both speed and accuracy, without sacrificing one for the other. It can search different variations of records in the reference database without a significant increase in elapsed processing time. Even the main decoding part can be parallelized as the number of words increase to maintain a fast response time.

Type: Application

Filed: November 2, 2009

Publication date: May 13, 2010

Applicant: Melodis Corporation

Inventors: Keyvan Mohajer, Seyed Majid Emami, Jon Grossman, Joe Kyaw Soe Aung, Sina Sohangir
SPEECH PROCESSING SYSTEM, SPEECH PROCESSING METHOD, AND SPEECH PROCESSING PROGRAM

Publication number: 20090204401

Abstract: Provided is a speech translation system for receiving an input of the original speech in a first language, translating an input content into a second language, and outputting a result of the translating as a speech, including: an input processing part for receiving the input of the original speech, and generating, from the original speech, an original language text and the prosodic information of the original speech; a translation part for generating a translated sentence by translating the first language into the second language; prosodic feature transform information including associated prosodic information between the first language and the second language; a prosodic feature transform part for transforming the prosodic information of the original speech into prosodic information of the speech to be output; and a speech synthesis part for outputting the translated sentence as a speech synthesized based on the prosodic information of the speech to be output.

Type: Application

Filed: November 13, 2008

Publication date: August 13, 2009

Inventor: Shehui Bu
GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA

Publication number: 20090150153

Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

Type: Application

Filed: December 7, 2007

Publication date: June 11, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
METHOD AND APPARATUS PERTAINING TO THE PROCESSING OF SAMPLED AUDIO CONTENT USING A MULTI-RESOLUTION SPEECH RECOGNITION SEARCH PROCESS

Publication number: 20080162129

Abstract: One provides (101) a plurality of frames of sampled audio content and then processes (102) that plurality of frames using a speech recognition search process that comprises, at least in part, searching for at least two of state boundaries, subword boundaries, and word boundaries using different search resolutions.

Type: Application

Filed: December 29, 2006

Publication date: July 3, 2008

Applicant: MOTOROLA, INC.

Inventor: Yan Ming Cheng
SIMULTANEOUS TRANSLATION OF OPEN DOMAIN LECTURES AND SPEECHES

Publication number: 20080120091

Abstract: A real-time open domain speech translation system for simultaneous translation of a spoken presentation that is a spoken monologue comprising one of a lecture, a speech, a presentation, a colloquium, and a seminar. The system includes an automatic speech recognition unit configured for accepting sound comprising the spoken presentation in a first language and for continuously creating word hypotheses, and a machine translation unit that receives the hypotheses, wherein the machine translation unit outputs a translation, into a second language, from the spoken presentation.

Type: Application

Filed: October 26, 2007

Publication date: May 22, 2008

Inventors: Alexander Waibel, Christian Fuegen
System and method for using a correspondence table to compress a pronunciation guide

Patent number: RE40458

Abstract: Parsing routines extract from a conventional pronunciation dictionary an entry, which includes a dictionary word and dictionary phonemes representing the pronunciation of the dictionary word. A correspondence table is used to compress the pronunciation dictionary. The correspondence table includes correspondence sets for a particular language, each set having a correspondence text entry, a correspondence phoneme entry representing the pronunciation of the correspondence text entry and a unique correspondence set identifying symbol. A matching system compares a dictionary entry with the correspondence sets, and replaces the dictionary entry with the symbols representing the best matches. In the absence of a match, symbols representing silent text or unmatched phonemes can be used. The correspondence symbols representing the best matches provide compressed pronunciation dictionary entries. The matching system also generates decoder code sets for subsequently translating the symbol sets.

Type: Grant

Filed: January 13, 2003

Date of Patent: August 12, 2008

Assignee: Apple Inc.

Inventor: Timothy Fredenburg