Patents Examined by Paras D Shah

Speech processing apparatus, method, and program

Patent number: 11600273

Abstract: The speech processing apparatus 100 includes an air microphone speech recognition unit 101 which recognizes speech from an air microphone 200 acquiring speech through air, a wearable microphone speech recognition unit 102 which recognizes speech from a wearable microphone 300, a sensing unit 103 which measures environmental conditions, a weight decision unit 104 which calculates the weights for recognition results of the air microphone speech recognition unit 101 and the wearable microphone speech recognition unit 102 on the basis of the environmental conditions, and a combination unit 105 which combines the recognition results outputted from the air microphone speech recognition unit 101 and the wearable microphone speech recognition unit 102, using the weights.

Type: Grant

Filed: February 14, 2018

Date of Patent: March 7, 2023

Assignee: NEC CORPORATION

Inventors: Qiongqiong Wang, Takafumi Koshinaka
Leveraging unpaired text data for training end-to-end spoken language understanding systems

Patent number: 11587551

Abstract: An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.

Type: Grant

Filed: April 7, 2020

Date of Patent: February 21, 2023

Assignee: International Business Machines Corporation

Inventors: Hong-Kwang Jeff Kuo, Yinghui Huang, Samuel Thomas, Kartik Audhkhasi, Michael Alan Picheny
Method and system for hybrid entity recognition

Patent number: 11580301

Abstract: A hybrid entity recognition system and accompanying method identify composite entities based on machine learning. An input sentence is received and is preprocessed to remove extraneous information, perform spelling correction, and perform grammar correction to generate a cleaned input sentence. A POS tagger tags parts of speech of the cleaned input sentence. A rules based entity recognizer module identifies first level entities in the cleaned input sentence. The cleaned input sentence is converted and translated into numeric vectors. Basic and composite entities are extracted from the cleaned input sentence using the numeric vectors.

Type: Grant

Filed: December 19, 2019

Date of Patent: February 14, 2023

Assignee: Genpact Luxembourg S.à r.l. II

Inventors: Ravi Narayan, Sunil Kumar Khokhar, Vikas Mehta, Chirag Srivastava
Systems and methods for response selection in multi-party conversations with dynamic topic tracking

Patent number: 11580975

Abstract: Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.

Type: Grant

Filed: September 8, 2020

Date of Patent: February 14, 2023

Assignee: salesforce.com, inc.

Inventors: Weishi Wang, Shafiq Rayhan Joty, Chu Hong Hoi
Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network

Patent number: 11580979

Abstract: In some embodiments, methods and systems for pushing audiovisual playlists based on a text-attentional convolutional neural network include a local voice interactive terminal, a dialog system server and a playlist recommendation engine, where the dialog system server and the playlist recommendation engine are respectively connected to the local voice interactive terminal. In some embodiments, the local voice interactive terminal includes a microphone array, a host computer connected to the microphone array, and a voice synthesis chip board connected to the microphone array. In some embodiments, the playlist recommendation engine obtains rating data based on a rating predictor constructed by the neural network; the host computer parses the data into recommended playlist information; and the voice terminal synthesizes the results and pushes them to a user in the form of voice.

Type: Grant

Filed: January 5, 2021

Date of Patent: February 14, 2023

Assignee: Chongqing University

Inventors: Yongduan Song, Junfeng Lai, Xin Zhou
Deep multi-channel acoustic modeling using multiple microphone array geometries

Patent number: 11574628

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

Type: Grant

Filed: March 28, 2019

Date of Patent: February 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Message and user profile indications in speech-based systems

Patent number: 11568885

Abstract: A speech-based system utilizes a speech interface device located in the home of a user. The system may interact with different users based on different user profiles. The system may include messaging services that generate and/or provide messages to the user through the speech interface device. The speech interface device may have indicators that are capable of being illuminated in different colors. To notify a user regarding the currently active user profile, each user profile is associated with a different color and the color of the active profile is displayed on the speech interface device when the user is interacting with the system. To notify the user regarding awaiting messages, different types of messages are associated with different colors and the colors of the message types of waiting messages are displayed on the speech interface whenever the user is not interacting with the system.

Type: Grant

Filed: February 15, 2019

Date of Patent: January 31, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Ty Loren Carlson
Method and system for determining and improving behavioral index

Patent number: 11562819

Abstract: The present invention provides methods and systems determining the behavioural index of a user. The present invention involves analyzing the user's behaviour and linguistic parameters using a smart wearable device. Basis the analysis, user's behavioural index is suitably determined, modified and informed to the user.

Type: Grant

Filed: March 1, 2019

Date of Patent: January 24, 2023

Assignee: KAHA PTE. LTD.

Inventor: Sudheendra Shantharam
Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

Patent number: 11545152

Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and gen

Type: Grant

Filed: January 20, 2021

Date of Patent: January 3, 2023

Assignee: GREEN KEY TECHNOLOGIES, INC.

Inventors: Tejas Shastry, Matthew Goldey, Svyat Vergun
Correlating scene-based audio data for psychoacoustic audio coding

Patent number: 11538489

Abstract: In general, techniques are described by which to correlate scene-based audio data for psychoacoustic audio coding. A device comprising a memory and one or more processors may be configured to perform the techniques. The memory may store a bitstream including a plurality of encoded correlated components of a soundfield represented by scene-based audio data. The one or more processors may perform psychoacoustic audio decoding with respect to one or more of the plurality of encoded correlated components to obtain a plurality of correlated components, and obtain, from the bitstream, an indication representative of how the one or more of the plurality of correlated components were reordered in the bitstream. The one or more processors may reorder, based on the indication, the plurality of correlated components to obtain a plurality of reordered components, and reconstruct, based on the plurality of reordered components, the scene-based audio data.

Type: Grant

Filed: June 22, 2020

Date of Patent: December 27, 2022

Assignee: Qualcomm Incorporated

Inventors: Ferdinando Olivieri, Taher Shahbazi Mirzahasanloo, Nils Günther Peters
Speech recognition using data analysis and dilation of speech content from separated audio input

Patent number: 11538464

Abstract: The disclosure includes using dilation of speech content from a separated audio input for speech recognition. An audio input from a speaker and predicted changes for the audio input based on an external noise are received at a CNN (Convolutional Neural Network). In the CNN, diarization is applied to the audio input to predict how a dilation of speech content from the speaker changes the audio input to generate a CNN output. A resulting dilation is determined from the CNN output. A word error rate is determined for the dilated CNN output to determine an accuracy for speech to text outputs. An adjustment parameter is set to change a range of the dilation based on the word error rate, and the resulting dilation of the CNN output is adjusted based on the adjustment parameter to reduce the word error rate.

Type: Grant

Filed: September 9, 2020

Date of Patent: December 27, 2022

Assignee: International Business Machines Corporation .

Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
Calibrated noise for text modification

Patent number: 11538467

Abstract: Devices and techniques are generally described for calibrating noise for natural language data modification. In various examples, first data representing a natural language input may be identified. A first vector representation of a first word of the first data may be determined. Sensitivity data may be determined for the first vector representation based at least in part on a first density of one or more vector representations adjacent to the first vector representation in an embedding space. In some examples, a first noise vector may be determined based at least in part on the sensitivity data. A first modified vector representation may be generated by adding the first noise vector to the first vector representation. A second word may be determined based at least in part on the first modified vector representation. Modified first data may be generated by replacing the first word with the second word.

Type: Grant

Filed: June 12, 2020

Date of Patent: December 27, 2022

Assignee: AMAZON TECHNOLOGIES, INC.

Inventor: Oluwaseyi Oluwafemi Feyisetan
Audio file processing method, electronic device, and storage medium

Patent number: 11538456

Abstract: An audio file processing method is provided for an electronic device. The method includes extracting at least one audio segment from a first audio file, recognizing at least one to-be-replaced audio segment representing a target role from the at least one audio segment, and determining time frame information of each to-be-replaced audio segment in the first audio file. The method also includes obtaining to-be-dubbed audio data for each to-be-replaced audio segment, and replacing data in the to-be-replaced audio segment with the to-be-dubbed audio data according to the time frame information, to obtain a second audio file. The at least one to-be-replaced audio segment is divided from the at least one audio segment based on a structure and a word count in a sentence corresponding to each to-be-replaced audio segment.

Type: Grant

Filed: April 9, 2020

Date of Patent: December 27, 2022

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor: Chunjiang Lai
Machine learning model training for reviewing documents

Patent number: 11526804

Abstract: A document analysis device that includes an artificial intelligence (AI) processing engine configured to receive a set of input sentences, to select a first sentence from the set of input sentences, and to compare the first sentence to previously classified sentences. The AI processing engine is further configured to compute similarity scores between the first sentence and the previously classified sentences, to identify a second sentence from the previously classified sentences with a similarity score greater than or equal to a similarity score threshold value, to identify a sentence type that is associated with the second sentence, and to associate the first sentence with the sentence type. The AI processing engine is further configured to add the first sentence to the set of training data for the machine learning model and to train the machine learning model using the set of training data.

Type: Grant

Filed: August 27, 2019

Date of Patent: December 13, 2022

Assignee: Bank of America Corporation

Inventors: Kishore Gopalan, Satish Chaduvu, Thomas J. Kuchcicki
Whispering voice recovery method, apparatus and device, and readable storage medium

Patent number: 11508366

Abstract: A method, an apparatus and a device for converting a whispered speech, and a readable storage medium are provided. The method is implemented based on the whispered speech converting model. The whispered speech converting model is trained in advance by using recognition results and whispered speech training acoustic features of whispered speech training data as samples and using normal speech acoustic features of normal speech data parallel to the whispered speech training data as sample labels. A whispered speech acoustic feature and a preliminary recognition result of whispered speech data are acquired, then the whispered speech acoustic feature and the preliminary recognition result are inputted into a preset whispered speech converting model to acquire a normal speech acoustic feature outputted by the model. In this way, the whispered speech can be converted to a normal speech.

Type: Grant

Filed: June 15, 2018

Date of Patent: November 22, 2022

Assignee: IFLYTEK CO., LTD.

Inventors: Jia Pan, Cong Liu, Haikun Wang, Zhiguo Wang, Guoping Hu
Predictive similarity scoring subsystem in a natural language understanding (NLU) framework

Patent number: 11487945

Abstract: Present embodiments include an agent automation framework having a similarity scoring subsystem that performs meaning representation similarity scoring to facilitate extraction of artifacts to address an utterance. The similarity scoring subsystem identifies a CCG form of an utterance-based meaning representation and queries a database to retrieve a comparison function list that enables quantifications of similarities between the meaning representation and candidates within a search space. The comparison functions enable the similarity scoring subsystem to perform computationally-cheapest and/or most efficient comparisons before other comparisons. The similarity scoring subsystem may determine an initial similarity score between the particular meaning representation and the candidates of the search space, then prune non-similar candidates from the search space.

Type: Grant

Filed: September 13, 2019

Date of Patent: November 1, 2022

Assignee: ServiceNow, Inc.

Inventors: Edwin Sapugay, Jonggun Park, Anne Katharine Heaton-Dunlap
Method and system to estimate speaker characteristics on-the-fly for unknown speaker with high accuracy and low latency

Patent number: 11488608

Abstract: A computer-implemented technique is presented for profiling an unknown speaker. A DNN-based frame selection allows the system to select the relevant frames necessary to provide a reliable speaker characteristic estimation. A frame selection module selects those frames that contain relevant information for estimating a given speaker characteristic and thereby contributes to the accuracy and the low latency of the system. Real-time speaker characteristics estimation allows the system to estimate the speaker characteristics from a speech segment of accumulated selected frames at any given time. The frame level processing contributes to the low latency as it is not necessary to wait for the whole speech utterance to predict a speaker characteristic but rather a speaker characteristic is estimated from only a few reliable frames. Different stopping criteria also contribute to the accuracy and the low latency of the system.

Type: Grant

Filed: December 16, 2019

Date of Patent: November 1, 2022

Assignee: SIGMA TECHNOLOGIES GLOBAL LLC

Inventors: Juan Manuel Perero-Codosero, Fernando Espinoza-Cuadros
Resampling output signals of QMF based audio codec

Patent number: 11475905

Abstract: An apparatus for processing an audio signal includes a configurable first audio signal processor for processing the audio signal in accordance with different configuration settings to obtain a processed audio signal, wherein the apparatus is adapted so that different configuration settings result in different sampling rates of the processed audio signal. The apparatus furthermore includes n analysis filter bank having a first number of analysis filter bank channels, a synthesis filter bank having a second number of synthesis filter bank channels, a second audio processor being adapted to receive and process an audio signal having a predetermined sampling rate, and a controller for controlling the first number of analysis filter bank channels or the second number of synthesis filter bank channels in accordance with a configuration setting.

Type: Grant

Filed: May 5, 2022

Date of Patent: October 18, 2022

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Markus Lohwasser, Manuel Jander, Max Neuendorf, Ralf Geiger, Markus Schnell, Matthias Hildenbrand, Tobias Chalupka
Deep multi-channel acoustic modeling

Patent number: 11475881

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Grant

Filed: July 17, 2020

Date of Patent: October 18, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
Resampling output signals of QMF based audio codec

Patent number: 11475906

Abstract: An apparatus for processing an audio signal includes a configurable first audio signal processor for processing the audio signal in accordance with different configuration settings to obtain a processed audio signal, wherein the apparatus is adapted so that different configuration settings result in different sampling rates of the processed audio signal. The apparatus furthermore includes n analysis filter bank having a first number of analysis filter bank channels, a synthesis filter bank having a second number of synthesis filter bank channels, a second audio processor being adapted to receive and process an audio signal having a predetermined sampling rate, and a controller for controlling the first number of analysis filter bank channels or the second number of synthesis filter bank channels in accordance with a configuration setting.

Type: Grant

Filed: May 5, 2022

Date of Patent: October 18, 2022

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Markus Lohwasser, Manuel Jander, Max Neuendorf, Ralf Geiger, Markus Schnell, Matthias Hildenbrand, Tobias Chalupka

prev 1 2 3 4 5 6 7 8 … next