Patents Examined by Paras D Shah
  • Patent number: 11600273
    Abstract: The speech processing apparatus 100 includes an air microphone speech recognition unit 101 which recognizes speech from an air microphone 200 acquiring speech through air, a wearable microphone speech recognition unit 102 which recognizes speech from a wearable microphone 300, a sensing unit 103 which measures environmental conditions, a weight decision unit 104 which calculates the weights for recognition results of the air microphone speech recognition unit 101 and the wearable microphone speech recognition unit 102 on the basis of the environmental conditions, and a combination unit 105 which combines the recognition results outputted from the air microphone speech recognition unit 101 and the wearable microphone speech recognition unit 102, using the weights.
    Type: Grant
    Filed: February 14, 2018
    Date of Patent: March 7, 2023
    Assignee: NEC CORPORATION
    Inventors: Qiongqiong Wang, Takafumi Koshinaka
  • Patent number: 11587551
    Abstract: An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: February 21, 2023
    Assignee: International Business Machines Corporation
    Inventors: Hong-Kwang Jeff Kuo, Yinghui Huang, Samuel Thomas, Kartik Audhkhasi, Michael Alan Picheny
  • Patent number: 11580301
    Abstract: A hybrid entity recognition system and accompanying method identify composite entities based on machine learning. An input sentence is received and is preprocessed to remove extraneous information, perform spelling correction, and perform grammar correction to generate a cleaned input sentence. A POS tagger tags parts of speech of the cleaned input sentence. A rules based entity recognizer module identifies first level entities in the cleaned input sentence. The cleaned input sentence is converted and translated into numeric vectors. Basic and composite entities are extracted from the cleaned input sentence using the numeric vectors.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: February 14, 2023
    Assignee: Genpact Luxembourg S.à r.l. II
    Inventors: Ravi Narayan, Sunil Kumar Khokhar, Vikas Mehta, Chirag Srivastava
  • Patent number: 11580975
    Abstract: Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.
    Type: Grant
    Filed: September 8, 2020
    Date of Patent: February 14, 2023
    Assignee: salesforce.com, inc.
    Inventors: Weishi Wang, Shafiq Rayhan Joty, Chu Hong Hoi
  • Patent number: 11580979
    Abstract: In some embodiments, methods and systems for pushing audiovisual playlists based on a text-attentional convolutional neural network include a local voice interactive terminal, a dialog system server and a playlist recommendation engine, where the dialog system server and the playlist recommendation engine are respectively connected to the local voice interactive terminal. In some embodiments, the local voice interactive terminal includes a microphone array, a host computer connected to the microphone array, and a voice synthesis chip board connected to the microphone array. In some embodiments, the playlist recommendation engine obtains rating data based on a rating predictor constructed by the neural network; the host computer parses the data into recommended playlist information; and the voice terminal synthesizes the results and pushes them to a user in the form of voice.
    Type: Grant
    Filed: January 5, 2021
    Date of Patent: February 14, 2023
    Assignee: Chongqing University
    Inventors: Yongduan Song, Junfeng Lai, Xin Zhou
  • Patent number: 11574628
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: February 7, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
  • Patent number: 11568885
    Abstract: A speech-based system utilizes a speech interface device located in the home of a user. The system may interact with different users based on different user profiles. The system may include messaging services that generate and/or provide messages to the user through the speech interface device. The speech interface device may have indicators that are capable of being illuminated in different colors. To notify a user regarding the currently active user profile, each user profile is associated with a different color and the color of the active profile is displayed on the speech interface device when the user is interacting with the system. To notify the user regarding awaiting messages, different types of messages are associated with different colors and the colors of the message types of waiting messages are displayed on the speech interface whenever the user is not interacting with the system.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: January 31, 2023
    Assignee: Amazon Technologies, Inc.
    Inventor: Ty Loren Carlson
  • Patent number: 11562819
    Abstract: The present invention provides methods and systems determining the behavioural index of a user. The present invention involves analyzing the user's behaviour and linguistic parameters using a smart wearable device. Basis the analysis, user's behavioural index is suitably determined, modified and informed to the user.
    Type: Grant
    Filed: March 1, 2019
    Date of Patent: January 24, 2023
    Assignee: KAHA PTE. LTD.
    Inventor: Sudheendra Shantharam
  • Patent number: 11545152
    Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and gen
    Type: Grant
    Filed: January 20, 2021
    Date of Patent: January 3, 2023
    Assignee: GREEN KEY TECHNOLOGIES, INC.
    Inventors: Tejas Shastry, Matthew Goldey, Svyat Vergun
  • Patent number: 11538489
    Abstract: In general, techniques are described by which to correlate scene-based audio data for psychoacoustic audio coding. A device comprising a memory and one or more processors may be configured to perform the techniques. The memory may store a bitstream including a plurality of encoded correlated components of a soundfield represented by scene-based audio data. The one or more processors may perform psychoacoustic audio decoding with respect to one or more of the plurality of encoded correlated components to obtain a plurality of correlated components, and obtain, from the bitstream, an indication representative of how the one or more of the plurality of correlated components were reordered in the bitstream. The one or more processors may reorder, based on the indication, the plurality of correlated components to obtain a plurality of reordered components, and reconstruct, based on the plurality of reordered components, the scene-based audio data.
    Type: Grant
    Filed: June 22, 2020
    Date of Patent: December 27, 2022
    Assignee: Qualcomm Incorporated
    Inventors: Ferdinando Olivieri, Taher Shahbazi Mirzahasanloo, Nils Günther Peters
  • Patent number: 11538464
    Abstract: The disclosure includes using dilation of speech content from a separated audio input for speech recognition. An audio input from a speaker and predicted changes for the audio input based on an external noise are received at a CNN (Convolutional Neural Network). In the CNN, diarization is applied to the audio input to predict how a dilation of speech content from the speaker changes the audio input to generate a CNN output. A resulting dilation is determined from the CNN output. A word error rate is determined for the dilated CNN output to determine an accuracy for speech to text outputs. An adjustment parameter is set to change a range of the dilation based on the word error rate, and the resulting dilation of the CNN output is adjusted based on the adjustment parameter to reduce the word error rate.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: December 27, 2022
    Assignee: International Business Machines Corporation .
    Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
  • Patent number: 11538467
    Abstract: Devices and techniques are generally described for calibrating noise for natural language data modification. In various examples, first data representing a natural language input may be identified. A first vector representation of a first word of the first data may be determined. Sensitivity data may be determined for the first vector representation based at least in part on a first density of one or more vector representations adjacent to the first vector representation in an embedding space. In some examples, a first noise vector may be determined based at least in part on the sensitivity data. A first modified vector representation may be generated by adding the first noise vector to the first vector representation. A second word may be determined based at least in part on the first modified vector representation. Modified first data may be generated by replacing the first word with the second word.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: December 27, 2022
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventor: Oluwaseyi Oluwafemi Feyisetan
  • Patent number: 11538456
    Abstract: An audio file processing method is provided for an electronic device. The method includes extracting at least one audio segment from a first audio file, recognizing at least one to-be-replaced audio segment representing a target role from the at least one audio segment, and determining time frame information of each to-be-replaced audio segment in the first audio file. The method also includes obtaining to-be-dubbed audio data for each to-be-replaced audio segment, and replacing data in the to-be-replaced audio segment with the to-be-dubbed audio data according to the time frame information, to obtain a second audio file. The at least one to-be-replaced audio segment is divided from the at least one audio segment based on a structure and a word count in a sentence corresponding to each to-be-replaced audio segment.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: December 27, 2022
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventor: Chunjiang Lai
  • Patent number: 11526804
    Abstract: A document analysis device that includes an artificial intelligence (AI) processing engine configured to receive a set of input sentences, to select a first sentence from the set of input sentences, and to compare the first sentence to previously classified sentences. The AI processing engine is further configured to compute similarity scores between the first sentence and the previously classified sentences, to identify a second sentence from the previously classified sentences with a similarity score greater than or equal to a similarity score threshold value, to identify a sentence type that is associated with the second sentence, and to associate the first sentence with the sentence type. The AI processing engine is further configured to add the first sentence to the set of training data for the machine learning model and to train the machine learning model using the set of training data.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: December 13, 2022
    Assignee: Bank of America Corporation
    Inventors: Kishore Gopalan, Satish Chaduvu, Thomas J. Kuchcicki
  • Patent number: 11508366
    Abstract: A method, an apparatus and a device for converting a whispered speech, and a readable storage medium are provided. The method is implemented based on the whispered speech converting model. The whispered speech converting model is trained in advance by using recognition results and whispered speech training acoustic features of whispered speech training data as samples and using normal speech acoustic features of normal speech data parallel to the whispered speech training data as sample labels. A whispered speech acoustic feature and a preliminary recognition result of whispered speech data are acquired, then the whispered speech acoustic feature and the preliminary recognition result are inputted into a preset whispered speech converting model to acquire a normal speech acoustic feature outputted by the model. In this way, the whispered speech can be converted to a normal speech.
    Type: Grant
    Filed: June 15, 2018
    Date of Patent: November 22, 2022
    Assignee: IFLYTEK CO., LTD.
    Inventors: Jia Pan, Cong Liu, Haikun Wang, Zhiguo Wang, Guoping Hu
  • Patent number: 11487945
    Abstract: Present embodiments include an agent automation framework having a similarity scoring subsystem that performs meaning representation similarity scoring to facilitate extraction of artifacts to address an utterance. The similarity scoring subsystem identifies a CCG form of an utterance-based meaning representation and queries a database to retrieve a comparison function list that enables quantifications of similarities between the meaning representation and candidates within a search space. The comparison functions enable the similarity scoring subsystem to perform computationally-cheapest and/or most efficient comparisons before other comparisons. The similarity scoring subsystem may determine an initial similarity score between the particular meaning representation and the candidates of the search space, then prune non-similar candidates from the search space.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: November 1, 2022
    Assignee: ServiceNow, Inc.
    Inventors: Edwin Sapugay, Jonggun Park, Anne Katharine Heaton-Dunlap
  • Patent number: 11488608
    Abstract: A computer-implemented technique is presented for profiling an unknown speaker. A DNN-based frame selection allows the system to select the relevant frames necessary to provide a reliable speaker characteristic estimation. A frame selection module selects those frames that contain relevant information for estimating a given speaker characteristic and thereby contributes to the accuracy and the low latency of the system. Real-time speaker characteristics estimation allows the system to estimate the speaker characteristics from a speech segment of accumulated selected frames at any given time. The frame level processing contributes to the low latency as it is not necessary to wait for the whole speech utterance to predict a speaker characteristic but rather a speaker characteristic is estimated from only a few reliable frames. Different stopping criteria also contribute to the accuracy and the low latency of the system.
    Type: Grant
    Filed: December 16, 2019
    Date of Patent: November 1, 2022
    Assignee: SIGMA TECHNOLOGIES GLOBAL LLC
    Inventors: Juan Manuel Perero-Codosero, Fernando Espinoza-Cuadros
  • Patent number: 11475905
    Abstract: An apparatus for processing an audio signal includes a configurable first audio signal processor for processing the audio signal in accordance with different configuration settings to obtain a processed audio signal, wherein the apparatus is adapted so that different configuration settings result in different sampling rates of the processed audio signal. The apparatus furthermore includes n analysis filter bank having a first number of analysis filter bank channels, a synthesis filter bank having a second number of synthesis filter bank channels, a second audio processor being adapted to receive and process an audio signal having a predetermined sampling rate, and a controller for controlling the first number of analysis filter bank channels or the second number of synthesis filter bank channels in accordance with a configuration setting.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: October 18, 2022
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Markus Lohwasser, Manuel Jander, Max Neuendorf, Ralf Geiger, Markus Schnell, Matthias Hildenbrand, Tobias Chalupka
  • Patent number: 11475881
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Grant
    Filed: July 17, 2020
    Date of Patent: October 18, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Patent number: 11475906
    Abstract: An apparatus for processing an audio signal includes a configurable first audio signal processor for processing the audio signal in accordance with different configuration settings to obtain a processed audio signal, wherein the apparatus is adapted so that different configuration settings result in different sampling rates of the processed audio signal. The apparatus furthermore includes n analysis filter bank having a first number of analysis filter bank channels, a synthesis filter bank having a second number of synthesis filter bank channels, a second audio processor being adapted to receive and process an audio signal having a predetermined sampling rate, and a controller for controlling the first number of analysis filter bank channels or the second number of synthesis filter bank channels in accordance with a configuration setting.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: October 18, 2022
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Markus Lohwasser, Manuel Jander, Max Neuendorf, Ralf Geiger, Markus Schnell, Matthias Hildenbrand, Tobias Chalupka