Neural Network Patents (Class 704/232)

Method and system for intent-based action recommendations and/or fulfillment in a messaging platform

Patent number: 12292912

Abstract: A system for intent-based action recommendations and/or fulfillment in a messaging platform, preferably including and/or interfacing with: a set of user interfaces; a set of models; and/or a messaging platform. A method for intent-based action recommendations and/or fulfillment in a messaging platform, preferably including any or all of: receiving a set of information associated with a request; producing and sending a set of intent options; receiving a selected intent; generating a message based on the selected intent and/or the set of information; and providing the message.

Type: Grant

Filed: November 26, 2024

Date of Patent: May 6, 2025

Assignee: OrangeDot, Inc.

Inventor: William Kearns
Parallel cross validation in collaborative machine learning

Patent number: 12288145

Abstract: A computer-implemented method, a computer program product, and a computer system for parallel cross validation in collaborative machine learning. A server groups local models into groups. In each group, each local device uses its local data to validate accuracies of the local models and sends a validation result to a group leader or the server. The group leader or the server selects groups whose variances of the accuracies are not below a predetermined variance threshold. In each selected group, the group leader or the server compares an accuracy of each local model with an average value of the accuracies and randomly selects one or more local models whose accuracies do not exceed a predetermined accuracy threshold. The server obtains weight parameters of selected local models and updates the global model based on the weight parameters.

Type: Grant

Filed: April 27, 2021

Date of Patent: April 29, 2025

Assignee: International Business Machines Corporation

Inventors: Kenichi Takasaki, Shoichiro Watanabe, Mari Abe Fukuda, Sanehiro Furuichi, Yasutaka Nishimura
Guardrails for efficient processing and error prevention in generating suggested messages

Patent number: 12282731

Abstract: Systems and methods for using a generative artificial intelligence (AI) model to generate a suggested draft reply to a selected message. A message generation system and method are described that use guardrails that prevent unnecessary AI model processing and accidental sending of an AI model-generated draft. In some examples, draft reply-generation is limited to a subset of messages (e.g., focused, non-confidential) and triggering of the draft reply generation is performed only after user interaction criteria are satisfied. In some examples, a confirmation message is presented when the draft reply is attempted to be sent with no changes or quickly after the draft is generated. For instance, the guardrails limit the number of times the AI model is invoked to generate suggested replies and further prevents users from accidentally sending drafts generated from the AI model.

Type: Grant

Filed: March 3, 2023

Date of Patent: April 22, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Caleb Whitmore, Susan Marie Grimshaw, Poonam Ganesh Hattangady
Progressive contrastive learning framework for self-supervised speaker verification

Patent number: 12277939

Abstract: A method includes receiving, by a first encoder, an original speech segment, receiving, by a second encoder, an augmented speech segment of the original speech segment, generating, by the first encoder, a first speaker representation based on the original speech segment, generating, by the second encoder, a second speaker representation based on the augmented speech segment, and generating a contrastive loss based on the first speaker representation and the second speaker representation.

Type: Grant

Filed: May 2, 2022

Date of Patent: April 15, 2025

Assignee: TENCENT AMERICA LLC

Inventors: Chunlei Zhang, Dong Yu
Method and apparatus for audio processing, electronic device and storage medium

Patent number: 12266373

Abstract: A method and apparatus for audio processing, an electronic device and a storage medium are provided. The method includes: obtaining an audio encoding result, wherein each element in the audio encoding result has a coordinate in an audio frame number dimension and a coordinate in a text label sequence dimension; in response to an output result of an ith frame in a decoding path being a non-null character, respectively increasing the coordinate in the audio frame number dimension and the coordinate in the text label sequence dimension corresponding to an output position of the ith frame by 1 to obtain an output position of a (i+1)th frame in the decoding path; and determining an output result corresponding to the output position of the (i+1)th frame according to the output result of the ith frame in the decoding path and an element of the (i+1)th frame in the audio encoding result.

Type: Grant

Filed: December 9, 2022

Date of Patent: April 1, 2025

Assignee: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.

Inventors: Mingshuang Luo, Fangjun Kuang, Liyong Guo, Long Lin, Wei Kang, Zengwei Yao, Povey Daniel
Multi-speaker neural text-to-speech synthesis

Patent number: 12266342

Abstract: A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).

Type: Grant

Filed: December 11, 2018

Date of Patent: April 1, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yan Deng, Lei He
Method and system for providing assistance for cognitively impaired users by utilizing artifical intelligence

Patent number: 12260863

Abstract: In an embodiment, the disclosure relates to a device for assisting a respondent in a conversation. The device includes a microphone configured to detect a voice input, and a transmitter communicatively coupled to a server and configured to transmit the voice input to the server. The server is to generate vectors associated with the voice input, feed the vectors associated with the voice input to an Artificial Intelligence utilizing a trained Machine Learning (ML) model, and obtain, from the trained ML model, an output corresponding to the vectors. The device further includes a receiver communicatively coupled to the server, and configured to receive from the server, the output generated by the ML model. A speaker is communicatively coupled with the receiver and is configured to generate a voice-based response based on the output, for assisting the respondent in responding to the conversation.

Type: Grant

Filed: May 6, 2024

Date of Patent: March 25, 2025

Inventor: Leigh M. Rothschild
Proxy servers for managing queries to large language models

Patent number: 12261827

Abstract: Systems, methods, and apparatus, including computer programs encoded on a computer storage medium for managing network traffic to and from a server configured to: (i) receive, from a client device, a query in a natural language, and (ii) generate a response to the query in the natural language. In one aspect, a method includes: receiving, from the client device via a network connection, a network message including a new query for the server; processing the new query, using a text encoder, to generate an embedding vector of the new query; identifying, from amongst multiple entries of a vector database, a particular entry based on a similarity metric between: (i) the embedding vector of the new query, and (ii) an embedding vector of a particular query stored in the particular entry; and determining whether the similarity metric is greater than a threshold similarity value.

Type: Grant

Filed: January 19, 2024

Date of Patent: March 25, 2025

Assignee: Auradine, Inc.

Inventors: Tao Xu, Barun Kar
Detecting and handling failures in other assistants

Patent number: 12254885

Abstract: Techniques are described herein for detecting and handling failures in other automated assistants. A method includes: executing a first automated assistant in an inactive state at least in part on a computing device operated by a user; while in the inactive state, determining, by the first automated assistant, that a second automated assistant failed to fulfill a request of the user; in response to determining that the second automated assistant failed to fulfill the request of the user, the first automated assistant processing cached audio data that captures a spoken utterance of the user comprising the request that the second automated assistant failed to fulfill, or features of the cached audio data, to determine a response that fulfills the request of the user; and providing, by the first automated assistant to the user, the response that fulfills the request of the user.

Type: Grant

Filed: January 13, 2023

Date of Patent: March 18, 2025

Assignee: GOOGLE LLC

Inventors: Victor Carbune, Matthew Sharifi
Method and system of neural network dynamic noise suppression for audio processing

Patent number: 12243545

Abstract: A method and system of neural network dynamic noise suppression is provided for audio processing.

Type: Grant

Filed: December 24, 2021

Date of Patent: March 4, 2025

Assignee: Intel Corporation

Inventors: Adam Kupryjanow, Lukasz Pindor
Processing image data

Patent number: 12244792

Abstract: A method of processing, prior to encoding using an external encoder, image data using an artificial neural network is provided. The external encoder is operable in a plurality of encoding modes. At the neural network, image data representing one or more images is received. The image data is processed using the neural network to generate output data indicative of an encoding mode selected from the plurality of encoding modes of the external encoder. The neural network trained to select using image data an encoding mode of the plurality of encoding modes of the external encoder using one or more differentiable functions configured to emulate an encoding process. The generated output data is outputted from the neural network to the external encoder to enable the external encoder to encode the image data using the selected encoding mode.

Type: Grant

Filed: June 16, 2021

Date of Patent: March 4, 2025

Assignee: Sony Interactive Entertainment Europe Limited

Inventors: Aaron Chadha, Ioannis Andreopoulos
Fault signal locating and identifying method of industrial equipment based on microphone array

Patent number: 12228476

Abstract: Provided is a fault signal locating and identifying method of industrial equipment based on a microphone array. The method includes the steps of: acquiring sound signals and dividing the acquired signals into a training set, a verifying set and a test set; performing feature extraction on the sound signals in the training set, and extracting a phase spectrogram and an amplitude spectrogram of a spectrogram; sending an output of a feature extraction module, as an input, to a CNN, and in each layer of the CNN, learning a translation invariance in the spectrogram by using a 2D CNN; in between the layers of the CNN, normalizing the output by using a batch normalization, and reducing a dimension by using a maximum pooling layer along a frequency axis; sending an output from the layers of the CNN to layers of RNN; using a linear activation function; and inputting an output of a full connection layer to two parallel full connection layer branches for fault identification and fault location, respectively.

Type: Grant

Filed: July 29, 2021

Date of Patent: February 18, 2025

Assignee: NORTHEASTERN UNIVERSITY

Inventors: Feng Luan, Xu Li, Ziming Zhang, Yan Wu, Yuejiao Han, Dianhua Zhang
End-to-end automatic speech recognition system for both conversational and command-and-control speech

Patent number: 12223953

Abstract: A contextual end-to-end automatic speech recognition (ASR) system includes: an audio encoder configured to process input audio signal to produce as output encoded audio signal; a bias encoder configured to produce as output at least one bias entry corresponding to a word to bias for recognition by the ASR system; a transcription token probability prediction network configured to produce as output a probability of a selected transcription token, based at least in part on the output of the bias encoder and the output of the audio encoder; a first attention mechanism configured to receive the at least one bias entry and determine whether the at least one bias entry is suitable to be transcribed at a specific moment of an ongoing transcription; and a second attention mechanism configured to produce prefix penalties for restricting the first attention mechanism to only entries fitting a current transcription context.

Type: Grant

Filed: May 5, 2022

Date of Patent: February 11, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Alejandro Coucheiro Limeres, Junho Park
Front-end clipping using visual cues

Patent number: 12225317

Abstract: According to one embodiment, a method, computer system, and computer program product for front-end clipping reduction is provided. The embodiment may include capturing input, including at least one visual input and at least one audio input. The embodiment may also include modeling data regarding visual cues based on a visual input from the at least one visual input. The embodiment may further include marking one or more timestamps which, in light of the modeled data, correspond to speech in the at least one audio input. The embodiment may also include transmitting an audio input from within the at least one audio input corresponding to the one or more marked timestamps.

Type: Grant

Filed: March 3, 2022

Date of Patent: February 11, 2025

Assignee: International Business Machines Corporation

Inventors: Joseph Sayer, Andrew David Lyell, Benjamin David Cox
Fault tolerant artificial neural network computation in deep learning accelerator having integrated random access memory

Patent number: 12217159

Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory (RAM) to store parameters of an artificial neural network (ANN). The device can generate random bit errors to simulate compromised or corrupted memory cells in a portion of the RAM accessed during computations of a first ANN output. A second ANN output is generated with the random bit errors applied to the data retrieved from the portion of the RAM. Based on a difference between the first and second ANN outputs, the device may adjust the ANN computation to reduce sensitivity to compromised or corrupted memory cells in the portion of the RAM. For example, the sensitivity reduction may be performed through ANN training using machine learning.

Type: Grant

Filed: August 6, 2020

Date of Patent: February 4, 2025

Assignee: Micron Technology, Inc.

Inventor: Poorna Kale
Classifying feedback from transcripts

Patent number: 12217012

Abstract: A method classifies feedback from transcripts. The method includes receiving an utterance from a transcript from a communication session and processing the utterance with a classifier model to identify a topic label for the utterance. The classifier model is trained to identify topic labels for training utterances. The topic labels correspond to topics of clusters of the training utterances. The training utterances are selected using attention values for the training utterances and clustered using encoder values for the utterances. The method further includes routing the communication session using the topic label for the utterance.

Type: Grant

Filed: July 31, 2023

Date of Patent: February 4, 2025

Assignee: Intuit Inc.

Inventors: Nitzan Gado, Adi Shalev, Talia Tron, Noa Haas, Oren Dar, Rami Cohen
Electronic apparatus for speech recognition, and controlling method thereof

Patent number: 12205576

Abstract: An electronic apparatus includes a memory storing a speech recognition model and first recognition information corresponding to a first user voice obtained through the speech recognition model, the speech recognition model including a first network, a second network, and a third network; and a processor configured to: obtain a first vector by inputting voice data corresponding to a second user voice to the first network, obtain a second vector by inputting the first recognition information to the second network which generates a vector based on first weight information, and obtain second recognition information corresponding to the second user voice by inputting the first vector and the second vector to the third network which generates recognition information based on second weight information, wherein at least a part of the second weight information is the same as the first weight information.

Type: Grant

Filed: October 18, 2022

Date of Patent: January 21, 2025

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jinhwan Park, Sungsoo Kim, Sichen Jin, Junmo Park, Dhairya Sandhyana, Changwoo Han
Optimizing inference performance for conformer

Patent number: 12190869

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

Type: Grant

Filed: September 29, 2022

Date of Patent: January 7, 2025

Assignee: Google LLC

Inventors: Tara N. Sainath, Rami Botros, Anmol Gulati, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
Device arbitration for speech processing

Patent number: 12190877

Abstract: Devices and techniques are generally described for nearest device arbitration. In various examples, a first device may receive first audio data representing a wakeword spoken by a first speaker at a first time. In some examples, a second device may receive second audio data representing the wakeword spoken by the first speaker at the first time. In some cases, the first device may generate first feature data representing the first audio data and the second device may generate second feature data representing the second audio data. In various examples, a machine learning model may use the first feature data and the second feature data to generate first prediction data representing a prediction that the first device is closer to the first speaker than the second device.

Type: Grant

Filed: March 2, 2022

Date of Patent: January 7, 2025

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Jarred Barber, Tao Zhang, Yifeng Fan
Natural language processing techniques using hybrid reason code prediction machine learning frameworks

Patent number: 12190062

Abstract: Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing natural language processing operations using a hybrid reason code prediction machine learning framework. Certain embodiments of the present invention utilize systems, methods, and computer program products that perform natural language processing using a hybrid reason code prediction machine learning framework that comprises one or more of the following: (i) a hierarchical transformer machine learning model, (ii) an utterance prediction machine learning model, (iii) an attention distribution generation machine learning model, (iv) an utterance-code pair prediction machine learning model, and (v) a hybrid prediction machine learning model.

Type: Grant

Filed: April 28, 2022

Date of Patent: January 7, 2025

Assignee: Optum, Inc.

Inventors: Suman Roy, Thomas G. Sullivan, Vijay Varma Malladi, Matthew J. Stewart, Abraham Gebru Tesfay, Gaurav Ranjan
Generating audio waveforms using encoder and decoder neural networks

Patent number: 12190896

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.

Type: Grant

Filed: July 1, 2022

Date of Patent: January 7, 2025

Assignee: Google LLC

Inventors: Yunpeng Li, Marco Tagliasacchi, Dominik Roblek, Félix de Chaumont Quitry, Beat Gfeller, Hannah Raphaelle Muckenhirn, Victor Ungureanu, Oleg Rybakov, Karolis Misiunas, Zalán Borsos
Trajectory prediction on top-down scenes and associated model

Patent number: 12183204

Abstract: Techniques are discussed for determining prediction probabilities of an object based on a top-down representation of an environment. Data representing objects in an environment can be captured. Aspects of the environment can be represented as map data. A multi-channel image representing a top-down view of object(s) in the environment can be generated based on the data representing the objects and map data. The multi-channel image can be used to train a machine learned model by minimizing an error between predictions from the machine learned model and a captured trajectory associated with the object. Once trained, the machine learned model can be used to generate prediction probabilities of objects in an environment, and the vehicle can be controlled based on such prediction probabilities.

Type: Grant

Filed: December 6, 2021

Date of Patent: December 31, 2024

Assignee: Zoox, Inc.

Inventors: Xi Joey Hong, Benjamin John Sapp, James William Vaisey Philbin, Kai Zhenyu Wang
Enhanced attention mechanisms

Patent number: 12175202

Abstract: A method includes receiving a sequence of audio features characterizing an utterance and processing, using an encoder neural network, the sequence of audio features to generate a sequence of encodings. At each of a plurality of output steps, the method also includes determining a corresponding hard monotonic attention output to select an encoding from the sequence of encodings, identifying a proper subset of the sequence of encodings based on a position of the selected encoding in the sequence of encodings, and performing soft attention over the proper subset of the sequence of encodings to generate a context vector at the corresponding output step. The method also includes processing, using a decoder neural network, the context vector generated at the corresponding output step to predict a probability distribution over possible output labels at the corresponding output step.

Type: Grant

Filed: November 30, 2021

Date of Patent: December 24, 2024

Assignee: Google LLC

Inventors: Chung-Cheng Chiu, Colin Abraham Raffel
Parameter-efficient multi-task and transfer learning

Patent number: 12169779

Abstract: The present disclosure provides systems and methods that enable parameter-efficient transfer learning, multi-task learning, and/or other forms of model re-purposing such as model personalization or domain adaptation. In particular, as one example, a computing system can obtain a machine-learned model that has been previously trained on a first training dataset to perform a first task. The machine-learned model can include a first set of learnable parameters. The computing system can modify the machine-learned model to include a model patch, where the model patch includes a second set of learnable parameters. The computing system can train the machine-learned model on a second training dataset to perform a second task that is different from the first task, which may include learning new values for the second set of learnable parameters included in the model patch while keeping at least some (e.g., all) of the first set of parameters fixed.

Type: Grant

Filed: May 2, 2023

Date of Patent: December 17, 2024

Assignee: GOOGLE LLC

Inventors: Mark Sandler, Andrew Gerald Howard, Andrey Zhmoginov, Pramod Kaushik Mudrakarta
Intelligent training and education bot

Patent number: 12165633

Abstract: The present disclosure relates to Communicational and Conversational Artificial Intelligence, Machine Perception, Perceptual-User-Interface, and a professional training method. A chatbot may comprise at least one skills module. The chatbot engages with trainee(s) on communicational training on a subject matter provided by the skills module. A trainer may create, remove, or update a skills module with interaction skills and training materials through an onboarding module. A trainee can upload recorded interactions to a skills module for evaluation or for role playing an interaction without a trainer or partner. An administrator may monitor a trainee's performance, and correlate with the organization's metrics. Based on the evaluation, the trainer or chatbot may provide the trainee with feedback and recommended improvement plans. The chatbot may be implemented in an Internet-of-Things or any device. The subject matters may extend to cover different industries/markets.

Type: Grant

Filed: May 10, 2022

Date of Patent: December 10, 2024

Assignee: AskWisy, Inc.

Inventors: Patrick Pak Tak Leong, Kwok-Cheung Ellis Hung
System and method for providing voice assistant service regarding text including anaphora

Patent number: 12159111

Abstract: A system and method for providing a voice assistant service for text including an anaphor are provided. A method, performed by an electronic device, of providing a voice assistant service includes: obtaining first text generated from a first input, detecting a target word within the first text and generating common information related to the detected target word, using a first natural language understanding (NLU) model, obtaining second text generated from a second input, inputting the common information and the second text to a second NLU model, detecting an anaphor included in the second text and outputting an intent and a parameter, based on common information corresponding to the detected anaphor, using the second NLU model, and generating response information related to the intent and the parameter.

Type: Grant

Filed: November 29, 2021

Date of Patent: December 3, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Yeonho Lee, Munjo Kim, Sangwook Park, Youngbin Shin, Kookjin Yeo
Electronic device including personalized text to speech module and method for controlling the same

Patent number: 12159619

Abstract: According to an embodiment, an electronic device comprises: a memory and at least one processor operatively connected with the memory. The at least one processor is configured to: in response to a voice assistant application being executed, identify a pronunciation variant for which an amount of sound source data stored in the memory is less than a specified value among a plurality of pronunciation variants, identify a subject based on the identified pronunciation variant, obtain a question text corresponding to a word including the identified pronunciation variant among a plurality of words included in the subject, output a question speech corresponding to the question text, and receive an utterance after outputting the question speech.

Type: Grant

Filed: March 15, 2022

Date of Patent: December 3, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Cheol Ryu, Kwanghoon Kim, Junesig Sung
Inducing variation in user experience parameters based on outcomes that promote rider safety in intelligent transportation systems

Patent number: 12153423

Abstract: A system for transportation includes a vehicle interface for gathering hormonal state data of a rider in the vehicle. The system further includes an artificial intelligence-based circuit that is trained on a set of outcomes related to rider in-vehicle experience and that induces, responsive to the sensed rider hormonal state data, variation in one or more of the user experience parameters to achieve at least one desired outcome in the set of outcomes. The set of outcomes includes at least one outcome that promotes rider safety. The inducing variation includes control of timing and extent of the variation.

Type: Grant

Filed: October 31, 2022

Date of Patent: November 26, 2024

Assignee: Strong Force TP Portfolio 2022, LLC

Inventor: Charles Howard Cella
Method and system for acoustic model conditioning on non-phoneme information features

Patent number: 12154546

Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

Type: Grant

Filed: July 6, 2023

Date of Patent: November 26, 2024

Assignee: SoundHound AI IP, LLC.

Inventors: Zizu Gowayyed, Keyvan Mohajer
Sequence labeling apparatus, sequence labeling method, and program

Patent number: 12142258

Abstract: Without dividing speech into a unit such as a word or a character, text corresponding to the speech is labeled. A speech distributed representation sequence converting unit 11 converts an acoustic feature sequence into a speech distributed representation. A symbol distributed representation converting unit 12 converts each symbol included in the symbol sequence corresponding to the acoustic feature sequence into a symbol distributed representation. A label estimation unit 13 estimates a label corresponding to the symbol from the fixed-length vector of the symbol generated using the speech distributed representation, the symbol distributed representation, and fixed-length vectors of previous and next symbols.

Type: Grant

Filed: January 10, 2020

Date of Patent: November 12, 2024

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Tomohiro Tanaka, Ryo Masumura, Takanobu Oba
Methods and apparatus to segment audio and determine audio segment similarities

Patent number: 12125472

Abstract: Methods, apparatus, and systems are disclosed to segment audio and determine audio segment similarities. An example apparatus includes at least one memory storing instructions and processor circuitry to execute instructions to at least select an anchor index beat of digital audio, identify a first segment of the digital audio based on the anchor index beat to analyze, the first segment having at least two beats and a respective center beat, concatenate time-frequency data of the at least two beats and the respective center beat to form a matrix of the first segment, generate a first deep feature based on the first segment, the first deep feature indicative of a descriptor of the digital audio, and train internal coefficients to classify the first deep feature as similar to a second deep feature based on the descriptor of the first deep feature and a descriptor of a second deep feature.

Type: Grant

Filed: April 10, 2023

Date of Patent: October 22, 2024

Assignee: Gracenote, Inc.

Inventor: Matthew McCallum
Speech recognition with sequence-to-sequence models

Patent number: 12106749

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Grant

Filed: September 20, 2021

Date of Patent: October 1, 2024

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Machine learning model with depth processing units

Patent number: 12086704

Abstract: Representative embodiments disclose machine learning classifiers used in scenarios such as speech recognition, image captioning, machine translation, or other sequence-to-sequence embodiments. The machine learning classifiers have a plurality of time layers, each layer having a time processing block and a depth processing block. The time processing block is a recurrent neural network such as a Long Short Term Memory (LSTM) network. The depth processing blocks can be an LSTM network, a gated Deep Neural Network (DNN) or a maxout DNN. The depth processing blocks account for the hidden states of each time layer and uses summarized layer information for final input signal feature classification. An attention layer can also be used between the top depth processing block and the output layer.

Type: Grant

Filed: November 3, 2021

Date of Patent: September 10, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jinyu Li, Liang Lu, Changliang Liu, Yifan Gong
Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

Patent number: 12087306

Abstract: In one embodiment, a method includes receiving a user's utterance comprising a word in a custom vocabulary list of the user, generating a previous token to represent a previous audio portion of the utterance, and generating a current token to represent a current audio portion of the utterance by generating a bias embedding by using the previous token to query a trie of wordpieces representing the custom vocabulary list, generating first probabilities of respective first candidate tokens likely uttered in the current audio portion based on the bias embedding and the current audio portion, generating second probabilities of respective second candidate tokens likely uttered after the previous token based on the previous token and the bias embedding, and generating the current token to represent the current audio portion of the utterance based on the first probabilities of the first candidate tokens and the second probabilities of the second candidate tokens.

Type: Grant

Filed: November 24, 2021

Date of Patent: September 10, 2024

Assignee: Meta Platforms, Inc.

Inventors: Duc Hoang Le, FNU Mahaveer, Gil Keren, Christian Fuegen, Yatharth Saraf
Method and apparatus for performing speaker diarization on mixed-bandwidth speech signals

Patent number: 12087307

Abstract: An apparatus for processing speech data may include a processor configured to: separate an input speech into speech signals; identify a bandwidth of each of the speech signals; extract speaker embeddings from the speech signals based on the bandwidth of each of the speech signals, using at least one neural network configured to receive the speech signals and output the speaker embeddings; and cluster the speaker embeddings into one or more speaker clusters, each speaker cluster corresponding to a speaker identity.

Type: Grant

Filed: November 30, 2021

Date of Patent: September 10, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Myungjong Kim, Vijendra Raj Apsingekar, Aviral Anshu, Taeyeon Ki
Animation generation and interpolation with RNN-based variational autoencoders

Patent number: 12079913

Abstract: This specification relates to the generation of animation data using recurrent neural networks. According to a first aspect of this specification, there is described a computer implemented method comprising: sampling an initial hidden state of a recurrent neural network (RNN) from a distribution; generating, using the RNN, a sequence of frames of animation from the initial state of the RNN and an initial set of animation data comprising a known initial frame of animation, the generating comprising, for each generated frame of animation in the sequence of frames of animation: inputting, into the RNN, a respective set of animation data comprising the previous frame of animation data in the sequence of frames of animation; generating, using the RNN and based on a current hidden state of the RNN, the frame of animation data; and updating the hidden state of the RNN based on the input respective set of animation data.

Type: Grant

Filed: March 31, 2022

Date of Patent: September 3, 2024

Assignee: ELECTRONIC ARTS INC.

Inventor: Elaheh Akhoundi
Method and apparatus for target exaggeration for deep learning-based speech enhancement

Patent number: 12073843

Abstract: The present disclosure relates to a speech enhancement apparatus, and specifically, to a method and apparatus for a target exaggeration for deep learning-based speech enhancement. According to an embodiment of the present disclosure, the apparatus for a target exaggeration for deep learning-based speech enhancement can preserve a speech signal from a noisy speech signal and can perform speech enhancement for removing a noise signal.

Type: Grant

Filed: October 26, 2021

Date of Patent: August 27, 2024

Assignee: Gwangju Institute of Science and Technology

Inventors: Jong Won Shin, Han Sol Kim
Voice command detection and prediction

Patent number: 12067975

Abstract: Methods, systems, and apparatuses for predicting an end of a command in a voice recognition input are described herein. The system may receive data comprising a voice input. The system may receive a signal comprising a voice input. The system may detect, in the voice input, data that is associated with a first portion of a command. The system may predict, based on the first portion and while the voice input is being received, a second portion of the command. The prediction may be generated by a machine learning algorithm that is trained based at least in part on historical data comprising user input data. The system may cause execution of the command, based on the first portion and the predicted second portion, prior to an end of the voice input.

Type: Grant

Filed: April 18, 2023

Date of Patent: August 20, 2024

Assignee: Comcast Cable Communications, LLC

Inventors: Rui Min, Hongcheng Wang
Tied and reduced RNN-T

Patent number: 12062363

Abstract: A recurrent neural network-transducer (RNN-T) model improves speech recognition by processing sequential non-blank symbols at each time step after an initial one. The model's prediction network receives a sequence of symbols from a final Softmax layer and employs a shared embedding matrix to create and map embeddings to each symbol, associating them with unique position vectors. These embeddings are weighted according to their similarity to their matching position vector. Subsequently, a joint network of the RNN-T model uses these weighted embeddings to output a probability distribution for potential speech recognition hypotheses at each time step, enabling more accurate transcriptions of spoken language.

Type: Grant

Filed: July 6, 2023

Date of Patent: August 13, 2024

Assignee: Google LLC

Inventors: Rami Botros, Tara Sainath
System and method for enhanced trust

Patent number: 12057128

Abstract: A system and method for publishing encoded identity data that uses at least biometric information as well as non-biometric identity and/or authentication data is disclosed. The system and method can be used for verifying a user's identity against the published encoded identity data on a distributed system, such as a distributed ledger or blockchain. Using this system, a user's identity can be verified efficiently by multiple parties, in sequence, or in parallel, as a user need only enroll in the verification process a single time. The system further includes a biometric enrollment sub-system that allows for a highly secure method of verifying a user based on unique biometric signals, such as features extracted from an audio voice signal.

Type: Grant

Filed: August 27, 2021

Date of Patent: August 6, 2024

Assignee: United Services Automobile Association (USAA)

Inventors: Vijay Jayapalan, Jeffrey David Calusinski
Influencer watch party

Patent number: 12034555

Abstract: Systems and methods for facilitating a watch party are provided. In one example, a method includes: initiating a watch party session for a host user using, presenting content selected by the host user on a first user device during the watch party session, initiating a chat session concurrent with the watch party session, receiving a participation request by a guest user sent from a second user device for participating in the chat session; in response to the participation request, authenticating the guest user; presenting the content selected by the host user on the second user device; synchronizing the presentation of the content on the first user device with the presentation of the second user device; and facilitating communication between the host user and the guest user during the chat session.

Type: Grant

Filed: May 10, 2023

Date of Patent: July 9, 2024

Assignee: DISH Network Technologies India Private Limited

Inventors: Melvin P. Perinchery, Preetham Kumar
Noise floor estimation and noise reduction

Patent number: 12033649

Abstract: Embodiments are disclosed for noise floor estimation and noise reduction, In an embodiment, a method comprises: obtaining an audio signal; dividing the audio signal into a plurality of buffers; determining time-frequency samples for each buffer of the audio signal; for each buffer and for each frequency, determining a median (or mean) and a measure of an amount of variation of energy based on the samples in the buffer and samples in neighboring buffers that together span a specified time range of the audio signal; combining the median (or mean) and the measure of the amount of variation of energy into a cost function; for each frequency: determining a signal energy of a particular buffer of the audio signal that corresponds to a minimum value of the cost function; selecting the signal energy as the estimated noise floor of the audio signal; and reducing, using the estimated noise floor, noise in the audio signal.

Type: Grant

Filed: January 18, 2021

Date of Patent: July 9, 2024

Assignee: DOLBY INTERNATIONAL AB

Inventors: Giulio Cengarle, Antonio Mateos Sole, Davide Scaini
Convolutional neural network optimization mechanism

Patent number: 12020135

Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.

Type: Grant

Filed: August 26, 2021

Date of Patent: June 25, 2024

Assignee: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
Methods and apparatus for automatically providing personalized search results

Patent number: 12019639

Abstract: This application relates to apparatus and methods for generating preference profiles that may be used to rank search results. In some examples, a computing device obtains browsing session data and determines items that were engaged, such as items that were viewed or clicked. The computing device obtains item property data, such as product descriptions, for the items, and applies a dependency parser to the item property data to identify portions that include certain words, such as nouns or adjectives, which are then identified as attributes. The computing device generates attribute data identifying portions of the item property data as item attributes. In some examples, the computing device applies one or more machine learning algorithms to the session data and/or search query to identify item attributes. The computing device may generate a profile that includes the item attributes, and may rank search results based on the attribute data, among other uses.

Type: Grant

Filed: January 25, 2023

Date of Patent: June 25, 2024

Assignee: Walmart Apollo, LLC

Inventors: Rahul Iyer, Soumya Wadhwa, Stephen Dean Guo, Kannan Achan
Systems and methods for fast filtering of audio keyword search

Patent number: 12020697

Abstract: An audio keyword searcher arranged to identify a voice segment of a received audio signal; identify, by an automatic speech recognition engine, one or more phonemes included in the voice segment; output, from the automatic speech recognition engine, the one or more phonemes to a keyword filter to detect whether the voice segment includes any of the one or more first keywords of the first keyword list and, if detected, output the one or more phonemes included in the voice segment to a decoder but, if not detected, not output the one or more phonemes included in the voice segment to the decoder. If the one or more phonemes are output to the decoder: generate a word lattice associated with the voice segment; search the word lattice for one or more second keywords, and determine whether the voice segment includes the one or more second keywords.

Type: Grant

Filed: July 15, 2020

Date of Patent: June 25, 2024

Assignee: Raytheon Applied Signal Technology, Inc.

Inventor: Jonathan C. Wintrode
Task agnostic open-set prototypes for few-shot open-set recognition

Patent number: 12019641

Abstract: Systems and techniques are provided for processing one or more data samples. For example, a neural network classifier can be trained to perform few-shot open-set recognition (FSOSR) based on a task-agnostic open-set prototype. A process can include determining one or more prototype representations for each class included in a plurality of support samples. A task-agnostic open-set prototype representation can be determined, in a same learned metric space as the one or more prototype representations. One or more distance metrics can be determined for each query sample of one or more query samples, based on the one or more prototype representations and the task-agnostic open-set prototype representation. Based on the one or more distance metrics, each query sample can be classified into one of classes associated with the one or more prototype representations or an open-set class associated with the task-agnostic open-set prototype representation.

Type: Grant

Filed: January 12, 2023

Date of Patent: June 25, 2024

Assignee: QUALCOMM Incorporated

Inventors: Byeonggeun Kim, Juntae Lee, Simyung Chang
Multi-task machine learning architectures and training procedures

Patent number: 12008459

Abstract: This document relates to architectures and training procedures for multi-task machine learning models, such as neural networks. One example method involves providing a multi-task machine learning model having one or more shared layers and two or more task-specific layers. The method can also involve performing a pretraining stage on the one or more shared layers using one or more unsupervised prediction tasks. The method can also involve performing a tuning stage on the one or more shared layers and the two or more task-specific layers using respective task-specific objectives.

Type: Grant

Filed: June 17, 2019

Date of Patent: June 11, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Weizhu Chen, Pengcheng He, Xiaodong Liu, Jianfeng Gao
Dialogue system, vehicle, and method of controlling dialogue system

Patent number: 11996099

Abstract: An embodiment dialogue system includes a speech recognizer configured to convert an utterance of a user into an utterance text, a natural language understanding module configured to identify an intention of the user based on the utterance text, and a controller configured to generate a first control signal for performing control corresponding to the intention of the user, identify whether an additional control item related to the control corresponding to the intention of the user exists, and in response to the additional control item existing, generate a second control signal for displaying information about the additional control item on a display.

Type: Grant

Filed: November 18, 2021

Date of Patent: May 28, 2024

Assignees: Hyundai Motor Company, Kia Corporation

Inventors: Sungwang Kim, Donghyeon Lee, Minjae Park
Systems and methods for video and language pre-training

Patent number: 11989941

Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

Type: Grant

Filed: December 30, 2021

Date of Patent: May 21, 2024

Assignee: Salesforce, Inc.

Inventors: Dongxu Li, Junnan Li, Chu Hong Hoi
Particular-sound detector and method, and program

Patent number: 11990151

Abstract: The present technology relates to a particular-sound detector and method, and a program that make it possible to improve the performance of detecting particular sounds. The particular-sound detector includes a particular-sound detecting section that detects a particular sound on a basis of a plurality of audio signals obtained by collecting sounds by a plurality of microphones provided to a wearable device. In addition, the plurality of the microphones includes two microphones that are equidistant at least from a sound source of the particular sound, and one microphone arranged at a predetermined position. The present technology can be applied to headphones.

Type: Grant

Filed: December 12, 2019

Date of Patent: May 21, 2024

Assignee: Sony Group Corporation

Inventors: Yuki Yamamoto, Yuji Tokozume, Toru Chinen

1 2 3 4 5 … next