Patents Examined by Hai Phan

Generating goal-oriented dialogues from documents

Patent number: 12682177

Abstract: Provided is a computer-implemented method, system, and computer program product for generating a goal-oriented dialogue from a grounding document. A processor may analyze a corpus of text. The processor may identify, based on the analyzing, one or more semantic structures that can be used to simulate a dialogue. The processor may generate, based on the identifying, a simulated dialogue, the simulated dialogue including one or more utterances from a simulated agent and one or more utterances from a simulated user to form a dialogue flow.

Type: Grant

Filed: June 24, 2022

Date of Patent: July 14, 2026

Assignee: International Business Machines Corporation

Inventors: Song Feng, Chulaka Gunasekara, Hui Wan, Jatin Ganhotra, Siva Sankalp Patel, Sachindra Joshi
Systems and methods for providing user interfaces to converse with a corpus of electronic documents via a large language model

Patent number: 12675510

Abstract: Systems and methods for providing user interfaces to converse with a corpus of electronic documents via a large language model are disclosed. Exemplary implementations may: present a user interface configured to obtain entry of user input from a user to select one or more documents to be provided as input to a large language model for an individual conversation; responsive to selection of the individual conversation, provide an individual query as a prompt to the large language model; obtain and present an individual reply from the large language model; determine an individual document from the one or more documents that is relevant to the individual reply; present the individual document in a particular portion of the user interface; and/or perform other steps.

Type: Grant

Filed: May 15, 2023

Date of Patent: July 7, 2026

Assignee: Instabase, Inc.

Inventors: Alagu Chockalingam, Aayush Dutt, Varun Jain, Timothy Serkes, Hariharan Thirugnanam, Subash Chandran Thirumaran
Interactive decoding of words from phoneme score distributions

Patent number: 12658181

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for interactive decoding of a word sequence.

Type: Grant

Filed: April 7, 2022

Date of Patent: June 16, 2026

Assignee: GDM Holding LLC

Inventors: Ioannis Alexandros Assael, Brendan Shillingford, Misha Man Ray Denil
Method and apparatus for processing audio for scene classification

Patent number: 12640161

Abstract: An audio processing method includes obtaining a first audio signal corresponding to a first frame; extracting a first feature vector by inputting the first audio signal to a first neural network; obtaining a temporal correlation vector representing a similarity between the first feature vector and at least one second feature vector extracted from at least one second audio signal corresponding to at least one second frame that is temporally before the first frame; and classifying a scene of the first audio signal by inputting the first feature vector, the at least one second feature vector, and the temporal correlation vector to a second neural network.

Type: Grant

Filed: May 9, 2023

Date of Patent: May 26, 2026

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kyungrae Kim, Woohyun Nam
System and methods for key-phrase extraction

Patent number: 12632658

Abstract: Systems and methods for key-phrase extraction are described. The systems and methods include receiving a transcript including a text paragraph and generating key-phrase data for the text paragraph using a key-phrase extraction network. The key-phrase extraction network is trained to identify domain-relevant key-phrase data based on domain data obtained using a domain discriminator network. The systems and methods further include generating meta-data for the transcript based on the key-phrase data.

Type: Grant

Filed: February 14, 2022

Date of Patent: May 19, 2026

Assignee: ADOBE INC.

Inventors: Amir Pouran Ben Veyseh, Franck Dernoncourt, Walter W. Chang, Trung Huu Bui, Hanieh Deilamsalehy, Seunghyun Yoon, Rajiv Bhawanji Jain, Quan Hung Tran, Varun Manjunatha
Locating a moving acoustic source

Patent number: 12621623

Abstract: Processing sound signals acquired by at least one microphone, to locate a sound source emitting from a plurality of discrete positions at respective discrete points in time, in a space comprising at least one planar reflective surface. The method includes: obtaining: a first vector u ? 0 ( k ) determining a direction of a first acoustic path, direct between the source and the microphone, a second vector u ? n ( k ) representing a second acoustic path resulting from a specular reflection and arriving at the microphone, and a delay ? n ( k ) of second path at the microphone, compared to the direct path; exploiting a property of the specular reflection according to which a Euclidean distance between two positions of the source at two discrete points in time is equal to a Euclidean distance between two respective positions of images of the source and derived from one or more same reflections, respectively at said two discrete points in time.

Type: Grant

Filed: February 13, 2023

Date of Patent: May 5, 2026

Assignee: ORANGE

Inventors: Srdan Kitic, Jérôme Daniel
AI-driven system and methods for personalized virtual medical and spiritual advisor avatars with adaptive therapeutic audio, biometric monitoring, and immersive AR/VR healthcare interfaces

Patent number: 12609104

Abstract: A computer-implemented system personalizes virtual advisors for immersive healthcare by creating virtual medical and spiritual avatars that resemble trusted authority figures using deepfake technology and multimodal deep neural networks. The virtual medical advisor tailors guidance by analyzing unstructured electronic health record data with natural language processing and BERT-based techniques while adapting its communication based on real-time physiological data from sensors like EEG and photoplethysmography. Concurrently, the virtual spiritual advisor offers faith-based counseling by factoring in user-declared spiritual preferences and sacred text analysis weighted for doctrinal considerations. Additional features include gamification with cryptocurrency tokens or NFTs for health activities, blockchain-based audit trails for HIPAA compliance, and federated learning with differential privacy.

Type: Grant

Filed: May 26, 2025

Date of Patent: April 21, 2026

Inventor: Michael P. Tabibian
Neutralizing distortion in audio data

Patent number: 12609127

Abstract: A system and process for pre-distorting TV shows and/or movie media enables digital transmission of the media via MPEG4/AC3 (or AAC) or MPEG4/AC4 codec for broadcast or streaming over the Internet with enhanced speech intelligibility. Processing of the entire media file is performed using pre-distortion techniques and algorithms including NN models (which includes DNN, RNN, CNN, and similar NN models) that are trained on perceptual codec induced noise, quantization noise, dynamic power level adjustment, frequency response adjustment, pitch and glottal impulse response adjustment, and other techniques. The pre-distortion process is iterative, and all combinations of pre-distortions to combat perceptual codec noise are attempted, and the result scored by an automatic speech recognition engine. The best speech recognition results and highest intelligibility scores are considered to indicate the best pre-distortion to be applied.

Type: Grant

Filed: November 1, 2023

Date of Patent: April 21, 2026

Inventors: Merrill Solomon, Glenn Bernard
Speech translation method, device, and storage medium

Patent number: 12602553

Abstract: Provided are a speech translation method, a device, and a storage medium. The method includes: extracting, through an encoder of an end-to-end speech translation model, the semantic feature of a to-be-processed speech; decoding, through a decoder of the end-to-end speech translation model, a source language text corresponding to the semantic feature from the semantic feature; decoding, through the decoder of the end-to-end speech translation model, the semantic feature according to the source language text to obtain a text sequence corresponding to the semantic feature; and splitting the text sequence to obtain a target language text corresponding to the to-be-processed speech.

Type: Grant

Filed: September 2, 2021

Date of Patent: April 14, 2026

Assignee: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

Inventors: Lei Li, Mingxuan Wang, Qianqian Dong, Chengqi Zhao
Voice recognition using accelerometers for sensing bone conduction

Patent number: 12603087

Abstract: Voice command recognition and natural language recognition are carried out using an accelerometer that senses signals from the vibrations of one or more bones of a user and receives no audio input. Since word recognition is made possible using solely the signal from the accelerometer from a person's bone conduction as they speak, an acoustic microphone is not needed and thus not used to collect data for word recognition. According to one embodiment, a housing contains an accelerometer and a processor, both within the same housing. The accelerometer is preferably a MEMS accelerometer which is capable of sensing the vibrations that are present in the bone of a user as the user is speaking words. A machine learning algorithm is applied to the collected data to correctly recognize words spoken by a person with significant difficulties in creating audible language.

Type: Grant

Filed: August 4, 2022

Date of Patent: April 14, 2026

Assignee: STMICROELECTRONICS S.R.L.

Inventors: Enrico Rosario Alessi, Fabio Passaniti, Nunziata Ivana Guarneri
Binarual rendering

Patent number: 12604152

Abstract: An aspect of the present disclosure relates to processing audio comprising decoding a first bitstream (b1) to obtain decoded immersive audio content (A), decoding a second bitstream (bp) to obtain pose information (P, V, V?) associated with a user of a lightweight processing device, determining a first head-pose (P?) based on the pose information, providing a downmix representation (Dmx) of the immersive audio content (A) corresponding to the first head pose (P?), rendering a set of binaural representations (BINn) of the immersive audio content (A), wherein the binaural representations correspond to a second set of head poses (Pn), computing reconstruction metadata (M) to enable reconstruction of the set of binaural representations from the downmix representation (Dmx), the metadata (M) including the first head pose (P?), and encoding the downmix representation (Dmx) and the reconstruction metadata (M) in a third bitstream (b2).

Type: Grant

Filed: February 7, 2024

Date of Patent: April 14, 2026

Assignees: Dolby Laboratories Licensing Corporation, DOLBY INTERNATIONAL AB

Inventors: Rishabh Tyagi, Stefan Bruhn, Juan Felix Torres
Detecting unintended memorization in language-model-fused ASR systems

Patent number: 12579975

Abstract: A method includes inserting a set of canary text samples into a corpus of training text samples and training an external language model on the corpus of training text samples and the set of canary text samples inserted into the corpus of training text samples. For each canary text sample, the method also includes generating a corresponding synthetic speech utterance and generating an initial transcription for the corresponding synthetic speech utterance. The method also includes rescoring the initial transcription generated for each corresponding synthetic speech utterance using the external language model. The method also includes determining a word error rate (WER) of the external language model based on the rescored initial transcriptions and the canary text samples and detecting memorization of the canary text samples by the external language model based on the WER of the external language model.

Type: Grant

Filed: April 19, 2023

Date of Patent: March 17, 2026

Assignee: Google LLC

Inventors: Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews
Combining domain-specific ontologies for language processing

Patent number: 12562244

Abstract: Methods and systems for performing a natural language processing task include identifying hypernym/hyponym relations in a depth-wise ontology and identifying synonymy relations in a breadth-wise ontology. The depth-wise ontology and the breadth-wise ontology are combined into a combined ontology using the identified hypernym/hyponym relations and the identified synonymy relations. Enhanced hypernym/hyponym relations are embedded using the combined ontology. A natural language processing task is performed using the enhanced hypernym/hyponym relations and the combined ontology.

Type: Grant

Filed: March 1, 2021

Date of Patent: February 24, 2026

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth Lee Clarkson, Sanjana Sahayaraj
Mitigating false positives and/or false negatives in hot word free adaptation of automated assistant

Patent number: 12548558

Abstract: Hot word free adaptation, of one or more function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to various techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of personalized parameter(s) for at least some user(s) of an assistant device. The personalized parameter(s) are utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant.

Type: Grant

Filed: January 19, 2022

Date of Patent: February 10, 2026

Assignee: GOOGLE LLC

Inventors: Tuan Nguyen, Gabriel Leblanc, Tzu-Chan Chuang, Qiong Huang, William A. Truong, Yixing Cai, Alexey Galata, Yuan Yuan
Mixture-of-expert approach to reinforcement learning-based dialogue management

Patent number: 12530536

Abstract: Systems and methods for dialogue response prediction can leverage a plurality of machine-learned language models to generate a plurality of candidate outputs, which can be processed by a dialogue management model to determine a predicted dialogue response. The plurality of machine-learned language models can include a plurality of experts trained on different intents, emotions, and/or tasks. The particular candidate output selected may be selected by the dialogue management model based on semantics determined based on a language representation. The language representation can be a representation generated by processing the conversation history of a conversation to determine conversation semantics.

Type: Grant

Filed: February 23, 2023

Date of Patent: January 20, 2026

Assignee: GOOGLE LLC

Inventors: Yinlam Chow, Ofir Nachum, Azamat Tulepbergenov
Noise suppression for speech enhancement

Patent number: 12531078

Abstract: A noise suppression method includes transforming a time-domain input signal into an input spectrum that is the spectrum of the input signal, the input signal comprising speech components and noise components, and the input spectrum comprising a speech spectrum that is the spectrum of the speech components and a noise spectrum that is the spectrum of the noise components, smoothing magnitudes of the input spectrum to provide a smoothed-magnitude input spectrum, and estimating basic suppression filter coefficients from the input spectrum and the smoothed input spectrum. The method further includes determining noise suppression filter coefficients from the estimated basic suppression filter coefficients and a spectral correlation factor, the spectral correlation factor indicating whether speech is present in the input signal or not, filtering the input spectrum based on the noise suppression filter coefficients to generate an output spectrum; and transforming the output spectrum into a time-domain output signal.

Type: Grant

Filed: March 30, 2020

Date of Patent: January 20, 2026

Assignee: Harman Becker Automotive Systems GmbH

Inventor: Vasudev Kandade Rajan
Spontaneous text to speech (TTS) synthesis

Patent number: 12505825

Abstract: The present disclosure provides methods and apparatuses for spontaneous text-to-speech (TTS) synthesis. A target text may be obtained. A fluency reference factor may be determined based at least on the target text. An acoustic feature corresponding to the target text may be generated with the fluency reference factor. A speech waveform corresponding to the target text may be generated based on the acoustic feature.

Type: Grant

Filed: April 22, 2021

Date of Patent: December 23, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ran Zhang, Jian Luan, Yahuan Cong
Voice interaction method and electronic device

Patent number: 12494199

Abstract: This application provides a voice interaction method and an electronic device, and relates to the field of artificial intelligence (AI) technologies and the field of voice processing technologies. An example solution includes: An electronic device receiving first voice information sent by a second user, and the electronic device recognizing the first voice information in response to receiving the first voice information. The first voice information is used to request a voice conversation with a first user. The electronic device may have, on a basis that the electronic device recognizes that the first voice information is voice information of the second user, a voice conversation with the second user by imitating a voice of the first user and in a mode in which the first user has a voice conversation with the second user.

Type: Grant

Filed: September 26, 2022

Date of Patent: December 9, 2025

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Weiguo Li, Li Qian, Xin Jiang
Multi-scale speaker diarization for conversational AI systems and applications

Patent number: 12482487

Abstract: Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing speaker diarization. The techniques include obtaining a speaker embedding for various reference times of a speech and for various differently-sized time intervals, identifying a plurality of clusters, each cluster associated with a different speaker of the speech. The techniques further include computing, using the speaker embeddings, a set of embedding weights for various differently-sized time intervals, and identifying, using the computed set of the embedding weights, one or more speakers speaking at a respective reference time.

Type: Grant

Filed: November 3, 2022

Date of Patent: November 25, 2025

Assignee: NVIDIA Corporation

Inventors: Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
Foreign language phrases learning system based on basic sentence pattern unit decomposition

Patent number: 12475312

Abstract: Disclosed is a foreign language phrases learning system based on basic sentence pattern unit decomposition, and implemented in a computing device including at least one processor and at least one memory for storing instructions executable by the processor, which includes: a sentence decomposition unit, when a natural language composed of a foreign language is input from a user, for decomposing a compound sentence corresponding to the input natural language into a plurality of basic sentences; a sentence pattern determination unit for checking one of morphemes or words contained in each of the decomposed basic sentences when the compound sentence is completely decomposed by the sentence decomposition unit, thereby determining a sentence pattern for each of the basic sentences; an additional information designation unit, when the sentence pattern for each of the basic sentences is completely determined by the sentence pattern determination unit, for designating some of the morphemes or the words contained in ea

Type: Grant

Filed: November 25, 2021

Date of Patent: November 18, 2025

Assignee: Dr SONG CO., LTD.

Inventors: Hwan Goo Song, Hyun Ji Yoon, Su Hyun Yoon, Hyun Suk Dan, Ki Ho Kim

1 2 3 4 5 … next