Patents Examined by Hai Phan
  • Patent number: 12640161
    Abstract: An audio processing method includes obtaining a first audio signal corresponding to a first frame; extracting a first feature vector by inputting the first audio signal to a first neural network; obtaining a temporal correlation vector representing a similarity between the first feature vector and at least one second feature vector extracted from at least one second audio signal corresponding to at least one second frame that is temporally before the first frame; and classifying a scene of the first audio signal by inputting the first feature vector, the at least one second feature vector, and the temporal correlation vector to a second neural network.
    Type: Grant
    Filed: May 9, 2023
    Date of Patent: May 26, 2026
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kyungrae Kim, Woohyun Nam
  • Patent number: 12632658
    Abstract: Systems and methods for key-phrase extraction are described. The systems and methods include receiving a transcript including a text paragraph and generating key-phrase data for the text paragraph using a key-phrase extraction network. The key-phrase extraction network is trained to identify domain-relevant key-phrase data based on domain data obtained using a domain discriminator network. The systems and methods further include generating meta-data for the transcript based on the key-phrase data.
    Type: Grant
    Filed: February 14, 2022
    Date of Patent: May 19, 2026
    Assignee: ADOBE INC.
    Inventors: Amir Pouran Ben Veyseh, Franck Dernoncourt, Walter W. Chang, Trung Huu Bui, Hanieh Deilamsalehy, Seunghyun Yoon, Rajiv Bhawanji Jain, Quan Hung Tran, Varun Manjunatha
  • Patent number: 12621623
    Abstract: Processing sound signals acquired by at least one microphone, to locate a sound source emitting from a plurality of discrete positions at respective discrete points in time, in a space comprising at least one planar reflective surface. The method includes: obtaining: a first vector u ? 0 ( k ) determining a direction of a first acoustic path, direct between the source and the microphone, a second vector u ? n ( k ) representing a second acoustic path resulting from a specular reflection and arriving at the microphone, and a delay ? n ( k ) of second path at the microphone, compared to the direct path; exploiting a property of the specular reflection according to which a Euclidean distance between two positions of the source at two discrete points in time is equal to a Euclidean distance between two respective positions of images of the source and derived from one or more same reflections, respectively at said two discrete points in time.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: May 5, 2026
    Assignee: ORANGE
    Inventors: Srdan Kitic, Jérôme Daniel
  • Patent number: 12609104
    Abstract: A computer-implemented system personalizes virtual advisors for immersive healthcare by creating virtual medical and spiritual avatars that resemble trusted authority figures using deepfake technology and multimodal deep neural networks. The virtual medical advisor tailors guidance by analyzing unstructured electronic health record data with natural language processing and BERT-based techniques while adapting its communication based on real-time physiological data from sensors like EEG and photoplethysmography. Concurrently, the virtual spiritual advisor offers faith-based counseling by factoring in user-declared spiritual preferences and sacred text analysis weighted for doctrinal considerations. Additional features include gamification with cryptocurrency tokens or NFTs for health activities, blockchain-based audit trails for HIPAA compliance, and federated learning with differential privacy.
    Type: Grant
    Filed: May 26, 2025
    Date of Patent: April 21, 2026
    Inventor: Michael P. Tabibian
  • Patent number: 12609127
    Abstract: A system and process for pre-distorting TV shows and/or movie media enables digital transmission of the media via MPEG4/AC3 (or AAC) or MPEG4/AC4 codec for broadcast or streaming over the Internet with enhanced speech intelligibility. Processing of the entire media file is performed using pre-distortion techniques and algorithms including NN models (which includes DNN, RNN, CNN, and similar NN models) that are trained on perceptual codec induced noise, quantization noise, dynamic power level adjustment, frequency response adjustment, pitch and glottal impulse response adjustment, and other techniques. The pre-distortion process is iterative, and all combinations of pre-distortions to combat perceptual codec noise are attempted, and the result scored by an automatic speech recognition engine. The best speech recognition results and highest intelligibility scores are considered to indicate the best pre-distortion to be applied.
    Type: Grant
    Filed: November 1, 2023
    Date of Patent: April 21, 2026
    Inventors: Merrill Solomon, Glenn Bernard
  • Patent number: 12602553
    Abstract: Provided are a speech translation method, a device, and a storage medium. The method includes: extracting, through an encoder of an end-to-end speech translation model, the semantic feature of a to-be-processed speech; decoding, through a decoder of the end-to-end speech translation model, a source language text corresponding to the semantic feature from the semantic feature; decoding, through the decoder of the end-to-end speech translation model, the semantic feature according to the source language text to obtain a text sequence corresponding to the semantic feature; and splitting the text sequence to obtain a target language text corresponding to the to-be-processed speech.
    Type: Grant
    Filed: September 2, 2021
    Date of Patent: April 14, 2026
    Assignee: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
    Inventors: Lei Li, Mingxuan Wang, Qianqian Dong, Chengqi Zhao
  • Patent number: 12603087
    Abstract: Voice command recognition and natural language recognition are carried out using an accelerometer that senses signals from the vibrations of one or more bones of a user and receives no audio input. Since word recognition is made possible using solely the signal from the accelerometer from a person's bone conduction as they speak, an acoustic microphone is not needed and thus not used to collect data for word recognition. According to one embodiment, a housing contains an accelerometer and a processor, both within the same housing. The accelerometer is preferably a MEMS accelerometer which is capable of sensing the vibrations that are present in the bone of a user as the user is speaking words. A machine learning algorithm is applied to the collected data to correctly recognize words spoken by a person with significant difficulties in creating audible language.
    Type: Grant
    Filed: August 4, 2022
    Date of Patent: April 14, 2026
    Assignee: STMICROELECTRONICS S.R.L.
    Inventors: Enrico Rosario Alessi, Fabio Passaniti, Nunziata Ivana Guarneri
  • Patent number: 12604152
    Abstract: An aspect of the present disclosure relates to processing audio comprising decoding a first bitstream (b1) to obtain decoded immersive audio content (A), decoding a second bitstream (bp) to obtain pose information (P, V, V?) associated with a user of a lightweight processing device, determining a first head-pose (P?) based on the pose information, providing a downmix representation (Dmx) of the immersive audio content (A) corresponding to the first head pose (P?), rendering a set of binaural representations (BINn) of the immersive audio content (A), wherein the binaural representations correspond to a second set of head poses (Pn), computing reconstruction metadata (M) to enable reconstruction of the set of binaural representations from the downmix representation (Dmx), the metadata (M) including the first head pose (P?), and encoding the downmix representation (Dmx) and the reconstruction metadata (M) in a third bitstream (b2).
    Type: Grant
    Filed: February 7, 2024
    Date of Patent: April 14, 2026
    Assignees: Dolby Laboratories Licensing Corporation, DOLBY INTERNATIONAL AB
    Inventors: Rishabh Tyagi, Stefan Bruhn, Juan Felix Torres
  • Patent number: 12579975
    Abstract: A method includes inserting a set of canary text samples into a corpus of training text samples and training an external language model on the corpus of training text samples and the set of canary text samples inserted into the corpus of training text samples. For each canary text sample, the method also includes generating a corresponding synthetic speech utterance and generating an initial transcription for the corresponding synthetic speech utterance. The method also includes rescoring the initial transcription generated for each corresponding synthetic speech utterance using the external language model. The method also includes determining a word error rate (WER) of the external language model based on the rescored initial transcriptions and the canary text samples and detecting memorization of the canary text samples by the external language model based on the WER of the external language model.
    Type: Grant
    Filed: April 19, 2023
    Date of Patent: March 17, 2026
    Assignee: Google LLC
    Inventors: Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews
  • Patent number: 12562244
    Abstract: Methods and systems for performing a natural language processing task include identifying hypernym/hyponym relations in a depth-wise ontology and identifying synonymy relations in a breadth-wise ontology. The depth-wise ontology and the breadth-wise ontology are combined into a combined ontology using the identified hypernym/hyponym relations and the identified synonymy relations. Enhanced hypernym/hyponym relations are embedded using the combined ontology. A natural language processing task is performed using the enhanced hypernym/hyponym relations and the combined ontology.
    Type: Grant
    Filed: March 1, 2021
    Date of Patent: February 24, 2026
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth Lee Clarkson, Sanjana Sahayaraj
  • Patent number: 12548558
    Abstract: Hot word free adaptation, of one or more function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to various techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of personalized parameter(s) for at least some user(s) of an assistant device. The personalized parameter(s) are utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant.
    Type: Grant
    Filed: January 19, 2022
    Date of Patent: February 10, 2026
    Assignee: GOOGLE LLC
    Inventors: Tuan Nguyen, Gabriel Leblanc, Tzu-Chan Chuang, Qiong Huang, William A. Truong, Yixing Cai, Alexey Galata, Yuan Yuan
  • Patent number: 12530536
    Abstract: Systems and methods for dialogue response prediction can leverage a plurality of machine-learned language models to generate a plurality of candidate outputs, which can be processed by a dialogue management model to determine a predicted dialogue response. The plurality of machine-learned language models can include a plurality of experts trained on different intents, emotions, and/or tasks. The particular candidate output selected may be selected by the dialogue management model based on semantics determined based on a language representation. The language representation can be a representation generated by processing the conversation history of a conversation to determine conversation semantics.
    Type: Grant
    Filed: February 23, 2023
    Date of Patent: January 20, 2026
    Assignee: GOOGLE LLC
    Inventors: Yinlam Chow, Ofir Nachum, Azamat Tulepbergenov
  • Patent number: 12531078
    Abstract: A noise suppression method includes transforming a time-domain input signal into an input spectrum that is the spectrum of the input signal, the input signal comprising speech components and noise components, and the input spectrum comprising a speech spectrum that is the spectrum of the speech components and a noise spectrum that is the spectrum of the noise components, smoothing magnitudes of the input spectrum to provide a smoothed-magnitude input spectrum, and estimating basic suppression filter coefficients from the input spectrum and the smoothed input spectrum. The method further includes determining noise suppression filter coefficients from the estimated basic suppression filter coefficients and a spectral correlation factor, the spectral correlation factor indicating whether speech is present in the input signal or not, filtering the input spectrum based on the noise suppression filter coefficients to generate an output spectrum; and transforming the output spectrum into a time-domain output signal.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: January 20, 2026
    Assignee: Harman Becker Automotive Systems GmbH
    Inventor: Vasudev Kandade Rajan
  • Patent number: 12505825
    Abstract: The present disclosure provides methods and apparatuses for spontaneous text-to-speech (TTS) synthesis. A target text may be obtained. A fluency reference factor may be determined based at least on the target text. An acoustic feature corresponding to the target text may be generated with the fluency reference factor. A speech waveform corresponding to the target text may be generated based on the acoustic feature.
    Type: Grant
    Filed: April 22, 2021
    Date of Patent: December 23, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ran Zhang, Jian Luan, Yahuan Cong
  • Patent number: 12494199
    Abstract: This application provides a voice interaction method and an electronic device, and relates to the field of artificial intelligence (AI) technologies and the field of voice processing technologies. An example solution includes: An electronic device receiving first voice information sent by a second user, and the electronic device recognizing the first voice information in response to receiving the first voice information. The first voice information is used to request a voice conversation with a first user. The electronic device may have, on a basis that the electronic device recognizes that the first voice information is voice information of the second user, a voice conversation with the second user by imitating a voice of the first user and in a mode in which the first user has a voice conversation with the second user.
    Type: Grant
    Filed: September 26, 2022
    Date of Patent: December 9, 2025
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Weiguo Li, Li Qian, Xin Jiang
  • Patent number: 12482487
    Abstract: Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing speaker diarization. The techniques include obtaining a speaker embedding for various reference times of a speech and for various differently-sized time intervals, identifying a plurality of clusters, each cluster associated with a different speaker of the speech. The techniques further include computing, using the speaker embeddings, a set of embedding weights for various differently-sized time intervals, and identifying, using the computed set of the embedding weights, one or more speakers speaking at a respective reference time.
    Type: Grant
    Filed: November 3, 2022
    Date of Patent: November 25, 2025
    Assignee: NVIDIA Corporation
    Inventors: Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
  • Patent number: 12475312
    Abstract: Disclosed is a foreign language phrases learning system based on basic sentence pattern unit decomposition, and implemented in a computing device including at least one processor and at least one memory for storing instructions executable by the processor, which includes: a sentence decomposition unit, when a natural language composed of a foreign language is input from a user, for decomposing a compound sentence corresponding to the input natural language into a plurality of basic sentences; a sentence pattern determination unit for checking one of morphemes or words contained in each of the decomposed basic sentences when the compound sentence is completely decomposed by the sentence decomposition unit, thereby determining a sentence pattern for each of the basic sentences; an additional information designation unit, when the sentence pattern for each of the basic sentences is completely determined by the sentence pattern determination unit, for designating some of the morphemes or the words contained in ea
    Type: Grant
    Filed: November 25, 2021
    Date of Patent: November 18, 2025
    Assignee: Dr SONG CO., LTD.
    Inventors: Hwan Goo Song, Hyun Ji Yoon, Su Hyun Yoon, Hyun Suk Dan, Ki Ho Kim
  • Patent number: 12456457
    Abstract: A method, computer program, and computer system is provided for automated speech recognition. Audio data corresponding to one or more speakers is received. Covariance matrices of target speech and noise associated with the received audio data are estimated based on a gated recurrent unit-based network. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated by a minimum variance distortionless response function based on the estimated covariance matrices.
    Type: Grant
    Filed: May 23, 2022
    Date of Patent: October 28, 2025
    Assignee: TENCENT AMERICA LLC
    Inventors: Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
  • Patent number: 12451142
    Abstract: Implementations relate to an automated assistant that is responsive, without requiring an invocation phrase or other invocation input(s), to certain spoken utterances when certain display content is being accessed by a user. The display content can be processed to identify certain inputs and/or other intents and parameters that are associated with assistant operations and are relevant to the display content. Thereafter, the automated assistant can determine whether any spoken utterances from the user correspond to those certain inputs, intents, and/or parameters. In response to receiving such a spoken utterance, the automated assistant can initialize performance of the relevant operation without necessitating that the user provides a preceding invocation phrase or other invocation input(s). When other display content is being accessed, the automated assistant can repeat the process for other inputs and operations.
    Type: Grant
    Filed: July 28, 2022
    Date of Patent: October 21, 2025
    Assignee: GOOGLE LLC
    Inventors: Pu-sen Chao, Alex Fandrianto, Muhammad Umair
  • Patent number: 12443805
    Abstract: A method for generating a first case dataset in a first language. The method includes receiving adverse event data. The method further includes determining case data including general case data and regional case data and providing the case data to a translator computing device to enable display on a user interface including multiple duolingual text fields with a first language text field including at least a portion of the text data in the first language and a second language text field adjacent the first language text field. The method further includes receiving the text data in the second language from a translator computing device. The text data in the second language is received via the second language text fields of the plurality of duolingual text fields. The method further includes generating and outputting the first case dataset including the text data in the first language.
    Type: Grant
    Filed: December 8, 2022
    Date of Patent: October 14, 2025
    Assignee: Veeva Systems Inc.
    Inventors: Marius K. Mortensen, Asaf Roll, Raagi Pandya, Ying Zhuo Wang, Florian Emmanuel Bernard Gilbert Letourneux, Zhen Tan, Piotr Kuchnio, Yangyang Xu