Patents Examined by Michael Colucci

Human-machine interaction processing method and apparatus thereof

Patent number: 11282516

Abstract: Embodiments of the present disclosure provide a human-machine interaction processing method, an apparatus thereof, a user terminal, a processing server and a system. On the user terminal side, the method includes: receiving an interaction request voice inputted from a user, and collecting video data of the user when inputting the interaction request voice; obtaining an interaction response voice corresponding to the interaction request voice, where the interaction response voice is obtained according to expression information of the user when inputting the interaction request voice and included in the video data; and outputting the interaction response voice to the user. The method imbues the interaction response voice with an emotional tone that matches the current emotion of the user, so that the human-machine interaction process is no longer monotonous, greatly enhancing the user experience.

Type: Grant

Filed: February 18, 2019

Date of Patent: March 22, 2022

Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Shuangshuang Qiao, Kun Liu, Yang Liang, Xiangyue Lin, Chao Han, Mingfa Zhu, Jiangliang Guo, Xu Li, Jun Liu, Shuo Li, Shiming Yin
Communication of transcriptions

Patent number: 11276392

Abstract: A method may include obtaining audio originating at a remote device during a communication session conducted between a first device and the remote device and obtaining a transcription of the audio. The method may also include processing the audio to generate processed audio. In some embodiments, the audio may be processed by a neural network that is trained with respect to an analog voice network and the processed audio may be formatted with respect to communication over the analog voice network. The method may further include processing the transcription to generate a processed transcription that is formatted with respect to communication over the analog voice network and multiplexing the processed audio with the processed transcription to obtain combined data. The method may also include communicating, to the first device during the communication session, the combined data over a same communication channel of the analog voice network.

Type: Grant

Filed: December 12, 2019

Date of Patent: March 15, 2022

Inventor: David Thomson
Method and electronic device for translating speech signal

Patent number: 11264008

Abstract: A method and an electronic device for translating a speech signal between a first language and a second language with minimized translation delay by translating fewer than all words of the speech signal according to a level of understanding of the second language by a user that receives the translation.

Type: Grant

Filed: October 18, 2018

Date of Patent: March 1, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ji-sang Yu, Sang-ha Kim, Jong-youb Ryu, Yoon-jung Choi, Eun-kyoung Kim, Jae-won Lee
System and method for a dialogue response generation system

Patent number: 11264009

Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.

Type: Grant

Filed: September 13, 2019

Date of Patent: March 1, 2022

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
Method and apparatus for facilitating persona-based agent interactions with online visitors

Patent number: 11257496

Abstract: A method and apparatus for facilitating persona-based agent interactions with online visitors is disclosed. A plurality of persona related attributes is extracted from a textual transcript of each interaction between an agent of an enterprise and an online visitor. A feature vector data representation is generated based on the plurality of persona related attributes extracted from each interaction to configure a plurality of feature vector data representations. The plurality of feature vector data representations is classified based on a plurality of persona-based clusters, which enables classification of the plurality of online visitors into the plurality of persona-based clusters. A learning model is trained for each persona-based cluster using utterances of online visitors classified into a respective persona-based cluster. The learning model is trained to mimic a visitor persona representative of the respective persona-based cluster.

Type: Grant

Filed: September 26, 2019

Date of Patent: February 22, 2022

Assignee: [24]7.ai, Inc.

Inventor: Abir Chakraborty
Cross-modal sequence distillation

Patent number: 11250838

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video speech recognition model having a plurality of model parameters on a set of unlabeled video-audio data and using a trained speech recognition model. During the training, the values of the parameters of the trained audio speech recognition model fixed are generally fixed and only the values of the video speech recognition model are adjusted. Once being trained, the video speech recognition model can be used to recognize speech from video when corresponding audio is not available.

Type: Grant

Filed: November 18, 2019

Date of Patent: February 15, 2022

Assignee: DeepMind Technologies Limited

Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
Computing system for expressive three-dimensional facial animation

Patent number: 11238885

Abstract: A computer-implemented technique for animating a visual representation of a face based on spoken words of a speaker is described herein. A computing device receives an audio sequence comprising content features reflective of spoken words uttered by a speaker. The computing device generates latent content variables and latent style variables based upon the audio sequence. The latent content variables are used to synchronized movement of lips on the visual representation to the spoken words uttered by the speaker. The latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words and are used to synchronize movement of full facial features of the visual representation to the spoken words uttered by the speaker. The computing device causes the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.

Type: Grant

Filed: October 29, 2018

Date of Patent: February 1, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Gaurav Mittal, Baoyuan Wang
Method and system for protecting user privacy during audio content processing

Patent number: 11233775

Abstract: A method and system for protecting user privacy in audio content is disclosed. An audio content including private information related to at least one user is received. The audio content is segmented to generate a plurality of audio blocks. Each audio block is associated with a sequence number based on a respective chronological position in the audio content. A random key of predefined length is generated for each audio block. The plurality of audio blocks are randomly distributed to a plurality of agents for audio-to-text transcription. The random distribution is configured to scramble a data context for protecting the user privacy of the at least one user during the audio-to-text transcription. A textual transcript corresponding to the audio content is generated based on the audio-to-text transcription, the sequence number and the random key generated for each audio block.

Type: Grant

Filed: June 11, 2021

Date of Patent: January 25, 2022

Assignee: ZOI MEET B.V.

Inventors: Neng Ming Yap, Kaarmuhilan Kalaiyarasan, Kevin Oranje
Speaker adaptation for attention-based encoder-decoder

Patent number: 11232782

Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model.

Type: Grant

Filed: November 6, 2019

Date of Patent: January 25, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
Interface for patient-provider conversation and auto-generation of note or summary

Patent number: 11227688

Abstract: A method for automatically generating a note summarizing a conversation between a patient and a healthcare provider is disclosed. A workstation is provided with a tool for rendering an audio recording of the conversation and a display for displaying a transcript of the audio recording obtained from a speech-to-text engine. The display of the workstation includes first transcript region for display of the transcript and a second note region for simultaneous displaying of elements of a note summarizing the conversation. Words or phrases in the transcript related to medical topics relating to the patient are extracted with the aid of a trained machine learning model. The extracted words or phrases are highlighted in the transcript and displayed in the note region.

Type: Grant

Filed: May 24, 2018

Date of Patent: January 18, 2022

Assignee: Google LLC

Inventors: Melissa Strader, William Ito, Christopher Co, Katherine Chou, Alvin Rajkomar, Rebecca Rolfe
Voice assistants with graphical image responses

Patent number: 11222060

Abstract: In an example, an apparatus having a voice assistant application that generates a graphical image response is provided. The apparatus includes a microphone and a processor in communication with the microphone. The microphone receives a secure voice assistant mode activation command and a voice command. The processor is to execute a voice assistant application, wherein the voice assistant application is to generate a graphical image response in response to the secure voice assistant mode activation command and the voice command, to change a privacy setting in the apparatus in response to the secure voice assistant mode activation command, to transmit the voice command from the microphone to the voice assistant application, and to transmit the graphical image response to a display.

Type: Grant

Filed: June 16, 2017

Date of Patent: January 11, 2022

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Alexander Wayne Clark, Kent E. Biggs, Henry Wang
Communication robot and method for operating the same

Patent number: 11217246

Abstract: Disclosed are a communication robot and a method for operating the same capable of smoothly processing speech recognition by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm in 5G environment connected for Internet of things. A method for operating a communication robot according to an embodiment of the present disclosure may include collecting speech uttered by two or more utterers approaching within a predetermined distance from the communication robot, collecting photographed images of the two or more utterers, determining whether a case where utterers of a wake-up word and a continuous word included in the uttered speech are the same is a first case, or whether a case where the utterers of the wake-up and the continuous word included in the uttered speech are different is a second case, and determining a voice reception enhancement direction according to the first case or the second case.

Type: Grant

Filed: September 26, 2019

Date of Patent: January 4, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Jae Pil Seo, Hyeon Sik Choi
Multi-type acoustic feature integration method and system based on deep neural networks

Patent number: 11217225

Abstract: The application discloses a multi-type acoustic feature integration method and system based on deep neural networks. The method and system include using labeled speech data set to train and build a multi-type acoustic feature integration model based on deep neural networks, to determine or update the network parameters of the multi-type acoustic feature integration model; the method and system includes inputting the multiple types of acoustic features extracted from the testing speech into the trained multi-type acoustic feature integration model, and extracting the deep integrated feature vectors in frame level or segment level. The solution supports the integrated feature extraction for multiple types of acoustic features in different kinds of speech tasks, such as speech recognition, speech wake-up, spoken language recognition, speaker recognition, and anti-spoofing etc.

Type: Grant

Filed: January 21, 2021

Date of Patent: January 4, 2022

Assignee: XIAMEN UNIVERSITY

Inventors: Lin Li, Zheng Li, Qingyang Hong
Customizable keyword spotting system with keyword adaptation

Patent number: 11217245

Abstract: A wake-up word for a digital assistant may be specified by a user to trigger the digital assistant to respond to the wake-up word, with the user providing one or more initial pronunciations of the wake-up word. The wake-up word may be unique, or at least not determined beforehand by a device manufacturer or developer of the digital assistant. The initial pronunciation(s) of the keyword may then be augmented with other potential pronunciations of the wake-up word that might be provided in the future, and those other potential pronunciations may then be pruned down to a threshold number of other potential pronunciations. One or more recordings of the initial pronunciation(s) of the wake-up may then be used to train a phoneme recognizer model to better recognize future instances of the wake-up word being spoken by the user or another person using the initial pronunciation or other potential pronunciations.

Type: Grant

Filed: August 29, 2019

Date of Patent: January 4, 2022

Assignee: Sony Interactive Entertainment Inc.

Inventors: Lakshmish Kaushik, Zhenhao Ge
Method and apparatus for processing audio data

Patent number: 11211051

Abstract: A method and an apparatus for processing audio data are provided. The method includes: acquiring a first piece of audio data; and processing the first piece of audio data based on an antialias filter, to generate a second piece of audio data, a sampling rate of the second piece of audio data being smaller than a sampling rate of the first piece of audio data; the antialias filter being generated by: inputting training voice data in a training sample into an initial antialias filter; inputting an output of an initial antialias filter into a training speech recognition model, and generating a training speech recognition result; and adjusting the initial antialias filter based on the training speech recognition result and a target speech recognition result of the training voice data in the training sample.

Type: Grant

Filed: February 28, 2020

Date of Patent: December 28, 2021

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventor: Chao Tian
Program dialog by example

Patent number: 11211049

Abstract: One embodiment provides a method including receiving authoring conversational training data. A machine learning based conversational agent is trained with the conversational training data. The training includes: creating and storing example transcripts of user utterances, creating and storing example transcripts of agent utterances, sequencing utterance transcripts using the example transcripts of user utterances and the example transcripts of agent utterances, forming a corpus from the sequenced utterance transcripts, marking speech patterns that represent social actions from tagging the sequenced utterance transcripts, and forming a patterned corpus from the marked speech patterns.

Type: Grant

Filed: July 3, 2019

Date of Patent: December 28, 2021

Assignee: International Business Machines Corporation

Inventors: Robert J. Moore, Pawan Chowdhary, Divyesh Jadav, Lei Huang, Sunhwan Lee, Eric Young Liu, Saurabh Mishra
Artificial intelligence device for learning deidentified speech signal and method therefor

Patent number: 11211047

Abstract: An artificial intelligence device for learning a de-identified speech signal includes a memory configured to store a speech recognition model, a microphone configured to acquire an original speech signal, and a processor configured to perform de-identification with respect to the acquired original speech signal and perform speech recognition with respect to the de-identified speech signal through the speech recognition model.

Type: Grant

Filed: August 27, 2019

Date of Patent: December 28, 2021

Assignee: LG ELECTRONICS INC.

Inventors: Wonho Shin, Jichan Maeng
Language agnostic automated voice activity detection

Patent number: 11205445

Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating a number of audio segments using the audio file, the plurality of audio segments including a first segment and a second segment, where the first segment and the second segment are consecutive segments. Example methods may include determining, using a Gated Recurrent Unit neural network, that the first segment includes first voice activity, determining, using the Gated Recurrent Unit neural network, that the second segment includes second voice activity, and determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment.

Type: Grant

Filed: June 10, 2019

Date of Patent: December 21, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Mayank Sharma, Sandeep Joshi, Muhammad Raffay Hamid
Adapting to differences in device state reporting of third party servers

Patent number: 11196582

Abstract: Implementations herein relate to information describing one or more internal states of a technical system. Implementations herein are provided for characterizing reliability of various different third party servers, at least when reporting third party device statuses, as well as adapting protocols for device ecosystems affected by such reliability. Latency can affect accuracy of device states represented by assistant devices. Certain servers can be characterized as especially delayed when reporting an updated device state in response to a user request, and, as a result, the third party server can be correlated to a metric that characterizes the relative latency of the third party server. When the metric fails to satisfy a particular threshold, a server and/or client associated with the “ecosystem” of third party devices can affirmatively operate to retrieve device state updates, rather than passively await updates from a corresponding third party server.

Type: Grant

Filed: February 8, 2019

Date of Patent: December 7, 2021

Assignee: GOOGLE LLC

Inventor: Yuzhao Ni
Optimization method, apparatus, device for wake-up model, and storage medium

Patent number: 11189287

Abstract: Provided are an optimization method, apparatus, device for a wake-up model and a storage medium, which allow for: acquiring a training set and a verification set; performing an iterative training on the wake-up model according to the training set and the verification set; during the iterative training, periodically updating the training set and the verification set according to the wake-up model and a preset corpus database, and continuing performing the iterative training on the wake-up model according to the updated training set and verification set; and outputting the wake-up model when a preset termination condition is reached. The embodiments of the present disclosure, by periodically updating the training set and the verification set according to the wake-up model and the preset corpus database during an iteration, may improve optimization efficiency and effects of the wake-up model, thereby improving stability and adaptability of the wake-up model and avoiding overfitting.

Type: Grant

Filed: December 4, 2019

Date of Patent: November 30, 2021

Assignees: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD., SHANGHAI XIAODU TECHNOLOGY CO. LTD.

Inventor: Yongchao Zhang

prev … 3 4 5 6 7 8 9 10 11 … next