Patents Examined by Michael Colucci
  • Patent number: 11282516
    Abstract: Embodiments of the present disclosure provide a human-machine interaction processing method, an apparatus thereof, a user terminal, a processing server and a system. On the user terminal side, the method includes: receiving an interaction request voice inputted from a user, and collecting video data of the user when inputting the interaction request voice; obtaining an interaction response voice corresponding to the interaction request voice, where the interaction response voice is obtained according to expression information of the user when inputting the interaction request voice and included in the video data; and outputting the interaction response voice to the user. The method imbues the interaction response voice with an emotional tone that matches the current emotion of the user, so that the human-machine interaction process is no longer monotonous, greatly enhancing the user experience.
    Type: Grant
    Filed: February 18, 2019
    Date of Patent: March 22, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Shuangshuang Qiao, Kun Liu, Yang Liang, Xiangyue Lin, Chao Han, Mingfa Zhu, Jiangliang Guo, Xu Li, Jun Liu, Shuo Li, Shiming Yin
  • Patent number: 11276392
    Abstract: A method may include obtaining audio originating at a remote device during a communication session conducted between a first device and the remote device and obtaining a transcription of the audio. The method may also include processing the audio to generate processed audio. In some embodiments, the audio may be processed by a neural network that is trained with respect to an analog voice network and the processed audio may be formatted with respect to communication over the analog voice network. The method may further include processing the transcription to generate a processed transcription that is formatted with respect to communication over the analog voice network and multiplexing the processed audio with the processed transcription to obtain combined data. The method may also include communicating, to the first device during the communication session, the combined data over a same communication channel of the analog voice network.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: March 15, 2022
    Inventor: David Thomson
  • Patent number: 11264008
    Abstract: A method and an electronic device for translating a speech signal between a first language and a second language with minimized translation delay by translating fewer than all words of the speech signal according to a level of understanding of the second language by a user that receives the translation.
    Type: Grant
    Filed: October 18, 2018
    Date of Patent: March 1, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ji-sang Yu, Sang-ha Kim, Jong-youb Ryu, Yoon-jung Choi, Eun-kyoung Kim, Jae-won Lee
  • Patent number: 11264009
    Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: March 1, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
  • Patent number: 11257496
    Abstract: A method and apparatus for facilitating persona-based agent interactions with online visitors is disclosed. A plurality of persona related attributes is extracted from a textual transcript of each interaction between an agent of an enterprise and an online visitor. A feature vector data representation is generated based on the plurality of persona related attributes extracted from each interaction to configure a plurality of feature vector data representations. The plurality of feature vector data representations is classified based on a plurality of persona-based clusters, which enables classification of the plurality of online visitors into the plurality of persona-based clusters. A learning model is trained for each persona-based cluster using utterances of online visitors classified into a respective persona-based cluster. The learning model is trained to mimic a visitor persona representative of the respective persona-based cluster.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: February 22, 2022
    Assignee: [24]7.ai, Inc.
    Inventor: Abir Chakraborty
  • Patent number: 11250838
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a video speech recognition model having a plurality of model parameters on a set of unlabeled video-audio data and using a trained speech recognition model. During the training, the values of the parameters of the trained audio speech recognition model fixed are generally fixed and only the values of the video speech recognition model are adjusted. Once being trained, the video speech recognition model can be used to recognize speech from video when corresponding audio is not available.
    Type: Grant
    Filed: November 18, 2019
    Date of Patent: February 15, 2022
    Assignee: DeepMind Technologies Limited
    Inventors: Brendan Shillingford, Ioannis Alexandros Assael, Joao Ferdinando Gomes de Freitas
  • Patent number: 11238885
    Abstract: A computer-implemented technique for animating a visual representation of a face based on spoken words of a speaker is described herein. A computing device receives an audio sequence comprising content features reflective of spoken words uttered by a speaker. The computing device generates latent content variables and latent style variables based upon the audio sequence. The latent content variables are used to synchronized movement of lips on the visual representation to the spoken words uttered by the speaker. The latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words and are used to synchronize movement of full facial features of the visual representation to the spoken words uttered by the speaker. The computing device causes the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.
    Type: Grant
    Filed: October 29, 2018
    Date of Patent: February 1, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Gaurav Mittal, Baoyuan Wang
  • Patent number: 11233775
    Abstract: A method and system for protecting user privacy in audio content is disclosed. An audio content including private information related to at least one user is received. The audio content is segmented to generate a plurality of audio blocks. Each audio block is associated with a sequence number based on a respective chronological position in the audio content. A random key of predefined length is generated for each audio block. The plurality of audio blocks are randomly distributed to a plurality of agents for audio-to-text transcription. The random distribution is configured to scramble a data context for protecting the user privacy of the at least one user during the audio-to-text transcription. A textual transcript corresponding to the audio content is generated based on the audio-to-text transcription, the sequence number and the random key generated for each audio block.
    Type: Grant
    Filed: June 11, 2021
    Date of Patent: January 25, 2022
    Assignee: ZOI MEET B.V.
    Inventors: Neng Ming Yap, Kaarmuhilan Kalaiyarasan, Kevin Oranje
  • Patent number: 11232782
    Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model.
    Type: Grant
    Filed: November 6, 2019
    Date of Patent: January 25, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
  • Patent number: 11227688
    Abstract: A method for automatically generating a note summarizing a conversation between a patient and a healthcare provider is disclosed. A workstation is provided with a tool for rendering an audio recording of the conversation and a display for displaying a transcript of the audio recording obtained from a speech-to-text engine. The display of the workstation includes first transcript region for display of the transcript and a second note region for simultaneous displaying of elements of a note summarizing the conversation. Words or phrases in the transcript related to medical topics relating to the patient are extracted with the aid of a trained machine learning model. The extracted words or phrases are highlighted in the transcript and displayed in the note region.
    Type: Grant
    Filed: May 24, 2018
    Date of Patent: January 18, 2022
    Assignee: Google LLC
    Inventors: Melissa Strader, William Ito, Christopher Co, Katherine Chou, Alvin Rajkomar, Rebecca Rolfe
  • Patent number: 11222060
    Abstract: In an example, an apparatus having a voice assistant application that generates a graphical image response is provided. The apparatus includes a microphone and a processor in communication with the microphone. The microphone receives a secure voice assistant mode activation command and a voice command. The processor is to execute a voice assistant application, wherein the voice assistant application is to generate a graphical image response in response to the secure voice assistant mode activation command and the voice command, to change a privacy setting in the apparatus in response to the secure voice assistant mode activation command, to transmit the voice command from the microphone to the voice assistant application, and to transmit the graphical image response to a display.
    Type: Grant
    Filed: June 16, 2017
    Date of Patent: January 11, 2022
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Alexander Wayne Clark, Kent E. Biggs, Henry Wang
  • Patent number: 11217246
    Abstract: Disclosed are a communication robot and a method for operating the same capable of smoothly processing speech recognition by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm in 5G environment connected for Internet of things. A method for operating a communication robot according to an embodiment of the present disclosure may include collecting speech uttered by two or more utterers approaching within a predetermined distance from the communication robot, collecting photographed images of the two or more utterers, determining whether a case where utterers of a wake-up word and a continuous word included in the uttered speech are the same is a first case, or whether a case where the utterers of the wake-up and the continuous word included in the uttered speech are different is a second case, and determining a voice reception enhancement direction according to the first case or the second case.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: January 4, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Jae Pil Seo, Hyeon Sik Choi
  • Patent number: 11217225
    Abstract: The application discloses a multi-type acoustic feature integration method and system based on deep neural networks. The method and system include using labeled speech data set to train and build a multi-type acoustic feature integration model based on deep neural networks, to determine or update the network parameters of the multi-type acoustic feature integration model; the method and system includes inputting the multiple types of acoustic features extracted from the testing speech into the trained multi-type acoustic feature integration model, and extracting the deep integrated feature vectors in frame level or segment level. The solution supports the integrated feature extraction for multiple types of acoustic features in different kinds of speech tasks, such as speech recognition, speech wake-up, spoken language recognition, speaker recognition, and anti-spoofing etc.
    Type: Grant
    Filed: January 21, 2021
    Date of Patent: January 4, 2022
    Assignee: XIAMEN UNIVERSITY
    Inventors: Lin Li, Zheng Li, Qingyang Hong
  • Patent number: 11217245
    Abstract: A wake-up word for a digital assistant may be specified by a user to trigger the digital assistant to respond to the wake-up word, with the user providing one or more initial pronunciations of the wake-up word. The wake-up word may be unique, or at least not determined beforehand by a device manufacturer or developer of the digital assistant. The initial pronunciation(s) of the keyword may then be augmented with other potential pronunciations of the wake-up word that might be provided in the future, and those other potential pronunciations may then be pruned down to a threshold number of other potential pronunciations. One or more recordings of the initial pronunciation(s) of the wake-up may then be used to train a phoneme recognizer model to better recognize future instances of the wake-up word being spoken by the user or another person using the initial pronunciation or other potential pronunciations.
    Type: Grant
    Filed: August 29, 2019
    Date of Patent: January 4, 2022
    Assignee: Sony Interactive Entertainment Inc.
    Inventors: Lakshmish Kaushik, Zhenhao Ge
  • Patent number: 11211051
    Abstract: A method and an apparatus for processing audio data are provided. The method includes: acquiring a first piece of audio data; and processing the first piece of audio data based on an antialias filter, to generate a second piece of audio data, a sampling rate of the second piece of audio data being smaller than a sampling rate of the first piece of audio data; the antialias filter being generated by: inputting training voice data in a training sample into an initial antialias filter; inputting an output of an initial antialias filter into a training speech recognition model, and generating a training speech recognition result; and adjusting the initial antialias filter based on the training speech recognition result and a target speech recognition result of the training voice data in the training sample.
    Type: Grant
    Filed: February 28, 2020
    Date of Patent: December 28, 2021
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventor: Chao Tian
  • Patent number: 11211049
    Abstract: One embodiment provides a method including receiving authoring conversational training data. A machine learning based conversational agent is trained with the conversational training data. The training includes: creating and storing example transcripts of user utterances, creating and storing example transcripts of agent utterances, sequencing utterance transcripts using the example transcripts of user utterances and the example transcripts of agent utterances, forming a corpus from the sequenced utterance transcripts, marking speech patterns that represent social actions from tagging the sequenced utterance transcripts, and forming a patterned corpus from the marked speech patterns.
    Type: Grant
    Filed: July 3, 2019
    Date of Patent: December 28, 2021
    Assignee: International Business Machines Corporation
    Inventors: Robert J. Moore, Pawan Chowdhary, Divyesh Jadav, Lei Huang, Sunhwan Lee, Eric Young Liu, Saurabh Mishra
  • Patent number: 11211047
    Abstract: An artificial intelligence device for learning a de-identified speech signal includes a memory configured to store a speech recognition model, a microphone configured to acquire an original speech signal, and a processor configured to perform de-identification with respect to the acquired original speech signal and perform speech recognition with respect to the de-identified speech signal through the speech recognition model.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: December 28, 2021
    Assignee: LG ELECTRONICS INC.
    Inventors: Wonho Shin, Jichan Maeng
  • Patent number: 11205445
    Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating a number of audio segments using the audio file, the plurality of audio segments including a first segment and a second segment, where the first segment and the second segment are consecutive segments. Example methods may include determining, using a Gated Recurrent Unit neural network, that the first segment includes first voice activity, determining, using the Gated Recurrent Unit neural network, that the second segment includes second voice activity, and determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment.
    Type: Grant
    Filed: June 10, 2019
    Date of Patent: December 21, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Mayank Sharma, Sandeep Joshi, Muhammad Raffay Hamid
  • Patent number: 11196582
    Abstract: Implementations herein relate to information describing one or more internal states of a technical system. Implementations herein are provided for characterizing reliability of various different third party servers, at least when reporting third party device statuses, as well as adapting protocols for device ecosystems affected by such reliability. Latency can affect accuracy of device states represented by assistant devices. Certain servers can be characterized as especially delayed when reporting an updated device state in response to a user request, and, as a result, the third party server can be correlated to a metric that characterizes the relative latency of the third party server. When the metric fails to satisfy a particular threshold, a server and/or client associated with the “ecosystem” of third party devices can affirmatively operate to retrieve device state updates, rather than passively await updates from a corresponding third party server.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: December 7, 2021
    Assignee: GOOGLE LLC
    Inventor: Yuzhao Ni
  • Patent number: 11189287
    Abstract: Provided are an optimization method, apparatus, device for a wake-up model and a storage medium, which allow for: acquiring a training set and a verification set; performing an iterative training on the wake-up model according to the training set and the verification set; during the iterative training, periodically updating the training set and the verification set according to the wake-up model and a preset corpus database, and continuing performing the iterative training on the wake-up model according to the updated training set and verification set; and outputting the wake-up model when a preset termination condition is reached. The embodiments of the present disclosure, by periodically updating the training set and the verification set according to the wake-up model and the preset corpus database during an iteration, may improve optimization efficiency and effects of the wake-up model, thereby improving stability and adaptability of the wake-up model and avoiding overfitting.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: November 30, 2021
    Assignees: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD., SHANGHAI XIAODU TECHNOLOGY CO. LTD.
    Inventor: Yongchao Zhang