Recognition Patents (Class 704/231)

Neural network (Class 704/232)

Detect speech in noise (Class 704/233)

Normalizing (Class 704/234)

Speech to image (Class 704/235)

Specialized equations or comparisons (Class 704/236)

Creating patterns for matching (Class 704/243)

Voice recognition (Class 704/246)

Word recognition (Class 704/251)

System and method for assessing worker performance

Patent number: 11423348

Abstract: Logistical operations (e.g., warehouses) may use a voice-enabled workflow to facilitate the work tasks of a staff (i.e., population) of workers. Typically, it is necessary for a worker to travel from location-to-location to complete assigned work tasks. As such, a worker's time spent travelling often correlates with the worker's overall work performance. Understanding the worker's travel performance is highly desirable, but computing a fair and accurate travel-performance metric is difficult. One reason for this is that the distance a worker travels is often unknown. The present invention embraces a system and method for accurately and fairly assessing a worker's travel performance by analyzing the worker's voice dialog.

Type: Grant

Filed: January 11, 2016

Date of Patent: August 23, 2022

Assignee: Hand Held Products, Inc.

Inventors: Kwong Wing Au, Christopher L. Lofty, Steven Thomas, John Pecorari, James Geisler
Systems and methods for live broadcasting of context-aware transcription and/or other elements related to conversations and/or speeches

Patent number: 11423911

Abstract: Computer-implemented method and system for processing and broadcasting one or more moment-associating elements. For example, the computer-implemented method includes granting subscription permission to one or more subscribers; receiving the one or more moment-associating elements; transforming the one or more moment-associating elements into one or more pieces of moment-associating information; and transmitting at least one piece of the one or more pieces of moment-associating information to the one or more subscribers.

Type: Grant

Filed: October 10, 2019

Date of Patent: August 23, 2022

Assignee: Otter.ai, Inc.

Inventors: Yun Fu, Tao Xing, Kaisuke Nakajima, Brian Francis Williams, James Mason Altreuter, Xiaoke Huang, Simon Lau, Sam Song Liang, Kean Kheong Chin, Wen Sun, Julius Cheng, Hitesh Anand Gupta
Selective adaptation and utilization of noise reduction technique in invocation phrase detection

Patent number: 11417324

Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.

Type: Grant

Filed: May 28, 2020

Date of Patent: August 16, 2022

Assignee: GOOGLE LLC

Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
Method and device for voice activity detection

Patent number: 11417354

Abstract: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus for voice activity detection (VAD). The VAD comprises creating a signal indicative of a primary VAD decision and determining hangover addition. The determination on hangover addition is made in dependence of a short term activity measure and/or a long term activity measure. A signal indicative of a final VAD decision is then created.

Type: Grant

Filed: February 18, 2020

Date of Patent: August 16, 2022

Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Inventor: Martin Sehlstedt
Methods and apparatus to monitor a media presentation

Patent number: 11418233

Abstract: Methods and apparatus to monitor a media presentation are disclosed. An example system includes a monitoring device to monitor a media presentation and generate research data identifying the media. A bridge device includes a housing dimensioned to receive the monitoring device, a receiver carried by the housing to receive a first audio signal via a wireless data connection from an audio source device using a wireless communication protocol, the first audio signal associated with the media. The bridge device includes an audio emitter to emit the audio signal for receipt by the monitoring device, and a transmitter to transmit the audio signal to an audio receiver device using the wireless communication protocol.

Type: Grant

Filed: June 15, 2020

Date of Patent: August 16, 2022

Assignee: THE NIELSEN COMPANY (US), LLC

Inventors: William K. Krug, James Zhang
Voice synthesis method, voice synthesis device, and storage medium

Patent number: 11410637

Abstract: A voice synthesis method according to an embodiment includes altering a series of synthesis spectra in a partial period of a synthesis voice based on a series of amplitude spectrum envelope contours of a voice expression to obtain a series of altered spectra to which the voice expression has been imparted, and synthesizing a series of voice samples to which the voice expression has been imparted, based on the series of altered spectra.

Type: Grant

Filed: April 26, 2019

Date of Patent: August 9, 2022

Assignee: YAMAHA CORPORATION

Inventors: Jordi Bonada, Merlijn Blaauw, Keijiro Saino, Ryunosuke Daido, Michael Wilson, Yuji Hisaminato
Speech recognition accuracy with natural-language understanding based meta-speech systems for assistant systems

Patent number: 11403466

Abstract: In one embodiment, a method includes receiving, from a client system associated with a first user, a first audio input. The method includes generating multiple transcriptions corresponding to the first audio input based on multiple automatic speech recognition (ASR) engines. Each ASR engine is associated with a respective domain out of multiple domains. The method includes determining, for each transcription, a combination of one or more intents and one or more slots to be associated with the transcription. The method includes selecting, by a meta-speech engine, one or more combinations of intents and slots from the multiple combinations to be associated with the first user input. The method includes generating a response to the first audio input based on the selected combinations and sending, to the client system, instructions for presenting the response to the first audio input.

Type: Grant

Filed: January 13, 2020

Date of Patent: August 2, 2022

Assignee: Facebook Technologies, LLC.

Inventors: Fuchun Peng, Jihang Li, Jinsong Yu
Method for operating voice recognition service and electronic device supporting same

Patent number: 11404048

Abstract: An embodiment of the inventive concept includes a communication device, a voice input receiving unit, a memory, and a processor. In addition, various embodiments recognized through the specification are possible.

Type: Grant

Filed: January 29, 2019

Date of Patent: August 2, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kyeonga Han, Soungmin Yoo
Communication system and method for providing advice to improve a speaking style

Patent number: 11398224

Abstract: A communication system includes a first terminal device, a second terminal device, and an advice providing device. The first terminal device is operated by an operator. The second terminal device is operated by a guest. The second terminal device communicates with the first terminal device through a network. The advice providing device includes circuitry that determines advice for the operator based on voice data including first voice data that is related to the operator and transmitted from the first terminal device and second voice data that is related to the guest and transmitted from the second terminal device. The circuitry of the advice providing device further transmits the advice to the first terminal device. The first terminal device receives the advice and displays, on a display, the advice.

Type: Grant

Filed: September 17, 2020

Date of Patent: July 26, 2022

Assignee: RICOH COMPANY, LTD.

Inventors: Mayu Hakata, Takashi Hasegawa
Image forming apparatus, control method for image forming apparatus, and control program for image forming apparatus

Patent number: 11388294

Abstract: An image forming apparatus may include: a storage that stores information in which a job type is associated with speech patterns for processings related to the job type; and a hardware processor that may: be inputted with a speech; acquire a job type; use speech patterns associated with a job type, which is acquired by the hardware processor and is being executed, to analyze a speech inputted during execution of the job; and execute the processings based on an analysis result by the hardware processor.

Type: Grant

Filed: June 16, 2020

Date of Patent: July 12, 2022

Assignee: Konica Minolta, Inc.

Inventor: Takahisa Matsunaga
Speech interaction method and apparatus

Patent number: 11380316

Abstract: The present invention discloses a speech interaction method and apparatus, and pertains to the field of speech processing technologies. The method includes: acquiring speech data of a user; performing user attribute recognition on the speech data to obtain a first user attribute recognition result; performing content recognition on the speech data to obtain a content recognition result of the speech data; and performing a corresponding operation according to at least the first user attribute recognition result and the content recognition result, so as to respond to the speech data. According to the present invention, after speech data is acquired, user attribute recognition and content recognition are separately performed on the speech data to obtain a first user attribute recognition result and a content recognition result, and a corresponding operation is performed according to at least the first user attribute recognition result and the content recognition result.

Type: Grant

Filed: October 10, 2019

Date of Patent: July 5, 2022

Assignee: Huawei Technologies Co., Ltd.

Inventors: Hongbo Jin, Zhuolin Jiang
Learning apparatus, speech recognition rank estimating apparatus, methods thereof, and program

Patent number: 11380301

Abstract: A learning apparatus comprises a learning part that learns an error correction model by a set of a speech recognition result candidate and a correct text of speech recognition for given audio data, wherein the speech recognition result candidate includes a speech recognition result candidate which is different from the correct text, and the error correction model is a model that receives a word sequence of the speech recognition result candidate as input and outputs an error correction score indicating likelihood of the word sequence of the speech recognition result candidate in consideration of a speech recognition error.

Type: Grant

Filed: February 18, 2019

Date of Patent: July 5, 2022

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Tomohiro Tanaka, Ryo Masumura
Memory protection unit

Patent number: 11379135

Abstract: One implementation is for a device. The device includes a controller having a memory protection unit, a legally relevant memory portion capable of interacting with the controller, a legally non-relevant memory portion capable of interacting with the controller, an interface in the memory protection unit that allows a privileged application to access the legally relevant memory portion and disallows an unprivileged application to access to the legally relevant portion, and an interrupt system, wherein when the unprivileged application makes an attempt to interact with the legally relevant memory portion, the memory protection unit takes an action associated with the unprivileged application.

Type: Grant

Filed: August 4, 2020

Date of Patent: July 5, 2022

Assignee: Honeywell International Inc.

Inventors: Ralf Peter Thor, François Germain Vincent, Roland Domke, Ionut Nicorescu
Determining context and intent in omnichannel communications using machine learning based artificial intelligence (AI) techniques

Patent number: 11373045

Abstract: A system for determining context and intent in a conversation using machine learning (ML) based artificial intelligence (AI) in omnichannel data communications is disclosed. The system may comprise a data store to store and manage data within a network, a server to facilitate operations using information from the one or more data stores, and a ML-based AI subsystem to communicate with the server and the data store in the network. The ML-based AI subsystem may comprise a data access interface to receive data associated with a conversation with a user via a communication channel. The ML-based AI subsystem may comprise a processor to provide a proactive, adaptive, and intelligent conversation by applying hierarchical multi-intent data labeling framework, training at least one model with training data, and generating and deploying a production-ready model based on the trained and retained at least one model.

Type: Grant

Filed: September 24, 2019

Date of Patent: June 28, 2022

Assignee: CONTACTENGINE LIMITED

Inventors: Dominic Bealby-Wright, Cosmin Dragos Davidescu
Interactive reading assistant

Patent number: 11373544

Abstract: A method includes displaying a first set of text content characterized by a first difficulty level. The method includes obtaining speech data associated with the first set of text content. The method includes determining linguistic feature(s) within the speech data. The method includes in response to completion of the speech data, determining a reading proficiency value associated with the first set of text content and based on the linguistic feature(s). The method includes in accordance with determining the reading proficiency value satisfies change criteria, changing a difficulty level for a second set of text content. After changing the difficulty level, the second set of text content corresponds to a second difficulty level different from the first difficulty level. The method includes in accordance with determining the reading proficiency value does not satisfy the change criteria, maintaining the second set of text content at the first difficulty level.

Type: Grant

Filed: February 24, 2020

Date of Patent: June 28, 2022

Assignee: APPLE INC.

Inventors: Barry-John Theobald, Russell Y. Webb, Nicholas Elia Apostoloff
Textual annotation of acoustic effects

Patent number: 11375293

Abstract: Accommodation for color or visual impairments may be implemented by selective color substitution. A color accommodation module receives an image frame from a host system and generates a color-adapted version of the image frame. The color accommodation module may include a rule based filter that substitutes one or more colors within the image frame with one or more corresponding alternative colors.

Type: Grant

Filed: October 31, 2018

Date of Patent: June 28, 2022

Assignee: Sony Interactive Entertainment Inc.

Inventors: Naveen Kumar, Justice Adams, Arindam Jati, Masanori Omote
System and method for digital content development using a natural language interface

Patent number: 11367447

Abstract: Aspects of the subject disclosure may include, for example, obtaining a natural language instruction, interpreting the instruction to obtain a machine interpretation, and analyzing the machine interpretation to obtain an intent of the natural language instruction. An actionable command adapted to cause a digital manipulation tool to digitally manipulate a content item is determined according to the intent of the natural language instruction. The actionable command is provided to the digital manipulation tool to obtain the manipulated content item. Other embodiments are disclosed.

Type: Grant

Filed: June 9, 2020

Date of Patent: June 21, 2022

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jackson Jarrell Pair
Providing a platform for configuring device-specific speech recognition and using a platform for configuring device-specific speech recognition

Patent number: 11367448

Abstract: A method of providing a platform for configuring device-specific speech recognition is provided. The method includes providing a user interface for developers to select a set of at least two acoustic models appropriate for a specific type of a device, receiving, from a developer, a selection of the set of the at least two acoustic models, and configuring a speech recognition system to perform device-specific speech recognition by using one acoustic model selected from the at least two acoustic models of the set.

Type: Grant

Filed: April 21, 2021

Date of Patent: June 21, 2022

Assignee: SOUNDHOUND, INC.

Inventors: Keyvan Mohajer, Mehul Patel
Resource recommendations in online chat conversations based on sequences of text

Patent number: 11368423

Abstract: A dense passage retrieval machine learning model having a first encoder for resources and a second encoder for messages can automatically match relevant resources to computers or sessions based on analysis of a series of messages of an online chat conversation. Continuous re-training is supported based on feedback from a moderator computer and/or user computers.

Type: Grant

Filed: December 29, 2021

Date of Patent: June 21, 2022

Assignee: SUPPORTIV INC.

Inventors: Helena Plater-Zyberk, Pouria Mojabi
Speech recognition accuracy with natural-language understanding based meta-speech systems for assistant systems

Patent number: 11366971

Abstract: In one embodiment, a method includes receiving, from a client system associated with a first user, a first audio input. The method includes generating multiple transcriptions corresponding to the first audio input based on multiple automatic speech recognition (ASR) engines. Each ASR engine is associated with a respective domain out of multiple domains. The method includes determining, for each transcription, a combination of one or more intents and one or more slots to be associated with the transcription. The method includes selecting, by a meta-speech engine, one or more combinations of intents and slots from the multiple combinations to be associated with the first user input. The method includes generating a response to the first audio input based on the selected combinations and sending, to the client system, instructions for presenting the response to the first audio input.

Type: Grant

Filed: January 13, 2020

Date of Patent: June 21, 2022

Assignee: Facebook Technologies, LLC.

Inventors: Fuchun Peng, Jihang Li, Jinsong Yu
Multi-microphone speech dialog system for multiple spatial zones

Patent number: 11367437

Abstract: There is provided a speech dialog system that includes a first microphone, a second microphone, a processor and a memory. The first microphone captures first audio from a first spatial zone, and produces a first audio signal. The second microphone captures second audio from a second spatial zone, and produces a second audio signal. The processor receives the first audio signal and the second audio signal, and the memory contains instructions that control the processor to perform operations of a speech enhancement module, an automatic speech recognition module, and a speech dialog module that performs a zone-dedicated speech dialog.

Type: Grant

Filed: May 30, 2019

Date of Patent: June 21, 2022

Assignee: NUANCE COMMUNICATIONS, INC.

Inventors: Timo Matheja, Markus Buck, Andreas Kirbach, Martin Roessler, Tim Haulick, Julien Premont, Josef Anastasiadis, Rudi Vuerinckx, Christophe Ris, Stijn Verschaeren, Hakan Ari, Dieter Ranz
Method and apparatus for providing speech service

Patent number: 11360737

Abstract: The present disclosure discloses a method and apparatus for providing a speech service. A specific embodiment of the method comprises: receiving request information sent by a device, the request information comprising first event information and speech information, the first event information used for indicating a first event occurring on the device when the device sends the request information, wherein the first event information comprises speech input event information used for instructing a user to input the speech information; generating response information comprising an operation instruction for a targeted device on the basis of the first event information and the speech information; and sending the response information to the targeted device for the targeted device to perform an operation indicated by the operation instruction. The embodiment improves the efficiency of providing a speech service.

Type: Grant

Filed: July 5, 2018

Date of Patent: June 14, 2022

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD

Inventors: Jianliang Zhou, Guanghao Shen, Ruisheng Wu
Multi-device control system and method and non-transitory computer-readable medium storing component for executing the same

Patent number: 11361765

Abstract: Disclosed is a multi-device control method including: performing a voice recognition operation on a voice command generated from a sound source; identifying distances between each of the plurality of devices and the sound source; assigning response rankings to the devices by combining a context-specific correction score of each device corresponding to the voice command and the distances; and selecting a device to respond to the voice command from among the devices according to the response rankings.

Type: Grant

Filed: April 19, 2019

Date of Patent: June 14, 2022

Assignee: LG ELECTRONICS INC.

Inventor: Jisoo Park
Method and apparatus for man-machine conversation, and electronic device

Patent number: 11355109

Abstract: Embodiments of the present disclosure provide a method and apparatus for man-machine conversion, and an electronic device. The method includes: outputting question information to a user based on a first task of a first conversation scenario; judging, in response to receiving reply information returned by the user, whether to trigger a second conversation scenario based on the reply information; generating, in response to determining the second conversation scenario being triggered based on the reply information, response information corresponding to the reply information based on the second conversation scenario; and outputting the response information to the user.

Type: Grant

Filed: December 10, 2019

Date of Patent: June 7, 2022

Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.

Inventors: Xuefeng Lou, Qingwei Huang, Weiwei Wang, Cheng Peng, Xiaojun Zhao
Single-channel and multi-channel source separation enhanced by lip motion

Patent number: 11348253

Abstract: Methods and systems are provided for implementing source separation techniques, and more specifically performing source separation on mixed source single-channel and multi-channel audio signals enhanced by inputting lip motion information from captured image data, including selecting a target speaker facial image from a plurality of facial images captured over a period of interest; computing a motion vector based on facial features of the target speaker facial image; and separating, based on at least the motion vector, audio corresponding to a constituent source from a mixed source audio signal captured over the period of interest. The mixed source audio signal may be captured from single-channel or multi-channel audio capture devices. Separating audio from the audio signal may be performed by a fusion learning model comprising a plurality of learning sub-models. Separating the audio from the audio signal may be performed by a blind source separation (“BSS”) learning model.

Type: Grant

Filed: January 9, 2020

Date of Patent: May 31, 2022

Assignee: Alibaba Group Holding Limited

Inventor: Yun Li
Detecting and processing sections spanning processed document partitions

Patent number: 11347928

Abstract: Aspects of the invention include detecting and processing sections spanning processed document partitions by caching a document partition. The document partition includes metadata indicating that the document partition is a portion of a whole document. Aspects also include pairing a candidate paragraph from the document partition with a cached paragraph segment and determining, using a coherence model, a probability that the candidate paragraph and the cached paragraph segment constitute a semantically coherent paragraph. Aspects further include discarding the cached paragraph segment and processing the candidate paragraph and the cached paragraph segment separately based on a determination that the probability is less than a threshold level and processing the candidate paragraph and the cached paragraph segment together as a cross-partition paragraph based on a determination that the probability is greater than the threshold level.

Type: Grant

Filed: July 27, 2020

Date of Patent: May 31, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Andrew J Lavery, Igor S. Ramos, Paul Joseph Hake, Scott Carrier
Methods, computing devices, and storage media for generating training corpus

Patent number: 11348571

Abstract: The present disclosure provides methods, computing devices, and storage media for generating a training corpus. The method includes: mining out pieces of data from user behavior logs associated with a target application, each piece of data including a first behavior log and a second behavior log, the first behavior log including a user speech and a corresponding speech recognition result, the second behavior log belonging to the same user as the first behavior log and time-dependent with the first behavior log; and determining the user speech and the corresponding speech recognition result in each piece of data as a positive feedback sample or a negative feedback sample, based on the first behavior log and the second behavior log.

Type: Grant

Filed: March 5, 2020

Date of Patent: May 31, 2022

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Shiqiang Ding, Jizhou Huang, Zhongwei Jiang, Wentao Ma
Automated audio-to-text transcription in multi-device teleconferences

Patent number: 11328730

Abstract: A system and method are disclosed for generating a teleconference space for two or more communication devices using a computer coupled with a database and comprising a processor and memory. The computer generates a teleconference space and transmits requests to join the teleconference space to the two or more communication devices. The computer stores in memory identification information, and audiovisual data associated with one or more users, for each of the two or more communication devices. The computer stores audio transcription data, transmitted to the computer by each of the two or more communication devices and associated with one or more communication device users, in the computer memory. The computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript, and transmits the master audio transcript to each of the two or more communication devices.

Type: Grant

Filed: April 29, 2020

Date of Patent: May 10, 2022

Assignee: Nextiva, Inc.

Inventors: Tomas Gorny, Jean-Baptiste Martinoli, Tracy Conrad, Lukas Gorny
Electronic device and method for controlling the electronic device

Patent number: 11328719

Abstract: An electronic device and a method for controlling the electronic device is provided. The electronic device includes a microphone, a memory configured to include at least one instruction, and a processor configured to execute the at least one instruction. The processor is configured to control the electronic device to perform voice recognition for an inquiry based on receiving input of a user inquiry through the microphone, and acquire a text for the inquiry, generate a plurality of inquiries for acquiring response data for the inquiry from a plurality of databases using a relation graph indicating a relation between the acquired text and data stored in the plurality of databases, acquire response data corresponding to each of the plurality of inquiries from each of the plurality of databases, and generate a response for the inquiry based on the response data acquired from each of the plurality of databases and output the response.

Type: Grant

Filed: January 23, 2020

Date of Patent: May 10, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jaehun Lee, Yunsu Lee, Taeho Hwang, Seungsoo Kang, Jiyoung Kang, Sejin Kwak
Method and system for generating a conversational agent by automatic paraphrase generation based on machine translation

Patent number: 11328017

Abstract: The present teaching relates to generating a conversational agent. In one example, a plurality of input utterances may be received from a developer. A paraphrase model is obtained. The paraphrase model is generated based on machine translation. For each of the plurality of input utterances, one or more paraphrases of the input utterance are generated based on the paraphrase model. For each of the plurality of input utterances, at least one of the one or more paraphrases is selected based on an instruction from the developer to generate selected paraphrases. The conversational agent is generated based on the plurality of input utterances and the selected paraphrases.

Type: Grant

Filed: September 13, 2019

Date of Patent: May 10, 2022

Assignee: Verizon Patent and Licensing Inc.

Inventors: Ankur Gupta, Timothy Daly, Tularam Ban
Keyboard automatic language identification and reconfiguration

Patent number: 11327652

Abstract: A keyboard is described that determines, using a first decoder and based on a selection of keys of a graphical keyboard, text. Responsive to determining that a characteristic of the text satisfies a threshold, a model of the keyboard identifies the target language of the text, and determines whether the target language is different than a language associated with the first decoder. If the target language of the text is not different than the language associated with the first decoder, the keyboard outputs, for display, an indication of first candidate words determined by the first decoder from the text. If the target language of the text is different: the keyboard enables a second decoder, where a language associated with the second decoder matches the target language of the text, and outputs, for display, an indication of second candidate words determined by the second decoder from the text.

Type: Grant

Filed: August 10, 2020

Date of Patent: May 10, 2022

Assignee: Google LLC

Inventors: Ouais Alsharif, Peter Ciccotto, Francoise Beaufays, Dragan Zivkovic
Clarifying questions for rewriting ambiguous user utterance

Patent number: 11302330

Abstract: A method of disambiguating user queries in a multi-turn dialogue include a set of user utterances includes using a predefined language model to recognize an ambiguous entity in an unresolved user utterance from the multi-turn dialogue. The method further includes outputting a clarifying question about the ambiguous entity, and receiving a clarifying user utterance. The method further includes identifying a disambiguating entity in the clarifying user utterance. The method further includes rewriting the unresolved user utterance as a rewritten utterance that replaces the ambiguous entity with the disambiguating entity, and outputting the rewritten utterance to one or more query answering machines.

Type: Grant

Filed: June 3, 2019

Date of Patent: April 12, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jiayin Ge, Zicheng Huang, Guihong Cao
Customizing voice-control for developer devices

Patent number: 11295735

Abstract: This disclosure describes, in part, techniques implemented by a speech-processing system for providing an extensible skill-interface component to facilitate voice-control of third-party developer devices. The speech-processing system may provide the skill-interface component to third-party device developers using a web-based portal through which the skill interfaces may be created to voice-enable third-party devices having unique capabilities. For instance, a skill interface may define events, such as voice commands of a user, which map to directives configured to cause the third-party devices to perform an operation that is responsive to the event. In this way, the speech-processing system may receive audio data representing a voice command of a user in an environment of a third-party device, and return a directive to cause the third-party device to perform an operation responsive to the voice command.

Type: Grant

Filed: December 13, 2017

Date of Patent: April 5, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Raphael Adam Anuar, Zoe Adams, Shah Samir Pravinchandra, Idris Abbas Saylawala
System and method for voice recognition using a peripheral device

Patent number: 11288038

Abstract: A system and method for dictation using a peripheral device includes a voice recognition mouse. The voice recognition mouse includes a microphone, a first button, a processor coupled to the microphone and the first button, and a memory coupled to the processor. The memory stores instructions that, when executed by the processor, cause the processor to detect actuation of the first button and in response to detecting actuation of the first button, invoke the microphone for capturing audio speech from a user. The captured audio speech is streamed to a first module. The first module is configured to invoke a second module for converting the captured audio speech into text and forward the text to the first module for providing to an application expecting the text, the application being configured to display the text on a display device.

Type: Grant

Filed: July 30, 2019

Date of Patent: March 29, 2022

Inventor: John Holst, III
Method of and system for translating speech to text

Patent number: 11289095

Abstract: There is a method and system for translating speech to text, the speech having been received by a client device. A user utterance corresponding to the speech is received. A first predicted text corresponding to the user utterance and a first confidence score corresponding to the first predicted text are determined using a local graph. The user utterance is transmitted to a server. A second predicted text corresponding to the user utterance and a second confidence score corresponding to the second predicted text is received from the server. If the first confidence score is higher than the second confidence score, the first predicted text is output.

Type: Grant

Filed: September 28, 2020

Date of Patent: March 29, 2022

Assignee: YANDEX EUROPE AG

Inventor: Pavel Aleksandrovich Zelenko
System, summarization apparatus, summarization system, and method of controlling summarization apparatus, for acquiring summary information

Patent number: 11281707

Abstract: A system that outputs information generated by summarizing contents of voices and images as texts. A CPU of the system performs, according to a program stored in a memory, recording voice data and captured image data, generating first text information by performing speech recognition on the acquired voice data, generating second text information by performing character recognition on the acquired image data, and generating summary text information smaller in the number of characters than the first text information and the second text information, based on the first text information and the second text information, according to a predetermined criterion.

Type: Grant

Filed: November 26, 2018

Date of Patent: March 22, 2022

Assignee: CANON KABUSHIKI KAISHA

Inventor: Motoki Ikeda
Deep language and acoustic modeling convergence and cross training

Patent number: 11270686

Abstract: A model-pair is selected to recognize spoken words in a speech signal generated from a speech, which includes an acoustic model and a language model. A degree of disjointedness between the acoustic model and the language model is computed relative to the speech by comparing a first recognition output produced from the acoustic model and a second recognition output produced from the language model. When the acoustic model incorrectly recognizes a portion of the speech signal as a first word and the language model correctly recognizes the portion of the speech signal as a second word, a textual representation of the second word is determined and associated with a set of sound descriptors to generate a training speech pattern. Using the training speech pattern, the acoustic model is trained to recognize the portion of the speech signal as the second word.

Type: Grant

Filed: March 28, 2017

Date of Patent: March 8, 2022

Assignee: International Business Machines Corporation

Inventors: Aaron K. Baughman, John M. Ganci, Jr., Stephen C. Hammer, Craig M. Trim
System and method for separation of audio sources that interfere with each other using a microphone array

Patent number: 11270712

Abstract: A system and method for decorrelating audio data. A method includes determining a plurality of propagation vectors for each of a plurality of sound sources based on audio data captured by a plurality of sound capturing devices and a location of each of the plurality of sound sources, wherein the plurality of sound sources and the plurality of sound capturing devices are deployed in a space, wherein the audio data is captured by the plurality of sound capturing devices based on sounds emitted by the plurality of sound sources in the space; determining a plurality of beam former outputs, wherein each beam former output is determined for one of the plurality of sound sources; determining a decoupling matrix based on the plurality of beam former outputs and the propagation vectors; and decorrelating audio data captured by the plurality of sound capturing devices based on the decoupling matrix.

Type: Grant

Filed: August 26, 2020

Date of Patent: March 8, 2022

Assignee: Insoundz Ltd.

Inventors: Ron Ziv, Tomer Goshen, Emil Winebrand, Yadin Aharoni
Interpretable counting in visual question answering

Patent number: 11270145

Abstract: Approaches for interpretable counting for visual question answering include a digital image processor, a language processor, and a counter. The digital image processor identifies objects in an image, maps the identified objects into an embedding space, generates bounding boxes for each of the identified objects, and outputs the embedded objects paired with their bounding boxes. The language processor embeds a question into the embedding space. The scorer determines scores for the identified objects. Each respective score determines how well a corresponding one of the identified objects is responsive to the question. The counter determines a count of the objects in the digital image that are responsive to the question based on the scores. The count and a corresponding bounding box for each object included in the count are output. In some embodiments, the counter determines the count interactively based on interactions between counted and uncounted objects.

Type: Grant

Filed: February 4, 2020

Date of Patent: March 8, 2022

Assignee: salesforce.com, inc.

Inventors: Alexander Richard Trott, Caiming Xiong, Richard Socher
Voice identification method, device, apparatus, and storage medium

Patent number: 11264034

Abstract: A voice identification method, device, apparatus, and a storage medium are provided. The method includes: receiving voice data; and performing a voice identification on the voice data, to obtain first text data associated with the voice data; determining common text data in a preset fixed data table, wherein a similarity between a pronunciation of the determined common text data and a pronunciation of the first text data meets a preset condition, wherein the determined common text data is a voice identification result with an occurrence number larger than a first preset threshold; and replacing the first text data with the determined common text data.

Type: Grant

Filed: February 26, 2020

Date of Patent: March 1, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd

Inventors: Ye Song, Long Zhang, Pengpeng Jie
Vision-assisted speech processing

Patent number: 11257493

Abstract: Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.

Type: Grant

Filed: July 11, 2019

Date of Patent: February 22, 2022

Assignee: SoundHound, Inc.

Inventors: Cristina Vasconcelos, Zili Li
Detecting disability and ensuring fairness in automated scoring of video interviews

Patent number: 11257041

Abstract: A processing device is to: identify, using digital interview data of interviewees captured during interviews, a subset of the interviewees that have a disability; label a first group of the interviewees as disabled and a second group of the interviewees as not disabled with reference to the disability; identify features from the digital interview data for the first group that correlate with the disability; formulate a digital fingerprint of the features that identifies how the first group differs from the second group with reference to the disability; map the digital fingerprint of the features onto a dataset of an interviewee belonging to the second group of the interviewees, to generate a mapped dataset; and determine, from the mapped dataset, effects of the digital fingerprint on a job performance score for the interviewee.

Type: Grant

Filed: August 20, 2018

Date of Patent: February 22, 2022

Assignee: HireVue, Inc.

Inventors: Loren Larsen, Keith Warnick, Lindsey Zuloaga, Caleb Rottman
Speaker recognition based on signal segments weighted by quality

Patent number: 11250860

Abstract: This speech processing device is provided with: a contribution degree estimation means which calculates a contribution degree representing a quality of a segment of the speech signal; and a speaker feature calculation means which calculates a feature from the speech signal, for recognizing attribute information of the speech signal, using the contribution degree as a weight of the segment of the speech signal.

Type: Grant

Filed: March 7, 2017

Date of Patent: February 15, 2022

Assignee: NEC CORPORATION

Inventors: Hitoshi Yamamoto, Takafumi Koshinaka
Ambient cooperative intelligence system and method

Patent number: 11250855

Abstract: A method, computer program product, and computing system for monitoring a plurality of conversations within a monitored space to generate a conversation data set; processing the conversation data set using machine learning to: define a system-directed command for an ACI system, and associate one or more conversational contexts with the system-directed command; detecting the occurrence of a specific conversational context within the monitored space, wherein the specific conversational context is included in the one or more conversational contexts associated with the system-directed command; and executing, in whole or in part, functionality associated with the system-directed command in response to detecting the occurrence of the specific conversational context without requiring the utterance of the system-directed command and/or a wake-up word/phrase.

Type: Grant

Filed: December 23, 2020

Date of Patent: February 15, 2022

Assignee: NUANCE COMMUNICATIONS, INC.

Inventors: Paul Joseph Vozila, Neal Snider
Using closed captions as parallel training data for customization of closed captioning systems

Patent number: 11250872

Abstract: Method, apparatus, and computer program product are provided for customizing an automatic closed captioning system. In some embodiments, at a data use (DU) location, an automatic closed captioning system that includes a base model is provided, search criteria are defined to request from one or more data collection (DC) locations, a search request based on the search criteria is sent to the one or more DC locations, relevant closed caption data from the one or more DC locations are received responsive to the search request, the received relevant closed caption data are processed by computing a confidence score for each of a plurality of data sub-sets of the received relevant closed caption data and selecting one or more of the data sub-sets based on the confidence scores, and the automatic closed captioning system is customized by using the selected one or more data sub-sets to train the base model.

Type: Grant

Filed: December 14, 2019

Date of Patent: February 15, 2022

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Yinghui Huang, Masayuki Suzuki, Zoltan Tueske, Laurence P. Sansone, Michael A. Picheny
Systems and methods for recording quality driven communication management

Patent number: 11238884

Abstract: A system, method and non-transitory computer readable medium for providing call quality driven communication management wherein an audio data stream of a communication session having one or more utterances is processed to generate a transcript of the communication session. The generated transcript is analyzed to determine whether a quality of the audio data stream, and one or more quality improvement measures when one or more audio artifacts are determined to be present in the audio data stream.

Type: Grant

Filed: October 4, 2019

Date of Patent: February 1, 2022

Assignee: RED BOX RECORDERS LIMITED

Inventors: Simon Jolly, Tony Commander, Kyrylo Zotkin
Providing suggested voice-based action queries

Patent number: 11238851

Abstract: Technology of the disclosure may facilitate user discovery of various voice-based action queries that can be spoken to initiate computer-based actions, such as voice-based action queries that can be provided as spoken input to a computing device to initiate computer-based actions that are particularized to content being viewed or otherwise consumed by the user on the computing device. Some implementations are generally directed to determining, in view of content recently viewed by a user on a computing device, at least one suggested voice-based action query for presentation via the computing device. Some implementations are additionally or alternatively generally directed to receiving at least one suggested voice-based action query at a computing device and providing the suggested voice-based action query as a suggestion in response to input to initiate providing of a voice-based query via the computing device.

Type: Grant

Filed: September 27, 2019

Date of Patent: February 1, 2022

Assignee: Google LLC

Inventors: Vikram Aggarwal, Pravir Kumar Gupta
Electronic device and controlling method thereof

Patent number: 11227597

Abstract: An electronic device for performing a voice recognition and a controlling method are provided.

Type: Grant

Filed: January 21, 2020

Date of Patent: January 18, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyunkook Cho, Taeyoung Kim, Sunghee Cho
Intelligent voice recognizing method, apparatus, and intelligent computing device

Patent number: 11217234

Abstract: Disclosed herein is a method for intelligently recognizing voice by a voice recognizing apparatus in various noise environments. The method includes acquiring a first noise level for an environment in which the voice recognizing apparatus is located, inputting the first noise level into a previously learned noise-sensitivity model to acquire a first optimum sensitivity, and recognizing a user's voice based on the first optimum sensitivity. The noise-sensitivity model is learned in a plurality of noise environments acquiring different noise levels, so that it is possible to accurately acquire an optimum sensitivity corresponding to a noise level depending on an operating state when an IoT device (voice recognizing apparatus) is in operation.

Type: Grant

Filed: September 19, 2019

Date of Patent: January 4, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Jaewoong Jeong, Youngman Kim, Sangjun Oh, Kyuho Lee, Seunghyun Hwang
Far-field extension for digital assistant services

Patent number: 11217255

Abstract: Systems and processes for operating an intelligent automated assistant to provide extension of digital assistant services are provided. An example method includes, at an electronic device having one or more processors, receiving, from a first user, a first speech input representing a user request. The method further includes obtaining an identity of the first user; and in accordance with the user identity, providing a representation of the user request to at least one of a second electronic device or a third electronic device. The method further includes receiving, based on a determination of whether the second electronic device or the third electronic device, or both, is to provide the response to the first electronic device, the response to the user request from the second electronic device or the third electronic device. The method further includes providing a representation of the response to the first user.

Type: Grant

Filed: August 16, 2017

Date of Patent: January 4, 2022

Assignee: Apple Inc.

Inventors: Yoon Kim, Charles Srisuwananukorn, David A. Carson, Thomas R. Gruber, Justin G. Binder

prev 1 2 3 4 5 6 7 … next