Recognition Patents (Class 704/231)
  • Patent number: 11423348
    Abstract: Logistical operations (e.g., warehouses) may use a voice-enabled workflow to facilitate the work tasks of a staff (i.e., population) of workers. Typically, it is necessary for a worker to travel from location-to-location to complete assigned work tasks. As such, a worker's time spent travelling often correlates with the worker's overall work performance. Understanding the worker's travel performance is highly desirable, but computing a fair and accurate travel-performance metric is difficult. One reason for this is that the distance a worker travels is often unknown. The present invention embraces a system and method for accurately and fairly assessing a worker's travel performance by analyzing the worker's voice dialog.
    Type: Grant
    Filed: January 11, 2016
    Date of Patent: August 23, 2022
    Assignee: Hand Held Products, Inc.
    Inventors: Kwong Wing Au, Christopher L. Lofty, Steven Thomas, John Pecorari, James Geisler
  • Patent number: 11423911
    Abstract: Computer-implemented method and system for processing and broadcasting one or more moment-associating elements. For example, the computer-implemented method includes granting subscription permission to one or more subscribers; receiving the one or more moment-associating elements; transforming the one or more moment-associating elements into one or more pieces of moment-associating information; and transmitting at least one piece of the one or more pieces of moment-associating information to the one or more subscribers.
    Type: Grant
    Filed: October 10, 2019
    Date of Patent: August 23, 2022
    Assignee: Otter.ai, Inc.
    Inventors: Yun Fu, Tao Xing, Kaisuke Nakajima, Brian Francis Williams, James Mason Altreuter, Xiaoke Huang, Simon Lau, Sam Song Liang, Kean Kheong Chin, Wen Sun, Julius Cheng, Hitesh Anand Gupta
  • Patent number: 11417324
    Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: August 16, 2022
    Assignee: GOOGLE LLC
    Inventors: Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
  • Patent number: 11417354
    Abstract: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus for voice activity detection (VAD). The VAD comprises creating a signal indicative of a primary VAD decision and determining hangover addition. The determination on hangover addition is made in dependence of a short term activity measure and/or a long term activity measure. A signal indicative of a final VAD decision is then created.
    Type: Grant
    Filed: February 18, 2020
    Date of Patent: August 16, 2022
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventor: Martin Sehlstedt
  • Patent number: 11418233
    Abstract: Methods and apparatus to monitor a media presentation are disclosed. An example system includes a monitoring device to monitor a media presentation and generate research data identifying the media. A bridge device includes a housing dimensioned to receive the monitoring device, a receiver carried by the housing to receive a first audio signal via a wireless data connection from an audio source device using a wireless communication protocol, the first audio signal associated with the media. The bridge device includes an audio emitter to emit the audio signal for receipt by the monitoring device, and a transmitter to transmit the audio signal to an audio receiver device using the wireless communication protocol.
    Type: Grant
    Filed: June 15, 2020
    Date of Patent: August 16, 2022
    Assignee: THE NIELSEN COMPANY (US), LLC
    Inventors: William K. Krug, James Zhang
  • Patent number: 11410637
    Abstract: A voice synthesis method according to an embodiment includes altering a series of synthesis spectra in a partial period of a synthesis voice based on a series of amplitude spectrum envelope contours of a voice expression to obtain a series of altered spectra to which the voice expression has been imparted, and synthesizing a series of voice samples to which the voice expression has been imparted, based on the series of altered spectra.
    Type: Grant
    Filed: April 26, 2019
    Date of Patent: August 9, 2022
    Assignee: YAMAHA CORPORATION
    Inventors: Jordi Bonada, Merlijn Blaauw, Keijiro Saino, Ryunosuke Daido, Michael Wilson, Yuji Hisaminato
  • Patent number: 11403466
    Abstract: In one embodiment, a method includes receiving, from a client system associated with a first user, a first audio input. The method includes generating multiple transcriptions corresponding to the first audio input based on multiple automatic speech recognition (ASR) engines. Each ASR engine is associated with a respective domain out of multiple domains. The method includes determining, for each transcription, a combination of one or more intents and one or more slots to be associated with the transcription. The method includes selecting, by a meta-speech engine, one or more combinations of intents and slots from the multiple combinations to be associated with the first user input. The method includes generating a response to the first audio input based on the selected combinations and sending, to the client system, instructions for presenting the response to the first audio input.
    Type: Grant
    Filed: January 13, 2020
    Date of Patent: August 2, 2022
    Assignee: Facebook Technologies, LLC.
    Inventors: Fuchun Peng, Jihang Li, Jinsong Yu
  • Patent number: 11404048
    Abstract: An embodiment of the inventive concept includes a communication device, a voice input receiving unit, a memory, and a processor. In addition, various embodiments recognized through the specification are possible.
    Type: Grant
    Filed: January 29, 2019
    Date of Patent: August 2, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kyeonga Han, Soungmin Yoo
  • Patent number: 11398224
    Abstract: A communication system includes a first terminal device, a second terminal device, and an advice providing device. The first terminal device is operated by an operator. The second terminal device is operated by a guest. The second terminal device communicates with the first terminal device through a network. The advice providing device includes circuitry that determines advice for the operator based on voice data including first voice data that is related to the operator and transmitted from the first terminal device and second voice data that is related to the guest and transmitted from the second terminal device. The circuitry of the advice providing device further transmits the advice to the first terminal device. The first terminal device receives the advice and displays, on a display, the advice.
    Type: Grant
    Filed: September 17, 2020
    Date of Patent: July 26, 2022
    Assignee: RICOH COMPANY, LTD.
    Inventors: Mayu Hakata, Takashi Hasegawa
  • Patent number: 11388294
    Abstract: An image forming apparatus may include: a storage that stores information in which a job type is associated with speech patterns for processings related to the job type; and a hardware processor that may: be inputted with a speech; acquire a job type; use speech patterns associated with a job type, which is acquired by the hardware processor and is being executed, to analyze a speech inputted during execution of the job; and execute the processings based on an analysis result by the hardware processor.
    Type: Grant
    Filed: June 16, 2020
    Date of Patent: July 12, 2022
    Assignee: Konica Minolta, Inc.
    Inventor: Takahisa Matsunaga
  • Patent number: 11380316
    Abstract: The present invention discloses a speech interaction method and apparatus, and pertains to the field of speech processing technologies. The method includes: acquiring speech data of a user; performing user attribute recognition on the speech data to obtain a first user attribute recognition result; performing content recognition on the speech data to obtain a content recognition result of the speech data; and performing a corresponding operation according to at least the first user attribute recognition result and the content recognition result, so as to respond to the speech data. According to the present invention, after speech data is acquired, user attribute recognition and content recognition are separately performed on the speech data to obtain a first user attribute recognition result and a content recognition result, and a corresponding operation is performed according to at least the first user attribute recognition result and the content recognition result.
    Type: Grant
    Filed: October 10, 2019
    Date of Patent: July 5, 2022
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Hongbo Jin, Zhuolin Jiang
  • Patent number: 11380301
    Abstract: A learning apparatus comprises a learning part that learns an error correction model by a set of a speech recognition result candidate and a correct text of speech recognition for given audio data, wherein the speech recognition result candidate includes a speech recognition result candidate which is different from the correct text, and the error correction model is a model that receives a word sequence of the speech recognition result candidate as input and outputs an error correction score indicating likelihood of the word sequence of the speech recognition result candidate in consideration of a speech recognition error.
    Type: Grant
    Filed: February 18, 2019
    Date of Patent: July 5, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tomohiro Tanaka, Ryo Masumura
  • Patent number: 11379135
    Abstract: One implementation is for a device. The device includes a controller having a memory protection unit, a legally relevant memory portion capable of interacting with the controller, a legally non-relevant memory portion capable of interacting with the controller, an interface in the memory protection unit that allows a privileged application to access the legally relevant memory portion and disallows an unprivileged application to access to the legally relevant portion, and an interrupt system, wherein when the unprivileged application makes an attempt to interact with the legally relevant memory portion, the memory protection unit takes an action associated with the unprivileged application.
    Type: Grant
    Filed: August 4, 2020
    Date of Patent: July 5, 2022
    Assignee: Honeywell International Inc.
    Inventors: Ralf Peter Thor, François Germain Vincent, Roland Domke, Ionut Nicorescu
  • Patent number: 11373045
    Abstract: A system for determining context and intent in a conversation using machine learning (ML) based artificial intelligence (AI) in omnichannel data communications is disclosed. The system may comprise a data store to store and manage data within a network, a server to facilitate operations using information from the one or more data stores, and a ML-based AI subsystem to communicate with the server and the data store in the network. The ML-based AI subsystem may comprise a data access interface to receive data associated with a conversation with a user via a communication channel. The ML-based AI subsystem may comprise a processor to provide a proactive, adaptive, and intelligent conversation by applying hierarchical multi-intent data labeling framework, training at least one model with training data, and generating and deploying a production-ready model based on the trained and retained at least one model.
    Type: Grant
    Filed: September 24, 2019
    Date of Patent: June 28, 2022
    Assignee: CONTACTENGINE LIMITED
    Inventors: Dominic Bealby-Wright, Cosmin Dragos Davidescu
  • Patent number: 11373544
    Abstract: A method includes displaying a first set of text content characterized by a first difficulty level. The method includes obtaining speech data associated with the first set of text content. The method includes determining linguistic feature(s) within the speech data. The method includes in response to completion of the speech data, determining a reading proficiency value associated with the first set of text content and based on the linguistic feature(s). The method includes in accordance with determining the reading proficiency value satisfies change criteria, changing a difficulty level for a second set of text content. After changing the difficulty level, the second set of text content corresponds to a second difficulty level different from the first difficulty level. The method includes in accordance with determining the reading proficiency value does not satisfy the change criteria, maintaining the second set of text content at the first difficulty level.
    Type: Grant
    Filed: February 24, 2020
    Date of Patent: June 28, 2022
    Assignee: APPLE INC.
    Inventors: Barry-John Theobald, Russell Y. Webb, Nicholas Elia Apostoloff
  • Patent number: 11375293
    Abstract: Accommodation for color or visual impairments may be implemented by selective color substitution. A color accommodation module receives an image frame from a host system and generates a color-adapted version of the image frame. The color accommodation module may include a rule based filter that substitutes one or more colors within the image frame with one or more corresponding alternative colors.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: June 28, 2022
    Assignee: Sony Interactive Entertainment Inc.
    Inventors: Naveen Kumar, Justice Adams, Arindam Jati, Masanori Omote
  • Patent number: 11367447
    Abstract: Aspects of the subject disclosure may include, for example, obtaining a natural language instruction, interpreting the instruction to obtain a machine interpretation, and analyzing the machine interpretation to obtain an intent of the natural language instruction. An actionable command adapted to cause a digital manipulation tool to digitally manipulate a content item is determined according to the intent of the natural language instruction. The actionable command is provided to the digital manipulation tool to obtain the manipulated content item. Other embodiments are disclosed.
    Type: Grant
    Filed: June 9, 2020
    Date of Patent: June 21, 2022
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jackson Jarrell Pair
  • Patent number: 11367448
    Abstract: A method of providing a platform for configuring device-specific speech recognition is provided. The method includes providing a user interface for developers to select a set of at least two acoustic models appropriate for a specific type of a device, receiving, from a developer, a selection of the set of the at least two acoustic models, and configuring a speech recognition system to perform device-specific speech recognition by using one acoustic model selected from the at least two acoustic models of the set.
    Type: Grant
    Filed: April 21, 2021
    Date of Patent: June 21, 2022
    Assignee: SOUNDHOUND, INC.
    Inventors: Keyvan Mohajer, Mehul Patel
  • Patent number: 11368423
    Abstract: A dense passage retrieval machine learning model having a first encoder for resources and a second encoder for messages can automatically match relevant resources to computers or sessions based on analysis of a series of messages of an online chat conversation. Continuous re-training is supported based on feedback from a moderator computer and/or user computers.
    Type: Grant
    Filed: December 29, 2021
    Date of Patent: June 21, 2022
    Assignee: SUPPORTIV INC.
    Inventors: Helena Plater-Zyberk, Pouria Mojabi
  • Patent number: 11366971
    Abstract: In one embodiment, a method includes receiving, from a client system associated with a first user, a first audio input. The method includes generating multiple transcriptions corresponding to the first audio input based on multiple automatic speech recognition (ASR) engines. Each ASR engine is associated with a respective domain out of multiple domains. The method includes determining, for each transcription, a combination of one or more intents and one or more slots to be associated with the transcription. The method includes selecting, by a meta-speech engine, one or more combinations of intents and slots from the multiple combinations to be associated with the first user input. The method includes generating a response to the first audio input based on the selected combinations and sending, to the client system, instructions for presenting the response to the first audio input.
    Type: Grant
    Filed: January 13, 2020
    Date of Patent: June 21, 2022
    Assignee: Facebook Technologies, LLC.
    Inventors: Fuchun Peng, Jihang Li, Jinsong Yu
  • Patent number: 11367437
    Abstract: There is provided a speech dialog system that includes a first microphone, a second microphone, a processor and a memory. The first microphone captures first audio from a first spatial zone, and produces a first audio signal. The second microphone captures second audio from a second spatial zone, and produces a second audio signal. The processor receives the first audio signal and the second audio signal, and the memory contains instructions that control the processor to perform operations of a speech enhancement module, an automatic speech recognition module, and a speech dialog module that performs a zone-dedicated speech dialog.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: June 21, 2022
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventors: Timo Matheja, Markus Buck, Andreas Kirbach, Martin Roessler, Tim Haulick, Julien Premont, Josef Anastasiadis, Rudi Vuerinckx, Christophe Ris, Stijn Verschaeren, Hakan Ari, Dieter Ranz
  • Patent number: 11360737
    Abstract: The present disclosure discloses a method and apparatus for providing a speech service. A specific embodiment of the method comprises: receiving request information sent by a device, the request information comprising first event information and speech information, the first event information used for indicating a first event occurring on the device when the device sends the request information, wherein the first event information comprises speech input event information used for instructing a user to input the speech information; generating response information comprising an operation instruction for a targeted device on the basis of the first event information and the speech information; and sending the response information to the targeted device for the targeted device to perform an operation indicated by the operation instruction. The embodiment improves the efficiency of providing a speech service.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: June 14, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD
    Inventors: Jianliang Zhou, Guanghao Shen, Ruisheng Wu
  • Patent number: 11361765
    Abstract: Disclosed is a multi-device control method including: performing a voice recognition operation on a voice command generated from a sound source; identifying distances between each of the plurality of devices and the sound source; assigning response rankings to the devices by combining a context-specific correction score of each device corresponding to the voice command and the distances; and selecting a device to respond to the voice command from among the devices according to the response rankings.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: June 14, 2022
    Assignee: LG ELECTRONICS INC.
    Inventor: Jisoo Park
  • Patent number: 11355109
    Abstract: Embodiments of the present disclosure provide a method and apparatus for man-machine conversion, and an electronic device. The method includes: outputting question information to a user based on a first task of a first conversation scenario; judging, in response to receiving reply information returned by the user, whether to trigger a second conversation scenario based on the reply information; generating, in response to determining the second conversation scenario being triggered based on the reply information, response information corresponding to the reply information based on the second conversation scenario; and outputting the response information to the user.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: June 7, 2022
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Xuefeng Lou, Qingwei Huang, Weiwei Wang, Cheng Peng, Xiaojun Zhao
  • Patent number: 11348253
    Abstract: Methods and systems are provided for implementing source separation techniques, and more specifically performing source separation on mixed source single-channel and multi-channel audio signals enhanced by inputting lip motion information from captured image data, including selecting a target speaker facial image from a plurality of facial images captured over a period of interest; computing a motion vector based on facial features of the target speaker facial image; and separating, based on at least the motion vector, audio corresponding to a constituent source from a mixed source audio signal captured over the period of interest. The mixed source audio signal may be captured from single-channel or multi-channel audio capture devices. Separating audio from the audio signal may be performed by a fusion learning model comprising a plurality of learning sub-models. Separating the audio from the audio signal may be performed by a blind source separation (“BSS”) learning model.
    Type: Grant
    Filed: January 9, 2020
    Date of Patent: May 31, 2022
    Assignee: Alibaba Group Holding Limited
    Inventor: Yun Li
  • Patent number: 11347928
    Abstract: Aspects of the invention include detecting and processing sections spanning processed document partitions by caching a document partition. The document partition includes metadata indicating that the document partition is a portion of a whole document. Aspects also include pairing a candidate paragraph from the document partition with a cached paragraph segment and determining, using a coherence model, a probability that the candidate paragraph and the cached paragraph segment constitute a semantically coherent paragraph. Aspects further include discarding the cached paragraph segment and processing the candidate paragraph and the cached paragraph segment separately based on a determination that the probability is less than a threshold level and processing the candidate paragraph and the cached paragraph segment together as a cross-partition paragraph based on a determination that the probability is greater than the threshold level.
    Type: Grant
    Filed: July 27, 2020
    Date of Patent: May 31, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrew J Lavery, Igor S. Ramos, Paul Joseph Hake, Scott Carrier
  • Patent number: 11348571
    Abstract: The present disclosure provides methods, computing devices, and storage media for generating a training corpus. The method includes: mining out pieces of data from user behavior logs associated with a target application, each piece of data including a first behavior log and a second behavior log, the first behavior log including a user speech and a corresponding speech recognition result, the second behavior log belonging to the same user as the first behavior log and time-dependent with the first behavior log; and determining the user speech and the corresponding speech recognition result in each piece of data as a positive feedback sample or a negative feedback sample, based on the first behavior log and the second behavior log.
    Type: Grant
    Filed: March 5, 2020
    Date of Patent: May 31, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Shiqiang Ding, Jizhou Huang, Zhongwei Jiang, Wentao Ma
  • Patent number: 11328730
    Abstract: A system and method are disclosed for generating a teleconference space for two or more communication devices using a computer coupled with a database and comprising a processor and memory. The computer generates a teleconference space and transmits requests to join the teleconference space to the two or more communication devices. The computer stores in memory identification information, and audiovisual data associated with one or more users, for each of the two or more communication devices. The computer stores audio transcription data, transmitted to the computer by each of the two or more communication devices and associated with one or more communication device users, in the computer memory. The computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript, and transmits the master audio transcript to each of the two or more communication devices.
    Type: Grant
    Filed: April 29, 2020
    Date of Patent: May 10, 2022
    Assignee: Nextiva, Inc.
    Inventors: Tomas Gorny, Jean-Baptiste Martinoli, Tracy Conrad, Lukas Gorny
  • Patent number: 11328719
    Abstract: An electronic device and a method for controlling the electronic device is provided. The electronic device includes a microphone, a memory configured to include at least one instruction, and a processor configured to execute the at least one instruction. The processor is configured to control the electronic device to perform voice recognition for an inquiry based on receiving input of a user inquiry through the microphone, and acquire a text for the inquiry, generate a plurality of inquiries for acquiring response data for the inquiry from a plurality of databases using a relation graph indicating a relation between the acquired text and data stored in the plurality of databases, acquire response data corresponding to each of the plurality of inquiries from each of the plurality of databases, and generate a response for the inquiry based on the response data acquired from each of the plurality of databases and output the response.
    Type: Grant
    Filed: January 23, 2020
    Date of Patent: May 10, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jaehun Lee, Yunsu Lee, Taeho Hwang, Seungsoo Kang, Jiyoung Kang, Sejin Kwak
  • Patent number: 11328017
    Abstract: The present teaching relates to generating a conversational agent. In one example, a plurality of input utterances may be received from a developer. A paraphrase model is obtained. The paraphrase model is generated based on machine translation. For each of the plurality of input utterances, one or more paraphrases of the input utterance are generated based on the paraphrase model. For each of the plurality of input utterances, at least one of the one or more paraphrases is selected based on an instruction from the developer to generate selected paraphrases. The conversational agent is generated based on the plurality of input utterances and the selected paraphrases.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: May 10, 2022
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Ankur Gupta, Timothy Daly, Tularam Ban
  • Patent number: 11327652
    Abstract: A keyboard is described that determines, using a first decoder and based on a selection of keys of a graphical keyboard, text. Responsive to determining that a characteristic of the text satisfies a threshold, a model of the keyboard identifies the target language of the text, and determines whether the target language is different than a language associated with the first decoder. If the target language of the text is not different than the language associated with the first decoder, the keyboard outputs, for display, an indication of first candidate words determined by the first decoder from the text. If the target language of the text is different: the keyboard enables a second decoder, where a language associated with the second decoder matches the target language of the text, and outputs, for display, an indication of second candidate words determined by the second decoder from the text.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: May 10, 2022
    Assignee: Google LLC
    Inventors: Ouais Alsharif, Peter Ciccotto, Francoise Beaufays, Dragan Zivkovic
  • Patent number: 11302330
    Abstract: A method of disambiguating user queries in a multi-turn dialogue include a set of user utterances includes using a predefined language model to recognize an ambiguous entity in an unresolved user utterance from the multi-turn dialogue. The method further includes outputting a clarifying question about the ambiguous entity, and receiving a clarifying user utterance. The method further includes identifying a disambiguating entity in the clarifying user utterance. The method further includes rewriting the unresolved user utterance as a rewritten utterance that replaces the ambiguous entity with the disambiguating entity, and outputting the rewritten utterance to one or more query answering machines.
    Type: Grant
    Filed: June 3, 2019
    Date of Patent: April 12, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jiayin Ge, Zicheng Huang, Guihong Cao
  • Patent number: 11295735
    Abstract: This disclosure describes, in part, techniques implemented by a speech-processing system for providing an extensible skill-interface component to facilitate voice-control of third-party developer devices. The speech-processing system may provide the skill-interface component to third-party device developers using a web-based portal through which the skill interfaces may be created to voice-enable third-party devices having unique capabilities. For instance, a skill interface may define events, such as voice commands of a user, which map to directives configured to cause the third-party devices to perform an operation that is responsive to the event. In this way, the speech-processing system may receive audio data representing a voice command of a user in an environment of a third-party device, and return a directive to cause the third-party device to perform an operation responsive to the voice command.
    Type: Grant
    Filed: December 13, 2017
    Date of Patent: April 5, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Raphael Adam Anuar, Zoe Adams, Shah Samir Pravinchandra, Idris Abbas Saylawala
  • Patent number: 11288038
    Abstract: A system and method for dictation using a peripheral device includes a voice recognition mouse. The voice recognition mouse includes a microphone, a first button, a processor coupled to the microphone and the first button, and a memory coupled to the processor. The memory stores instructions that, when executed by the processor, cause the processor to detect actuation of the first button and in response to detecting actuation of the first button, invoke the microphone for capturing audio speech from a user. The captured audio speech is streamed to a first module. The first module is configured to invoke a second module for converting the captured audio speech into text and forward the text to the first module for providing to an application expecting the text, the application being configured to display the text on a display device.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: March 29, 2022
    Inventor: John Holst, III
  • Patent number: 11289095
    Abstract: There is a method and system for translating speech to text, the speech having been received by a client device. A user utterance corresponding to the speech is received. A first predicted text corresponding to the user utterance and a first confidence score corresponding to the first predicted text are determined using a local graph. The user utterance is transmitted to a server. A second predicted text corresponding to the user utterance and a second confidence score corresponding to the second predicted text is received from the server. If the first confidence score is higher than the second confidence score, the first predicted text is output.
    Type: Grant
    Filed: September 28, 2020
    Date of Patent: March 29, 2022
    Assignee: YANDEX EUROPE AG
    Inventor: Pavel Aleksandrovich Zelenko
  • Patent number: 11281707
    Abstract: A system that outputs information generated by summarizing contents of voices and images as texts. A CPU of the system performs, according to a program stored in a memory, recording voice data and captured image data, generating first text information by performing speech recognition on the acquired voice data, generating second text information by performing character recognition on the acquired image data, and generating summary text information smaller in the number of characters than the first text information and the second text information, based on the first text information and the second text information, according to a predetermined criterion.
    Type: Grant
    Filed: November 26, 2018
    Date of Patent: March 22, 2022
    Assignee: CANON KABUSHIKI KAISHA
    Inventor: Motoki Ikeda
  • Patent number: 11270686
    Abstract: A model-pair is selected to recognize spoken words in a speech signal generated from a speech, which includes an acoustic model and a language model. A degree of disjointedness between the acoustic model and the language model is computed relative to the speech by comparing a first recognition output produced from the acoustic model and a second recognition output produced from the language model. When the acoustic model incorrectly recognizes a portion of the speech signal as a first word and the language model correctly recognizes the portion of the speech signal as a second word, a textual representation of the second word is determined and associated with a set of sound descriptors to generate a training speech pattern. Using the training speech pattern, the acoustic model is trained to recognize the portion of the speech signal as the second word.
    Type: Grant
    Filed: March 28, 2017
    Date of Patent: March 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Aaron K. Baughman, John M. Ganci, Jr., Stephen C. Hammer, Craig M. Trim
  • Patent number: 11270712
    Abstract: A system and method for decorrelating audio data. A method includes determining a plurality of propagation vectors for each of a plurality of sound sources based on audio data captured by a plurality of sound capturing devices and a location of each of the plurality of sound sources, wherein the plurality of sound sources and the plurality of sound capturing devices are deployed in a space, wherein the audio data is captured by the plurality of sound capturing devices based on sounds emitted by the plurality of sound sources in the space; determining a plurality of beam former outputs, wherein each beam former output is determined for one of the plurality of sound sources; determining a decoupling matrix based on the plurality of beam former outputs and the propagation vectors; and decorrelating audio data captured by the plurality of sound capturing devices based on the decoupling matrix.
    Type: Grant
    Filed: August 26, 2020
    Date of Patent: March 8, 2022
    Assignee: Insoundz Ltd.
    Inventors: Ron Ziv, Tomer Goshen, Emil Winebrand, Yadin Aharoni
  • Patent number: 11270145
    Abstract: Approaches for interpretable counting for visual question answering include a digital image processor, a language processor, and a counter. The digital image processor identifies objects in an image, maps the identified objects into an embedding space, generates bounding boxes for each of the identified objects, and outputs the embedded objects paired with their bounding boxes. The language processor embeds a question into the embedding space. The scorer determines scores for the identified objects. Each respective score determines how well a corresponding one of the identified objects is responsive to the question. The counter determines a count of the objects in the digital image that are responsive to the question based on the scores. The count and a corresponding bounding box for each object included in the count are output. In some embodiments, the counter determines the count interactively based on interactions between counted and uncounted objects.
    Type: Grant
    Filed: February 4, 2020
    Date of Patent: March 8, 2022
    Assignee: salesforce.com, inc.
    Inventors: Alexander Richard Trott, Caiming Xiong, Richard Socher
  • Patent number: 11264034
    Abstract: A voice identification method, device, apparatus, and a storage medium are provided. The method includes: receiving voice data; and performing a voice identification on the voice data, to obtain first text data associated with the voice data; determining common text data in a preset fixed data table, wherein a similarity between a pronunciation of the determined common text data and a pronunciation of the first text data meets a preset condition, wherein the determined common text data is a voice identification result with an occurrence number larger than a first preset threshold; and replacing the first text data with the determined common text data.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: March 1, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd
    Inventors: Ye Song, Long Zhang, Pengpeng Jie
  • Patent number: 11257493
    Abstract: Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.
    Type: Grant
    Filed: July 11, 2019
    Date of Patent: February 22, 2022
    Assignee: SoundHound, Inc.
    Inventors: Cristina Vasconcelos, Zili Li
  • Patent number: 11257041
    Abstract: A processing device is to: identify, using digital interview data of interviewees captured during interviews, a subset of the interviewees that have a disability; label a first group of the interviewees as disabled and a second group of the interviewees as not disabled with reference to the disability; identify features from the digital interview data for the first group that correlate with the disability; formulate a digital fingerprint of the features that identifies how the first group differs from the second group with reference to the disability; map the digital fingerprint of the features onto a dataset of an interviewee belonging to the second group of the interviewees, to generate a mapped dataset; and determine, from the mapped dataset, effects of the digital fingerprint on a job performance score for the interviewee.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: February 22, 2022
    Assignee: HireVue, Inc.
    Inventors: Loren Larsen, Keith Warnick, Lindsey Zuloaga, Caleb Rottman
  • Patent number: 11250860
    Abstract: This speech processing device is provided with: a contribution degree estimation means which calculates a contribution degree representing a quality of a segment of the speech signal; and a speaker feature calculation means which calculates a feature from the speech signal, for recognizing attribute information of the speech signal, using the contribution degree as a weight of the segment of the speech signal.
    Type: Grant
    Filed: March 7, 2017
    Date of Patent: February 15, 2022
    Assignee: NEC CORPORATION
    Inventors: Hitoshi Yamamoto, Takafumi Koshinaka
  • Patent number: 11250855
    Abstract: A method, computer program product, and computing system for monitoring a plurality of conversations within a monitored space to generate a conversation data set; processing the conversation data set using machine learning to: define a system-directed command for an ACI system, and associate one or more conversational contexts with the system-directed command; detecting the occurrence of a specific conversational context within the monitored space, wherein the specific conversational context is included in the one or more conversational contexts associated with the system-directed command; and executing, in whole or in part, functionality associated with the system-directed command in response to detecting the occurrence of the specific conversational context without requiring the utterance of the system-directed command and/or a wake-up word/phrase.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: February 15, 2022
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventors: Paul Joseph Vozila, Neal Snider
  • Patent number: 11250872
    Abstract: Method, apparatus, and computer program product are provided for customizing an automatic closed captioning system. In some embodiments, at a data use (DU) location, an automatic closed captioning system that includes a base model is provided, search criteria are defined to request from one or more data collection (DC) locations, a search request based on the search criteria is sent to the one or more DC locations, relevant closed caption data from the one or more DC locations are received responsive to the search request, the received relevant closed caption data are processed by computing a confidence score for each of a plurality of data sub-sets of the received relevant closed caption data and selecting one or more of the data sub-sets based on the confidence scores, and the automatic closed captioning system is customized by using the selected one or more data sub-sets to train the base model.
    Type: Grant
    Filed: December 14, 2019
    Date of Patent: February 15, 2022
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Yinghui Huang, Masayuki Suzuki, Zoltan Tueske, Laurence P. Sansone, Michael A. Picheny
  • Patent number: 11238884
    Abstract: A system, method and non-transitory computer readable medium for providing call quality driven communication management wherein an audio data stream of a communication session having one or more utterances is processed to generate a transcript of the communication session. The generated transcript is analyzed to determine whether a quality of the audio data stream, and one or more quality improvement measures when one or more audio artifacts are determined to be present in the audio data stream.
    Type: Grant
    Filed: October 4, 2019
    Date of Patent: February 1, 2022
    Assignee: RED BOX RECORDERS LIMITED
    Inventors: Simon Jolly, Tony Commander, Kyrylo Zotkin
  • Patent number: 11238851
    Abstract: Technology of the disclosure may facilitate user discovery of various voice-based action queries that can be spoken to initiate computer-based actions, such as voice-based action queries that can be provided as spoken input to a computing device to initiate computer-based actions that are particularized to content being viewed or otherwise consumed by the user on the computing device. Some implementations are generally directed to determining, in view of content recently viewed by a user on a computing device, at least one suggested voice-based action query for presentation via the computing device. Some implementations are additionally or alternatively generally directed to receiving at least one suggested voice-based action query at a computing device and providing the suggested voice-based action query as a suggestion in response to input to initiate providing of a voice-based query via the computing device.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: February 1, 2022
    Assignee: Google LLC
    Inventors: Vikram Aggarwal, Pravir Kumar Gupta
  • Patent number: 11227597
    Abstract: An electronic device for performing a voice recognition and a controlling method are provided.
    Type: Grant
    Filed: January 21, 2020
    Date of Patent: January 18, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyunkook Cho, Taeyoung Kim, Sunghee Cho
  • Patent number: 11217234
    Abstract: Disclosed herein is a method for intelligently recognizing voice by a voice recognizing apparatus in various noise environments. The method includes acquiring a first noise level for an environment in which the voice recognizing apparatus is located, inputting the first noise level into a previously learned noise-sensitivity model to acquire a first optimum sensitivity, and recognizing a user's voice based on the first optimum sensitivity. The noise-sensitivity model is learned in a plurality of noise environments acquiring different noise levels, so that it is possible to accurately acquire an optimum sensitivity corresponding to a noise level depending on an operating state when an IoT device (voice recognizing apparatus) is in operation.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: January 4, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Jaewoong Jeong, Youngman Kim, Sangjun Oh, Kyuho Lee, Seunghyun Hwang
  • Patent number: 11217255
    Abstract: Systems and processes for operating an intelligent automated assistant to provide extension of digital assistant services are provided. An example method includes, at an electronic device having one or more processors, receiving, from a first user, a first speech input representing a user request. The method further includes obtaining an identity of the first user; and in accordance with the user identity, providing a representation of the user request to at least one of a second electronic device or a third electronic device. The method further includes receiving, based on a determination of whether the second electronic device or the third electronic device, or both, is to provide the response to the first electronic device, the response to the user request from the second electronic device or the third electronic device. The method further includes providing a representation of the response to the first user.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: January 4, 2022
    Assignee: Apple Inc.
    Inventors: Yoon Kim, Charles Srisuwananukorn, David A. Carson, Thomas R. Gruber, Justin G. Binder