Speech Recognition Using Nonacoustical Features, E.g., Position Of The Lips, Etc. (epo) Patents (Class 704/E15.041)
  • Patent number: 11601750
    Abstract: According to examples, an apparatus may include a processor and a non-transitory computer readable medium on which is stored instructions that the processor may execute to access an audio signal captured by a microphone of a user's speech while the microphone is in a muted state. The processor may also execute the instructions to analyze a spectral or frequency content of the accessed audio signal to determine whether the user was facing the microphone while the user spoke. In addition, based on a determination that the user was facing the microphone while the user spoke, the processor may execute the instructions to unmute the microphone.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: March 7, 2023
    Assignee: Hewlett-Packard Development Company, L.P
    Inventors: Srikanth Kuthuru, Sunil Bharitkar
  • Patent number: 11561760
    Abstract: An electronic device for changing a voice of a personal assistant function, and a method therefor are provided. The electronic device includes a display, a transceiver, processor, and a memory for storing commands executable by the processor. The processor is configured to, based on a user command to request acquisition of voice data feature of a person included in a media content displayed on the display being received, control the display to display information of a person, based on a user input to select the one of the information of a person being received, acquire voice data corresponding to an utterance of a person related to the selected information of a person, and acquire voice data feature from the acquired voice data, control the transceiver to transmit the acquired voice data feature to a server.
    Type: Grant
    Filed: June 10, 2019
    Date of Patent: January 24, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jaehong Kim, Sangkyung Lee, Jihak Jung
  • Patent number: 11551681
    Abstract: Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: January 10, 2023
    Inventors: Rajesh Kumar Pandey, Ruhi Sarikaya, Shubham Katiyar, Arun Kumar Thenappan, Isaac Joseph Madwed, Jihwan Lee, David Thomas, Julia Kennedy Nemer, Mohamed Farouk AbdelHady, Joe Pemberton, Young-Bum Kim, Arima Vu Ram Thayumanavar, Wangyao Ge
  • Patent number: 11516570
    Abstract: Implementations of the subject matter described herein provide a silent voice input solution without being noticed by surroundings. Compared with conventional voice input solutions which are based on normal speech or whispering, the proposed “silent” voice input method is performed by using ingressive voice during the user's breathing-in process. By placing the apparatus very close to the user's mouth with a ultra-small gap formed between the microphone and the apparatus, the proposed silent voice input solution can realize a very small voice leakage, and thereby allowing the user to use ultra-low voice speech input in public and mobile situations, without disturbing surrounding people.
    Type: Grant
    Filed: July 1, 2021
    Date of Patent: November 29, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Masaaki Fukumoto
  • Patent number: 11488596
    Abstract: A method for recording audio content in a group conversation among a plurality of members includes: controlling an image capturing device to continuously capture images of the members; executing an image processing procedure on the images of the members to determine whether a specific gesture is detected; when the determination is affirmative, controlling an audio recording device to activate and perform directional audio collection with respect to a direction that is associated with the specific gesture to record audio data; and controlling a data storage to store the audio data and a time stamp associated with the audio data as an entry of conversation record.
    Type: Grant
    Filed: April 27, 2020
    Date of Patent: November 1, 2022
    Inventor: Hsiao-Han Chen
  • Patent number: 10878226
    Abstract: In an approach, a computer determines based, at least in part, on a video of an attendee of a video conference, a first sentiment of the attendee wherein the first sentiment includes at least a sentiment from a sentiment analysis of one or more facial expressions of the attendee and a sentiment from a sentiment analysis of a plurality of the attendee's spoken words. The approach includes a computer receiving an indication of an attendee activity in at least a first application in computing devices accessed by the attendee and determining whether the first sentiment of the attendee is related to the video conference based, in part, on the attendee activity in at least the first application. Responsive to determining that the first sentiment of the attendee is not related to the video conference, the computer discards the first sentiment that is unrelated to the video conference.
    Type: Grant
    Filed: March 8, 2017
    Date of Patent: December 29, 2020
    Assignee: International Business Machines Corporation
    Inventors: Hernan A. Cunico, Asima Silva
  • Patent number: 9898170
    Abstract: An approach is provided for automatically generating user-specific interaction modes for processing question and answers at the information handling system by receiving a question from a user, extracting user context parameters identifying a usage scenario for the user, identifying first input and output presentation modes for the user based on the extracted user context parameters, monitoring user interaction with the system in relation to the question, and adjusting the first input and output presentation modes based on the extracted user context parameters and detected user interaction with the system.
    Type: Grant
    Filed: December 10, 2014
    Date of Patent: February 20, 2018
    Assignee: International Business Machines Corporation
    Inventors: John P. Bufe, Donna K. Byron, Mary D. Swift, Timothy Winkler
  • Patent number: 8635066
    Abstract: Methods, system, and articles are described herein for receiving an audio input and a facial image sequence for a period of time, in which the audio input includes speech input from multiple speakers. The audio input is extracted based on the received facial image sequence to extract a speech input of a particular speaker.
    Type: Grant
    Filed: April 14, 2010
    Date of Patent: January 21, 2014
    Assignee: T-Mobile USA, Inc.
    Inventor: Andrew R. Morrison
  • Publication number: 20110099013
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
    Type: Application
    Filed: October 23, 2009
    Publication date: April 28, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Dan MELAMED, Srinivas Bangalore, Michael Johnston
  • Publication number: 20100250250
    Abstract: A hybrid text generator is disclosed that generates a hybrid text string from multiple text strings that are produced from an audio input by multiple automated speech recognition systems. The hybrid text generator receives metadata that describes a time-location that each word from the multiple text strings is located in the audio input. The hybrid text generator matches words between the multiple text strings using the metadata and generates a hybrid text string that includes the matched words. The hybrid text generator utilizes confidence scores associated with words that do not match between the multiple text strings to determine whether to add an unmatched word to the hybrid text string.
    Type: Application
    Filed: March 29, 2010
    Publication date: September 30, 2010
    Inventor: Jonathan Wiggs
  • Publication number: 20100185447
    Abstract: Embodiments are provided for selecting and utilizing multiple recognizers to process an utterance based on a markup language document. The markup language document and an utterance are received in a computing device. One or more recognizers are selected from among the multiple recognizers for returning a results set for the utterance based on markup language in the markup language document. The results set is received from the one or more selected recognizers in a format determined by a processing method specified in the markup language document. An event is then executed on the computing device in response to receiving the results set.
    Type: Application
    Filed: January 22, 2009
    Publication date: July 22, 2010
    Applicant: Microsoft Corporation
    Inventors: Andrew K. Krumel, Pierre-Alexandre F. Masse, Joseph A. Ruff
  • Publication number: 20090326941
    Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.
    Type: Application
    Filed: September 4, 2009
    Publication date: December 31, 2009
    Inventor: Mark Catchpole
  • Publication number: 20090171662
    Abstract: The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
    Type: Application
    Filed: December 27, 2007
    Publication date: July 2, 2009
    Applicant: SEHDA, INC.
    Inventors: Jun Huang, Yookyung Kim, Youssef Billawala, Farzad Ehsani, Demitrios Master
  • Publication number: 20090048838
    Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.
    Type: Application
    Filed: May 29, 2008
    Publication date: February 19, 2009
    Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
  • Publication number: 20090006093
    Abstract: A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
    Type: Application
    Filed: June 29, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventor: Amitava Das
  • Publication number: 20080300880
    Abstract: This application discloses A multi-lingual output device for output of transactional information for a given customer, the device that includes a data base for determining what transaction information needs to be outputted, the local language in which the information is to be outputted, and the preferred language of the customer in which the information is to be outputted; and, a local transaction subsystem in communication with said database, wherein said local transaction sub system includes input device receiving means for accepting an input device and output generating means for generating a signal to an output device.
    Type: Application
    Filed: January 22, 2008
    Publication date: December 4, 2008
    Inventor: Lawrence Stephen Gelbman
  • Publication number: 20080103758
    Abstract: A method for language translation of a toolkit menu is provided, which includes receiving, by a Subscriber Identity Module (SIM) toolkit module, the toolkit menu from a SIM card module, determining, by the SIM toolkit module, whether a language of the toolkit menu matches a user-defined language, and translating, by the SIM toolkit module, the language of the toolkit menu into the user-defined language, if the language of the toolkit menu is different from the user defined language.
    Type: Application
    Filed: October 25, 2007
    Publication date: May 1, 2008
    Inventor: Suraparaju VENKATESWARLU
  • Patent number: RE42868
    Abstract: A method and apparatus accesses a database where entries are linked to at least two sets of patterns. One or more patterns of a first set of patterns are recognized within a received signal. The recognized patterns are used to identify entries and compile a list of patterns in a second set of patterns to which those entries are also linked. The list is then used to recognize a second received signal. The received signals may, for example, be voice signals or signals indicating the origin or destination of the received signals.
    Type: Grant
    Filed: October 25, 1995
    Date of Patent: October 25, 2011
    Assignee: Cisco Technology, Inc.
    Inventors: David J. Attwater, Steven J. Whittaker, Francis J. Scahill, Alison D. Simons