Speech Recognition Using Nonacoustical Features, E.g., Position Of The Lips, Etc. (epo) Patents (Class 704/E15.041)
E Subclasses
-
Patent number: 11601750Abstract: According to examples, an apparatus may include a processor and a non-transitory computer readable medium on which is stored instructions that the processor may execute to access an audio signal captured by a microphone of a user's speech while the microphone is in a muted state. The processor may also execute the instructions to analyze a spectral or frequency content of the accessed audio signal to determine whether the user was facing the microphone while the user spoke. In addition, based on a determination that the user was facing the microphone while the user spoke, the processor may execute the instructions to unmute the microphone.Type: GrantFiled: December 17, 2018Date of Patent: March 7, 2023Assignee: Hewlett-Packard Development Company, L.PInventors: Srikanth Kuthuru, Sunil Bharitkar
-
Patent number: 11561760Abstract: An electronic device for changing a voice of a personal assistant function, and a method therefor are provided. The electronic device includes a display, a transceiver, processor, and a memory for storing commands executable by the processor. The processor is configured to, based on a user command to request acquisition of voice data feature of a person included in a media content displayed on the display being received, control the display to display information of a person, based on a user input to select the one of the information of a person being received, acquire voice data corresponding to an utterance of a person related to the selected information of a person, and acquire voice data feature from the acquired voice data, control the transceiver to transmit the acquired voice data feature to a server.Type: GrantFiled: June 10, 2019Date of Patent: January 24, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Jaehong Kim, Sangkyung Lee, Jihak Jung
-
Patent number: 11551681Abstract: Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.Type: GrantFiled: December 13, 2019Date of Patent: January 10, 2023Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Rajesh Kumar Pandey, Ruhi Sarikaya, Shubham Katiyar, Arun Kumar Thenappan, Isaac Joseph Madwed, Jihwan Lee, David Thomas, Julia Kennedy Nemer, Mohamed Farouk AbdelHady, Joe Pemberton, Young-Bum Kim, Arima Vu Ram Thayumanavar, Wangyao Ge
-
Patent number: 11516570Abstract: Implementations of the subject matter described herein provide a silent voice input solution without being noticed by surroundings. Compared with conventional voice input solutions which are based on normal speech or whispering, the proposed “silent” voice input method is performed by using ingressive voice during the user's breathing-in process. By placing the apparatus very close to the user's mouth with a ultra-small gap formed between the microphone and the apparatus, the proposed silent voice input solution can realize a very small voice leakage, and thereby allowing the user to use ultra-low voice speech input in public and mobile situations, without disturbing surrounding people.Type: GrantFiled: July 1, 2021Date of Patent: November 29, 2022Assignee: Microsoft Technology Licensing, LLCInventor: Masaaki Fukumoto
-
Patent number: 11488596Abstract: A method for recording audio content in a group conversation among a plurality of members includes: controlling an image capturing device to continuously capture images of the members; executing an image processing procedure on the images of the members to determine whether a specific gesture is detected; when the determination is affirmative, controlling an audio recording device to activate and perform directional audio collection with respect to a direction that is associated with the specific gesture to record audio data; and controlling a data storage to store the audio data and a time stamp associated with the audio data as an entry of conversation record.Type: GrantFiled: April 27, 2020Date of Patent: November 1, 2022Inventor: Hsiao-Han Chen
-
Patent number: 10878226Abstract: In an approach, a computer determines based, at least in part, on a video of an attendee of a video conference, a first sentiment of the attendee wherein the first sentiment includes at least a sentiment from a sentiment analysis of one or more facial expressions of the attendee and a sentiment from a sentiment analysis of a plurality of the attendee's spoken words. The approach includes a computer receiving an indication of an attendee activity in at least a first application in computing devices accessed by the attendee and determining whether the first sentiment of the attendee is related to the video conference based, in part, on the attendee activity in at least the first application. Responsive to determining that the first sentiment of the attendee is not related to the video conference, the computer discards the first sentiment that is unrelated to the video conference.Type: GrantFiled: March 8, 2017Date of Patent: December 29, 2020Assignee: International Business Machines CorporationInventors: Hernan A. Cunico, Asima Silva
-
Patent number: 9898170Abstract: An approach is provided for automatically generating user-specific interaction modes for processing question and answers at the information handling system by receiving a question from a user, extracting user context parameters identifying a usage scenario for the user, identifying first input and output presentation modes for the user based on the extracted user context parameters, monitoring user interaction with the system in relation to the question, and adjusting the first input and output presentation modes based on the extracted user context parameters and detected user interaction with the system.Type: GrantFiled: December 10, 2014Date of Patent: February 20, 2018Assignee: International Business Machines CorporationInventors: John P. Bufe, Donna K. Byron, Mary D. Swift, Timothy Winkler
-
Patent number: 8635066Abstract: Methods, system, and articles are described herein for receiving an audio input and a facial image sequence for a period of time, in which the audio input includes speech input from multiple speakers. The audio input is extracted based on the received facial image sequence to extract a speech input of a particular speaker.Type: GrantFiled: April 14, 2010Date of Patent: January 21, 2014Assignee: T-Mobile USA, Inc.Inventor: Andrew R. Morrison
-
Publication number: 20110099013Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.Type: ApplicationFiled: October 23, 2009Publication date: April 28, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Dan MELAMED, Srinivas Bangalore, Michael Johnston
-
Publication number: 20100250250Abstract: A hybrid text generator is disclosed that generates a hybrid text string from multiple text strings that are produced from an audio input by multiple automated speech recognition systems. The hybrid text generator receives metadata that describes a time-location that each word from the multiple text strings is located in the audio input. The hybrid text generator matches words between the multiple text strings using the metadata and generates a hybrid text string that includes the matched words. The hybrid text generator utilizes confidence scores associated with words that do not match between the multiple text strings to determine whether to add an unmatched word to the hybrid text string.Type: ApplicationFiled: March 29, 2010Publication date: September 30, 2010Inventor: Jonathan Wiggs
-
Publication number: 20100185447Abstract: Embodiments are provided for selecting and utilizing multiple recognizers to process an utterance based on a markup language document. The markup language document and an utterance are received in a computing device. One or more recognizers are selected from among the multiple recognizers for returning a results set for the utterance based on markup language in the markup language document. The results set is received from the one or more selected recognizers in a format determined by a processing method specified in the markup language document. An event is then executed on the computing device in response to receiving the results set.Type: ApplicationFiled: January 22, 2009Publication date: July 22, 2010Applicant: Microsoft CorporationInventors: Andrew K. Krumel, Pierre-Alexandre F. Masse, Joseph A. Ruff
-
Publication number: 20090326941Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.Type: ApplicationFiled: September 4, 2009Publication date: December 31, 2009Inventor: Mark Catchpole
-
Publication number: 20090171662Abstract: The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.Type: ApplicationFiled: December 27, 2007Publication date: July 2, 2009Applicant: SEHDA, INC.Inventors: Jun Huang, Yookyung Kim, Youssef Billawala, Farzad Ehsani, Demitrios Master
-
Publication number: 20090048838Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.Type: ApplicationFiled: May 29, 2008Publication date: February 19, 2009Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
-
Publication number: 20090006093Abstract: A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.Type: ApplicationFiled: June 29, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventor: Amitava Das
-
Publication number: 20080300880Abstract: This application discloses A multi-lingual output device for output of transactional information for a given customer, the device that includes a data base for determining what transaction information needs to be outputted, the local language in which the information is to be outputted, and the preferred language of the customer in which the information is to be outputted; and, a local transaction subsystem in communication with said database, wherein said local transaction sub system includes input device receiving means for accepting an input device and output generating means for generating a signal to an output device.Type: ApplicationFiled: January 22, 2008Publication date: December 4, 2008Inventor: Lawrence Stephen Gelbman
-
Publication number: 20080103758Abstract: A method for language translation of a toolkit menu is provided, which includes receiving, by a Subscriber Identity Module (SIM) toolkit module, the toolkit menu from a SIM card module, determining, by the SIM toolkit module, whether a language of the toolkit menu matches a user-defined language, and translating, by the SIM toolkit module, the language of the toolkit menu into the user-defined language, if the language of the toolkit menu is different from the user defined language.Type: ApplicationFiled: October 25, 2007Publication date: May 1, 2008Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventor: Suraparaju VENKATESWARLU
-
Patent number: RE42868Abstract: A method and apparatus accesses a database where entries are linked to at least two sets of patterns. One or more patterns of a first set of patterns are recognized within a received signal. The recognized patterns are used to identify entries and compile a list of patterns in a second set of patterns to which those entries are also linked. The list is then used to recognize a second received signal. The received signals may, for example, be voice signals or signals indicating the origin or destination of the received signals.Type: GrantFiled: October 25, 1995Date of Patent: October 25, 2011Assignee: Cisco Technology, Inc.Inventors: David J. Attwater, Steven J. Whittaker, Francis J. Scahill, Alison D. Simons