Speech Recognition (epo) Patents (Class 704/E15.001)

E Subclasses

Assessment or evaluation of speech recognition systems (epo) (Class 704/E15.002)

Language recognition (epo) (Class 704/E15.003)

Feature extraction for speech recognition; selection of recognition unit (epo) (Class 704/E15.004)

Segmentation or word limit detection (epo) (Class 704/E15.005)

Word boundary detection (EPO) (Class 704/E15.006)

Creation of reference templates; training of speech recognition systems, e.g., adaption to the characteristics of the speaker's voice, etc. (epo) (Class 704/E15.007)

Speech classification or search (epo) (Class 704/E15.014)

Speech recognition techniques for robustness in adverse environments, e.g., in noise, of stress induced speech, etc. (epo) (Class 704/E15.039)

Procedures used during a speech recognition process, e.g., man-machine dialogue, etc. (epo) (Class 704/E15.04)

Speech recognition using nonacoustical features, e.g., position of the lips, etc. (epo) (Class 704/E15.041)

Using position of the lips, movement of the lips, or face analysis (EPO) (Class 704/E15.042)

Speech to text systems (epo) (Class 704/E15.043)

Constructional details of speech recognition systems (epo) (Class 704/E15.046)

MULTI-LAYERED SPEECH RECOGNITION APPARATUS AND METHOD

Publication number: 20120232893

Abstract: A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2?n?N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n?1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.

Type: Application

Filed: May 23, 2012

Publication date: September 13, 2012

Applicant: Samsung Electronics Co., Ltd.

Inventors: Jaewon Lee, Jeongmi Cho
SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS

Publication number: 20120232885

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.

Type: Application

Filed: March 8, 2011

Publication date: September 13, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Luciano De Andrade BARBOSA, Srinivas BANGALORE
MOTION-BASED VOICE ACTIVITY DETECTION

Publication number: 20120226498

Abstract: Motion-based voice activity detection may be provided. A data stream may be received and a determination may be made whether at least one non-audio element associated with the data stream indicates that the data stream comprises speech. In response to determining that the at least one non-audio element associated with the data stream indicates that the data stream comprises speech, a speech to text conversion may be performed on at least one audio element associated with the data stream.

Type: Application

Filed: March 2, 2011

Publication date: September 6, 2012

Applicant: MICROSOFT CORPORATION

Inventor: Remi Ken-Sho Kwan
COMPUTERIZED INFORMATION PRESENTATION APPARATUS

Publication number: 20120223899

Abstract: A computerized information system useful for providing directions and other information to a user. In one embodiment, the apparatus comprises a processor and network interface and computer readable medium having at least one computer program disposed thereon, the at least one program being configured to receive inputs from the user regarding locations or entities, and provide directions and/or advertising related content. At least a portion of the information is obtained via the network interface from a remote server.

Type: Application

Filed: February 24, 2012

Publication date: September 6, 2012

Inventor: Robert F. Gazdzinski
SOUND RECOGNITION METHOD AND SYSTEM

Publication number: 20120226497

Abstract: A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.

Type: Application

Filed: February 13, 2012

Publication date: September 6, 2012

Applicant: QUALCOMM Incorporated

Inventors: Kisun You, Kyu Woong Hwang, Taesu Kim
NETWORK APPARATUS AND METHODS FOR USER INFORMATION DELIVERY

Publication number: 20120221412

Abstract: A network apparatus useful for providing directions and other information to a user of a client device in wireless communication therewith. In one embodiment, the apparatus includes one or more wireless interfaces and a network interface for communication with a server. User speech inputs in the form of digitized representations are received by the apparatus and used by the server as the basis for retrieving information including graphical representations of location or entities that the user wishes to find.

Type: Application

Filed: March 1, 2012

Publication date: August 30, 2012

Inventor: Robert F. Gazdzinski
SECURITY SYSTEM AND METHOD

Publication number: 20120221334

Abstract: A security system and method includes setting operation steps having a preset sequence, a trigger signal and a testing parameter for each of the operation steps, and a range of each of the testing parameters. The method further confirms a current operation step when a testing device starts a test process. If an output signal received from a sensing device is not identical to the trigger signal of the current operation step, a voice content file of the current operation step is sent to the voice output device. If a value of the output parameter read from the sensing device is not within the range of the testing parameter, a voice prompt file of the testing parameter is sent to the voice output device. After sending the voice content file or the voice prompt file, an abnormality processing command of the current operation step is sent to the testing device to stop the test process.

Type: Application

Filed: August 2, 2011

Publication date: August 30, 2012

Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD.

Inventor: HONG-RU ZHU
LEVERAGING SPEECH RECOGNIZER FEEDBACK FOR VOICE ACTIVITY DETECTION

Publication number: 20120221330

Abstract: A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames.

Type: Application

Filed: February 25, 2011

Publication date: August 30, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Albert Joseph Kishan Thambiratnam, Weiwu Zhu, Frank Torsten Bernd Seide
METHOD AND APPARATUS FOR AUTOMATIC CORRELATION OF MULTI-CHANNEL INTERACTIONS

Publication number: 20120215535

Abstract: A method and apparatus for multi-channel categorization, comprising capturing a vocal interaction and a non-vocal interaction, using logging or capturing devices; retrieving a first word from the vocal interaction and a second word from the non-vocal interaction; assigning the vocal interaction into a first category using the first word; assigning the non-vocal interaction into a second category using the second word; and associating the first category and the second category into a multi-channel category, thus aggregating the vocal interaction and the non-vocal interaction.

Type: Application

Filed: February 23, 2011

Publication date: August 23, 2012

Applicant: Nice Systems Ltd.

Inventors: Moshe WASSERBLAT, Omer Gazit
HYBRIDIZED CLIENT-SERVER SPEECH RECOGNITION

Publication number: 20120215539

Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.

Type: Application

Filed: February 22, 2012

Publication date: August 23, 2012

Inventor: Ajay Juneja
COMPUTERIZED INFORMATION PRESENTATION APPARATUS

Publication number: 20120215544

Abstract: A computerized information apparatus useful for providing directions and other information to a user. In one embodiment, the apparatus comprises a processor and network interface and computer readable medium having at least one computer program disposed thereon, the at least one program being configured to receive a speech input from the user regarding an organization or entities, and provide a graphic or visual representation of the organization or entity to aid them in finding the organization or entity. At least a portion of the information is obtained via the network interface from a remote server.

Type: Application

Filed: February 24, 2012

Publication date: August 23, 2012

Inventor: Robert F. Gazdzinski
Adding Speech Capabilities to Existing Computer Applications with Complex Graphical User Interfaces

Publication number: 20120215543

Abstract: At design time of a graphical user interface (GUI), a software component (VUIcontroller) is added to the GUI. At run time of the GUI, the VUIcontroller analyzes the GUI from within a process that executes the GUI. From this analysis, the VUIcontroller automatically generates a voice command set, such as a speech-recognition grammar, that corresponds to controls of the GUI. The generated voice command set is made available to a speech recognition engine, thereby speech-enabling the GUI. Optionally, a GUI designer may add properties to ones of the GUI controls at GUI design time, without necessarily writing a voice command set. These properties, if specified, are then used at GUI run time to control or influence the analysis of the GUI and the automatic generation of the voice command set.

Type: Application

Filed: April 26, 2011

Publication date: August 23, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Mert Öz, Herwig Häle, Attila Muszta, Andreas Neubacher, Peter Kozma
Increased User Interface Responsiveness for System with Multi-Modal Input and High Response Latencies

Publication number: 20120215531

Abstract: A multi-modal user interface with increased responsiveness is described. A graphical user interface (GUI) supports multiple different user input modalities including low delay inputs which respond to user inputs without significant delay, and high latency inputs which have a significant response latency after receiving a user input before providing a corresponding completed response. The GUI accepts user inputs in a sequence of mixed input modalities independently of response latencies without waiting for responses to high latency inputs. The GUI also provides interim indication during response latencies of pending responses at a position in the GUI where the completed response will be presented.

Type: Application

Filed: February 18, 2011

Publication date: August 23, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Gerhard Grobauer, Andreas Neubacher, Miklós Pápai
METHOD AND APPARATUS FOR INFORMATION EXTRACTION FROM INTERACTIONS

Publication number: 20120209606

Abstract: Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.

Type: Application

Filed: February 14, 2011

Publication date: August 16, 2012

Applicant: Nice Systems Ltd.

Inventors: Maya Gorodetsky, Ezra Daya, Oren Pereg
ACOUSTIC VOICE ACTIVITY DETECTION

Publication number: 20120209603

Abstract: Techniques for acoustic voice activity detection (AVAD) is described, including detecting a signal associated with a subband from a microphone, performing an operation on data associated with the signal, the operation generating a value associated with the subband, and determining whether the value distinguishes the signal from noise by using the value to determine a signal-to-noise ratio and comparing the value to a threshold.

Type: Application

Filed: January 9, 2012

Publication date: August 16, 2012

Inventor: Zhinian Jing
Voice-operated control circuit and method for using same

Publication number: 20120203558

Abstract: A voice-operated control circuit and method for using the voice-operated control circuit in connection with a toy vehicle. The voice-operated control circuit contains an audio detector, such as a microphone, to detect audible sound signals, and an integrated circuit that determines the duration of the audible sound signals received by the audio detector. At a user-defined time and based on the audible sound signals received, the integrated circuit determines and controls the duration of operation of various components of the toy vehicle, such as a motor, lights and/or sounds.

Type: Application

Filed: February 4, 2011

Publication date: August 9, 2012

Inventors: Ryohei Tanaka, Christy Marie Torres
MAPPING OBSTRUENT SPEECH ENERGY TO LOWER FREQUENCIES

Publication number: 20120197643

Abstract: A speech signal processing system and method which uses the following steps: (a) receiving an utterance from a user via a microphone that converts the utterance into a speech signal; and (b) pre-processing the speech signal using a processor. The pre-processing step includes extracting acoustic data from the received speech signal, determining from the acoustic data whether the utterance includes one or more obstruents; estimating speech energy from higher frequencies associated with the identified obstruents, and mapping the estimated speech energy to lower frequencies.

Type: Application

Filed: January 27, 2011

Publication date: August 2, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING SYSTEM, AND PROGRAM

Publication number: 20120197644

Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.

Type: Application

Filed: January 30, 2012

Publication date: August 2, 2012

Applicant: International Business Machines Corporation

Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
CONSONANT-SEGMENT DETECTION APPARATUS AND CONSONANT-SEGMENT DETECTION METHOD

Publication number: 20120197641

Abstract: A signal portion is extracted from an input signal for each frame having a specific duration to generate a per-frame input signal. The per-frame input signal in a time domain is converted into a per-frame input signal in a frequency domain, thereby generating a spectral pattern. Subband average energy is derived in each of subbands adjacent one another in the spectral pattern. The subband average energy is compared in at least one subband pair of a first subband and a second subband that is a higher frequency band than the first subband, the first and second subbands being consecutive subbands in the spectral pattern. It is determined that the per-frame input signal includes a consonant segment if the subband average energy of the second subband is higher than the subband average energy of the first subband.

Type: Application

Filed: February 1, 2012

Publication date: August 2, 2012

Applicant: JVC KENWOOD Corporation

Inventors: Akiko Akechi, Takaaki Yamabe
SYSTEM AND METHODS FOR MATCHING AN UTTERANCE TO A TEMPLATE HIERARCHY

Publication number: 20120191453

Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.

Type: Application

Filed: March 30, 2012

Publication date: July 26, 2012

Applicant: Cyberpulse L.L.C.

Inventors: James ROBERGE, Jeffrey Soble
System and Method for an Endpoint Detection of Speech for Improved Speech Recognition in Noisy Environments

Publication number: 20120191455

Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.

Type: Application

Filed: April 3, 2012

Publication date: July 26, 2012

Applicant: WIAV SOLUTIONS LLC

Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh
SPEECH RECOGNITION USING DOCK CONTEXT

Publication number: 20120191449

Abstract: Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.

Type: Application

Filed: September 30, 2011

Publication date: July 26, 2012

Applicant: GOOGLE INC.

Inventors: Matthew I. LLOYD, Pankaj RISBOOD
SYSTEM AND METHOD FOR NOISE REDUCTION IN PROCESSING SPEECH SIGNALS BY TARGETING SPEECH AND DISREGARDING NOISE

Publication number: 20120191450

Abstract: A system and method for processing a speech signal delivered in a noisy channel or with ambient noise that focuses on a subset of harmonics that are least corrupted by noise, that disregards the signal harmonics with low signal-to-noise ratio(s), and that disregards amplitude modulations inconsistent with speech.

Type: Application

Filed: July 27, 2010

Publication date: July 26, 2012

Inventor: Mark Pinson
GESTURE PROCESSING

Publication number: 20120188164

Abstract: Presented is method and system for processing a gesture performed by a user of a first input device. The method comprises detecting the gesture and detecting a user-provided parameter for disambiguating the gesture. A user command is then determined based on the detected gesture and the detected parameter.

Type: Application

Filed: October 16, 2009

Publication date: July 26, 2012

Inventors: Prasenjit Dey, Sriganesh Madhvanath, Ramadevi Vennelakanti, Rahul Ajmera
METHOD AND SYSTEM FOR CANDIDATE MATCHING

Publication number: 20120185251

Abstract: A method and system for candidate matching, such as used in match-making services, assesses narrative responses to measure candidate qualities. A candidate database includes self-assessment data and narrative data. Narrative data concerning a defined topic is analyzed to determine candidate qualities separate from topical information. Candidate qualities thus determined are included in candidate profiles and used to identify desirable candidates.

Type: Application

Filed: March 26, 2012

Publication date: July 19, 2012

Applicant: HOSHIKO LLC

Inventor: Gary Stephen Shuster
Auditory spacing of sound sources based on geographic locations of the sound sources or user placement

Patent number: 8224395

Abstract: A method may include connecting to another user device, identifying a geographic location of the other user device, identifying a geographic location of the user device, mapping a sound source associated with the other user device, based on the geographic location of the other user device with respect to the geographic location of the user device, to a location of an auditory space associated with a user of the user device, placing the sound source in the location of the auditory space, and emitting, based on the placing, the sound source so that the sound source is capable of being perceived by the user in the location of the auditory space.

Type: Grant

Filed: April 24, 2009

Date of Patent: July 17, 2012

Assignee: Sony Mobile Communications AB

Inventors: Ted Moller, Ian Rattigan
USER INTENTION BASED ON N-BEST LIST OF RECOGNITION HYPOTHESES FOR UTTERANCES IN A DIALOG

Publication number: 20120179467

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.

Type: Application

Filed: March 20, 2012

Publication date: July 12, 2012

Applicant: AT&T Intellectual Property I, L. P.

Inventor: Jason Williams
SIMULTANEOUS VOICE AND DATA SYSTEMS FOR SECURE CATALOG ORDERS

Publication number: 20120179470

Abstract: Systems and methods for providing a simultaneous voice and data user interface for secure catalog orders and in particular for providing a system and method for providing a distributed voice user interface for a remote device having a limited visual user interface simultaneously with a data stream for facilitating secure automated catalog orders for simultaneous electronic fulfillment applied to that device are described.

Type: Application

Filed: March 19, 2012

Publication date: July 12, 2012

Applicant: Pitney Bowes Inc.

Inventors: Jeffrey D. Pierce, G. Jonathan Wolfman, Luu T. Pham, Thomas J. Foth, George M. Macdonald
CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

Publication number: 20120179463

Abstract: Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.

Type: Application

Filed: January 6, 2012

Publication date: July 12, 2012

Applicant: Nuance Communications, Inc.

Inventors: Michael Newman, Anthony Gillet, David Mark Krowitz, Michael D. Edgington
CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS

Publication number: 20120179464

Abstract: Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.

Type: Application

Filed: January 6, 2012

Publication date: July 12, 2012

Applicant: Nuance Communications, Inc.

Inventors: Michael Newman, Anthony Gillet, David Mark Krowitz, Michael D. Edgington
APPARATUS AND METHOD FOR VOICE COMMAND RECOGNITION BASED ON A COMBINATION OF DIALOG MODELS

Publication number: 20120173244

Abstract: Provided are a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, by combining a rule based dialog model and a statistical dialog model rule. The voice command recognition apparatus includes a command intention determining unit configured to correct an error in recognizing a voice command of a user, and an application processing unit configured to check whether the final command intention determined in the command intention determining unit comprises the input factors for execution of an application.

Type: Application

Filed: September 26, 2011

Publication date: July 5, 2012

Inventors: Byung-Kwan Kwak, Chi-Youn Park, Jeong-Su Kim, Jeong-Mi Cho
COMMUNICATION METHOD AND APPARATUS FOR PHONE HAVING VOICE RECOGNITION FUNCTION

Publication number: 20120173233

Abstract: A method and apparatus for communicating through a phone having a voice recognition function are provided. The method of performing communication using a phone having a voice recognition function includes converting to an incoming call notification and voice recognition mode when a phone call is received; converting to a communication connection and speakerphone mode when voice information related to a communication connection instruction is recognized; performing communication using a speakerphone; and ending communication when voice information related to a communication end instruction is recognized during communication using the speakerphone. Therefore, when a phone call is received, a mode of a phone is converted to a speakerphone mode with a voice instruction using a voice recognition function, and thus communication can be performed without using a hand.

Type: Application

Filed: March 2, 2012

Publication date: July 5, 2012

Inventor: Joung-Min Seo
Remote Control Audio Link

Publication number: 20120173238

Abstract: One embodiment may take the form of a voice control system. The system may include a first apparatus with a processing unit configured to execute a voice recognition module and one or more executable commands, and a receiver coupled to the processing unit and configured to receive a first audio file from a remote control device. The first audio file may include at least one voice command. The first apparatus may further include a communication component coupled to the processing unit and configured to receive programming content, and one or more storage media storing the voice recognition module. The voice recognition module may be configured to convert voice commands into text.

Type: Application

Filed: January 28, 2011

Publication date: July 5, 2012

Applicant: EchoStar Technologies L.L.C.

Inventors: Jeremy Mickelsen, Nathan A. Hale, Benjamin Mauser, David A. Innes, Brad Bylund
SPEECH TRANSLATION SYSTEM, DICTIONARY SERVER, AND PROGRAM

Publication number: 20120166176

Abstract: A conventional speech recognition dictionary, translation dictionary and speech synthesis dictionary used in speech translation have inconsistencies.

Type: Application

Filed: March 3, 2010

Publication date: June 28, 2012

Inventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
APPARATUS AND METHOD FOR SPEECH RECOGNITION, AND TELEVISION EQUIPPED WITH APPARATUS FOR SPEECH RECOGNITION

Publication number: 20120162540

Abstract: An embodiment of an apparatus for speech recognition includes a speech input unit configured to acquire acoustic signal, a first recognition unit configured to recognize the acoustic signal, a communication unit configured to communicate with an external server, a second recognition unit configured to recognize the acoustic signal by utilizing the external server via the communication unit, and a remote signal input unit configured to acquire a control signal from a remote controller. The switching unit is configured to switch between the first recognition unit and the second recognition unit for recognizing the acoustic signal in response to a start trigger. The switching unit selects the second recognition unit when the start trigger is detected from the control signal, and the switching unit selects the first recognition unit when the start trigger is detected from the acoustic signal.

Type: Application

Filed: December 21, 2011

Publication date: June 28, 2012

Applicant: Kabushiki Kaisha Toshiba

Inventors: Kazushige Ouchi, Miwako Doi
Selective Transmission of Voice Data

Publication number: 20120166184

Abstract: Systems and methods that provide for voice command devices that receive sound but do not transfer the voice data beyond the system unless certain voice-filtering criteria have been met are described herein. In addition, embodiments provide devices that support voice command operation while external voice data transmission is in mute operation mode. As such, devices according to embodiments may process voice data locally responsive to the voice data matching voice-filtering criteria. Furthermore, systems and methods are described herein involving voice command devices that capture sound and analyze it in real-time on a word-by-word basis and decide whether to handle the voice data locally, transmit it externally, or both.

Type: Application

Filed: December 23, 2010

Publication date: June 28, 2012

Applicant: Lenovo (Singapore) Pte. Ltd.

Inventors: Howard Locker, Daryl Cromer, Scott Edwards Kelso, Aaron Michael Stewart
SAMPLE CLUSTERING TO REDUCE MANUAL TRANSCRIPTIONS IN SPEECH RECOGNITION SYSTEM

Publication number: 20120158399

Abstract: Techniques for grouping a plurality of samples automatically transcribed from a plurality of utterances. The method comprises forming clusters from the plurality of samples, wherein the clusters include two or more of the plurality of samples. One or more samples are selected from a cluster and manually-processed data samples for the one or more samples are obtained. A weighting factor may be assigned to the data samples based, at least in part, on the number of samples in the cluster associated with the selected data sample.

Type: Application

Filed: December 21, 2010

Publication date: June 21, 2012

Inventors: Real Tremblay, Jerome Tremblay, Alina Andreevskaia
METHOD FOR ESTIMATING LANGUAGE MODEL WEIGHT AND SYSTEM FOR THE SAME

Publication number: 20120150539

Abstract: Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.

Type: Application

Filed: December 13, 2011

Publication date: June 14, 2012

Applicant: Electronics and Telecommunications Research Institute

Inventors: Hyung Bae Jeon, Yun Keun Lee, Eui Sok Chung, Jong Jin Kim, Hoon Chung, Jeon Gue Park, Ho Young Jung, Byung Ok Kang, Ki Young Park, Sung Joo Lee, Jeom Ja Kang, Hwa Jeon Song
METHOD AND APPARATUS FOR DETECTING UNSOLICITED MULTIMEDIA COMMUNICATIONS

Publication number: 20120150540

Abstract: A service for searching for unsolicited communications is provided. For example, the service may inspect e-mail messages, instant messaging messages, facsimile transmissions, voice communications, and video telephony, and analyze these communications to determine whether an intended communication is unsolicited. In connection with voice and video telephony, a voice sample may be obtained from the caller and voice recognition may be performed on the sample to determine an identity of the person or the voice. The voice sample may also be used to determine the type of voice—i.e., if the voice is live, machine generated, or prerecorded. Where the call is a video telephony call, image recognition may be used to inspect an image of the person. The information obtained from voice recognition, voice type recognition, and image recognition may be used to detect whether the messages if from a known source of unsolicited communications.

Type: Application

Filed: February 20, 2012

Publication date: June 14, 2012

Applicant: ROCKSTAR BIDCO, LP

Inventors: Samir Srivastava, Francois Audet, Vibhu Vivek
MODEL RESTRUCTURING FOR CLIENT AND SERVER BASED AUTOMATIC SPEECH RECOGNITION

Publication number: 20120150536

Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.

Type: Application

Filed: December 9, 2010

Publication date: June 14, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
WORD RECOGNITION APPARATUS, WORD RECOGNITION METHOD, NON-TRANSITORY COMPUTER READABLE MEDIUM STORING WORD RECOGNITION PROGRAM, AND DELIVERY ITEM SORTING APPARATUS

Publication number: 20120140979

Abstract: To improve the recognition accuracy even when a plurality of words including a matched portion are selected as candidates for recognition. A word recognition apparatus according to the present invention includes input means for inputting a word image representing a plurality of characters; word candidate selection means for recognizing the word image input by the input means and selecting a first word candidate and a second word candidate based on a plurality of words registered in a word dictionary; and verification means for comparing the first word candidate and the second word candidate character by character and verifying a likelihood of the first word candidate based on an evaluation value obtained when the word image is recognized by characters determined as unmatched.

Type: Application

Filed: June 14, 2010

Publication date: June 7, 2012

Applicant: NEC Corporation

Inventor: Daisuke Nishiwaki
SYSTEMS AND METHODS FOR TWO STREAM INDEXING OF AUDIO CONTENT

Publication number: 20120136870

Abstract: Systems and methods provide for indexing audio content by fusing the indexes derived from a keyword stream and a large vocabulary stream search. For example, systems and methods provide for two stream searching of Spoken Web VoiceSites, wherein metadata is extracted from the VoiceSite and is used to determine a set of keywords for high precision search while a traditional standard vocabulary set is used to perform a high results, low precision search. The results of the keyword search and the standard vocabulary search are fused together to form a comprehensive, ranked list of results.

Type: Application

Filed: November 30, 2010

Publication date: May 31, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Anupam Joshi, Sougata Mukherjea, Nitendra Rajput
APPARATUS AND METHOD FOR PREPROCESSING SPEECH SIGNALS

Publication number: 20120136659

Abstract: Disclosed herein are an apparatus and method for preprocessing speech signals to perform speech recognition. The apparatus includes a voiced sound interval detection unit, a preprocessing method determination unit, and a clipping signal processing unit. The voiced sound interval detection unit detects a voiced sound interval including a voiced sound signal in a voice interval. The preprocessing method determination unit detects a clipping signal present in the voiced sound interval. The clipping signal processing unit extracts signal samples adjacent to the clipping signal, and performs interpolation on the clipping signal using the adjacent signal samples.

Type: Application

Filed: November 22, 2011

Publication date: May 31, 2012

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Byung-Ok Kang, Hwa-Jeon Song, Ho-Young Jung, Sung-Joo Lee, Jeon-Gue Park, Yun-Keun Lee
ONLINE DISTORTED SPEECH ESTIMATION WITHIN AN UNSCENTED TRANSFORMATION FRAMEWORK

Publication number: 20120130710

Abstract: Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.

Type: Application

Filed: November 18, 2010

Publication date: May 24, 2012

Applicant: Microsoft Corporation

Inventors: Deng Li, Jinyu Li, Dong Yu, Yifan Gong
SPEECH DETERMINATION APPARATUS AND SPEECH DETERMINATION METHOD

Publication number: 20120130711

Abstract: A signal portion per frame is extracted from an input signal, thus generating a per-frame signal. The per-frame signal in the time domain is converted into a per-frame signal in the frequency domain, thereby generating a spectral pattern of spectra. It is determined whether an energy ratio is higher than a threshold level. The energy ratio is a ratio of each spectral energy to subband energy in a subband that involves the spectrum. The subband is involved in subbands into which a frequency band is separated with a specific bandwidth. It is determined whether the per-frame signal is a speech segment, based on a result of the determination. Average energy is derived in the frequency direction for the spectra in the spectral pattern in each subband. Subband energy is derived per subband by averaging the average energy in the time domain.

Type: Application

Filed: November 22, 2011

Publication date: May 24, 2012

Applicant: JVC KENWOOD Corporation a corporation of Japan

Inventor: Takaaki YAMABE
SYSTEMS, METHODS, AND APPARATUS FOR VOICE ACTIVITY DETECTION

Publication number: 20120130713

Abstract: Systems, methods, apparatus, and machine-readable media for voice activity detection in a single-channel or multichannel audio signal are disclosed.

Type: Application

Filed: October 24, 2011

Publication date: May 24, 2012

Applicant: QUALCOMM Incorporated

Inventors: Jongwon Shin, Erik Visser, Ian Ernan Liu
MOBILE TERMINAL AND MENU CONTROL METHOD THEREOF

Publication number: 20120130712

Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.

Type: Application

Filed: January 31, 2012

Publication date: May 24, 2012

Inventors: Jong-Ho SHIN, Jae-Do Kwak, Jong-Keun Youn
MOBILE TERMINAL AND CALL CONTENT MANAGEMENT METHOD THEREOF

Publication number: 20120129576

Abstract: A method for operating a mobile terminal, and which includes performing voice recognition on call content to produce recognized call content, converting the recognized call content into one or more units of character information, registering the one or more units of character information to one or more particular functions of the mobile terminal based on a type of the character information or a field of the character information, inputting a search parameter, searching one of a plurality of file types and identifying a file related to both the search parameter and the one or more registered units of character information, and displaying or automatically executing the identified file.

Type: Application

Filed: February 2, 2012

Publication date: May 24, 2012

Inventors: In-Jik LEE, Sun-Hwa CHA, Jae-Do KWAK
SYSTEMS AND METHODS FOR EDITING TELECOM WEB APPLICATIONS THROUGH A VOICE INTERFACE

Publication number: 20120123783

Abstract: Systems and associated methods for editing telecom web applications through a voice interface are described. Systems and methods provide for editing telecom web applications over a connection, as for example accessed via a standard phone, using speech and/or DTMF inputs. The voice based editing includes exposing an editing interface to a user for a telecom web application that is editable, dynamically generating a voice-based interface for a given user for accomplishing editing tasks, and modifying the telecom web application to reflect the editing commands entered by the user.

Type: Application

Filed: November 17, 2010

Publication date: May 17, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sheetal K. Agarwal, Arun Kumar, Priyanka Manwani
Automated speech translation system using human brain language areas comprehension capabilities

Patent number: H2269

Abstract: The present invention is an “Automated Speech Translation System using Human Brain Language Areas Comprehension Capabilities”. It discloses a method to address the most common variation in the world, which is communication gap between people of different ethnicity. Imagine a world where we can communicate with our natural language to everyone without the need of human translators, interpreters, hand-held device and language translation books. In order to facilitate language translation, this present invention recognizes the speech in voice pitches, collects the language comprehensive information from each recipient's brain language areas within the audible range and sends it to “voice processing center” for analyzing. Then, it translates the collected voice pitches of speech to natural language of recipient(s) by using language dictionaries database.

Type: Grant

Filed: November 20, 2009

Date of Patent: June 5, 2012

Inventor: Johnson Manuel-Devadoss (Johnson Smith)

prev 1 2 3 4 5 6 7 8 9 … next