Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20120232893
    Abstract: A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2?n?N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n?1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.
    Type: Application
    Filed: May 23, 2012
    Publication date: September 13, 2012
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jaewon Lee, Jeongmi Cho
  • Publication number: 20120232885
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.
    Type: Application
    Filed: March 8, 2011
    Publication date: September 13, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Luciano De Andrade BARBOSA, Srinivas BANGALORE
  • Publication number: 20120226498
    Abstract: Motion-based voice activity detection may be provided. A data stream may be received and a determination may be made whether at least one non-audio element associated with the data stream indicates that the data stream comprises speech. In response to determining that the at least one non-audio element associated with the data stream indicates that the data stream comprises speech, a speech to text conversion may be performed on at least one audio element associated with the data stream.
    Type: Application
    Filed: March 2, 2011
    Publication date: September 6, 2012
    Applicant: MICROSOFT CORPORATION
    Inventor: Remi Ken-Sho Kwan
  • Publication number: 20120223899
    Abstract: A computerized information system useful for providing directions and other information to a user. In one embodiment, the apparatus comprises a processor and network interface and computer readable medium having at least one computer program disposed thereon, the at least one program being configured to receive inputs from the user regarding locations or entities, and provide directions and/or advertising related content. At least a portion of the information is obtained via the network interface from a remote server.
    Type: Application
    Filed: February 24, 2012
    Publication date: September 6, 2012
    Inventor: Robert F. Gazdzinski
  • Publication number: 20120226497
    Abstract: A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.
    Type: Application
    Filed: February 13, 2012
    Publication date: September 6, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Kisun You, Kyu Woong Hwang, Taesu Kim
  • Publication number: 20120221412
    Abstract: A network apparatus useful for providing directions and other information to a user of a client device in wireless communication therewith. In one embodiment, the apparatus includes one or more wireless interfaces and a network interface for communication with a server. User speech inputs in the form of digitized representations are received by the apparatus and used by the server as the basis for retrieving information including graphical representations of location or entities that the user wishes to find.
    Type: Application
    Filed: March 1, 2012
    Publication date: August 30, 2012
    Inventor: Robert F. Gazdzinski
  • Publication number: 20120221334
    Abstract: A security system and method includes setting operation steps having a preset sequence, a trigger signal and a testing parameter for each of the operation steps, and a range of each of the testing parameters. The method further confirms a current operation step when a testing device starts a test process. If an output signal received from a sensing device is not identical to the trigger signal of the current operation step, a voice content file of the current operation step is sent to the voice output device. If a value of the output parameter read from the sensing device is not within the range of the testing parameter, a voice prompt file of the testing parameter is sent to the voice output device. After sending the voice content file or the voice prompt file, an abnormality processing command of the current operation step is sent to the testing device to stop the test process.
    Type: Application
    Filed: August 2, 2011
    Publication date: August 30, 2012
    Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD.
    Inventor: HONG-RU ZHU
  • Publication number: 20120221330
    Abstract: A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames.
    Type: Application
    Filed: February 25, 2011
    Publication date: August 30, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Albert Joseph Kishan Thambiratnam, Weiwu Zhu, Frank Torsten Bernd Seide
  • Publication number: 20120215535
    Abstract: A method and apparatus for multi-channel categorization, comprising capturing a vocal interaction and a non-vocal interaction, using logging or capturing devices; retrieving a first word from the vocal interaction and a second word from the non-vocal interaction; assigning the vocal interaction into a first category using the first word; assigning the non-vocal interaction into a second category using the second word; and associating the first category and the second category into a multi-channel category, thus aggregating the vocal interaction and the non-vocal interaction.
    Type: Application
    Filed: February 23, 2011
    Publication date: August 23, 2012
    Applicant: Nice Systems Ltd.
    Inventors: Moshe WASSERBLAT, Omer Gazit
  • Publication number: 20120215539
    Abstract: A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.
    Type: Application
    Filed: February 22, 2012
    Publication date: August 23, 2012
    Inventor: Ajay Juneja
  • Publication number: 20120215544
    Abstract: A computerized information apparatus useful for providing directions and other information to a user. In one embodiment, the apparatus comprises a processor and network interface and computer readable medium having at least one computer program disposed thereon, the at least one program being configured to receive a speech input from the user regarding an organization or entities, and provide a graphic or visual representation of the organization or entity to aid them in finding the organization or entity. At least a portion of the information is obtained via the network interface from a remote server.
    Type: Application
    Filed: February 24, 2012
    Publication date: August 23, 2012
    Inventor: Robert F. Gazdzinski
  • Publication number: 20120215543
    Abstract: At design time of a graphical user interface (GUI), a software component (VUIcontroller) is added to the GUI. At run time of the GUI, the VUIcontroller analyzes the GUI from within a process that executes the GUI. From this analysis, the VUIcontroller automatically generates a voice command set, such as a speech-recognition grammar, that corresponds to controls of the GUI. The generated voice command set is made available to a speech recognition engine, thereby speech-enabling the GUI. Optionally, a GUI designer may add properties to ones of the GUI controls at GUI design time, without necessarily writing a voice command set. These properties, if specified, are then used at GUI run time to control or influence the analysis of the GUI and the automatic generation of the voice command set.
    Type: Application
    Filed: April 26, 2011
    Publication date: August 23, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Mert Öz, Herwig Häle, Attila Muszta, Andreas Neubacher, Peter Kozma
  • Publication number: 20120215531
    Abstract: A multi-modal user interface with increased responsiveness is described. A graphical user interface (GUI) supports multiple different user input modalities including low delay inputs which respond to user inputs without significant delay, and high latency inputs which have a significant response latency after receiving a user input before providing a corresponding completed response. The GUI accepts user inputs in a sequence of mixed input modalities independently of response latencies without waiting for responses to high latency inputs. The GUI also provides interim indication during response latencies of pending responses at a position in the GUI where the completed response will be presented.
    Type: Application
    Filed: February 18, 2011
    Publication date: August 23, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Gerhard Grobauer, Andreas Neubacher, Miklós Pápai
  • Publication number: 20120209606
    Abstract: Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.
    Type: Application
    Filed: February 14, 2011
    Publication date: August 16, 2012
    Applicant: Nice Systems Ltd.
    Inventors: Maya Gorodetsky, Ezra Daya, Oren Pereg
  • Publication number: 20120209603
    Abstract: Techniques for acoustic voice activity detection (AVAD) is described, including detecting a signal associated with a subband from a microphone, performing an operation on data associated with the signal, the operation generating a value associated with the subband, and determining whether the value distinguishes the signal from noise by using the value to determine a signal-to-noise ratio and comparing the value to a threshold.
    Type: Application
    Filed: January 9, 2012
    Publication date: August 16, 2012
    Inventor: Zhinian Jing
  • Publication number: 20120203558
    Abstract: A voice-operated control circuit and method for using the voice-operated control circuit in connection with a toy vehicle. The voice-operated control circuit contains an audio detector, such as a microphone, to detect audible sound signals, and an integrated circuit that determines the duration of the audible sound signals received by the audio detector. At a user-defined time and based on the audible sound signals received, the integrated circuit determines and controls the duration of operation of various components of the toy vehicle, such as a motor, lights and/or sounds.
    Type: Application
    Filed: February 4, 2011
    Publication date: August 9, 2012
    Inventors: Ryohei Tanaka, Christy Marie Torres
  • Publication number: 20120197643
    Abstract: A speech signal processing system and method which uses the following steps: (a) receiving an utterance from a user via a microphone that converts the utterance into a speech signal; and (b) pre-processing the speech signal using a processor. The pre-processing step includes extracting acoustic data from the received speech signal, determining from the acoustic data whether the utterance includes one or more obstruents; estimating speech energy from higher frequencies associated with the identified obstruents, and mapping the estimated speech energy to lower frequencies.
    Type: Application
    Filed: January 27, 2011
    Publication date: August 2, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Publication number: 20120197644
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 2, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120197641
    Abstract: A signal portion is extracted from an input signal for each frame having a specific duration to generate a per-frame input signal. The per-frame input signal in a time domain is converted into a per-frame input signal in a frequency domain, thereby generating a spectral pattern. Subband average energy is derived in each of subbands adjacent one another in the spectral pattern. The subband average energy is compared in at least one subband pair of a first subband and a second subband that is a higher frequency band than the first subband, the first and second subbands being consecutive subbands in the spectral pattern. It is determined that the per-frame input signal includes a consonant segment if the subband average energy of the second subband is higher than the subband average energy of the first subband.
    Type: Application
    Filed: February 1, 2012
    Publication date: August 2, 2012
    Applicant: JVC KENWOOD Corporation
    Inventors: Akiko Akechi, Takaaki Yamabe
  • Publication number: 20120191453
    Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.
    Type: Application
    Filed: March 30, 2012
    Publication date: July 26, 2012
    Applicant: Cyberpulse L.L.C.
    Inventors: James ROBERGE, Jeffrey Soble
  • Publication number: 20120191455
    Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
    Type: Application
    Filed: April 3, 2012
    Publication date: July 26, 2012
    Applicant: WIAV SOLUTIONS LLC
    Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh
  • Publication number: 20120191449
    Abstract: Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.
    Type: Application
    Filed: September 30, 2011
    Publication date: July 26, 2012
    Applicant: GOOGLE INC.
    Inventors: Matthew I. LLOYD, Pankaj RISBOOD
  • Publication number: 20120191450
    Abstract: A system and method for processing a speech signal delivered in a noisy channel or with ambient noise that focuses on a subset of harmonics that are least corrupted by noise, that disregards the signal harmonics with low signal-to-noise ratio(s), and that disregards amplitude modulations inconsistent with speech.
    Type: Application
    Filed: July 27, 2010
    Publication date: July 26, 2012
    Inventor: Mark Pinson
  • Publication number: 20120188164
    Abstract: Presented is method and system for processing a gesture performed by a user of a first input device. The method comprises detecting the gesture and detecting a user-provided parameter for disambiguating the gesture. A user command is then determined based on the detected gesture and the detected parameter.
    Type: Application
    Filed: October 16, 2009
    Publication date: July 26, 2012
    Inventors: Prasenjit Dey, Sriganesh Madhvanath, Ramadevi Vennelakanti, Rahul Ajmera
  • Publication number: 20120185251
    Abstract: A method and system for candidate matching, such as used in match-making services, assesses narrative responses to measure candidate qualities. A candidate database includes self-assessment data and narrative data. Narrative data concerning a defined topic is analyzed to determine candidate qualities separate from topical information. Candidate qualities thus determined are included in candidate profiles and used to identify desirable candidates.
    Type: Application
    Filed: March 26, 2012
    Publication date: July 19, 2012
    Applicant: HOSHIKO LLC
    Inventor: Gary Stephen Shuster
  • Patent number: 8224395
    Abstract: A method may include connecting to another user device, identifying a geographic location of the other user device, identifying a geographic location of the user device, mapping a sound source associated with the other user device, based on the geographic location of the other user device with respect to the geographic location of the user device, to a location of an auditory space associated with a user of the user device, placing the sound source in the location of the auditory space, and emitting, based on the placing, the sound source so that the sound source is capable of being perceived by the user in the location of the auditory space.
    Type: Grant
    Filed: April 24, 2009
    Date of Patent: July 17, 2012
    Assignee: Sony Mobile Communications AB
    Inventors: Ted Moller, Ian Rattigan
  • Publication number: 20120179467
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.
    Type: Application
    Filed: March 20, 2012
    Publication date: July 12, 2012
    Applicant: AT&T Intellectual Property I, L. P.
    Inventor: Jason Williams
  • Publication number: 20120179470
    Abstract: Systems and methods for providing a simultaneous voice and data user interface for secure catalog orders and in particular for providing a system and method for providing a distributed voice user interface for a remote device having a limited visual user interface simultaneously with a data stream for facilitating secure automated catalog orders for simultaneous electronic fulfillment applied to that device are described.
    Type: Application
    Filed: March 19, 2012
    Publication date: July 12, 2012
    Applicant: Pitney Bowes Inc.
    Inventors: Jeffrey D. Pierce, G. Jonathan Wolfman, Luu T. Pham, Thomas J. Foth, George M. Macdonald
  • Publication number: 20120179463
    Abstract: Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
    Type: Application
    Filed: January 6, 2012
    Publication date: July 12, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Michael Newman, Anthony Gillet, David Mark Krowitz, Michael D. Edgington
  • Publication number: 20120179464
    Abstract: Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
    Type: Application
    Filed: January 6, 2012
    Publication date: July 12, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Michael Newman, Anthony Gillet, David Mark Krowitz, Michael D. Edgington
  • Publication number: 20120173244
    Abstract: Provided are a voice command recognition apparatus and method capable of figuring out the intention of a voice command input through a voice dialog interface, by combining a rule based dialog model and a statistical dialog model rule. The voice command recognition apparatus includes a command intention determining unit configured to correct an error in recognizing a voice command of a user, and an application processing unit configured to check whether the final command intention determined in the command intention determining unit comprises the input factors for execution of an application.
    Type: Application
    Filed: September 26, 2011
    Publication date: July 5, 2012
    Inventors: Byung-Kwan Kwak, Chi-Youn Park, Jeong-Su Kim, Jeong-Mi Cho
  • Publication number: 20120173233
    Abstract: A method and apparatus for communicating through a phone having a voice recognition function are provided. The method of performing communication using a phone having a voice recognition function includes converting to an incoming call notification and voice recognition mode when a phone call is received; converting to a communication connection and speakerphone mode when voice information related to a communication connection instruction is recognized; performing communication using a speakerphone; and ending communication when voice information related to a communication end instruction is recognized during communication using the speakerphone. Therefore, when a phone call is received, a mode of a phone is converted to a speakerphone mode with a voice instruction using a voice recognition function, and thus communication can be performed without using a hand.
    Type: Application
    Filed: March 2, 2012
    Publication date: July 5, 2012
    Inventor: Joung-Min Seo
  • Publication number: 20120173238
    Abstract: One embodiment may take the form of a voice control system. The system may include a first apparatus with a processing unit configured to execute a voice recognition module and one or more executable commands, and a receiver coupled to the processing unit and configured to receive a first audio file from a remote control device. The first audio file may include at least one voice command. The first apparatus may further include a communication component coupled to the processing unit and configured to receive programming content, and one or more storage media storing the voice recognition module. The voice recognition module may be configured to convert voice commands into text.
    Type: Application
    Filed: January 28, 2011
    Publication date: July 5, 2012
    Applicant: EchoStar Technologies L.L.C.
    Inventors: Jeremy Mickelsen, Nathan A. Hale, Benjamin Mauser, David A. Innes, Brad Bylund
  • Publication number: 20120166176
    Abstract: A conventional speech recognition dictionary, translation dictionary and speech synthesis dictionary used in speech translation have inconsistencies.
    Type: Application
    Filed: March 3, 2010
    Publication date: June 28, 2012
    Inventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
  • Publication number: 20120162540
    Abstract: An embodiment of an apparatus for speech recognition includes a speech input unit configured to acquire acoustic signal, a first recognition unit configured to recognize the acoustic signal, a communication unit configured to communicate with an external server, a second recognition unit configured to recognize the acoustic signal by utilizing the external server via the communication unit, and a remote signal input unit configured to acquire a control signal from a remote controller. The switching unit is configured to switch between the first recognition unit and the second recognition unit for recognizing the acoustic signal in response to a start trigger. The switching unit selects the second recognition unit when the start trigger is detected from the control signal, and the switching unit selects the first recognition unit when the start trigger is detected from the acoustic signal.
    Type: Application
    Filed: December 21, 2011
    Publication date: June 28, 2012
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Kazushige Ouchi, Miwako Doi
  • Publication number: 20120166184
    Abstract: Systems and methods that provide for voice command devices that receive sound but do not transfer the voice data beyond the system unless certain voice-filtering criteria have been met are described herein. In addition, embodiments provide devices that support voice command operation while external voice data transmission is in mute operation mode. As such, devices according to embodiments may process voice data locally responsive to the voice data matching voice-filtering criteria. Furthermore, systems and methods are described herein involving voice command devices that capture sound and analyze it in real-time on a word-by-word basis and decide whether to handle the voice data locally, transmit it externally, or both.
    Type: Application
    Filed: December 23, 2010
    Publication date: June 28, 2012
    Applicant: Lenovo (Singapore) Pte. Ltd.
    Inventors: Howard Locker, Daryl Cromer, Scott Edwards Kelso, Aaron Michael Stewart
  • Publication number: 20120158399
    Abstract: Techniques for grouping a plurality of samples automatically transcribed from a plurality of utterances. The method comprises forming clusters from the plurality of samples, wherein the clusters include two or more of the plurality of samples. One or more samples are selected from a cluster and manually-processed data samples for the one or more samples are obtained. A weighting factor may be assigned to the data samples based, at least in part, on the number of samples in the cluster associated with the selected data sample.
    Type: Application
    Filed: December 21, 2010
    Publication date: June 21, 2012
    Inventors: Real Tremblay, Jerome Tremblay, Alina Andreevskaia
  • Publication number: 20120150539
    Abstract: Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.
    Type: Application
    Filed: December 13, 2011
    Publication date: June 14, 2012
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Hyung Bae Jeon, Yun Keun Lee, Eui Sok Chung, Jong Jin Kim, Hoon Chung, Jeon Gue Park, Ho Young Jung, Byung Ok Kang, Ki Young Park, Sung Joo Lee, Jeom Ja Kang, Hwa Jeon Song
  • Publication number: 20120150540
    Abstract: A service for searching for unsolicited communications is provided. For example, the service may inspect e-mail messages, instant messaging messages, facsimile transmissions, voice communications, and video telephony, and analyze these communications to determine whether an intended communication is unsolicited. In connection with voice and video telephony, a voice sample may be obtained from the caller and voice recognition may be performed on the sample to determine an identity of the person or the voice. The voice sample may also be used to determine the type of voice—i.e., if the voice is live, machine generated, or prerecorded. Where the call is a video telephony call, image recognition may be used to inspect an image of the person. The information obtained from voice recognition, voice type recognition, and image recognition may be used to detect whether the messages if from a known source of unsolicited communications.
    Type: Application
    Filed: February 20, 2012
    Publication date: June 14, 2012
    Applicant: ROCKSTAR BIDCO, LP
    Inventors: Samir Srivastava, Francois Audet, Vibhu Vivek
  • Publication number: 20120150536
    Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.
    Type: Application
    Filed: December 9, 2010
    Publication date: June 14, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
  • Publication number: 20120140979
    Abstract: To improve the recognition accuracy even when a plurality of words including a matched portion are selected as candidates for recognition. A word recognition apparatus according to the present invention includes input means for inputting a word image representing a plurality of characters; word candidate selection means for recognizing the word image input by the input means and selecting a first word candidate and a second word candidate based on a plurality of words registered in a word dictionary; and verification means for comparing the first word candidate and the second word candidate character by character and verifying a likelihood of the first word candidate based on an evaluation value obtained when the word image is recognized by characters determined as unmatched.
    Type: Application
    Filed: June 14, 2010
    Publication date: June 7, 2012
    Applicant: NEC Corporation
    Inventor: Daisuke Nishiwaki
  • Publication number: 20120136870
    Abstract: Systems and methods provide for indexing audio content by fusing the indexes derived from a keyword stream and a large vocabulary stream search. For example, systems and methods provide for two stream searching of Spoken Web VoiceSites, wherein metadata is extracted from the VoiceSite and is used to determine a set of keywords for high precision search while a traditional standard vocabulary set is used to perform a high results, low precision search. The results of the keyword search and the standard vocabulary search are fused together to form a comprehensive, ranked list of results.
    Type: Application
    Filed: November 30, 2010
    Publication date: May 31, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Anupam Joshi, Sougata Mukherjea, Nitendra Rajput
  • Publication number: 20120136659
    Abstract: Disclosed herein are an apparatus and method for preprocessing speech signals to perform speech recognition. The apparatus includes a voiced sound interval detection unit, a preprocessing method determination unit, and a clipping signal processing unit. The voiced sound interval detection unit detects a voiced sound interval including a voiced sound signal in a voice interval. The preprocessing method determination unit detects a clipping signal present in the voiced sound interval. The clipping signal processing unit extracts signal samples adjacent to the clipping signal, and performs interpolation on the clipping signal using the adjacent signal samples.
    Type: Application
    Filed: November 22, 2011
    Publication date: May 31, 2012
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Byung-Ok Kang, Hwa-Jeon Song, Ho-Young Jung, Sung-Joo Lee, Jeon-Gue Park, Yun-Keun Lee
  • Publication number: 20120130710
    Abstract: Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.
    Type: Application
    Filed: November 18, 2010
    Publication date: May 24, 2012
    Applicant: Microsoft Corporation
    Inventors: Deng Li, Jinyu Li, Dong Yu, Yifan Gong
  • Publication number: 20120130711
    Abstract: A signal portion per frame is extracted from an input signal, thus generating a per-frame signal. The per-frame signal in the time domain is converted into a per-frame signal in the frequency domain, thereby generating a spectral pattern of spectra. It is determined whether an energy ratio is higher than a threshold level. The energy ratio is a ratio of each spectral energy to subband energy in a subband that involves the spectrum. The subband is involved in subbands into which a frequency band is separated with a specific bandwidth. It is determined whether the per-frame signal is a speech segment, based on a result of the determination. Average energy is derived in the frequency direction for the spectra in the spectral pattern in each subband. Subband energy is derived per subband by averaging the average energy in the time domain.
    Type: Application
    Filed: November 22, 2011
    Publication date: May 24, 2012
    Applicant: JVC KENWOOD Corporation a corporation of Japan
    Inventor: Takaaki YAMABE
  • Publication number: 20120130713
    Abstract: Systems, methods, apparatus, and machine-readable media for voice activity detection in a single-channel or multichannel audio signal are disclosed.
    Type: Application
    Filed: October 24, 2011
    Publication date: May 24, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Jongwon Shin, Erik Visser, Ian Ernan Liu
  • Publication number: 20120130712
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Application
    Filed: January 31, 2012
    Publication date: May 24, 2012
    Inventors: Jong-Ho SHIN, Jae-Do Kwak, Jong-Keun Youn
  • Publication number: 20120129576
    Abstract: A method for operating a mobile terminal, and which includes performing voice recognition on call content to produce recognized call content, converting the recognized call content into one or more units of character information, registering the one or more units of character information to one or more particular functions of the mobile terminal based on a type of the character information or a field of the character information, inputting a search parameter, searching one of a plurality of file types and identifying a file related to both the search parameter and the one or more registered units of character information, and displaying or automatically executing the identified file.
    Type: Application
    Filed: February 2, 2012
    Publication date: May 24, 2012
    Inventors: In-Jik LEE, Sun-Hwa CHA, Jae-Do KWAK
  • Publication number: 20120123783
    Abstract: Systems and associated methods for editing telecom web applications through a voice interface are described. Systems and methods provide for editing telecom web applications over a connection, as for example accessed via a standard phone, using speech and/or DTMF inputs. The voice based editing includes exposing an editing interface to a user for a telecom web application that is editable, dynamically generating a voice-based interface for a given user for accomplishing editing tasks, and modifying the telecom web application to reflect the editing commands entered by the user.
    Type: Application
    Filed: November 17, 2010
    Publication date: May 17, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sheetal K. Agarwal, Arun Kumar, Priyanka Manwani
  • Patent number: H2269
    Abstract: The present invention is an “Automated Speech Translation System using Human Brain Language Areas Comprehension Capabilities”. It discloses a method to address the most common variation in the world, which is communication gap between people of different ethnicity. Imagine a world where we can communicate with our natural language to everyone without the need of human translators, interpreters, hand-held device and language translation books. In order to facilitate language translation, this present invention recognizes the speech in voice pitches, collects the language comprehensive information from each recipient's brain language areas within the audible range and sends it to “voice processing center” for analyzing. Then, it translates the collected voice pitches of speech to natural language of recipient(s) by using language dictionaries database.
    Type: Grant
    Filed: November 20, 2009
    Date of Patent: June 5, 2012
    Inventor: Johnson Manuel-Devadoss (Johnson Smith)