Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
  • Publication number: 20120116768
    Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
    Type: Application
    Filed: November 8, 2011
    Publication date: May 10, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.
    Inventors: Srinivas Bangalore, Michael J. Johnston
  • Publication number: 20120101823
    Abstract: Embodiments of a dialog system that utilizes contextual information to perform recognition of proper names are described. Unlike present name recognition methods on large name lists that generally focus strictly on the static aspect of the names, embodiments of the present system take into account of the temporal, recency and context effect when names are used, and formulates new questions to further constrain the search space or grammar for recognition of the past and current utterances.
    Type: Application
    Filed: December 28, 2011
    Publication date: April 26, 2012
    Applicant: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhongnan Shen, Zhe Feng
  • Publication number: 20120095765
    Abstract: A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command.
    Type: Application
    Filed: December 22, 2011
    Publication date: April 19, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: William K. Bodin, James R. Lewis, Leslie R. Wilson
  • Publication number: 20120089396
    Abstract: A system that incorporates teachings of the present disclosure may include, for example, an interface for receiving an utterance of speech and converting the utterance into a speech signal, such as digital representation including a waveform and/or spectrum; and a processor for dividing the speech signal into segments and detecting the emotional information from speech. The system is designed by comparing the speech segments to a baseline to identify the emotion or emotions from the suprasegmental information (i.e., paralinguistic information) in speech, wherein the baseline is determined from acoustic characteristics of a plurality of emotion categories. Other embodiments are disclosed.
    Type: Application
    Filed: June 16, 2010
    Publication date: April 12, 2012
    Applicant: University of Florida Research Foundation, Inc.
    Inventors: Sona Patel, Rahul Shrivastav
  • Publication number: 20120078634
    Abstract: A voice dialogue system executing an operation by a voice dialogue with a user, includes a history storage unit storing an operation name of the operation executed by the voice dialogue system and an operation history corresponding to a number of execution times of the executed operation; a voice storage unit storing voice data corresponding to the operation name; a detection unit detecting a voice skip signal indicating skipping an user's voice input; an acquisition unit acquiring the operation name of the operation having a high priority based on the number of execution times from said history storage unit, when said detection unit detects the voice skip signal; and a generation unit reading the voice data corresponding to the acquired operation name from said voice storage unit, and generating a voice signal corresponding to the read voice data.
    Type: Application
    Filed: March 15, 2011
    Publication date: March 29, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Masahide Ariu
  • Publication number: 20120078631
    Abstract: Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.
    Type: Application
    Filed: September 22, 2011
    Publication date: March 29, 2012
    Applicant: ALIBABA GROUP HOLDING LIMITED
    Inventors: Haibo Sun, Yang Yang, Yining Chen
  • Publication number: 20120078629
    Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.
    Type: Application
    Filed: March 25, 2011
    Publication date: March 29, 2012
    Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
  • Publication number: 20120072219
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.
    Type: Application
    Filed: September 22, 2010
    Publication date: March 22, 2012
    Applicant: AT & T Intellectual Property I, L.P.
    Inventors: Michael JOHNSTON, Srinivas Bangalore, Junlan Feng, Taniya Mishra
  • Publication number: 20120072221
    Abstract: A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input.
    Type: Application
    Filed: November 23, 2011
    Publication date: March 22, 2012
    Applicant: Ben Franklin Patent Holding, LLC
    Inventors: George M. WHITE, James J. Buteau, Glen E. Shires, Kevin J. Surace, Steven Markman
  • Publication number: 20120059658
    Abstract: Embodiments of the present invention relate to searching for content on the Internet. A user may supply a search query to a device, and the device may issue the search query to a plurality of search engines, including at least one general purpose search engine and at least one site-specific search engine. In this way, the user need not separately issue search queries to each of the plurality of search engines.
    Type: Application
    Filed: September 8, 2010
    Publication date: March 8, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Vladimir Sejnoha, Gary B. Clayton, Victor S. Chen, Steven Hatch, William F. Ganong, III, Gunnar Evermann, Marc W. Regan, Stephen W. Laverty, Paul J. Vozila, Nathan M. Bodenstab, Yik-Cheung Tam
  • Publication number: 20120059657
    Abstract: A method for detecting and recognizing speech is provided that remotely detects body motions from a speaker during vocalization with one or more radar sensors. Specifically, the radar sensors include a transmit aperture that transmits one or more waveforms towards the speaker, and each of the waveforms has a distinct wavelength. A receiver aperture is configured to receive the scattered radio frequency energy from the speaker. Doppler signals correlated with the speaker vocalization are extracted with a receiver. Digital signal processors are configured to develop feature vectors utilizing the vocalization Doppler signals, and words associated with the feature vectors are recognized with a word classifier.
    Type: Application
    Filed: June 7, 2011
    Publication date: March 8, 2012
    Inventors: Jefferson M. Willey, Todd Stephenson, Hugh Faust, James P. Hansen, George J. Linde, Carol Chang, Justin Nevitt, James A. Ballas, Thomas Herne Crystal, Vincent Michael Stanford, Jean W. de Graaf
  • Publication number: 20120053943
    Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result. If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference. Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result. The method can be used for single or multiple digits or digit strings.
    Type: Application
    Filed: November 7, 2011
    Publication date: March 1, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Jason W. Clark, Rathinavelu Chengalvarayan, Timothy J. Grost, Dana B. Fecher, Jeremy M. Spaulding
  • Publication number: 20120035930
    Abstract: A conferencing system is disclosed in which a participant to a conference call can program the embodiment to listen for one or more “keywords” in the conference call. The keywords might be a participant's name or words associated with him or her or words associated with his or her area of knowledge. The embodiments uses speech recognition technology to listen for those words. When the embodiments detects that those words have been spoken, the embodiment alerts the participant—using audible, visual, and/or tactile signals—that the participant's attention to the call is warranted. When the keywords are chosen wisely, the benefit can be great.
    Type: Application
    Filed: September 26, 2011
    Publication date: February 9, 2012
    Applicant: AVAYA INC.
    Inventors: Ezra Raphael Gilbert, Vipul Kishore Lalka, Venkat R. Gilakattula
  • Publication number: 20120035931
    Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.
    Type: Application
    Filed: September 29, 2011
    Publication date: February 9, 2012
    Inventors: Michael J. LeBeau, John Nicholas Jitkoff, Dave Burke
  • Publication number: 20120029919
    Abstract: One embodiment of the present invention provides a system for placing linguistically-aware variables in computer-generated text. During operation, the system receives a sentence at a computer system, wherein the sentence comprises two or more words. Next, the system analyzes the sentence to identify a first variable, wherein the first variable is a place-holder for a first word. The system then receives the first word. After that, the system automatically determines a gender of the first word. Next, the system analyzes the sentence to identify a first dependent word that is dependent on the first word, wherein a spelling of the first dependent word is dependent on the gender of the first word. The system then determines the spelling of the first dependent word that corresponds to the gender of the first word. Next, the system replaces the first variable in the sentence with the first word.
    Type: Application
    Filed: July 29, 2010
    Publication date: February 2, 2012
    Applicant: INTUIT INC.
    Inventor: Peter J. Harris
  • Publication number: 20110320202
    Abstract: A system using sound templates is presented that may receive a first template for an audio signal and compares it to templates from different sound sources to determine a correlation between them. A location history database is created that assists in identifying the location of a user in response to audio templates generated by the user over time and at different locations. Comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons. Queries may be run against the database to track users by templates generated from their voice. In addition, background information may be filtered out of the voice signal and separately compared against the database to assist in identifying a location based on the background noise.
    Type: Application
    Filed: June 22, 2011
    Publication date: December 29, 2011
    Inventor: John D. KAUFMAN
  • Publication number: 20110320201
    Abstract: An audio signal verification system is presented for verifying the sound is from a predetermined source. Various methods for analyzing the sound are presented and the various methods may be combined to vary degrees to determine an appropriate correlation with a predefined pattern. Moreover a confidence level or other indication may be used to indicate the determination was successful. The sound may be reduced to templates with varying degrees of richness. Also different templates may be created using the same sound source and different sounds from the same source may be aggregated to form a single template. Comparisons may be made comparing a sound or a template derived from that sound with stored sounds or templates derived from that stored sound. Moreover comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons.
    Type: Application
    Filed: June 2, 2011
    Publication date: December 29, 2011
    Inventor: John D. KAUFMAN
  • Patent number: 8083587
    Abstract: A slot machine 1 of the present invention makes a control so as to: sequentially store the number of game values consumed per unit game; sequentially store the number of game values given per unit game; calculating a difference between the total number of game values given and the total number of game values consumed, as a self game value difference; transmitting the self game value difference to outside; receiving someone's game value difference from outside; when the self game value difference and the someone's game value difference are in a predetermined relationship, voice-outputting, by the conversation controller 91, an answer at volume corresponding to the predetermined relationship from the speaker 23 in response to a voice input through the microphone 90; and delete the stored numbers of game values given and consumed, under a predetermined condition.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: December 27, 2011
    Assignee: Aruze Gaming America, Inc.
    Inventor: Kazuo Okada
  • Publication number: 20110307257
    Abstract: A method and system for indicating in real time that an interaction is associated with a problem or issue, comprising: receiving a segment of an interaction in which a representative of the organization participates; extracting a feature from the segment; extracting a global feature associated with the interaction; aggregating the feature and the global feature; and classifying the segment or the interaction in association with the problem or issue by applying a model to the feature and the global feature. The method and system may also use features extracted from earlier segments within the interaction. The method and system can also evaluate the model based on features extracted from training interactions and manual tagging assigned to the interactions or segments thereof.
    Type: Application
    Filed: June 10, 2010
    Publication date: December 15, 2011
    Applicant: Nice Systems Ltd.
    Inventors: Oren PEREG, Moshe WASSERBLAT, Yuval LUBOWICH, Ronen LAPERDON, Dori SHAPIRA, Vladislav FEIGIN, Oz FOX-KAHANA
  • Publication number: 20110301955
    Abstract: Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop of a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases.
    Type: Application
    Filed: June 7, 2010
    Publication date: December 8, 2011
    Applicant: GOOGLE INC.
    Inventors: William J. Byrne, Alexander H. Gruenstein, Douglas Beeferman
  • Publication number: 20110282650
    Abstract: A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.
    Type: Application
    Filed: May 17, 2010
    Publication date: November 17, 2011
    Applicant: AVAYA INC.
    Inventors: Terry Jennings, Paul Roller Michaelis
  • Publication number: 20110282667
    Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.
    Type: Application
    Filed: May 14, 2010
    Publication date: November 17, 2011
    Applicant: Sony Computer Entertainment Inc.
    Inventor: Gustavo A. Hernandez-Abrego
  • Publication number: 20110270612
    Abstract: Systems and methods are provided for scoring non-native, spontaneous speech. A spontaneous speech sample is received, where the sample is of spontaneous speech spoken by a non-native speaker. Automatic speech recognition is performed on the sample using an automatic speech recognition system to generate a transcript of the sample, where a speech recognizer metric is determined by the automatic speech recognition system. A word accuracy rate estimate is determined for the transcript of the sample generated by the automatic speech recognition system based on the speech recognizer metric. The spontaneous speech sample is scored using a preferred scoring model when the word accuracy rate estimate satisfies a threshold, and the spontaneous speech sample is scored using an alternate scoring model when the word accuracy rate estimate fails to satisfy the threshold.
    Type: Application
    Filed: April 28, 2011
    Publication date: November 3, 2011
    Inventors: Su-Youn Yoon, Lei Chen, Klaus Zechner
  • Publication number: 20110218802
    Abstract: A computerized method for continuous speech recognition using a speech recognition engine and a phoneme model. The computerized method inputs a speech signal into the speech recognition engine. Based on the phoneme model, the speech signal is indexed by scoring for the phonemes of the phoneme model and a time-ordered list of phoneme candidates and respective scores resulting from the scoring are produced. The phoneme candidates are input with the scores from the time-ordered list. Word transcription candidates are typically input from a dictionary and words are built by selecting from the word transcription candidates based on the scores. A stream of transcriptions is outputted corresponding to the input speech signal. The stream of transcriptions is re-scored by searching for and detecting anomalous word transcriptions in the stream of transcriptions to produce second scores.
    Type: Application
    Filed: March 8, 2010
    Publication date: September 8, 2011
    Inventors: Shlomi Hai Bouganim, Boris Levant
  • Publication number: 20110218807
    Abstract: The invention relates to a method for sentence planning (120) in a task classification system that interacts with a user. The method may include recognizing symbols in the user's input communication and determining whether the user's input communication can be understood. If the user's communication can be understood, understanding data may be generated (220). The method may further include generating communicative goals (3010) based on the recognized symbols and understanding data. The generated communicative goals (3010) may be related to information needed to be obtained form the user. The method may also include automatically planning one or more sentences (3020) based on the generated communicative goals and outputting at least one of the sentence plans to the user (3080).
    Type: Application
    Filed: May 18, 2011
    Publication date: September 8, 2011
    Applicant: AT&T Intellectual Property ll, LP
    Inventors: Marilyn A. WALKER, Owen Christopher RAMBOW, Monica ROGATI
  • Publication number: 20110210822
    Abstract: A refrigerator is provided. The refrigerator includes a voice recognition unit for recognizing a voice of a name of food, a memory for storing location information of the food received in a storage chamber, a controller for determining the voice recognized by the voice recognition unit and searching a storage location of the food voice-recognized in accordance with the recognized voice, and a voice output unit for outputting a voice message on the storage location information of the food searched by the controller.
    Type: Application
    Filed: September 11, 2008
    Publication date: September 1, 2011
    Applicant: LG Electronics Inc.
    Inventors: Sung-Ae Lee, Min-Kyeong Kim
  • Publication number: 20110202341
    Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.
    Type: Application
    Filed: April 29, 2011
    Publication date: August 18, 2011
    Applicant: VERIZON PATENT AND LICENSING INC.
    Inventor: Kevin W. BROWN
  • Publication number: 20110191106
    Abstract: One-to-many comparisons of callers' words and/or voice prints with known words and/or voice prints to identify any substantial matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract different words, such as words of anger. The system may also segment at least a portion of the customer's voice to create a tone profile, and it formats the segmented words and tone profiles for network transmission to a server. The server compares the customer's words and/or tone profiles with multiple known words and/or tone profiles stored on a database to determine any substantial matches. The identification of any matches may be used for a variety of purposes, such as providing representative feedback or customer follow-up.
    Type: Application
    Filed: April 12, 2011
    Publication date: August 4, 2011
    Applicant: American Express Travel Related Services Company, Inc.
    Inventors: Chin H. Khor, Marcel Leyva, Vernon Marshall
  • Publication number: 20110184735
    Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.
    Type: Application
    Filed: January 22, 2010
    Publication date: July 28, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
  • Publication number: 20110166860
    Abstract: Systems and methods are disclosed to operate a mobile device by capturing user input; transmitting the user input over a wireless channel to an engine, analyzing at the engine music clip or video in a multimedia data stream and sending an analysis wirelessly to the mobile device.
    Type: Application
    Filed: July 12, 2010
    Publication date: July 7, 2011
    Inventor: Bao Q. Tran
  • Publication number: 20110144992
    Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
  • Publication number: 20110144973
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Enrico Bocchieri, Diamantino Antonio Caseiro
  • Publication number: 20110144993
    Abstract: A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Inventor: David Ruby
  • Publication number: 20110131045
    Abstract: Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.
    Type: Application
    Filed: February 2, 2011
    Publication date: June 2, 2011
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Philippe Di Cristo, Min Ke, Robert A. Kennewick, Lynn Elise Armstrong
  • Publication number: 20110131043
    Abstract: The present invention enables the recognition process at high speed even when a lot of garbage is included in the grammar. The first voice recognition processing unit generates a recognition hypothesis graph which indicates a structure of hypothesis that is derived according to a first grammar together with a score associated with respective connections of a recognition unit by executing a voice recognition process based on the first grammar to a voice feature amount of input voice, and the second voice recognition processing unit outputs the recognition result from a total score of a hypothesis which is derived according to a second grammar after executing a voice recognition process according to the second grammar that is specified to accept a section other than keywords in input voice as the garbage section to a voice feature amount of input voice, and the second voice recognition processing unit acquires the structure and the score of the garbage section from the recognition hypothesis graph.
    Type: Application
    Filed: December 22, 2008
    Publication date: June 2, 2011
    Inventors: Fumihiro Adachi, Ryosuke Isotani, Ken Hanazawa
  • Publication number: 20110093268
    Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.
    Type: Application
    Filed: September 14, 2010
    Publication date: April 21, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
  • Publication number: 20110093269
    Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response.
    Type: Application
    Filed: December 30, 2010
    Publication date: April 21, 2011
    Inventors: Keith Braho, Amro El-Jaroudi, Jeffrey Pike
  • Publication number: 20110054901
    Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
    Type: Application
    Filed: August 27, 2010
    Publication date: March 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
  • Publication number: 20110044438
    Abstract: During voice communication between multiple telecommunications devices, a shareable application facilitates concurrent sharing of data and processes between the devices. The application may be configured to monitor the voice communication and execute a predetermined function upon detecting a predetermined condition in the voice communication. The application may further facilitate sharing of functionality and user interface displays during the voice communication. In some implementations, a server computing device on a communications network may facilitate functions of shareable applications on one or more telecommunications devices.
    Type: Application
    Filed: August 20, 2009
    Publication date: February 24, 2011
    Applicant: T-Mobile USA, Inc.
    Inventors: Winston Wang, Adam Holt, Jean-Luc Bouthemy, Michael Kemery
  • Publication number: 20110029313
    Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system.
    Type: Application
    Filed: October 11, 2010
    Publication date: February 3, 2011
    Applicant: VOCOLLECT, INC.
    Inventors: Keith P. Braho, Jeffrey P. Pike, Lori A. Pike
  • Publication number: 20110010175
    Abstract: Provided is to a text data processing apparatus, method and program to add a symbol at an appropriate position. The apparatus according to this embodiment is a text data processing apparatus that executes edit of a symbol in input text, the apparatus including symbol edit determination means 52 that determines whether symbol edit is necessary or not based on a frequency of symbol insertion in a block consisting of a plurality of divided text; and symbol edit position calculation means 53 that calculates likelihood of the symbol edit based on likelihood of symbol insertion for a word and a distance between the symbols and calculates a symbol edit position in the block in accordance with the likelihood of symbol edit or a word in the block when the symbol edit determination means determines that the symbol edit is necessary.
    Type: Application
    Filed: February 13, 2009
    Publication date: January 13, 2011
    Inventors: Tasuku Kitade, Takafumi Koshinaka
  • Publication number: 20110004462
    Abstract: Speech recognition may be improved by generating and using a topic specific language model. A topic specific language model may be created by performing an initial pass on an audio signal using a generic or basis language model. A speech recognition device may then determine topics relating to the audio signal based on the words identified in the initial pass and retrieve a corpus of text relating to those topics. Using the retrieved corpus of text, the speech recognition device may create a topic specific language model. In one example, the speech recognition device may adapt or otherwise modify the generic language model based on the retrieved corpus of text.
    Type: Application
    Filed: July 1, 2009
    Publication date: January 6, 2011
    Applicant: COMCAST INTERACTIVE MEDIA, LLC
    Inventors: David F. Houghton, Seth Michael Murray, Sibley Verbeck Simon
  • Publication number: 20100332230
    Abstract: Phonetic distances are empirically measured as a function of speech recognition engine recognition error rates. The error rates are determined by comparing a recognized speech file with a reference file. The phonetic distances can be normalized to earlier measurements. The phonetic distances/error rates can also be used to improve speech recognition engine grammar selection, as an aid in language training and evaluation, and in other applications.
    Type: Application
    Filed: June 25, 2009
    Publication date: December 30, 2010
    Applicant: ADACEL SYSTEMS, INC.
    Inventor: Chang-Qing Shu
  • Publication number: 20100332231
    Abstract: A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the
    Type: Application
    Filed: June 1, 2010
    Publication date: December 30, 2010
    Applicants: Honda Motor Co., Ltd., Advanced Telecommunications Research Institute International
    Inventors: Mikio Nakano, Takashi Nose, Ryo Taguchi, Kotaro Funakoshi, Naoto Iwahashi
  • Publication number: 20100328066
    Abstract: Methods, systems and articles of manufacture are provided for administering sobriety tests to online gamblers, as well as to determining whether, when and to whom to administer a sobriety tests. Various mediation events to be initiated upon certain results of such sobriety tests are also disclosed.
    Type: Application
    Filed: June 24, 2010
    Publication date: December 30, 2010
    Inventors: Jay S. Walker, Zachary T. Smith, Magdalena M. Fincham
  • Publication number: 20100324900
    Abstract: A computerized method of detecting a target word in a speech signal. A speech recognition engine and a previously constructed phoneme model is provided. The speech signal is input into the speech recognition engine. Based on the phoneme model, the input speech signal is indexed. A time-ordered list is stored representing n-best phoneme candidates of the input speech signal and phonemes of the input speech signal in multiple phoneme frames. The target word is transcribed into a transcription of target phonemes. The time-ordered list of n-best phoneme candidates is searched for a locus of said target phonemes. While searching, scoring is based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of the target phonemes found. A composite score of the probability of an occurrence of the target word is produced. When the composite score is higher than a threshold, start and finish times are output which bound the locus.
    Type: Application
    Filed: June 19, 2009
    Publication date: December 23, 2010
    Inventors: Ronen Faifkov, Rabin Cohen-Tov, Adam Simone
  • Publication number: 20100318356
    Abstract: Textual transcription of speech is generated and formatted according to user-specified transformation and behavior requirements for a speech recognition system having input grammars and transformations. An apparatus may include a speech recognition platform configured to receive a user-specified transformation requirement, recognize speech in speech data into recognized speech according to a set of recognition grammars; and apply transformations to the recognized speech according to the user-specified transformation requirement. The apparatus may further be configured to receive a user-specified behavior requirement and transform the recognized speech according to the behavior requirement. Other embodiments are described and claimed.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Jonathan E. Hamaker, Keith C. Herold
  • Publication number: 20100292989
    Abstract: Enables symbol insertion evaluation in consideration of a difference in speaking style features between speakers. For a word sequence transcribing voice information, the symbol insertion likelihood calculation means 113 obtains a symbol insertion likelihood for each of a plurality of symbol insertion models supplied for different speaking style features. The speaking style feature similarity calculation means 112 obtains a similarity between the speaking style feature of the word sequence and the plurality of speaking style feature models. The symbol insertion evaluation means 114 weights the symbol insertion likelihood obtained for the word sequence by each of the plurality of symbol insertion models according to the similarity between the speaking style feature of the word sequence and the plurality of speaking style feature models and the relevance between the symbol insertion model and the speaking style feature model, and performs symbol insertion evaluation to the word sequence.
    Type: Application
    Filed: January 19, 2009
    Publication date: November 18, 2010
    Inventors: Tasuku Kitade, Takafumi Koshinaka
  • Publication number: 20100286984
    Abstract: A method for the voice recognition of a spoken expression to be recognized, comprising a plurality of expression parts that are to be recognized. Partial voice recognition takes place on a first selected expression part, and depending on a selection of hits for the first expression part detected by the partial voice recognition, voice recognition on the first and further expression parts is executed.
    Type: Application
    Filed: June 18, 2008
    Publication date: November 11, 2010
    Inventors: Michael Wandinger, Jesus Fernando Guitarte Perez, Bernhard Littel
  • Publication number: 20100280828
    Abstract: Techniques are described that generally relate to systems, methods, and devices designed to selectively filter offensive communications in accordance with a user's intentions. Example methods may be designed to filter (such as by deleting, blocking, replacing, and/or modifying) various offensive words, phrases, and/or sounds that have been identified as having offensive meanings.
    Type: Application
    Filed: April 30, 2009
    Publication date: November 4, 2010
    Inventors: Gene Fein, Edward Merritt