Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
-
Publication number: 20120116768Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.Type: ApplicationFiled: November 8, 2011Publication date: May 10, 2012Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.Inventors: Srinivas Bangalore, Michael J. Johnston
-
Publication number: 20120101823Abstract: Embodiments of a dialog system that utilizes contextual information to perform recognition of proper names are described. Unlike present name recognition methods on large name lists that generally focus strictly on the static aspect of the names, embodiments of the present system take into account of the temporal, recency and context effect when names are used, and formulates new questions to further constrain the search space or grammar for recognition of the past and current utterances.Type: ApplicationFiled: December 28, 2011Publication date: April 26, 2012Applicant: Robert Bosch GmbHInventors: Fuliang Weng, Zhongnan Shen, Zhe Feng
-
Publication number: 20120095765Abstract: A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command.Type: ApplicationFiled: December 22, 2011Publication date: April 19, 2012Applicant: Nuance Communications, Inc.Inventors: William K. Bodin, James R. Lewis, Leslie R. Wilson
-
Publication number: 20120089396Abstract: A system that incorporates teachings of the present disclosure may include, for example, an interface for receiving an utterance of speech and converting the utterance into a speech signal, such as digital representation including a waveform and/or spectrum; and a processor for dividing the speech signal into segments and detecting the emotional information from speech. The system is designed by comparing the speech segments to a baseline to identify the emotion or emotions from the suprasegmental information (i.e., paralinguistic information) in speech, wherein the baseline is determined from acoustic characteristics of a plurality of emotion categories. Other embodiments are disclosed.Type: ApplicationFiled: June 16, 2010Publication date: April 12, 2012Applicant: University of Florida Research Foundation, Inc.Inventors: Sona Patel, Rahul Shrivastav
-
Publication number: 20120078634Abstract: A voice dialogue system executing an operation by a voice dialogue with a user, includes a history storage unit storing an operation name of the operation executed by the voice dialogue system and an operation history corresponding to a number of execution times of the executed operation; a voice storage unit storing voice data corresponding to the operation name; a detection unit detecting a voice skip signal indicating skipping an user's voice input; an acquisition unit acquiring the operation name of the operation having a high priority based on the number of execution times from said history storage unit, when said detection unit detects the voice skip signal; and a generation unit reading the voice data corresponding to the acquired operation name from said voice storage unit, and generating a voice signal corresponding to the read voice data.Type: ApplicationFiled: March 15, 2011Publication date: March 29, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Masahide Ariu
-
Publication number: 20120078631Abstract: Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.Type: ApplicationFiled: September 22, 2011Publication date: March 29, 2012Applicant: ALIBABA GROUP HOLDING LIMITEDInventors: Haibo Sun, Yang Yang, Yining Chen
-
Publication number: 20120078629Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.Type: ApplicationFiled: March 25, 2011Publication date: March 29, 2012Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
-
Publication number: 20120072219Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.Type: ApplicationFiled: September 22, 2010Publication date: March 22, 2012Applicant: AT & T Intellectual Property I, L.P.Inventors: Michael JOHNSTON, Srinivas Bangalore, Junlan Feng, Taniya Mishra
-
Publication number: 20120072221Abstract: A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input.Type: ApplicationFiled: November 23, 2011Publication date: March 22, 2012Applicant: Ben Franklin Patent Holding, LLCInventors: George M. WHITE, James J. Buteau, Glen E. Shires, Kevin J. Surace, Steven Markman
-
Publication number: 20120059658Abstract: Embodiments of the present invention relate to searching for content on the Internet. A user may supply a search query to a device, and the device may issue the search query to a plurality of search engines, including at least one general purpose search engine and at least one site-specific search engine. In this way, the user need not separately issue search queries to each of the plurality of search engines.Type: ApplicationFiled: September 8, 2010Publication date: March 8, 2012Applicant: Nuance Communications, Inc.Inventors: Vladimir Sejnoha, Gary B. Clayton, Victor S. Chen, Steven Hatch, William F. Ganong, III, Gunnar Evermann, Marc W. Regan, Stephen W. Laverty, Paul J. Vozila, Nathan M. Bodenstab, Yik-Cheung Tam
-
Publication number: 20120059657Abstract: A method for detecting and recognizing speech is provided that remotely detects body motions from a speaker during vocalization with one or more radar sensors. Specifically, the radar sensors include a transmit aperture that transmits one or more waveforms towards the speaker, and each of the waveforms has a distinct wavelength. A receiver aperture is configured to receive the scattered radio frequency energy from the speaker. Doppler signals correlated with the speaker vocalization are extracted with a receiver. Digital signal processors are configured to develop feature vectors utilizing the vocalization Doppler signals, and words associated with the feature vectors are recognized with a word classifier.Type: ApplicationFiled: June 7, 2011Publication date: March 8, 2012Inventors: Jefferson M. Willey, Todd Stephenson, Hugh Faust, James P. Hansen, George J. Linde, Carol Chang, Justin Nevitt, James A. Ballas, Thomas Herne Crystal, Vincent Michael Stanford, Jean W. de Graaf
-
Publication number: 20120053943Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result. If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference. Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result. The method can be used for single or multiple digits or digit strings.Type: ApplicationFiled: November 7, 2011Publication date: March 1, 2012Applicant: GENERAL MOTORS LLCInventors: Jason W. Clark, Rathinavelu Chengalvarayan, Timothy J. Grost, Dana B. Fecher, Jeremy M. Spaulding
-
Publication number: 20120035930Abstract: A conferencing system is disclosed in which a participant to a conference call can program the embodiment to listen for one or more “keywords” in the conference call. The keywords might be a participant's name or words associated with him or her or words associated with his or her area of knowledge. The embodiments uses speech recognition technology to listen for those words. When the embodiments detects that those words have been spoken, the embodiment alerts the participant—using audible, visual, and/or tactile signals—that the participant's attention to the call is warranted. When the keywords are chosen wisely, the benefit can be great.Type: ApplicationFiled: September 26, 2011Publication date: February 9, 2012Applicant: AVAYA INC.Inventors: Ezra Raphael Gilbert, Vipul Kishore Lalka, Venkat R. Gilakattula
-
Publication number: 20120035931Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.Type: ApplicationFiled: September 29, 2011Publication date: February 9, 2012Inventors: Michael J. LeBeau, John Nicholas Jitkoff, Dave Burke
-
Publication number: 20120029919Abstract: One embodiment of the present invention provides a system for placing linguistically-aware variables in computer-generated text. During operation, the system receives a sentence at a computer system, wherein the sentence comprises two or more words. Next, the system analyzes the sentence to identify a first variable, wherein the first variable is a place-holder for a first word. The system then receives the first word. After that, the system automatically determines a gender of the first word. Next, the system analyzes the sentence to identify a first dependent word that is dependent on the first word, wherein a spelling of the first dependent word is dependent on the gender of the first word. The system then determines the spelling of the first dependent word that corresponds to the gender of the first word. Next, the system replaces the first variable in the sentence with the first word.Type: ApplicationFiled: July 29, 2010Publication date: February 2, 2012Applicant: INTUIT INC.Inventor: Peter J. Harris
-
Publication number: 20110320202Abstract: A system using sound templates is presented that may receive a first template for an audio signal and compares it to templates from different sound sources to determine a correlation between them. A location history database is created that assists in identifying the location of a user in response to audio templates generated by the user over time and at different locations. Comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons. Queries may be run against the database to track users by templates generated from their voice. In addition, background information may be filtered out of the voice signal and separately compared against the database to assist in identifying a location based on the background noise.Type: ApplicationFiled: June 22, 2011Publication date: December 29, 2011Inventor: John D. KAUFMAN
-
Publication number: 20110320201Abstract: An audio signal verification system is presented for verifying the sound is from a predetermined source. Various methods for analyzing the sound are presented and the various methods may be combined to vary degrees to determine an appropriate correlation with a predefined pattern. Moreover a confidence level or other indication may be used to indicate the determination was successful. The sound may be reduced to templates with varying degrees of richness. Also different templates may be created using the same sound source and different sounds from the same source may be aggregated to form a single template. Comparisons may be made comparing a sound or a template derived from that sound with stored sounds or templates derived from that stored sound. Moreover comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons.Type: ApplicationFiled: June 2, 2011Publication date: December 29, 2011Inventor: John D. KAUFMAN
-
Gaming machine with dialog outputting method to victory or defeat of game and control method thereof
Patent number: 8083587Abstract: A slot machine 1 of the present invention makes a control so as to: sequentially store the number of game values consumed per unit game; sequentially store the number of game values given per unit game; calculating a difference between the total number of game values given and the total number of game values consumed, as a self game value difference; transmitting the self game value difference to outside; receiving someone's game value difference from outside; when the self game value difference and the someone's game value difference are in a predetermined relationship, voice-outputting, by the conversation controller 91, an answer at volume corresponding to the predetermined relationship from the speaker 23 in response to a voice input through the microphone 90; and delete the stored numbers of game values given and consumed, under a predetermined condition.Type: GrantFiled: January 21, 2009Date of Patent: December 27, 2011Assignee: Aruze Gaming America, Inc.Inventor: Kazuo Okada -
Publication number: 20110307257Abstract: A method and system for indicating in real time that an interaction is associated with a problem or issue, comprising: receiving a segment of an interaction in which a representative of the organization participates; extracting a feature from the segment; extracting a global feature associated with the interaction; aggregating the feature and the global feature; and classifying the segment or the interaction in association with the problem or issue by applying a model to the feature and the global feature. The method and system may also use features extracted from earlier segments within the interaction. The method and system can also evaluate the model based on features extracted from training interactions and manual tagging assigned to the interactions or segments thereof.Type: ApplicationFiled: June 10, 2010Publication date: December 15, 2011Applicant: Nice Systems Ltd.Inventors: Oren PEREG, Moshe WASSERBLAT, Yuval LUBOWICH, Ronen LAPERDON, Dori SHAPIRA, Vladislav FEIGIN, Oz FOX-KAHANA
-
Publication number: 20110301955Abstract: Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop of a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases.Type: ApplicationFiled: June 7, 2010Publication date: December 8, 2011Applicant: GOOGLE INC.Inventors: William J. Byrne, Alexander H. Gruenstein, Douglas Beeferman
-
Publication number: 20110282650Abstract: A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.Type: ApplicationFiled: May 17, 2010Publication date: November 17, 2011Applicant: AVAYA INC.Inventors: Terry Jennings, Paul Roller Michaelis
-
Publication number: 20110282667Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.Type: ApplicationFiled: May 14, 2010Publication date: November 17, 2011Applicant: Sony Computer Entertainment Inc.Inventor: Gustavo A. Hernandez-Abrego
-
Publication number: 20110270612Abstract: Systems and methods are provided for scoring non-native, spontaneous speech. A spontaneous speech sample is received, where the sample is of spontaneous speech spoken by a non-native speaker. Automatic speech recognition is performed on the sample using an automatic speech recognition system to generate a transcript of the sample, where a speech recognizer metric is determined by the automatic speech recognition system. A word accuracy rate estimate is determined for the transcript of the sample generated by the automatic speech recognition system based on the speech recognizer metric. The spontaneous speech sample is scored using a preferred scoring model when the word accuracy rate estimate satisfies a threshold, and the spontaneous speech sample is scored using an alternate scoring model when the word accuracy rate estimate fails to satisfy the threshold.Type: ApplicationFiled: April 28, 2011Publication date: November 3, 2011Inventors: Su-Youn Yoon, Lei Chen, Klaus Zechner
-
Publication number: 20110218802Abstract: A computerized method for continuous speech recognition using a speech recognition engine and a phoneme model. The computerized method inputs a speech signal into the speech recognition engine. Based on the phoneme model, the speech signal is indexed by scoring for the phonemes of the phoneme model and a time-ordered list of phoneme candidates and respective scores resulting from the scoring are produced. The phoneme candidates are input with the scores from the time-ordered list. Word transcription candidates are typically input from a dictionary and words are built by selecting from the word transcription candidates based on the scores. A stream of transcriptions is outputted corresponding to the input speech signal. The stream of transcriptions is re-scored by searching for and detecting anomalous word transcriptions in the stream of transcriptions to produce second scores.Type: ApplicationFiled: March 8, 2010Publication date: September 8, 2011Inventors: Shlomi Hai Bouganim, Boris Levant
-
Publication number: 20110218807Abstract: The invention relates to a method for sentence planning (120) in a task classification system that interacts with a user. The method may include recognizing symbols in the user's input communication and determining whether the user's input communication can be understood. If the user's communication can be understood, understanding data may be generated (220). The method may further include generating communicative goals (3010) based on the recognized symbols and understanding data. The generated communicative goals (3010) may be related to information needed to be obtained form the user. The method may also include automatically planning one or more sentences (3020) based on the generated communicative goals and outputting at least one of the sentence plans to the user (3080).Type: ApplicationFiled: May 18, 2011Publication date: September 8, 2011Applicant: AT&T Intellectual Property ll, LPInventors: Marilyn A. WALKER, Owen Christopher RAMBOW, Monica ROGATI
-
Publication number: 20110210822Abstract: A refrigerator is provided. The refrigerator includes a voice recognition unit for recognizing a voice of a name of food, a memory for storing location information of the food received in a storage chamber, a controller for determining the voice recognized by the voice recognition unit and searching a storage location of the food voice-recognized in accordance with the recognized voice, and a voice output unit for outputting a voice message on the storage location information of the food searched by the controller.Type: ApplicationFiled: September 11, 2008Publication date: September 1, 2011Applicant: LG Electronics Inc.Inventors: Sung-Ae Lee, Min-Kyeong Kim
-
Publication number: 20110202341Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.Type: ApplicationFiled: April 29, 2011Publication date: August 18, 2011Applicant: VERIZON PATENT AND LICENSING INC.Inventor: Kevin W. BROWN
-
Publication number: 20110191106Abstract: One-to-many comparisons of callers' words and/or voice prints with known words and/or voice prints to identify any substantial matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract different words, such as words of anger. The system may also segment at least a portion of the customer's voice to create a tone profile, and it formats the segmented words and tone profiles for network transmission to a server. The server compares the customer's words and/or tone profiles with multiple known words and/or tone profiles stored on a database to determine any substantial matches. The identification of any matches may be used for a variety of purposes, such as providing representative feedback or customer follow-up.Type: ApplicationFiled: April 12, 2011Publication date: August 4, 2011Applicant: American Express Travel Related Services Company, Inc.Inventors: Chin H. Khor, Marcel Leyva, Vernon Marshall
-
Publication number: 20110184735Abstract: Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination.Type: ApplicationFiled: January 22, 2010Publication date: July 28, 2011Applicant: MICROSOFT CORPORATIONInventors: Jason Flaks, Dax Hawkins, Christian Klein, Mitchell Stephen Dernis, Tommer Leyvand, Ali M. Vassigh, Duncan McKay
-
Publication number: 20110166860Abstract: Systems and methods are disclosed to operate a mobile device by capturing user input; transmitting the user input over a wireless channel to an engine, analyzing at the engine music clip or video in a multimedia data stream and sending an analysis wirelessly to the mobile device.Type: ApplicationFiled: July 12, 2010Publication date: July 7, 2011Inventor: Bao Q. Tran
-
Publication number: 20110144992Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.Type: ApplicationFiled: December 15, 2009Publication date: June 16, 2011Applicant: Microsoft CorporationInventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
-
Publication number: 20110144973Abstract: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.Type: ApplicationFiled: December 15, 2009Publication date: June 16, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Enrico Bocchieri, Diamantino Antonio Caseiro
-
Publication number: 20110144993Abstract: A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween.Type: ApplicationFiled: December 15, 2009Publication date: June 16, 2011Inventor: David Ruby
-
Publication number: 20110131045Abstract: Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.Type: ApplicationFiled: February 2, 2011Publication date: June 2, 2011Applicant: VoiceBox Technologies, Inc.Inventors: Philippe Di Cristo, Min Ke, Robert A. Kennewick, Lynn Elise Armstrong
-
Publication number: 20110131043Abstract: The present invention enables the recognition process at high speed even when a lot of garbage is included in the grammar. The first voice recognition processing unit generates a recognition hypothesis graph which indicates a structure of hypothesis that is derived according to a first grammar together with a score associated with respective connections of a recognition unit by executing a voice recognition process based on the first grammar to a voice feature amount of input voice, and the second voice recognition processing unit outputs the recognition result from a total score of a hypothesis which is derived according to a second grammar after executing a voice recognition process according to the second grammar that is specified to accept a section other than keywords in input voice as the garbage section to a voice feature amount of input voice, and the second voice recognition processing unit acquires the structure and the score of the garbage section from the recognition hypothesis graph.Type: ApplicationFiled: December 22, 2008Publication date: June 2, 2011Inventors: Fumihiro Adachi, Ryosuke Isotani, Ken Hanazawa
-
Publication number: 20110093268Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.Type: ApplicationFiled: September 14, 2010Publication date: April 21, 2011Applicant: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
-
Publication number: 20110093269Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response.Type: ApplicationFiled: December 30, 2010Publication date: April 21, 2011Inventors: Keith Braho, Amro El-Jaroudi, Jeffrey Pike
-
Publication number: 20110054901Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.Type: ApplicationFiled: August 27, 2010Publication date: March 3, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
-
Publication number: 20110044438Abstract: During voice communication between multiple telecommunications devices, a shareable application facilitates concurrent sharing of data and processes between the devices. The application may be configured to monitor the voice communication and execute a predetermined function upon detecting a predetermined condition in the voice communication. The application may further facilitate sharing of functionality and user interface displays during the voice communication. In some implementations, a server computing device on a communications network may facilitate functions of shareable applications on one or more telecommunications devices.Type: ApplicationFiled: August 20, 2009Publication date: February 24, 2011Applicant: T-Mobile USA, Inc.Inventors: Winston Wang, Adam Holt, Jean-Luc Bouthemy, Michael Kemery
-
Publication number: 20110029313Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system.Type: ApplicationFiled: October 11, 2010Publication date: February 3, 2011Applicant: VOCOLLECT, INC.Inventors: Keith P. Braho, Jeffrey P. Pike, Lori A. Pike
-
Publication number: 20110010175Abstract: Provided is to a text data processing apparatus, method and program to add a symbol at an appropriate position. The apparatus according to this embodiment is a text data processing apparatus that executes edit of a symbol in input text, the apparatus including symbol edit determination means 52 that determines whether symbol edit is necessary or not based on a frequency of symbol insertion in a block consisting of a plurality of divided text; and symbol edit position calculation means 53 that calculates likelihood of the symbol edit based on likelihood of symbol insertion for a word and a distance between the symbols and calculates a symbol edit position in the block in accordance with the likelihood of symbol edit or a word in the block when the symbol edit determination means determines that the symbol edit is necessary.Type: ApplicationFiled: February 13, 2009Publication date: January 13, 2011Inventors: Tasuku Kitade, Takafumi Koshinaka
-
Publication number: 20110004462Abstract: Speech recognition may be improved by generating and using a topic specific language model. A topic specific language model may be created by performing an initial pass on an audio signal using a generic or basis language model. A speech recognition device may then determine topics relating to the audio signal based on the words identified in the initial pass and retrieve a corpus of text relating to those topics. Using the retrieved corpus of text, the speech recognition device may create a topic specific language model. In one example, the speech recognition device may adapt or otherwise modify the generic language model based on the retrieved corpus of text.Type: ApplicationFiled: July 1, 2009Publication date: January 6, 2011Applicant: COMCAST INTERACTIVE MEDIA, LLCInventors: David F. Houghton, Seth Michael Murray, Sibley Verbeck Simon
-
Publication number: 20100332230Abstract: Phonetic distances are empirically measured as a function of speech recognition engine recognition error rates. The error rates are determined by comparing a recognized speech file with a reference file. The phonetic distances can be normalized to earlier measurements. The phonetic distances/error rates can also be used to improve speech recognition engine grammar selection, as an aid in language training and evaluation, and in other applications.Type: ApplicationFiled: June 25, 2009Publication date: December 30, 2010Applicant: ADACEL SYSTEMS, INC.Inventor: Chang-Qing Shu
-
Publication number: 20100332231Abstract: A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and theType: ApplicationFiled: June 1, 2010Publication date: December 30, 2010Applicants: Honda Motor Co., Ltd., Advanced Telecommunications Research Institute InternationalInventors: Mikio Nakano, Takashi Nose, Ryo Taguchi, Kotaro Funakoshi, Naoto Iwahashi
-
Publication number: 20100328066Abstract: Methods, systems and articles of manufacture are provided for administering sobriety tests to online gamblers, as well as to determining whether, when and to whom to administer a sobriety tests. Various mediation events to be initiated upon certain results of such sobriety tests are also disclosed.Type: ApplicationFiled: June 24, 2010Publication date: December 30, 2010Inventors: Jay S. Walker, Zachary T. Smith, Magdalena M. Fincham
-
Publication number: 20100324900Abstract: A computerized method of detecting a target word in a speech signal. A speech recognition engine and a previously constructed phoneme model is provided. The speech signal is input into the speech recognition engine. Based on the phoneme model, the input speech signal is indexed. A time-ordered list is stored representing n-best phoneme candidates of the input speech signal and phonemes of the input speech signal in multiple phoneme frames. The target word is transcribed into a transcription of target phonemes. The time-ordered list of n-best phoneme candidates is searched for a locus of said target phonemes. While searching, scoring is based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of the target phonemes found. A composite score of the probability of an occurrence of the target word is produced. When the composite score is higher than a threshold, start and finish times are output which bound the locus.Type: ApplicationFiled: June 19, 2009Publication date: December 23, 2010Inventors: Ronen Faifkov, Rabin Cohen-Tov, Adam Simone
-
Publication number: 20100318356Abstract: Textual transcription of speech is generated and formatted according to user-specified transformation and behavior requirements for a speech recognition system having input grammars and transformations. An apparatus may include a speech recognition platform configured to receive a user-specified transformation requirement, recognize speech in speech data into recognized speech according to a set of recognition grammars; and apply transformations to the recognized speech according to the user-specified transformation requirement. The apparatus may further be configured to receive a user-specified behavior requirement and transform the recognized speech according to the behavior requirement. Other embodiments are described and claimed.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: MICROSOFT CORPORATIONInventors: Jonathan E. Hamaker, Keith C. Herold
-
Publication number: 20100292989Abstract: Enables symbol insertion evaluation in consideration of a difference in speaking style features between speakers. For a word sequence transcribing voice information, the symbol insertion likelihood calculation means 113 obtains a symbol insertion likelihood for each of a plurality of symbol insertion models supplied for different speaking style features. The speaking style feature similarity calculation means 112 obtains a similarity between the speaking style feature of the word sequence and the plurality of speaking style feature models. The symbol insertion evaluation means 114 weights the symbol insertion likelihood obtained for the word sequence by each of the plurality of symbol insertion models according to the similarity between the speaking style feature of the word sequence and the plurality of speaking style feature models and the relevance between the symbol insertion model and the speaking style feature model, and performs symbol insertion evaluation to the word sequence.Type: ApplicationFiled: January 19, 2009Publication date: November 18, 2010Inventors: Tasuku Kitade, Takafumi Koshinaka
-
Publication number: 20100286984Abstract: A method for the voice recognition of a spoken expression to be recognized, comprising a plurality of expression parts that are to be recognized. Partial voice recognition takes place on a first selected expression part, and depending on a selection of hits for the first expression part detected by the partial voice recognition, voice recognition on the first and further expression parts is executed.Type: ApplicationFiled: June 18, 2008Publication date: November 11, 2010Inventors: Michael Wandinger, Jesus Fernando Guitarte Perez, Bernhard Littel
-
Publication number: 20100280828Abstract: Techniques are described that generally relate to systems, methods, and devices designed to selectively filter offensive communications in accordance with a user's intentions. Example methods may be designed to filter (such as by deleting, blocking, replacing, and/or modifying) various offensive words, phrases, and/or sounds that have been identified as having offensive meanings.Type: ApplicationFiled: April 30, 2009Publication date: November 4, 2010Inventors: Gene Fein, Edward Merritt