Creating Patterns For Matching Patents (Class 704/243)
-
Patent number: 8914283Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.Type: GrantFiled: August 5, 2013Date of Patent: December 16, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Zeynep Hakkani-Tur, Giuseppe Riccardi
-
Patent number: 8914290Abstract: Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.Type: GrantFiled: May 18, 2012Date of Patent: December 16, 2014Assignee: Vocollect, Inc.Inventors: James Hendrickson, Debra Drylie Scott, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
-
Patent number: 8914278Abstract: A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus.Type: GrantFiled: July 31, 2008Date of Patent: December 16, 2014Assignee: Ginger Software, Inc.Inventors: Yael Karov Zangvil, Avner Zangvil
-
Patent number: 8909534Abstract: A method may include selecting, by a computing device, sets of two or more text candidates from a plurality of text candidates corresponding to vocal input. The method may further include for each set, providing, by the computing device, representations of each of the respective two or more text candidates in the set to users, wherein the representations are provided as audio. The method may further include receiving a selection from each of the users of one of a text candidate from the set, wherein the selection is based on satisfying a criterion. The method may further include determining that a text candidate included in the plurality of text candidates has a highest probability out of the plurality of text candidates of being a correct textual transcription of the vocal input based at least in part on selections from the users.Type: GrantFiled: March 9, 2012Date of Patent: December 9, 2014Assignee: Google Inc.Inventor: Taliver Heath
-
Patent number: 8909528Abstract: A method (and system) of determining confusable list items and resolving this confusion in a spoken dialog system includes receiving user input, processing the user input and determining if a list of items needs to be played back to the user, retrieving the list to be played back to the user, identifying acoustic confusions between items on the list, changing the items on the list as necessary to remove the acoustic confusions, and playing unambiguous list items back to the user.Type: GrantFiled: May 9, 2007Date of Patent: December 9, 2014Assignee: Nuance Communications, Inc.Inventors: Ellen Marie Eide, Vaibhava Goel, Ramesh Gopinath, Osamuyimen T. Stewart
-
Patent number: 8909529Abstract: The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.Type: GrantFiled: November 15, 2013Date of Patent: December 9, 2014Assignee: AT&T Intellectual Property II, L.P.Inventor: Giuseppe Riccardi
-
Publication number: 20140358539Abstract: A method includes: acquiring data samples; performing categorized sentence mining in the acquired data samples to obtain categorized training samples for multiple categories; building a text classifier based on the categorized training samples; classifying the data samples using the text classifier to obtain a class vocabulary and a corpus for each category; mining the corpus for each category according to the class vocabulary for the category to obtain a respective set of high-frequency language templates; training on the templates for each category to obtain a template-based language model for the category; training on the corpus for each category to obtain a class-based language model for the category; training on the class vocabulary for each category to obtain a lexicon-based language model for the category; building a speech decoder according to an acoustic model, the class-based language model and the lexicon-based language model for any given field, and the data samples.Type: ApplicationFiled: February 14, 2014Publication date: December 4, 2014Applicant: Tencent Technology (Shenzhen) Company LimitedInventors: Feng Rao, Li Lu, Bo Chen, Xiang Zhang, Shuai Yue, Lu Li
-
Publication number: 20140358538Abstract: Methods and systems are provided for shaping speech dialog of a speech system. In one embodiment, a method includes: receiving data related to a first utterance from a user of the speech system; processing the data based on at least one attribute processing technique that determines at least one attribute of the first utterance; determining a shaping pattern based on the at least one attribute; and generating a speech prompt based on the shaping pattern.Type: ApplicationFiled: May 28, 2013Publication date: December 4, 2014Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Ron M. Hecht, Eli Tzirkel-Hancock, Omer Tsimhoni, Ute Winter
-
Publication number: 20140350931Abstract: A Statistical Machine Translation (SMT) model is trained using pairs of sentences that include content obtained from one or more content sources (e.g. feed(s)) with corresponding queries that have been used to access the content. A query click graph may be used to assist in determining candidate pairs for the SMT training data. All/portion of the candidate pairs may be used to train the SMT model. After training the SMT model using the SMT training data, the SMT model is applied to content to determine predicted queries that may be used to search for the content. The predicted queries are used to train a language model, such as a query language model. The query language model may be interpolated other language models, such as a background language model, as well as a feed language model trained using the content used in determining the predicted queries.Type: ApplicationFiled: May 24, 2013Publication date: November 27, 2014Applicant: Microsoft CorporationInventors: Michael Levit, Dilek Hakkani-Tur, Gokhan Tur
-
Patent number: 8892437Abstract: Example embodiments of the present invention may include a method that provides transcribing spoken utterances occurring during a call and assigning each of the spoken utterances with a corresponding set of first classifications. The method may also include determining a confidence rating associated with each of the spoken utterances and the assigned set of first classifications, and performing at least one of reclassifying the spoken utterances with new classifications based on at least one additional classification operation, and adding the assigned first classifications and the corresponding plurality of spoken utterances to a training data set.Type: GrantFiled: November 13, 2013Date of Patent: November 18, 2014Assignee: West CorporationInventor: Silke Witt-ehsani
-
Patent number: 8892436Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.Type: GrantFiled: October 19, 2011Date of Patent: November 18, 2014Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry FoundationInventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
-
Publication number: 20140337026Abstract: A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data.Type: ApplicationFiled: April 14, 2014Publication date: November 13, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Osamu Ichikawa, Steven J. Rennie
-
Patent number: 8886535Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.Type: GrantFiled: January 23, 2014Date of Patent: November 11, 2014Assignee: Accumente, LLCInventors: Jike Chong, Ian Richard Lane, Senaka Wimal Buthpitiya
-
Patent number: 8886533Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.Type: GrantFiled: October 25, 2011Date of Patent: November 11, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
-
Patent number: 8886534Abstract: A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.Type: GrantFiled: January 27, 2011Date of Patent: November 11, 2014Assignee: Honda Motor Co., Ltd.Inventors: Mikio Nakano, Naoto Iwahashi, Kotaro Funakoshi, Taisuke Sumii
-
Patent number: 8886543Abstract: System and methods for characterizing interest points within a fingerprint are disclosed herein. The systems include generating a set of interest points and an anchor point related to an audio sample. A quantized absolute frequency of an anchor point can be calculated and used to calculate a set of quantized ratios. A fingerprint can then be generated based upon the set of quantized ratios and used in comparison to reference fingerprints to identify the audio sample. The disclosed systems and methods provide for an audio matching system robust to pitch-shift distortion by using quantized ratios within fingerprints rather than solely using absolute frequencies of interest points. Thus, the disclosed system and methods result in more accurate audio identification.Type: GrantFiled: November 15, 2011Date of Patent: November 11, 2014Assignee: Google Inc.Inventors: Matthew Sharifi, George Tzanetakis, Annie Chen, Dominik Roblek
-
Patent number: 8880402Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech to obtain at least one parameter value, and determining an experience level of the user using the parameter value(s). The method can also include prompting the user based upon the determined experience level of the user to assist the user in delivering speech commands.Type: GrantFiled: October 28, 2006Date of Patent: November 4, 2014Assignee: General Motors LLCInventors: Ryan J. Wasson, John P. Weiss, Jason W. Clark
-
Patent number: 8880397Abstract: Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.Type: GrantFiled: October 21, 2011Date of Patent: November 4, 2014Assignee: Wal-Mart Stores, Inc.Inventors: Dion Almaer, Bernard Paul Cousineau, Ben Galbraith
-
Publication number: 20140324427Abstract: A dialog manager and spoken dialog service having a dialog manager generated according to a method comprising selecting a top level flow controller based on application type, selecting available reusable subdialogs for each application part, developing a subdialog for each application part not having an available subdialog and testing and deploying the spoken dialog service using the selected top level flow controller, selected reusable subdialogs and developed subdialogs. The dialog manager capable of handling context shifts in a spoken dialog with a user. Application dependencies are established in the top level flow controller thus enabling the subdialogs to be reusable and to be capable of managing context shifts and mixed initiative dialogs.Type: ApplicationFiled: April 25, 2014Publication date: October 30, 2014Applicant: AT&T Intellectual Property II, L.P.Inventors: Giuseppe Di Fabbrizio, Charles Alfred Lewis
-
Patent number: 8868423Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.Type: GrantFiled: July 11, 2013Date of Patent: October 21, 2014Assignee: John Nicholas and Kristin Gross TrustInventor: John Nicholas Gross
-
Patent number: 8868410Abstract: The invention provides a dialogue-based learning apparatus through dialogue with users comprising: a speech input unit (10) for inputting speeches; a speech recognition unit (20) for recognizing the input speech; and a behavior and dialogue controller (30) for controlling behaviors and dialogues according to speech recognition results, wherein the behavior and dialogue controller (30) has a topic recognition expert (34) to memorise contents of utterances and to retrieve the topic that best matches the speech recognition results, and a mode switching expert (35) to control mode switching in accordance with a user utterance, wherein the mode switching expert switches modes in accordance with a user utterance, wherein the topic recognition expert registers a plurality words in the utterance as topics in first mode, performs searches from among the registered topics, and selects the maximum likelihood topic in second mode.Type: GrantFiled: August 29, 2008Date of Patent: October 21, 2014Assignees: National Institute of Information and Communications Technology, Honda Motor Co., Ltd.Inventors: Naoto Iwahashi, Noriyuki Kimura, Mikio Nakano, Kotaro Funakoshi
-
Patent number: 8862468Abstract: A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations.Type: GrantFiled: December 22, 2011Date of Patent: October 14, 2014Assignee: Microsoft CorporationInventors: Timothy Paek, Max Chickering, Eric Badger
-
Patent number: 8856005Abstract: A method for receiving processed information at a remote device is described. The method includes transmitting from the remote device a verbal request to a first information provider and receiving a digital message from the first information provider in response to the transmitted verbal request. The digital message includes a symbolic representation indicator associated with a symbolic representation of the verbal request and data used to control an application. The method also includes transmitting, using the application, the symbolic representation indicator to a second information provider for generating results to be displayed on the remote device.Type: GrantFiled: January 8, 2014Date of Patent: October 7, 2014Assignee: Google Inc.Inventors: Gudmundur Hafsteinsson, Michael J. LeBeau, Natalia Marmasse, Sumit Agarwal, Dipchad Nishar
-
Patent number: 8856002Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.Type: GrantFiled: April 11, 2008Date of Patent: October 7, 2014Assignee: International Business Machines CorporationInventors: Dimitri Kanevsky, David Nahamoo, Tara N Sainath
-
Patent number: 8849662Abstract: A method and a system for segmenting phonemes from voice signals. A method for accurately segmenting phonemes, in which a histogram showing a peak distribution corresponding to an order is formed by using a high order concept, and a boundary indicating a starting point and an ending point of each phoneme is determined by calculating a peak statistic based on the histogram. The phoneme segmentation method can remarkably reduce an amount of calculation, and has an advantage of being applied to sound signal systems which perform sound coding, sound recognition, sound synthesizing, sound reinforcement, etc.Type: GrantFiled: December 28, 2006Date of Patent: September 30, 2014Assignee: Samsung Electronics Co., LtdInventor: Hyun-Soo Kim
-
Patent number: 8849660Abstract: Systems and methods for training voice activation control of electronic equipment are disclosed. One example method includes receiving a selection corresponding to at least one command used to control the electronic equipment. The method further includes instructing a user to speak, and responsive to the instruction, receiving a digitized speech stream. The method further includes segmenting the speech stream into speech segments, storing at least one of the speech segments as an entry in a dictionary, and associating the dictionary entry with the selected command.Type: GrantFiled: December 14, 2007Date of Patent: September 30, 2014Inventors: Arturo A. Rodriguez, David A. Sedacca, Albert Garcia
-
Patent number: 8843371Abstract: The instant application includes computationally-implemented systems and methods that include managing adaptation data, the adaptation data is at least partly based on at least one speech interaction of a particular party, facilitating transmission of the adaptation data to a target device when there is an indication of a speech-facilitated transaction between the target device and the particular party, such that the adaptation data is to be applied to the target device to assist in execution of the speech-facilitated transaction, and facilitating acquisition of adaptation result data that is based on at least one aspect of the speech-facilitated transaction and to be used in determining whether to modify the adaptation data. In addition to the foregoing, other aspects are described in the claims, drawings, and text.Type: GrantFiled: August 1, 2012Date of Patent: September 23, 2014Assignee: Elwha LLCInventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud
-
Patent number: 8843370Abstract: Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system.Type: GrantFiled: November 26, 2007Date of Patent: September 23, 2014Assignee: Nuance Communications, Inc.Inventors: Daniel Willett, Chuang He
-
Publication number: 20140278413Abstract: An electronic device with one or more processors and memory includes a procedure for training a digital assistant. In some embodiments, the device detects an impasse in a dialogue between the digital assistant and a user including a speech input. During a learning session, the device utilizes a subsequent clarification input from the user to adjust intent inference or task execution associated with the speech input to produce a satisfactory response. In some embodiments, the device identifies a pattern of success or failure associated with an aspect previously used to complete a task and generates a hypothesis regarding a parameter used in speech recognition, intent inference or task execution as a cause for the pattern. Then, the device tests the hypothesis by altering the parameter for a subsequent completion of the task and adopts or rejects the hypothesis based on feedback information collected from the subsequent completion.Type: ApplicationFiled: March 14, 2014Publication date: September 18, 2014Applicant: Apple Inc.Inventors: Donald W. PITSCHEL, Adam J. CHEYER, Christopher D. BRIGHAM, Thomas R. GRUBER
-
Publication number: 20140278395Abstract: A method and apparatus for determining a motion environment profile to adapt voice recognition processing includes a device receiving an acoustic signal including a speech signal, which is provided to a voice recognition module. The method also includes determining a motion profile for the device, determining a temperature profile for the device, and determining a noise profile for the acoustic signal. The method further includes determining, from the motion, temperature, and noise profiles, a motion environment profile for the device and adapting voice recognition processing for the speech signal based on the motion environment profile.Type: ApplicationFiled: July 31, 2013Publication date: September 18, 2014Applicant: Motorola Mobility LLCInventors: Robert A. Zurek, Kevin J. Bastyr, Giles T. Davis, Plamen A. Ivanov, Adrian M. Schuster
-
Publication number: 20140257810Abstract: According to an embodiment, a pattern classifier device includes a decision unit, an execution unit, a calculator, and a determination unit. The decision unit is configured to decide a subclass to which the input pattern is to belong, based on attribute information of the input pattern. The execution unit is configured to determine whether the input pattern belongs to a class that is divided into subclasses, using a weak classifier allocated to the decided subclass, and output a result of the determination and a reliability of the weak classifier. The calculator is configured to calculate an integrated value obtained by integrating an evaluation value based on the determination result and the reliability. The determination unit is configured to repeat the determination processing when a termination condition of the determination processing is not satisfied, and terminate the determination processing and output the integrated value when the termination condition, has been satisfied.Type: ApplicationFiled: January 24, 2014Publication date: September 11, 2014Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Hiroshi Fujimura, Takashi Masuko
-
Patent number: 8831763Abstract: System and methods for intelligently pruning interest points are disclosed herein. The systems include generating a plurality of distorted audio samples and associated distorted interest points based upon a clean audio sample. Interest points that are common to sets of distorted interest points are retained with interest points not robust to distortion discarded. The disclosed systems and methods therefore can provide for a scalable audio matching solution by eliminating interest points in reference sample fingerprints. The set of pruned interest points are robust to distortion and the benefits of both scalability and accuracy can be had.Type: GrantFiled: October 18, 2011Date of Patent: September 9, 2014Assignee: Google Inc.Inventors: Matthew Sharifi, Gheorghe Postelnicu, George Tzanetakis, Dominik Roblek
-
Patent number: 8831940Abstract: A dictation system that allows using trainable code phrases is provided. The dictation system operates by receiving audio and recognizing the audio as text. The text/audio may contain code phrases that are identified by a comparator that matches the text/audio and replaces the code phrase with a standard clause that is associated with the code phrase. The database or memory containing the code phrases is loaded with matched standard clauses that may be identified to provide a hierarchal system such that certain code phrases may have multiple meanings depending on the user.Type: GrantFiled: March 21, 2011Date of Patent: September 9, 2014Assignee: NVOQ IncorporatedInventors: Charles Corfield, Brian Marquette, David Mondragon, Rebecca Heins
-
Patent number: 8831957Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model.Type: GrantFiled: October 15, 2012Date of Patent: September 9, 2014Assignee: Google Inc.Inventors: Gabriel Taubman, Brian Strope
-
Patent number: 8825481Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.Type: GrantFiled: January 20, 2012Date of Patent: September 2, 2014Assignee: Microsoft CorporationInventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
-
Patent number: 8825478Abstract: Audio content is converted to text using speech recognition software. The text is then associated with a distinct voice or a generic placeholder label if no distinction can be made. From the text and voice information, a word cloud is generated based on key words and key speakers. A visualization of the cloud displays as it is being created. Words grow in size in relation to their dominance. When it is determined that the predominant words or speakers have changed, the word cloud is complete. That word cloud continues to be displayed statically and a new word cloud display begins based upon a new set of predominant words or a new predominant speaker or set of speakers. This process may continue until the meeting is concluded. At the end of the meeting, the completed visualization may be saved to a storage device, sent to selected individuals, removed, or any combination of the preceding.Type: GrantFiled: January 10, 2011Date of Patent: September 2, 2014Assignee: Nuance Communications, Inc.Inventors: Susan Marie Cox, Janani Janakiraman, Fang Lu, Loulwa F Salem
-
Publication number: 20140244254Abstract: A development system is described for facilitating the development of a spoken natural language (SNL) interface. The development system receives seed templates from a developer, each of which provides a command phrasing that can be used to invoke a function, when spoken by an end user. The development system then uses one or more development resources, such as a crowdsourcing system and a paraphrasing system, to provide additional templates. This yields an extended set of templates. A generation system then generates one or more models based on the extended set of templates. A user device may install the model(s) for use in interpreting commands spoken by an end user. When the user device recognizes a command, it may automatically invoke a function associated with that command. Overall, the development system provides an easy-to-use tool for producing an SNL interface.Type: ApplicationFiled: February 25, 2013Publication date: August 28, 2014Applicant: MICROSOFT CORPORATIONInventors: Yun-Cheng Ju, Matthai Philipose, Seungyeop Han
-
Patent number: 8818808Abstract: Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.Type: GrantFiled: February 23, 2005Date of Patent: August 26, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Giuseppe Riccardi, Gokhan Tur
-
Patent number: 8818809Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.Type: GrantFiled: June 20, 2013Date of Patent: August 26, 2014Assignee: Google Inc.Inventors: Craig L. Reding, Suzi Levas
-
Patent number: 8817952Abstract: Methods, apparatus, and systems are provided such that a Public Safety Answering Point (PSAP) may utilize a new model to handle Open Line emergency calls, including audio optimization, automation, analysis, and presentation. Embodiments of the present disclosure assist with the difficult task of identifying background noise while trying to listen and talk to a caller, and give the best possible audio from the caller to the emergency call-taker or dispatcher. More particularly, an audio stream is split into at least two instances, with a first instance being optimized for speech intelligibility and provided to a call-taker or dispatcher and a second instance being provided for background sound analysis. Accordingly, the new PSAP Open Line model may allow for significantly more efficient emergency assessment, location, and management of resources.Type: GrantFiled: March 15, 2013Date of Patent: August 26, 2014Assignee: Avaya Inc.Inventors: Jon Bentley, Mark Fletcher, Joseph L. Hall, Avram Levi, Paul Roller Michaelis, Heinz Teutsch
-
Patent number: 8812315Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.Type: GrantFiled: October 1, 2013Date of Patent: August 19, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
-
Publication number: 20140229178Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.Type: ApplicationFiled: April 22, 2014Publication date: August 14, 2014Applicant: Spansion LLCInventors: Richard FASTOW, Qamrul Hasan
-
Patent number: 8805684Abstract: Automatic speech recognition (ASR) may be performed on received utterances. The ASR may be performed by an ASR module of a computing device (e.g., a client device). The ASR may include: generating feature vectors based on the utterances, updating the feature vectors based on feature-space speaker adaptation parameters, transcribing the utterances to text strings, and updating the feature-space speaker adaptation parameters based on the feature vectors. The transcriptions may be based, at least in part, on an acoustic model and the updated feature vectors. Updated speaker adaptation parameters may be received from another computing device and incorporated into the ASR module.Type: GrantFiled: October 17, 2012Date of Patent: August 12, 2014Assignee: Google Inc.Inventors: Petar Aleksic, Xin Lei
-
Patent number: 8805685Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.Type: GrantFiled: August 5, 2013Date of Patent: August 12, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst J. Schroeter
-
Patent number: 8805560Abstract: Systems and methods for noise based interest point density pruning are disclosed herein. The systems include determining an amount of noise in an audio sample and adjusting the amount of interest points within an audio sample fingerprint based on the amount of noise. Samples containing high amounts of noise correspondingly generate fingerprints with more interest points. The disclosed systems and methods allow reference fingerprints to be reduced in size while increasing the size of sample fingerprints. The benefits in scalability do not compromise the accuracy of an audio matching system using noise based interest point density pruning.Type: GrantFiled: October 18, 2011Date of Patent: August 12, 2014Assignee: Google Inc.Inventors: George Tzanetakis, Dominik Roblek, Matthew Sharifi
-
Publication number: 20140222426Abstract: The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.Type: ApplicationFiled: April 7, 2014Publication date: August 7, 2014Applicant: AT&T Intellectual Property II, L.P.Inventors: Giuseppe Di Fabbrizio, Dilek Z. Hakkani-Tur, Mazin G. Rahim, Bernard S. Renger, Gokhan Tur
-
Publication number: 20140222425Abstract: Provided are a speech recognition learning method using 3D geometric information and a speech recognition method by using 3D geometric information. The method performs learning by using 3D geometric information for learning or information derived from the 3D geometric information to generate a recognizer, and the speech recognition method performs speech recognition by applying 3D geometric information on a physical object correlated to or dependent on voice or information derived from the 3D geometric information to the recognizer.Type: ApplicationFiled: February 7, 2014Publication date: August 7, 2014Applicant: SOGANG UNIVERSITY RESEARCH FOUNDATIONInventors: Hyung-Min PARK, Changsoo JE, Bi Ho KIM, Min Wook KIM
-
Patent number: 8798990Abstract: Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.Type: GrantFiled: April 30, 2013Date of Patent: August 5, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Srinivas Bangalore, Mazin Gilbert, Narendra K. Gupta
-
Patent number: 8798994Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.Type: GrantFiled: February 6, 2008Date of Patent: August 5, 2014Assignee: International Business Machines CorporationInventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
-
Publication number: 20140214422Abstract: The application provides a method and system for determinism in non-linear systems for speech processing, particularly automatic speech segmentation for building speech recognition systems. More particularly, the application enables a method and system for detecting boundary of coarticulated units from isolated speech using recurrence plot.Type: ApplicationFiled: July 18, 2012Publication date: July 31, 2014Applicant: Tata Consultancy Services LimitedInventors: Mohd Bilal Arif Syed, Arijit Sinharay, Tanushyam Chattopadhyay