Constructional Details Of Speech Recognition Systems (epo) Patents (Class 704/E15.046)
  • Patent number: 11935516
    Abstract: A speech recognition method and apparatus are disclosed. The speech recognition method includes determining a first score of candidate texts based on an input speech, determining a weight for an output of a language model based on the input speech, applying the weight to a second score of the candidate texts output from the language model to obtain a weighted second score, selecting a target candidate text from among the candidate texts based on the first score and the weighted second score corresponding to the target candidate text, and determining the target candidate text to correspond to a portion of the input speech.
    Type: Grant
    Filed: July 20, 2021
    Date of Patent: March 19, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Jihyun Lee
  • Patent number: 11914099
    Abstract: Systems and methods include a method for predicting geological formation tops. First well log data associated with a key master well is received. Formation data identifying tops of formations confirmed in the key master well is received. Merged key master well and formation data is generated in a dynamic time warping (DTW)-readable format by merging the first well log data with the formation data. Second well log data associated with a training well located in geographic proximity to the key master well is received. The second well log data is formatted into the DTW-readable format. A DTW function is executed to generate indices associated with the formation tops. The DTW function uses the merged key master well and formation data and the formatted second well log data as DTW function inputs. Predicted geological formation tops for the training well are predicted using the generated indexes.
    Type: Grant
    Filed: November 1, 2019
    Date of Patent: February 27, 2024
    Assignee: Saudi Arabian Oil Company
    Inventors: Matter J. Alshammery, Nazih F. Najjar
  • Patent number: 11508370
    Abstract: An on-board agent system includes: a plurality of agent functional units, each of the plurality of agent functional units being configured to provide a service including outputting a response using voice to an output unit according to an utterance of an occupant of a vehicle; and a common operator configured to be shared by the plurality of agent functional units and provided in the vehicle, wherein, when an operation is executed on the common operator with an operation pattern set to correspond to each of the plurality of agent functional units, an agent functional unit corresponding to the operation pattern of the executed operation is activated.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: November 22, 2022
    Assignee: HONDA MOTOR CO., LTD.
    Inventors: Sawako Furuya, Yoshifumi Wagatsuma, Hiroki Nakayama, Kengo Naiki, Yusuke Oi
  • Publication number: 20140025379
    Abstract: A system and method are presented for real-time speech analytics in the speech analytics field. Real time audio is fed along with a keyword model, into a recognition engine. The recognition engine computes the probability of the audio stream data matching keywords in the keyword model. The probability is compared to a threshold where the system determines if the probability is indicative of whether or not the keyword has been spotted. Empirical metrics are computed and any false alarms are identified and rejected. The keyword may be reported as found when it is deemed not to be a false alarm and passes the threshold for detection.
    Type: Application
    Filed: July 20, 2012
    Publication date: January 23, 2014
    Applicant: INTERACTIVE INTELLIGENCE, INC.
    Inventors: Aravind Ganapathiraju, Ananth Nagaraja Iyer
  • Publication number: 20130124209
    Abstract: An information processing apparatus includes: a plurality of information input units; an event detection unit that generates event information including estimated position information and estimated identification information of users present in the real space based on analysis of the information from the information input unit; and an information integration processing unit that inputs the event information, and generates target information including a position of each user and user identification information based on the input event information, and signal information representing a probability value of the event generation source, wherein the information integration processing unit includes an utterance source probability calculation unit, and wherein the utterance source probability calculation unit performs a process of calculating an utterance source score as an index value representing an utterance source probability of each target by multiplying weights based on utterance situations by a plurality of d
    Type: Application
    Filed: November 6, 2012
    Publication date: May 16, 2013
    Applicant: Sony Corporation
    Inventor: Sony Corporation
  • Publication number: 20120315957
    Abstract: There is provided an electronic device that can execute a function using characters inputted during a telephone call, and a control method and a control program thereof. An application control unit inputs a character as an input character using an input control unit in a state where a call with a predetermined communication counterpart is continuing using a communication unit. When a predetermined function is selected after the input character is inputted with the input control unit, the application control unit executes a predetermined function in a state where the input character is inputted.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 13, 2012
    Applicant: KYOCERA Corporation
    Inventor: Hiroshi KAMIKUBO
  • Publication number: 20120245942
    Abstract: Systems and methods are provided for scoring speech. A speech sample is received, where the speech sample is associated with a script. The speech sample is aligned with the script. An event recognition metric of the speech sample is extracted, and locations of prosodic events are detected in the speech sample based on the event recognition metric. The locations of the detected prosodic events are compared with locations of model prosodic events, where the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script. A prosodic event metric is calculated based on the comparison, and the speech sample is scored using a scoring model based upon the prosodic event metric.
    Type: Application
    Filed: March 20, 2012
    Publication date: September 27, 2012
    Inventors: Klaus Zechner, Xiaoming Xi
  • Publication number: 20120232894
    Abstract: The invention provides a portable device for recognizing and/or reconstructing speech by ultrasound probing of the vocal apparatus, the device including at least one ultrasound transducer (20) for generating an ultrasound wave and for receiving a wave reflected by the user's vocal apparatus, and analysis means for analyzing a signal generated by the ultrasound transducer, wherein the device includes locating means (21, 23) for determining the position of the ultrasound transducer relative to the skull of the user.
    Type: Application
    Filed: September 15, 2010
    Publication date: September 13, 2012
    Applicants: CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE, UNIVERSITE PIERRE ET MARIE CURIE (PARIS 6)
    Inventors: Thomas Hueber, Bruce Denby, Gérard Dreyfus, Rémi Dubois, Perrie Roussel
  • Publication number: 20120221337
    Abstract: The invention comprises a method and apparatus for predicting word accuracy. Specifically, the method comprises obtaining an utterance in speech data where the utterance comprises an actual word string, processing the utterance for generating an interpretation of the actual word string, processing the utterance to identify at least one utterance frame, and predicting a word accuracy associated with the interpretation according to at least one stationary signal-to-noise ratio and at least one non-stationary signal to noise ratio, wherein the at least one stationary signal-to-noise ratio and the at least one non-stationary signal to noise ratio are determined according to a frame energy associated with each of the at least one utterance frame.
    Type: Application
    Filed: May 7, 2012
    Publication date: August 30, 2012
    Inventors: Mazin Gilbert, Hong Kook Kim
  • Publication number: 20120191456
    Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
    Type: Application
    Filed: February 1, 2012
    Publication date: July 26, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
  • Publication number: 20120095766
    Abstract: A speech recognition apparatus is provided. The speech recognition apparatus includes a primary speech recognition unit configured to perform speech recognition on input speech and thus to generate word lattice information, a word string generation unit configured to generate one or more word strings based on the word lattice information, a language model score calculation unit configured to calculate bidirectional language model scores of the generated word strings selectively using forward and backward language models for each of words in each of the generated word strings, and a sentence output unit configured to output one or more of the generated word strings with high scores as results of the speech recognition of the input speech based on the calculated bidirectional language model scores.
    Type: Application
    Filed: May 24, 2011
    Publication date: April 19, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ick-Sang Han, Chi-Youn Park, Jeong-Su Kim, Jeong-Mi Cho
  • Publication number: 20120041764
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Application
    Filed: August 10, 2011
    Publication date: February 16, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Haitian XU, Kean Kheong Chin, Mark John Francis Gales
  • Publication number: 20120016672
    Abstract: Computer-implemented systems and methods are provided for assessing non-native speech proficiency. A non-native speech sample is processed to identify a plurality of vowel sound boundaries in the non-native speech sample. Portions of the non-native speech sample are analyzed within the vowel sound boundaries to extract vowel characteristics. The vowel characteristics are used to identify a plurality of vowel space metrics for the non-native speech sample, and the vowel space metrics are used to determine a non-native speech proficiency score for the non-native speech sample.
    Type: Application
    Filed: July 14, 2011
    Publication date: January 19, 2012
    Inventors: Lei Chen, Keelan Evanini, Xie Sun
  • Publication number: 20110144987
    Abstract: A method of automated speech recognition in a vehicle. The method includes receiving audio in the vehicle, pre-processing the received audio to generate acoustic feature vectors, decoding the generated acoustic feature vectors to produce at least one speech hypothesis, and post-processing the at least one speech hypothesis using pitch to improve speech recognition accuracy. The speech hypothesis can be accepted as recognized speech during post-processing if pitch is present in the received audio. Alternatively, a pitch count for the received audio can be determined, N-best speech hypotheses can be post-processed by comparing the pitch count to syllable counts associated with the speech hypotheses, and the speech hypothesis having a syllable count equal to the pitch count can be accepted as recognized speech.
    Type: Application
    Filed: December 10, 2009
    Publication date: June 16, 2011
    Applicant: GENERAL MOTORS LLC
    Inventors: Xufang Zhao, Uma Arun
  • Publication number: 20100235168
    Abstract: A communication system comprises a terminal configured for being able to communicate with a computer and to operate according to at least one operational parameter. A peripheral device for use with the terminal has a characterizing parameter associated therewith. The terminal is operable for reading the characterizing parameter from the peripheral device when the device is coupled to the terminal. The terminal is further operable for configuring itself to operate according to an operational parameter associated with the characterizing parameter of the peripheral device.
    Type: Application
    Filed: May 21, 2010
    Publication date: September 16, 2010
    Inventors: Mark David Murawski, Ryan Anthony Zoschg, James Randall Logan, Roger Graham Byford, Lawrence R. Sweeney, Douglas Mark Zatezalo
  • Publication number: 20100185447
    Abstract: Embodiments are provided for selecting and utilizing multiple recognizers to process an utterance based on a markup language document. The markup language document and an utterance are received in a computing device. One or more recognizers are selected from among the multiple recognizers for returning a results set for the utterance based on markup language in the markup language document. The results set is received from the one or more selected recognizers in a format determined by a processing method specified in the markup language document. An event is then executed on the computing device in response to receiving the results set.
    Type: Application
    Filed: January 22, 2009
    Publication date: July 22, 2010
    Applicant: Microsoft Corporation
    Inventors: Andrew K. Krumel, Pierre-Alexandre F. Masse, Joseph A. Ruff
  • Publication number: 20100179812
    Abstract: Provided are an apparatus and method for recognizing voice commands, the apparatus including: a voice command recognition unit which recognizes an input voice command; a voice command recognition learning unit which learns a recognition-targeted voice command; and a controller which controls the voice command recognition unit to recognize the recognition-targeted voice command from an input voice command, controls the voice command recognition learning unit to learn the input voice command if the voice command recognition is unsuccessful, and performs a particular operation corresponding to the recognized voice command if the voice command recognition is successful.
    Type: Application
    Filed: September 2, 2009
    Publication date: July 15, 2010
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jong-hyuk Jang, Seung-kwon Park, Jong-ho Lea
  • Publication number: 20100076763
    Abstract: A voice recognition search apparatus includes: a dictionary create unit creating a first voice recognition dictionary from a search subject data; a voice acquisition unit acquiring first and second voices; a voice recognition unit creating first and second text data by recognizing the first and second voices using the first and second voice recognition dictionaries; a first search unit searching the search subject data by the first text data; and a second search unit searching a search result of the first search unit by the second text data.
    Type: Application
    Filed: September 15, 2009
    Publication date: March 25, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kazushige Ouchi, Miwako Doi
  • Publication number: 20100070278
    Abstract: A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.
    Type: Application
    Filed: September 12, 2008
    Publication date: March 18, 2010
    Inventors: Andreas Hagen, Bryan Peltom, Kadri Hacioglu
  • Publication number: 20090222265
    Abstract: A voice recognition apparatus 10 includes a voice recognition means 12 for performing voice recognition, and a control means for controlling receipt of a voice input to the voice recognition means, and for performing recognition according to a result of the voice recognition acquired by the voice recognition means. In this voice recognition apparatus, the control means controls the receipt of a voice according to a timeout time which defines the end of the receipt of a voice. The voice recognition apparatus further includes an environmental condition detecting means 18 for detecting an environmental condition, and a timeout time control means 16 for changing the timeout time according to the environmental condition detected by the environmental condition detection means.
    Type: Application
    Filed: September 13, 2006
    Publication date: September 3, 2009
    Inventors: Ryo Iwamiya, Reiko Okada
  • Publication number: 20090216528
    Abstract: A method of adapting a neural network of an automatic speech recognition device, includes the steps of: providing a neural network including an input stage, an intermediate stage and an output stage, the output stage outputting phoneme probabilities; providing a linear stage in the neural network; and training the linear stage by means of an adaptation set; wherein the step of providing the linear stage includes the step of providing the linear stage after the intermediate stage.
    Type: Application
    Filed: June 1, 2005
    Publication date: August 27, 2009
    Inventors: Roberto Gemello, Franco Mana
  • Publication number: 20090177472
    Abstract: A node initializing unit generates a root node including inputted phonemic models. A candidate generating unit generates candidates of a pair of child sets by partitioning a set of phonemic models included in a node having no child node into two. A candidate deleting unit deletes candidates each including only phonemic models attached with determination information indicating that at least one of the child sets has a small amount of speech data for training. A similarity calculating unit calculates a sum of similarities among the phonemic models included in the child sets. A candidate selecting unit selects one of the candidates having a largest sum. A node generating unit generates two nodes including the two child sets included in the selected candidate, respectively. A clustering unit clusters the phonemic models in units of phonemic model sets each included in a node.
    Type: Application
    Filed: September 22, 2008
    Publication date: July 9, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Masaru Sakai
  • Publication number: 20090157403
    Abstract: A speech recognition apparatus generates a feature vector series corresponding to a speech signal, and recognizes a phoneme series corresponding to the feature vector series using sounds corresponding to phonemes and a phoneme language model. In addition, the speech recognition apparatus recognizes vocabulary that corresponds to the recognized phoneme series. At this time, the phoneme language model represents connection relationships between the phonemes, and is modeled according to time-variant characteristics of the phonemes.
    Type: Application
    Filed: December 12, 2008
    Publication date: June 18, 2009
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITIUTE
    Inventors: Hoon CHUNG, Yunkeun Lee
  • Publication number: 20090132233
    Abstract: A translation graph is created using a plurality of reference sources that include translations between a plurality of different languages. Each entry in a source is used to create a wordsense entry, and each new word in a source is used to create a wordnode entry. A pair of wordnode and wordsense entries corresponds to a translation. In addition, a probability is determined for each wordsense entry and is decreased for each translation entry that includes more than a predefined number of translations into the same language. Bilingual translation entries are removed if subsumed by a multilingual translation entry. Triangulation is employed to identify pairs of common wordsense translations between a first, second, and third language. Translations not found in reference sources can also be inferred from the data comprising the translation graph. The translation graph can then be used for searches of a data collection in different languages.
    Type: Application
    Filed: November 21, 2007
    Publication date: May 21, 2009
    Applicant: University of Washington
    Inventors: Oren Etzioni, Kobi Reiter, Marcus Sammer, Michael Schmitz, Stephen Soderland
  • Publication number: 20090112590
    Abstract: Disclosed are systems and methods for dynamically interacting with a user through a spoken dialogue system. A method includes the steps of (1) receiving a user utterance, (2) analyzing the user utterance for a threshold determination of dialect, (3) generating a response that reflects an incremental implementation of the dialect, (4) further varying the perceived implementation of the dialect in subsequent responses by a process of: (a) receiving a subsequent user utterance, (b) determining a modified level of confidence in the dialect based at least in part from the subsequent utterance, (c) generating a subsequent response that implements an incremental variation according to the modified level of confidence.
    Type: Application
    Filed: October 30, 2007
    Publication date: April 30, 2009
    Applicant: AT&T Corp.
    Inventors: Gregory Pulz, Harry E. Blanchard, Steven H. Lewis, Lan Zhang
  • Publication number: 20090043573
    Abstract: A method and apparatus for identifying a speaker within a captured audio signal from a collection of known speakers. The method and apparatus receive or generate voice representations for each known speakers and tag the representations according to meta data related to the known speaker or to the voice. The representations are grouped into one or more groups according to the indices. When a voice to be recognized is introduced, characteristics are determined according to which the groups are prioritized, so that the representations participating only in part of the groups are matched against the o voice to be identified, thus reducing identification time and improving the statistical significance.
    Type: Application
    Filed: August 9, 2007
    Publication date: February 12, 2009
    Applicant: NICE SYSTEMS LTD.
    Inventors: Adam WEINBERG, Irit OPHER, Eyal BENAROYA, Renan GUTMAN
  • Publication number: 20080319750
    Abstract: Monitoring a spoken-word audio stream for a relevant concept is disclosed. A speech recognition engine may recognize a plurality of words from the audio stream. Function words that do not indicate content may be removed from the plurality of words. A concept may be determined from at least one word recognized from the audio stream. The concept may be determined via a morphological normalization of the plurality of words. The concept may be associated with a time related to when the at least one word was spoken. A relevance metric may be computed for the concept. Computing the relevance metric may include assessing the temporal frequency of the concept within the audio stream. The relevance metric for the concept may be based on respective confidence scores of the at least one word. The concept, time, and relevance metric may be displayed in a graphical display.
    Type: Application
    Filed: June 20, 2007
    Publication date: December 25, 2008
    Applicant: Microsoft Corporation
    Inventors: Stephen Frederick Potter, Tal Saraf, David Gareth Ollason, Steve Sung-Nam Chang
  • Publication number: 20080300881
    Abstract: [Object] To provide recognition of natural speech for a speech application in a grammar method with little effort and cost.
    Type: Application
    Filed: June 20, 2008
    Publication date: December 4, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hiroaki Kashima, Yoshinori Tahara, Daisuke Tomoda
  • Publication number: 20080249778
    Abstract: Communications between users of different modalities are enabled by a single integrated platform that allows both the input of voice (from a telephone, for example) to be realized as text (such as an interactive text message) and allows the input of text (from the interactive text messaging application, for example) to be realized as voice (on the telephone). Real-time communication may be enabled between any permutation of any number of text devices (desktop, PDA, mobile telephone) and voice devices (mobile telephone, regular telephone, etc.). A call to a text device user may be initiated by a voice device user or vice versa.
    Type: Application
    Filed: April 3, 2007
    Publication date: October 9, 2008
    Applicant: Microsoft Corporation
    Inventors: William F. Barton, Francisco M. Galanes, Lawrence M. Ockene, Anand Ramakrishna, Tal Saraf
  • Publication number: 20080215329
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Application
    Filed: March 28, 2008
    Publication date: September 4, 2008
    Applicant: International Business Machines Corporation
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Publication number: 20080126094
    Abstract: A method, system, and computer program for generating a recognition model set. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.
    Type: Application
    Filed: July 10, 2007
    Publication date: May 29, 2008
    Inventors: Eric W. Janke, Bin Jia