Markov Patents (Class 704/256)
  • Patent number: 10341785
    Abstract: The application relates to a hearing device comprising a) an input unit for delivering a time varying electric input signal representing an audio signal comprising at least two sound sources, b) a cyclic analysis buffer unit of length A adapted for storing the last A audio samples, c) a cyclic synthesis buffer unit of length, where L is smaller than A, adapted for storing the last L audio samples, which are intended to be separated in individual sound sources, d) a database having stored recorded sound examples from said at least two sound sources, each entry in the database being termed an atom, the atoms originating from audio samples from first and second buffers corresponding in size to said synthesis and analysis buffer units, where for each atom, the audio samples from the first buffer overlaps with the audio samples from the second buffer, and where atoms originating from the first buffer constitute a reconstruction dictionary, and where atoms originating from the second buffer constitute an analysis d
    Type: Grant
    Filed: October 5, 2015
    Date of Patent: July 2, 2019
    Assignee: OTICON A/S
    Inventors: Thomas Barker, Tuomas Virtanen, Niels Henrik Pontoppidan
  • Patent number: 10249290
    Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
    Type: Grant
    Filed: June 11, 2018
    Date of Patent: April 2, 2019
    Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
  • Patent number: 10176819
    Abstract: A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech is converted into a converted speech based on the PPG and the mapping.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: January 8, 2019
    Assignee: The Chinese University of Hong Kong
    Inventors: Lifa Sun, Kun Li, Hao Wang, Shiyin Kang, Mei Ling Helen Meng
  • Patent number: 10140973
    Abstract: Systems, methods, and devices for generating text-to-speech output using previously captured speech are described. Spoken audio is obtained and undergoes speech processing to create text. The resulting text is stored with the spoken audio, with both the text and the spoken audio being associated with the individual that spoke the audio. Various spoken audio and corresponding text are stored over time to create a library of speech units. When the individual sends a text message to a recipient, the text message is processed to determine portions of text, and the portions of text are compared to the library of text associated with the individual. When text in the library is identified, the system selects the spoken audio units associated with the identified stored text. The selected spoken audio units are then used to generate output audio data corresponding to the original text message, with the output audio data being sent to a device of the message recipient.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: November 27, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Manish Kumar Dalmia, Rafal Kuklinski
  • Patent number: 10109271
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractors in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Grant
    Filed: May 19, 2014
    Date of Patent: October 23, 2018
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Patent number: 10102849
    Abstract: Techniques for generating language models. The techniques include: obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and generating a language model for each of the plurality of clusters.
    Type: Grant
    Filed: March 24, 2017
    Date of Patent: October 16, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Paul J. Vozila, Wilson Tam, Nils Lenke
  • Patent number: 10074384
    Abstract: A state estimating apparatus includes: a spectrum calculating unit which calculates a power spectrum for each of a plurality of frequencies on a frame-by-frame basis from a voice signal containing voice of a first speaker and voice of a second speaker transmitted over a telephone line; a band power calculating unit which calculates power of a non-transmission band on a frame-by-frame basis, based on the power spectra of frequencies contained in the non-transmission band among the plurality of frequencies; a transmitted-voice judging unit which determines that any frame whose power in the non-transmission band is greater than a threshold value indicating the presence of voice carries the voice of the first speaker; and a state judging unit which judges whether the state of mind of the first speaker is normal or abnormal, based on the frame judged to carry the voice of the first speaker.
    Type: Grant
    Filed: February 12, 2014
    Date of Patent: September 11, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Shoji Hayakawa
  • Patent number: 10056076
    Abstract: According to some embodiments of the present invention there is provided a computerized method for speech processing using a Gaussian Mixture Model. The method comprises the action of receiving by hardware processor(s) two or more covariance values representing relationships between distributions of speech coefficient values that represent two or more audible input speech signals recorded by a microphone. The method comprises the action of computing two or more eigenvectors and eignevalues using a principle component analysis of the covariance values and transforming the speech coefficient values using the eigenvectors and computing two or more second covariance values from the transformed speech coefficient values. The method comprises the action of modifying some of the second covariance values according to the eignevalues, the covariance values, and two or more indices of the speech coefficient values. The second covariance values to the speech processor comprising the Gaussian Mixture Model.
    Type: Grant
    Filed: September 6, 2015
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventor: Hagai Aronowitz
  • Patent number: 9997154
    Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
    Type: Grant
    Filed: May 12, 2014
    Date of Patent: June 12, 2018
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
  • Patent number: 9959341
    Abstract: In some embodiments, a system is provided comprising at least one processor programmed to process an input text to identify a plurality of semantic patterns that match the input text, wherein, for at least one semantic pattern of the plurality of semantic patterns: the at least one semantic pattern comprises a plurality of semantic entities identified from the at least one input text, and the plurality of semantic entities occur in a common context within the at least one input text. The at least one processor may be further programmed to use statistical information derived from training data to associate a respective weight with each semantic pattern of the plurality of semantic patterns.
    Type: Grant
    Filed: June 11, 2015
    Date of Patent: May 1, 2018
    Assignee: Nuance Communications, Inc.
    Inventor: Jan Curin
  • Patent number: 9911412
    Abstract: Methods, devices, and computer program products for recognizing and responding to natural language input are described herein. Natural language input is received at a natural language input interface of a computing device and transformed into computer-usable text. A natural language input recognizer obtains evidence from one or more evidence source and generates an evidence graph based on the evidence obtained. Evidence may be obtained asynchronously, and the natural language input recognizer may update the evidence graph upon receipt of additional evidence. The natural language input recognizer generates a set of recognition hypotheses based on the evidence graph and selects one of the recognition hypotheses as a recognition result for the natural language input. Semantic models, evidence models, and response models may be employed to generate the evidence graph and respond to the recognition result selected for the natural language input.
    Type: Grant
    Filed: March 6, 2015
    Date of Patent: March 6, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Martin Labský, Ladislav Kunc, Jan Kleindienst, Tomás Macek, Bart D'hoore
  • Patent number: 9906414
    Abstract: A network reliability evaluation method for an acyclic network and a system thereof are provided. The network reliability evaluation method searches all minimal paths from a source node to a sink source based on nodes according to the properties of the acyclic network. In conventional network reliability evaluation methods, searching all minimal paths is based on arcs. Therefore, the network reliability evaluation method and the system thereof can effectively reduce the complexity of calculation.
    Type: Grant
    Filed: September 8, 2015
    Date of Patent: February 27, 2018
    Assignee: NATIONAL TSING HUA UNIVERSITY
    Inventor: Wei-Chang Yeh
  • Patent number: 9865257
    Abstract: A controller for a dialog manager, the dialog manager being configured to receive a representation of an input utterance from a user and control the flow of conversation with the user. The controller is configured to implement a parameterized policy for defining the behavior of a dialog manager. The parameterized policy is configured to operate with an ontology-independent parameter as an input. The controller has a processor for parameterizing an end user ontology such that the parameterized policy can define the behavior of a dialog manager for the end user ontology. The processor is configured to define a slot of the end user ontology in terms of at least one of the ontology-independent parameters such that it is suitable for being used as an input to the parameterized policy.
    Type: Grant
    Filed: April 29, 2016
    Date of Patent: January 9, 2018
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Zhuoran Wang, Yannis Stylianou
  • Patent number: 9842102
    Abstract: A method of generating ontologies for a Virtual Assistant across different languages may include extracting a plurality of tokens in a first language from a plurality of web resources in a web domain that includes the Virtual Assistant. The web resources may be made available in a first language and a second language. The method may also include determining a first part-of-speech (POS) for each of the plurality of tokens, where the first POS may be specific to the first language. The method may additionally include mapping the first POS to a second POS from a standardized set of POS's that are general across the first language and the second language, and generating a plurality of lemmas from the plurality of tokens. The method may further include displaying a network representing the ontology.
    Type: Grant
    Filed: July 7, 2015
    Date of Patent: December 12, 2017
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Margaret Salome, Fabrice Nauze, Christian Kissig, Maria Begona Villada-Moiron
  • Patent number: 9711148
    Abstract: A processing system receives an audio signal encoding an utterance and determines that a first portion of the audio signal corresponds to a predefined phrase. The processing system accesses one or more text-dependent models associated with the predefined phrase and determines a first confidence based on the one or more text-dependent models associated with the predefined phrase, the first confidence corresponding to a first likelihood that a particular speaker spoke the utterance. The processing system determines a second confidence for a second portion of the audio signal using one or more text-independent models, the second confidence corresponding to a second likelihood that the particular speaker spoke the utterance. The processing system then determines that the particular speaker spoke the utterance based at least in part on the first confidence and the second confidence.
    Type: Grant
    Filed: July 18, 2013
    Date of Patent: July 18, 2017
    Assignee: Google Inc.
    Inventors: Matthew Sharifi, Dominik Roblek
  • Patent number: 9659568
    Abstract: A method of decoding an audio signal, and which includes extracting spectral data and a loss signal compensation parameter from an audio signal bitstream; detecting a loss signal based on the spectral data; generating first compensation data corresponding to the loss signal using a random signal based on the loss signal compensation parameter; generating a scale factor by adding a scale factor difference value to a scale factor reference value if the scale factor corresponds to a band quantized to zero, and the scale factor reference value is included in the loss signal compensation parameter; and generating second compensation data by applying the scale factor to the first compensation data.
    Type: Grant
    Filed: December 31, 2008
    Date of Patent: May 23, 2017
    Assignee: LG Electronics Inc.
    Inventors: Jae Hyun Lim, Dong Soo Kim, Hyun Kook Lee, Sung Yong Yoon, Hee Suk Pang
  • Patent number: 9626970
    Abstract: Embodiments of the present invention relate to speaker identification using spatial information. A method of speaker identification for audio content being of a format based on multiple channels is disclosed. The method comprises extracting, from a first audio clip in the format, a plurality of spatial acoustic features across the multiple channels and location information, the first audio clip containing voices from a speaker, and constructing a first model for the speaker based on the spatial acoustic features and the location information, the first model indicating a characteristic of the voices from the speaker. The method further comprises identifying whether the audio content contains voices from the speaker based on the first model. Corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: April 18, 2017
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Shen Huang, Xuejing Sun
  • Patent number: 9613638
    Abstract: Systems and methods are provided for generating an intelligibility score for speech of a non-native speaker. Words in a speech recording are identified using an automated speech recognizer, where the automated speech recognizer provides a string of words identified in the speech recording, and where the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words. For a particular word in the string of words, a context metric value is determined based upon a usage of the particular word within the string of words. An acoustic score for the particular word is determined based on the acoustic model likelihood score for the particular word from the automated speech recognizer. An intelligibility score is determined for the particular word based on the acoustic score for the particular word and the context metric value for the particular word.
    Type: Grant
    Filed: February 26, 2015
    Date of Patent: April 4, 2017
    Assignee: Educational Testing Service
    Inventors: Anastassia Loukina, Keelan Evanini
  • Patent number: 9570056
    Abstract: An audio data synthesis method including a time of a plurality of audio data is adjusted without using a device which can acquire the standard time. Specifically, audio data is obtained based on synchronized recording of the first and second recorders without using standard time. A time difference is calculated between an own terminal and another terminal, based on the time at which output of a first sound from the audio output module is started, a time at which input of a sound corresponding to the audio data to the audio input module is started, a time indicated by the first information, and a time indicated by the second information. Second and third audio data is synthesized after a time difference between the second and third audio data based on the third sound which is input to the audio input module is adjusted, based on the time difference.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: February 14, 2017
    Assignee: OLYMPUS CORPORATION
    Inventor: Ryuichi Kiyoshige
  • Patent number: 9560991
    Abstract: A system for establishing a hearing ability model of a hearing ability of a person, includes a data storage configured to store a representation of a distribution of a hearing ability of a population of individuals, and a processor configured to establish a hearing ability model representing a hearing ability of the person based at least in part on (i) information regarding a person's response to a stimulus of a hearing evaluation event, and (ii) the representation of the distribution of the hearing ability of the population.
    Type: Grant
    Filed: April 24, 2009
    Date of Patent: February 7, 2017
    Assignee: GN HEARING A/S
    Inventors: Aalbert de Vries, Svante Sten Johan Stadler, Arne Leijon, Tjeerd Maarten Hein Dijkstra, Alexander Ypma
  • Patent number: 9542927
    Abstract: A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.
    Type: Grant
    Filed: November 13, 2014
    Date of Patent: January 10, 2017
    Assignee: Google Inc.
    Inventors: Ioannis Agiomyrgiannakis, Alexander Gutkin
  • Patent number: 9530412
    Abstract: Systems, methods, and computer-readable storage devices are for an event-driven multi-agent architecture improves via a semi-hierarchical multi-agent reinforcement learning approach. A system receives a user input during a speech dialog between a user and the system. The system then processes the user input, identifying an importance of the user input to the speech dialog based on a user classification and identifying a variable strength turn-taking signal inferred from the user input. An utterance selection agent selects an utterance for replying to the user input based on the importance of the user input, and a turn-taking agent determines whether to output the utterance based on the utterance, and the variable strength turn-taking signal. When the turn-taking agent indicates the utterance should be output, the system selects when to output the utterance.
    Type: Grant
    Filed: August 29, 2014
    Date of Patent: December 27, 2016
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Ethan Selfridge
  • Patent number: 9530116
    Abstract: Systems, methods, devices, and non-transitory storage media provide a web-based graphical user interface to allow a user to control a softphone of a user device. A context handler operates on the user device and manages a page context of the web-based graphical user interface. A call manager operates as a RESTful web service on the user device. The context handler and the call manager communicate using JavaScript Object Notation with Padding (JSONP) messages. The call manager includes a finite state machine that indicates a current state of the softphone. The call manager generates application programming interface calls to the softphone in response to receiving JSONP messages. The user interface is updated to reflect the current state of the softphone based on execution of user selections by the softphone. The context handler chunks data into multiple JSONP messages if data exceeds a data size limitation of the JSONP.
    Type: Grant
    Filed: May 28, 2013
    Date of Patent: December 27, 2016
    Assignee: VERIZON PATENT AND LICENSING INC.
    Inventors: Manah M. Khalil, Mohammad Y. Salman
  • Patent number: 9524730
    Abstract: A system receives monaural sound which includes speech and background noises. The received sound is divided by frequency and time into time-frequency units (TFUs). Each TFU is classified as speech or non-speech by a processing unit. The processing unit for each frequency range includes at least one of a deep neural network (DNN) or a linear support vector machine (LSVM). The DNN extracts and classifies the features of the TFU and includes a pre-trained stack of Restricted Boltzmann Machines (RBM), and each RBM includes a visible and a hidden layer. The LSVM classifies each TFU based on extracted features from the DNN, including those from the visible layer of the first RBM, and those from the hidden layer of the last RBM in the stack. The LSVM and DNN include training with a plurality of training noises. Each TFU classified as speech is output.
    Type: Grant
    Filed: March 29, 2013
    Date of Patent: December 20, 2016
    Assignee: OHIO STATE INNOVATION FOUNDATION
    Inventors: DeLiang Wang, Yuxuan Wang
  • Patent number: 9520123
    Abstract: A system and method for concatenative speech synthesis is provided. Embodiments may include accessing, using one or more computing devices, a plurality of speech synthesis units from a speech database and determining a similarity between the plurality of speech synthesis units. Embodiments may further include retrieving two or more speech synthesis units having the similarity and pruning at least one of the two or more speech synthesis units based upon, at least in part, the similarity.
    Type: Grant
    Filed: March 19, 2015
    Date of Patent: December 13, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Heng Lu, Xu Shao, Wei Zhang, Wenhui Lei, Andrew Breen
  • Patent number: 9484032
    Abstract: The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words in said multimedia content based on a measure of emphasis laid on each word in said multimedia content and said one or more timestamps associated with said one or more words. The method further includes presenting one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud. Each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content.
    Type: Grant
    Filed: October 27, 2014
    Date of Patent: November 1, 2016
    Assignee: Xerox Corporation
    Inventors: Kuldeep Yadav, Kundan Shrivastava, Om D Deshmukh
  • Patent number: 9449598
    Abstract: Features are disclosed for performing speech recognition on utterances using a grammar and a statistical language model, such as an n-gram model. States of the grammar may correspond to states of the statistical language model. Speech recognition may be initiated using the grammar. At a given state of the grammar, speech recognition may continue at a corresponding state of the statistical language model. Speech recognition may continue using the grammar in parallel with the statistical language model, or it may continue using the statistical language model exclusively. Scores associated with the correspondences between states (e.g., backoff arcs) may be determined according to a heuristically or based on test data.
    Type: Grant
    Filed: September 26, 2013
    Date of Patent: September 20, 2016
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Bjorn Hoffmeister, Sri Venkata Surya Siva Rama Krishna Garimella, Rohit Krishna Prasad
  • Patent number: 9393412
    Abstract: An audio-cochlear implant processor device for a hearing impaired listener is described. An input signal decoder decodes an audio input data signal into a corresponding multi-channel audio signal representing multiple audio objects. An audio processor adjusts the multi-channel audio output based on user-specific hearing impairment characteristics to produce a processed audio output to a cochlear implant audio processor with auditory scene analysis (ASA) cues for the hearing impaired listener.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: July 19, 2016
    Assignee: MED-EL Elektromedizinische Geraete GmbH
    Inventors: Stefan Strahl, Cornelia Falch
  • Patent number: 9244901
    Abstract: A method, a computer readable medium, and a system for tagging Natural language application. In the method, utterances are analyzed using one or more rules, and a tag is assigned to the analyzed utterances based on the one or more rules. The analysis of the utterances may include determining a frequency of the utterances, and grouping the utterances by their determined frequency and their assigned tag. The frequency may represent a number of occurrences in natural language where the utterances share semantic meanings, for example. Further, the one or more rules may be prioritized, and each of the utterances may be analyzed using the prioritized rules. In this manner, meaning may be assigned to utterances such that groups of utterances may be tagged simultaneously.
    Type: Grant
    Filed: April 22, 2008
    Date of Patent: January 26, 2016
    Assignee: WEST CORPORATION
    Inventors: Kiyomi Murata, Steven John Schanbacher, Aaron Scott Fisher
  • Patent number: 9224386
    Abstract: Features are disclosed for discriminative training of speech recognition language models. A confusion matrix can be generated from acoustic model training data for use in discriminative training. The confusion matrix can include probabilities for the substitution, insertion, and/or deletion of some or all subword units of a language. Probabilities can be calculated based on the presence or absence of subword units in a processed acoustic model training data audio recording when compared to a correct transcription of the recording. The probabilities can be used to generate erroneous transcriptions in language model training corpora, and the language model can be trained to distinguish the erroneous transcriptions from the correct transcriptions.
    Type: Grant
    Filed: June 22, 2012
    Date of Patent: December 29, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Frederick V. Weber
  • Patent number: 9158652
    Abstract: An airborne, gas, or liquid particle sensor with one or more intelligent modules either within the instrument or attached to the instrument. These modules comprising sub-systems with local controllers or memory.
    Type: Grant
    Filed: March 15, 2014
    Date of Patent: October 13, 2015
    Assignee: Particles Plus, Inc.
    Inventor: David Pariseau
  • Patent number: 9159319
    Abstract: Keyword spotting may be improved by using a competitor model. In some embodiments, audio data is received by a device. At least a portion of the audio data may be compared with a keyword model to obtain a first score. The keyword model may model a keyword. The portion of the audio data may also be compared with a competitor model to obtain a second score. The competitor model may model a competitor word, which may be a word that is similar to the keyword. The device may compare the first score and the second score to determine if a keyword is spoken.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: October 13, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Bjorn Hoffmeister
  • Patent number: 9142209
    Abstract: A method and apparatus receive multiple data pattern analysis requests from a controller and substantially simultaneously perform, with multiple data pattern analysis units, multiple data pattern analyses on one or more portions of a data stream.
    Type: Grant
    Filed: April 22, 2014
    Date of Patent: September 22, 2015
    Assignee: CYPRESS SEMICONDUCTOR CORPORATION
    Inventors: Richard Fastow, Qamrul Hasan
  • Patent number: 9135248
    Abstract: Systems, methods, and devices for determining contexts and determining associated demographic profiles using information received from multiple demographic sensor enabled electronic devices, are disclosed. Contexts can be defined by a description of spatial and/or temporal components. Such contexts can be arbitrarily defined using semantically meaningful and absolute descriptions of time and location. Demographic sensor data is associated with or includes context data that describes the circumstances under which the data was determined. The demographic sensor data can include demographic sensor readings that are implicit indications of a demographic for the context. The sensor data can also include user reported data with explicit descriptions of a demographic for the context. The demographic sensor data can be filtered by context data according a selected context.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: September 15, 2015
    Assignee: ARRIS Technology, Inc.
    Inventors: Jianguo Li, Mir F. Ali, Paul C. Davis, Dale W. Russell, Di You
  • Patent number: 9105053
    Abstract: An approach is provided for determining a user age range. An age estimator causes, at least in part, acquisition of voice data. Next, the age estimator calculates a first set of probability values, wherein each of the probability values represents a probability that the voice data is in a respective one of a plurality of predefined age ranges, and the predefined age ranges are segments of a lifespan. Then, the age estimator derives a second set of probability values by applying a correlation matrix to the first set of probability values, wherein the correlation matrix associates the first set of probability values with probabilities of the voice data matching individual ages over the lifespan. Then, the age estimator, for each of the predefined age ranges, calculates a sum of the probabilities in the second set of probability values corresponding to the individual ages within the respective predefined age ranges.
    Type: Grant
    Filed: March 23, 2010
    Date of Patent: August 11, 2015
    Assignee: Nokia Technologies Oy
    Inventors: Yang Cao, Feng Ding, Jilei Tian
  • Patent number: 9092757
    Abstract: The disclosure herein relates to a system and method for personalizing an online experience of a user based on the user's attitude. Attitude models are constructed from user activity data that are able to infer or determine attitudes for a user. Based on the attitudes derived from applying the attitude models, attitude profiles are created for the users. As a users engage in various types of online interactions, the attitude profiles associated with the users can be utilized to personalize the online experience of the user.
    Type: Grant
    Filed: May 9, 2012
    Date of Patent: July 28, 2015
    Assignee: Yahoo! Inc.
    Inventors: Judd Antin, David Ayman Shamma, Elizabeth Churchill
  • Patent number: 9082401
    Abstract: The present disclosure describes example systems, methods, and devices for generating a synthetic speech signal. An example method may include determining a phonemic representation of text. The example method may also include identifying one or more finite-state machines (“FSMs”) corresponding to one or more phonemes included in the phonemic representation of the text. A given FSM may be a compressed unit of recorded speech that simulates a Hidden Markov Model. The example method may further include determining a selected sequence of models that minimizes a cost function that represents a likelihood that a possible sequence of models substantially matches a phonemic representation of text. Each possible sequence of models may include at least one FSM. The method may additionally include generating a synthetic speech signal based on the selected sequence that includes one or more spectral features generated from at least one FSM included in the selected sequence.
    Type: Grant
    Filed: January 9, 2013
    Date of Patent: July 14, 2015
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin
  • Patent number: 9077522
    Abstract: A two-party approximation protocol is transformed into a private approximation protocol. A first input x?{0, 1, . . . , M}n and a second input y?{0, 1, . . . , M}n of a two party approximation protocol approximating a function of a form ƒ(x, y)=?j=1ng (xj, yj) is received. Variable B is set as a public upper bound on ƒ(x, y). Variable l is set l=O*(1). The following is performed until ? j = 1 l ? z j ? l t or B<1, where t is an arbitrary number: (1) a private importance sampling protocol with the first input x, the second input y, and a third input 1k, is executed independently for j?[l], where k is a security parameter, an output of the private importance sampling protocol is shares of Ij?[n]?{?}; (2) l coin tosses z1, . . . , zl are independently generated where zj=1 iff Ij??; and (3) B is divided by 2 if ? j = 1 l ? z j ? l t or B<1 is not satisfied.
    Type: Grant
    Filed: August 8, 2012
    Date of Patent: July 7, 2015
    Assignee: International Business Machines Corporation
    Inventor: David Paul Woodruff
  • Patent number: 9037464
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: May 19, 2015
    Assignee: Google Inc.
    Inventors: Tomas Mikolov, Kai Chen, Gregory S. Corrado, Jeffrey A. Dean
  • Patent number: 9031844
    Abstract: A method includes an act of causing a processor to access a deep-structured model retained in a computer-readable medium, the deep-structured model includes a plurality of layers with respective weights assigned to the plurality of layers, transition probabilities between states, and language model scores. The method further includes the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: May 12, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Li Deng, Abdel-rahman Samir Abdel-rahman Mohamed
  • Patent number: 9026442
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: May 5, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 9020820
    Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.
    Type: Grant
    Filed: April 13, 2012
    Date of Patent: April 28, 2015
    Assignee: Fujitsu Limited
    Inventors: Shoji Hayakawa, Naoshi Matsuo
  • Patent number: 9015044
    Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: April 21, 2015
    Assignee: Malaspina Labs (Barbados) Inc.
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
  • Patent number: 9009039
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: April 14, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Patent number: 8972260
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: March 3, 2015
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Patent number: 8949125
    Abstract: Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user's request.
    Type: Grant
    Filed: June 16, 2010
    Date of Patent: February 3, 2015
    Assignee: Google Inc.
    Inventor: Gal Chechik
  • Patent number: 8942975
    Abstract: Techniques are described herein that suppress noise in a Mel-filtered spectral domain. For example, a window may be applied to a representation of a speech signal in a time domain. The windowed representation in the time domain may be converted to a subsequent representation of the speech signal in the Mel-filtered spectral domain. A noise suppression operation may be performed with respect to the subsequent representation to provide noise-suppressed Mel coefficients.
    Type: Grant
    Filed: March 22, 2011
    Date of Patent: January 27, 2015
    Assignee: Broadcom Corporation
    Inventor: Jonas Borgstrom
  • Patent number: 8935170
    Abstract: A speech recognition system, according to an example embodiment, includes a data storage to store speech training data. A training engine determines consecutive breakout periods in the speech training data, calculates forward and backward probabilities for the breakout periods, and generates a speech recognition Hidden Markov Model (HMM) from the forward and backward probabilities calculated for the breakout periods.
    Type: Grant
    Filed: November 27, 2012
    Date of Patent: January 13, 2015
    Assignee: Longsand Limited
    Inventor: Maha Kadirkamanathan
  • Patent number: 8930183
    Abstract: A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: receiving a speech input from a first voice, dividing said speech input into a plurality of frames; mapping the speech from the first voice to a second voice; and outputting the speech in the second voice, wherein mapping the speech from the first voice to the second voice comprises, deriving kernels demonstrating the similarity between speech features derived from the frames of the speech input from the first voice and stored frames of training data for said first voice, the training data corresponding to different text to that of the speech input and wherein the mapping step uses a plurality of kernels derived for each frame of input speech with a plurality of stored frames of training data of the first voice.
    Type: Grant
    Filed: August 25, 2011
    Date of Patent: January 6, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Byung Ha Chun, Mark John Francis Gales
  • Patent number: 8924214
    Abstract: A method for detecting and recognizing speech is provided that remotely detects body motions from a speaker during vocalization with one or more radar sensors. Specifically, the radar sensors include a transmit aperture that transmits one or more waveforms towards the speaker, and each of the waveforms has a distinct wavelength. A receiver aperture is configured to receive the scattered radio frequency energy from the speaker. Doppler signals correlated with the speaker vocalization are extracted with a receiver. Digital signal processors are configured to develop feature vectors utilizing the vocalization Doppler signals, and words associated with the feature vectors are recognized with a word classifier.
    Type: Grant
    Filed: June 7, 2011
    Date of Patent: December 30, 2014
    Assignee: The United States of America, as represented by the Secretary of the Navy
    Inventors: Jefferson M Willey, Todd Stephenson, Hugh Faust, James P. Hansen, George J Linde, Carol Chang, Justin Nevitt, James A Ballas, Thomas Herne Crystal, Vincent Michael Stanford, Jean W. De Graaf