Specialized Model Patents (Class 704/266)
  • Patent number: 12008921
    Abstract: Systems and methods are described for grapheme-phoneme correspondence learning. In an example, a display of a device is caused to output a grapheme graphical user interface (GUI) that includes a grapheme. Audio data representative of a sound made by the human user is received based on the grapheme shown on the display. A grapheme-phoneme model can determine whether the sound made by the human corresponds to a phoneme for the displayed grapheme based on the audio data. The grapheme-phoneme model is trained based on augmented spectrogram data. A speaker is caused to output a sound representative of the phoneme for the grapheme to provide the human with a correct pronunciation of the grapheme in response to the grapheme-phoneme model determining that the sound made by the human does not correspond to the phoneme for the grapheme.
    Type: Grant
    Filed: January 10, 2023
    Date of Patent: June 11, 2024
    Assignee: 617 Education Inc.
    Inventor: Tom Dillon
  • Patent number: 11929060
    Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.
    Type: Grant
    Filed: February 8, 2021
    Date of Patent: March 12, 2024
    Assignee: Google LLC
    Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
  • Patent number: 11856369
    Abstract: A hearing aid system presents a hearing impaired user with customized enhanced intelligibility sound in a preferred language. The system includes a model trained with a set of source speech data representing sampling from a speech population relevant to the user. The model is also trained with a set of corresponding alternative articulation of source data, pre-defined or algorithmically constructed during an interactive session with the user. The model creates a set of selected target speech training data from the set of alternative articulation data that is preferred by the user as being satisfactorily intelligible and clear. The system includes a machine learning model, trained to shift incoming source speech data to a preferred variant of the target data that the hearing aid system presents to the user.
    Type: Grant
    Filed: May 2, 2021
    Date of Patent: December 26, 2023
    Inventor: Abbas Rafii
  • Patent number: 11847419
    Abstract: Disclosed herein are system, method, and computer program product embodiments for recognizing a human emotion in a message. An embodiment operates by receiving a message from a user. The embodiment labels each word of the message with a part of speech (POS) thereby creating a POS set. The embodiment determines an incongruity score for a combination of words in the POS set using a knowledgebase. The embodiment determines a preliminary emotion detection score for an emotion for the message based on the POS set. Finally, the embodiment calculates a final emotion detection score for the emotion for the message based on the preliminary emotion detection score and the incongruity score.
    Type: Grant
    Filed: October 1, 2021
    Date of Patent: December 19, 2023
    Assignee: VIRTUAL EMOTION RESOURCE NETWORK, LLC
    Inventors: Craig Tucker, Bryan Novak
  • Patent number: 11830474
    Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification.
    Type: Grant
    Filed: January 6, 2022
    Date of Patent: November 28, 2023
    Assignee: Google LLC
    Inventors: Rakesh Iyer, Vincent Wan
  • Patent number: 11721319
    Abstract: An artificial intelligence device includes a memory and a processor. The memory is configured to store audio data having a predetermined speech style. The processor is configured to generate a condition vector relating to a condition for determining the speech style of the audio data, reduce a dimension of the condition vector to a predetermined reduction dimension, acquire a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension, and change a vector element value included in the sparse code vector.
    Type: Grant
    Filed: February 27, 2020
    Date of Patent: August 8, 2023
    Assignee: LG ELECTRONICS INC.
    Inventors: Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang
  • Patent number: 11676496
    Abstract: Systems and methods to identify a query parameter in an incoming flight voice or data communication to respond to a request. A processing system configured to: in response to receipt of a clearance message, decode the clearance message to determine whether the clearance message contains a command instruction or clearance data for a flight, and to present the command instruction to a pilot as notice to execute the command instruction or if available, obtain at least one query parameter from the clearance data to configure in a query operation to present in response to a pilot question about the command instruction. In response to receipt of the voice or data communication, determine further an intent within the voice or data communication of a question or instruction voiced by applying an acoustic model for tagging identified parts about the question or instruction voiced with query parameters in response to the pilot.
    Type: Grant
    Filed: June 2, 2020
    Date of Patent: June 13, 2023
    Assignee: HONEYWELL INTERNATIONAL INC.
    Inventors: Hariharan Saptharishi, Gobinathan Baladhandapani, Mahesh Kumar Sampath, Sivakumar Kanagarajan
  • Patent number: 11657104
    Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: obtaining an utterance input from a user agent, and collecting context data of the utterance input. A context tag is generated based on the context data, and one or more ground truth having respective utterance semantically identical to the utterance input is selected. Semantical relationship between the context tag and an intent of the selected ground truth is examined and the selected ground truth is updated with the context tag.
    Type: Grant
    Filed: October 21, 2019
    Date of Patent: May 23, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Faheem Altaf, Lisa Seacat Deluca, Raghuram Srinivas
  • Patent number: 11521616
    Abstract: The method S200 can include: at an aircraft, receiving an audio utterance from air traffic control S210, converting the audio utterance to text, determining commands from the text using a question-and-answer model S240, and optionally controlling the aircraft based on the commands S250. The method functions to automatically interpret flight commands from the air traffic control (ATC) stream.
    Type: Grant
    Filed: April 13, 2022
    Date of Patent: December 6, 2022
    Assignee: Merlin Labs, Inc.
    Inventors: Michael Pust, Joseph Bondaryk, Matthew George
  • Patent number: 11514888
    Abstract: A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation tor the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.
    Type: Grant
    Filed: August 13, 2020
    Date of Patent: November 29, 2022
    Assignee: Google LLC
    Inventors: Lev Finkelstein, Chun-An Chan, Byungha Chun, Ye Jia, Yu Zhang, Robert Andrew James Clark, Vincent Wan
  • Patent number: 11508380
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a plurality of speech inputs is received from a first user. A voice model is obtained based on the plurality of speech inputs. A user input is received from the first user, the user input corresponding to a request to provide access to the voice model. The voice model is provided to a second electronic device.
    Type: Grant
    Filed: May 26, 2020
    Date of Patent: November 22, 2022
    Assignee: Apple Inc.
    Inventors: Qiong Hu, Jiangchuan Li, David A. Winarsky
  • Patent number: 11430438
    Abstract: An electronic device includes a microphone, a communication circuit, and a processor configured to obtain a user's utterance through the microphone, transmit first information about the utterance through the communication circuit to an external server for at least partially automatic speech recognition (ASR) or natural language understanding (NLU), obtain a second text from the external server through the communication circuit, the second text being a text resulting from modifying at least part of a first text included in a neutral response to the utterance based on parameters corresponding to the user's conversation style and emotion identified based on the first information, and provide a voice corresponding to the second text or a message including the second text in response to the utterance.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: August 30, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Piotr Andruszkiewicz, Tomasz Latkowski, Kamil Herba, Maciej Pienkosz, Iryna Orlova, Jakub Staniszewski, Krystian Koziel
  • Patent number: 11417341
    Abstract: Techniques for processing comment information are disclosed herein. The disclosed techniques include collecting first voice information from a user in response to receiving a request for inputting voice information while the user is watching a video comprising a plurality of segments; obtaining a timestamp corresponding to a segment among the plurality of segments of the video; processing the first voice information and obtaining second voice information; and generating bullet screen information based at least in part on the timestamp and the second voice information.
    Type: Grant
    Filed: February 25, 2020
    Date of Patent: August 16, 2022
    Assignee: SHANGHAI BILIBILI TECHNOLOGY CO., LTD.
    Inventor: Yue Zhu
  • Patent number: 11404045
    Abstract: A speech synthesis method performed by an electronic apparatus to synthesize speech from text and includes: obtaining text input to the electronic apparatus; obtaining a text representation by encoding the text using a text encoder of the electronic apparatus; obtaining an audio representation of a first audio frame set from an audio encoder of the electronic apparatus, based on the text representation; obtaining an audio representation of a second audio frame set based on the text representation and the audio representation of the first audio frame set; obtaining an audio feature of the second audio frame set by decoding the audio representation of the second audio frame set; and synthesizing speech based on an audio feature of the first audio frame set and the audio feature of the second audio frame set.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: August 2, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Seungdo Choi, Kyoungbo Min, Sangjun Park, Kihyun Choo
  • Patent number: 11363377
    Abstract: An audio processing method comprises: for each given input digital audio signal of a set of two or more input digital audio signals, detecting a correlation between the given input digital audio signal and others of the input digital audio signals; generating a gain adjustment for application to the given input digital audio signal in dependence upon the detected correlation; applying the gain adjustment to the given input digital audio signal to generate a respective gain-adjusted input digital audio signal; and combining the set of gain-adjusted input digital audio signals to generate an output digital audio signal.
    Type: Grant
    Filed: October 12, 2018
    Date of Patent: June 14, 2022
    Assignee: SONY EUROPE B.V.
    Inventors: Emmanuel Deruty, Stéphane Rivaud
  • Patent number: 11355103
    Abstract: Embodiments described herein provide for a computer that detects one or more keywords of interest using acoustic features, to detect or query commonalities across multiple fraud calls. Embodiments described herein may implement unsupervised keyword spotting (UKWS) or unsupervised word discovery (UWD) in order to identify commonalities across a set of calls, where both UKWS and UWD employ Gaussian Mixture Models (GMM) and one or more dynamic time-warping algorithms. A user may indicate a training exemplar or occurrence of call-specific information, referred to herein as “a named entity,” such as a person's name, an account number, account balance, or order number. The computer may perform a redaction process that computationally nullifies the import of the named entity in the modeling processes described herein.
    Type: Grant
    Filed: January 28, 2020
    Date of Patent: June 7, 2022
    Assignee: PINDROP SECURITY, INC.
    Inventor: Hrishikesh Rao
  • Patent number: 11270487
    Abstract: The disclosed computer-implemented method may include identifying a muscle group engaged by a user to execute a predefined body action by (1) capturing a set of images of the user while the user executes the predefined body action, and (2) associating a feature of a body of the user with the muscle group based on the predefined body action and the set of images. The method may also include determining, based on the set of images, a set of parameters associated with the user, the muscle group, and the predefined body action. The method may also include directing a computer-generated avatar that represents the body of the user to produce the predefined body action in accordance with the set of parameters. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: March 8, 2022
    Assignee: Facebook Technologies, LLC
    Inventors: William Arthur Hugh Steptoe, Michael Andrew Howard, Melinda Ozel, Giovanni F. Nakpil, Timothy Naylor
  • Patent number: 10979762
    Abstract: Systems and methods are described herein for a media guidance application that can cause a specific portion of a media asset to be stored based on a user command. For example, if the user requests the closing scene from a given movie, the media guidance application may detect the command, determine that it comprises an instruction to store a portion of a media asset, identify a source of the portion of the media asset, and cause the portion of the media asset to be stored. The media guidance application may also cause the entirety of the media asset to be stored and initiate playback at the start of the requested portion. This may allow users to store and watch portions of particular interest without requiring that the users seek through the entire media asset on their own.
    Type: Grant
    Filed: July 22, 2019
    Date of Patent: April 13, 2021
    Assignee: Rovi Guides, Inc.
    Inventors: Paul Maltar, Milan Patel, Yong Gong
  • Patent number: 10978067
    Abstract: A home appliance including an audio input unit including at least one microphone and into which a voice command formed of natural language is inputted; a communication unit which transmits the voice command to a voice recognition system as voice data, and receives a response signal from the voice recognition system; a controller which sets an operation corresponding to the response signal, outputs an operation state, and outputs a guidance announcement, according to a result of voice recognition of the voice recognition system; and an audio output unit which outputs the guidance announcement corresponding to the operation, wherein the controller determines whether it is possible to set the operation or to support a function for the operation in response to the response signal and an operation state of the home appliance, and performs the operation.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: April 13, 2021
    Assignee: LG ELECTRONICS INC.
    Inventors: Junhee An, Hyojeong Kang, Jaehoon Lee, Heungkyu Lee
  • Patent number: 10699695
    Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: June 30, 2020
    Assignee: Amazon Washington, Inc.
    Inventors: Adam Franciszek Nadolski, Daniel Korzekwa, Thomas Edward Merritt, Marco Nicolis, Bartosz Putrycz, Roberto Barra Chicote, Rafal Kuklinski, Wiktor Dolecki
  • Patent number: 10546062
    Abstract: A token is extracted from a Natural Language input. A phonetic pattern is computed corresponding to the token, the phonetic pattern including a sound pattern that represents a part of the token when the token is spoken. New data is created from data of the phonetic pattern, the new data including a syllable sequence corresponding to the phonetic pattern. A state of a data storage device is changed by storing the new data in a matrix of syllable sequences corresponding to the token. An option is selected that corresponds to the token by executing a fuzzy matching algorithm using a processor and a memory, the selecting of the option is based on a syllable sequence in the matrix.
    Type: Grant
    Filed: November 15, 2017
    Date of Patent: January 28, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sean M. Fuoco, John M. Ganci, Jr., Craig M. Trim, Jie Zeng
  • Patent number: 10467333
    Abstract: Methods, apparatuses, and computer program products are described herein that are configured to enable updating of an output text. In some example embodiments, a method is provided that comprises generating a new message for each updateable data element based on a predetermined indication. The method of this embodiment may also include determining a classification for each new message by comparing each new message with a corresponding message that describes the updateable data element. The method of this embodiment may also include generating an additional document plan tree that contains at least a portion of the new messages. The method of this embodiment may also include combining the additional document plan tree with an original document plan tree.
    Type: Grant
    Filed: April 7, 2016
    Date of Patent: November 5, 2019
    Assignee: ARRIA DATA2TEXT LIMITED
    Inventors: Alasdair James Logan, Ehud Baruch Reiter
  • Patent number: 10339465
    Abstract: During a training phase of a machine learning model, representations of at least some nodes of a decision tree are generated and stored on persistent storage in depth-first order. A respective predictive utility metric (PUM) value is determined for one or more nodes, indicating expected contributions of the nodes to a prediction of the model. A particular node is selected for removal from the tree based at least partly on its PUM value. A modified version of the tree, with the particular node removed, is stored for obtaining a prediction.
    Type: Grant
    Filed: August 19, 2014
    Date of Patent: July 2, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Robert Matthias Steele, Tarun Agarwal, Leo Parker Dirac, Jun Qian
  • Patent number: 10249289
    Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.
    Type: Grant
    Filed: July 13, 2017
    Date of Patent: April 2, 2019
    Assignee: Google LLC
    Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
  • Patent number: 9756446
    Abstract: An audio receiver that performs crosstalk cancellation using a speaker array is described. The audio receiver detects the location of a listener in a room and processes a piece of sound program content to be output through the speaker array using one or more beam pattern matrices. The beam pattern matrices are generated according to one or more constraints. The constraints may include increasing a right channel and decreasing a left channel at the right ear of the listener, increasing a left channel and decreasing a right channel at the left ear of the listener, and decreasing sound in all other areas of the room. These constraints cause the audio receiver to beam sound primarily towards the listener and not in other areas of the room such that crosstalk cancellation is achieved with minimal effects due to changes to the frequency response of the room. Other embodiments are also described.
    Type: Grant
    Filed: March 13, 2014
    Date of Patent: September 5, 2017
    Assignee: Apple Inc.
    Inventors: Martin E. Johnson, Ronald N. Isaac
  • Patent number: 9436891
    Abstract: A method for identifying synonymous expressions includes determining synonymous expression candidates for a target expression. A plurality of target images related to the target expression and a plurality of candidate images related to each of the synonymous expression candidates are identified. Features extracted from the plurality of target images are compared with features extracted from the plurality of candidate images using a processor to identify a synonymous expression of the target expression.
    Type: Grant
    Filed: July 30, 2013
    Date of Patent: September 6, 2016
    Assignee: GlobalFoundries, Inc.
    Inventors: Chiaki Oishi, Tetsuya Nasukawa, Shoko Suzuki
  • Patent number: 9275633
    Abstract: Technologies are described herein for providing validated text-to-speech correction hints from aggregated pronunciation corrections received from text-to-speech applications. A number of pronunciation corrections are received by a Web service. The pronunciation corrections may be provided by users of text-to-speech applications executing on a variety of user computer systems. Each of the plurality of pronunciation corrections includes a specification of a word or phrase and a suggested pronunciation provided by the user. The pronunciation corrections are analyzed to generate validated correction hints, and the validated correction hints are provided back to the text-to-speech applications to be used to correct pronunciation of words and phrases in the text-to-speech applications.
    Type: Grant
    Filed: January 9, 2012
    Date of Patent: March 1, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Edward Cath, Timothy Edwin Harris, James Oliver Tisdale, III
  • Patent number: 9195312
    Abstract: An information processing apparatus includes a voice acquiring unit configured to acquire voice information remarked by a user; an image information acquiring unit configured to acquire image information of the user; a user specifying unit configured to specify the user having remarked the voice information based on the image information when the voice information is acquired; a word extracting unit configured to extract a word from the voice information; a priority calculating unit configured to increase priority of the user having remarked the word for operation of a display device when the extracted word matches any one of keywords; a gesture detecting unit configured to detect a gesture made by the user based on the image information; and an operation permission determining unit configured to permit the user to operate the display device when the priority of the user having made the gesture is highest.
    Type: Grant
    Filed: June 4, 2013
    Date of Patent: November 24, 2015
    Assignee: RICOH COMPANY, LIMITED
    Inventors: Akinori Itoh, Hidekuni Annaka
  • Patent number: 9154880
    Abstract: A first filter coefficient and a second filter coefficient are calculated by using spatial transfer characteristics from a first speaker and a second speaker to a first control point and a second control point, and a first sound increase ratio na at the first control point and a second sound increase ratio nb at the second control point, so that, when the first filter coefficient is a through characteristic, a first composite sound pressure from the first speaker and the second speaker to the first control point is na times a first sound pressure from the first speaker to the first control point, and a second composite sound pressure from the first speaker and the second speaker to the second control point is nb times a second sound pressure from the first speaker to the second control point.
    Type: Grant
    Filed: June 19, 2013
    Date of Patent: October 6, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takahiro Hiruma, Akihiko Enamito, Osamu Nishimura
  • Patent number: 9087507
    Abstract: Computer-based skimming and scrolling of aurally presented information is described. Different levels of skimming are achieved in aural presentations with allowing a user to navigate an aural presentation according to significant points identified within an information source. The significant points are identified using various indicia that suggest logical arrangements for the information contained within the source, such as semantics, syntax, typography, formatting, named entities, and markup tags. The identified significant points signal changes in playback mode for the audio presentation, such as different tones, pitches, volumes, or voices. Similar indicia may be used to generate identifying markers from the information source that can be aurally presented in lieu of the information source itself to allow for aural scrolling of the information.
    Type: Grant
    Filed: November 15, 2006
    Date of Patent: July 21, 2015
    Assignee: Yahoo! Inc.
    Inventor: Srinivasan H. Sengamedu
  • Patent number: 9043213
    Abstract: A speech recognition method including the steps of receiving a speech input from a known speaker of a sequence of observations and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model. The acoustic model has a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation and has been trained using first training data and adapted using second training data to said speaker. The speech recognition method also determines the likelihood of a sequence of observations occurring in a given language using a language model and combines the likelihoods determined by the acoustic model and the language model and outputs a sequence of words identified from said speech input signal. The acoustic model is context based for the speaker, the context based information being contained in the model using a plurality of decision trees and the structure of the decision trees is based on second training data.
    Type: Grant
    Filed: January 26, 2011
    Date of Patent: May 26, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Byung Ha Chun
  • Publication number: 20150127350
    Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Publication number: 20150127349
    Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 9026445
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: March 20, 2013
    Date of Patent: May 5, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Patent number: 9026438
    Abstract: A method for detecting barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialog system configured to detect barge-in is also disclosed.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: May 5, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Markus Buck, Franz Gerl, Tim Haulick, Tobias Herbig, Gerhard Uwe Schmidt, Matthias Schulz
  • Patent number: 9020812
    Abstract: Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: April 28, 2015
    Assignees: LG Electronics Inc., Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hye Jeong Jeon, Dae Hwan Kim, Hong Goo Kang, Min Ki Lee, Byung Suk Lee, Gyu Hyeok Jeong
  • Patent number: 9002711
    Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
    Type: Grant
    Filed: December 16, 2010
    Date of Patent: April 7, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Publication number: 20150095035
    Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: International Business Machines Corporation
    Inventor: Slava Shechtman
  • Patent number: 8996384
    Abstract: Embodiments of the invention address the deficiencies of the prior art by providing a method, apparatus, and program product to of converting components of a web page to voice prompts for a user. In some embodiments, the method comprises selectively determining at least one HTML component from a plurality of HTML components of a web page to transform into a voice prompt for a mobile system based upon a voice attribute file associated with the web page. The method further comprises transforming the at least one HTML component into parameterized data suitable for use by the mobile system based upon at least a portion of the voice attribute file associated with the at least one HTML component and transmitting the parameterized data to the mobile system.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: March 31, 2015
    Assignee: Vocollect, Inc.
    Inventors: Paul M. Funyak, Norman J. Connors, Paul E. Kolonay, Matthew Aaron Nichols
  • Patent number: 8990087
    Abstract: A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: March 24, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: John Lattyak, John T. Kim, Robert Wai-Chi Chu, Laurent An Minh Nguyen
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8954328
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: February 10, 2015
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8898062
    Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.
    Type: Grant
    Filed: January 22, 2008
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventors: Yumiko Kato, Takahiro Kamai
  • Patent number: 8888494
    Abstract: One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user.
    Type: Grant
    Filed: June 27, 2011
    Date of Patent: November 18, 2014
    Inventor: Randall Lee Threewits
  • Patent number: 8868431
    Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: October 21, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
  • Patent number: 8868418
    Abstract: Embodiments of the invention provide a communication device and methods for enhancing audio signals. A first audio signal buffer and a second audio signal buffer are acquired. Thereafter, the magnitude spectrum calculated from the Fast Fourier Transform (FFT) of the second audio signal is processed based on the Linear Predictive Coding (LPC) spectrum of the first audio signal to generate an enhanced second audio signal.
    Type: Grant
    Filed: November 20, 2010
    Date of Patent: October 21, 2014
    Inventors: Alon Konchitsky, Sandeep Kulakcherla, Alberto D. Berstein
  • Patent number: 8868423
    Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.
    Type: Grant
    Filed: July 11, 2013
    Date of Patent: October 21, 2014
    Assignee: John Nicholas and Kristin Gross Trust
    Inventor: John Nicholas Gross
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8849669
    Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and/or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech, including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.
    Type: Grant
    Filed: April 3, 2013
    Date of Patent: September 30, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Raimo Bakis, Ellen Marie Eide, Roberto Pieraccini, Maria E. Smith, Jie Z. Zeng
  • Patent number: 8831950
    Abstract: Embodiments of the present invention provide a method, system and computer program product for the automated voice enablement of a Web page. In an embodiment of the invention, a method for voice enabling a Web page can include selecting an input field of a Web page for speech input, generating a speech grammar for the input field based upon terms in a core attribute of the input field, receiving speech input for the input field, posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into a document object model (DOM) for the Web page.
    Type: Grant
    Filed: April 7, 2008
    Date of Patent: September 9, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Victor S. Moore, Wendi L. Nusbickel