Specialized Model Patents (Class 704/266)

Personalized voices for text messaging

Patent number: 12170089

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a plurality of speech inputs is received from a first user. A voice model is obtained based on the plurality of speech inputs. A user input is received from the first user, the user input corresponding to a request to provide access to the voice model. The voice model is provided to a second electronic device.

Type: Grant

Filed: October 25, 2022

Date of Patent: December 17, 2024

Assignee: Apple Inc.

Inventors: Qiong Hu, Jiangchuan Li, David A. Winarsky
Predicting parametric vocoder parameters from prosodic features

Patent number: 12125469

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification.

Type: Grant

Filed: October 17, 2023

Date of Patent: October 22, 2024

Assignee: Google LLC

Inventors: Rakesh Iyer, Vincent Wan
Systems and methods for grapheme-phoneme correspondence learning

Patent number: 12008921

Abstract: Systems and methods are described for grapheme-phoneme correspondence learning. In an example, a display of a device is caused to output a grapheme graphical user interface (GUI) that includes a grapheme. Audio data representative of a sound made by the human user is received based on the grapheme shown on the display. A grapheme-phoneme model can determine whether the sound made by the human corresponds to a phoneme for the displayed grapheme based on the audio data. The grapheme-phoneme model is trained based on augmented spectrogram data. A speaker is caused to output a sound representative of the phoneme for the grapheme to provide the human with a correct pronunciation of the grapheme in response to the grapheme-phoneme model determining that the sound made by the human does not correspond to the phoneme for the grapheme.

Type: Grant

Filed: January 10, 2023

Date of Patent: June 11, 2024

Assignee: 617 Education Inc.

Inventor: Tom Dillon
Consistency prediction on streaming sequence models

Patent number: 11929060

Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.

Type: Grant

Filed: February 8, 2021

Date of Patent: March 12, 2024

Assignee: Google LLC

Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
Methods and systems implementing phonologically-trained computer-assisted hearing aids

Patent number: 11856369

Abstract: A hearing aid system presents a hearing impaired user with customized enhanced intelligibility sound in a preferred language. The system includes a model trained with a set of source speech data representing sampling from a speech population relevant to the user. The model is also trained with a set of corresponding alternative articulation of source data, pre-defined or algorithmically constructed during an interactive session with the user. The model creates a set of selected target speech training data from the set of alternative articulation data that is preferred by the user as being satisfactorily intelligible and clear. The system includes a machine learning model, trained to shift incoming source speech data to a preferred variant of the target data that the hearing aid system presents to the user.

Type: Grant

Filed: May 2, 2021

Date of Patent: December 26, 2023

Inventor: Abbas Rafii
Human emotion detection

Patent number: 11847419

Abstract: Disclosed herein are system, method, and computer program product embodiments for recognizing a human emotion in a message. An embodiment operates by receiving a message from a user. The embodiment labels each word of the message with a part of speech (POS) thereby creating a POS set. The embodiment determines an incongruity score for a combination of words in the POS set using a knowledgebase. The embodiment determines a preliminary emotion detection score for an emotion for the message based on the POS set. Finally, the embodiment calculates a final emotion detection score for the emotion for the message based on the preliminary emotion detection score and the incongruity score.

Type: Grant

Filed: October 1, 2021

Date of Patent: December 19, 2023

Assignee: VIRTUAL EMOTION RESOURCE NETWORK, LLC

Inventors: Craig Tucker, Bryan Novak
Predicting parametric vocoder parameters from prosodic features

Patent number: 11830474

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification.

Type: Grant

Filed: January 6, 2022

Date of Patent: November 28, 2023

Assignee: Google LLC

Inventors: Rakesh Iyer, Vincent Wan
Artificial intelligence device and method for generating speech having a different speech style

Patent number: 11721319

Abstract: An artificial intelligence device includes a memory and a processor. The memory is configured to store audio data having a predetermined speech style. The processor is configured to generate a condition vector relating to a condition for determining the speech style of the audio data, reduce a dimension of the condition vector to a predetermined reduction dimension, acquire a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension, and change a vector element value included in the sparse code vector.

Type: Grant

Filed: February 27, 2020

Date of Patent: August 8, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang
Methods and systems for querying for parameter retrieval

Patent number: 11676496

Abstract: Systems and methods to identify a query parameter in an incoming flight voice or data communication to respond to a request. A processing system configured to: in response to receipt of a clearance message, decode the clearance message to determine whether the clearance message contains a command instruction or clearance data for a flight, and to present the command instruction to a pilot as notice to execute the command instruction or if available, obtain at least one query parameter from the clearance data to configure in a query operation to present in response to a pilot question about the command instruction. In response to receipt of the voice or data communication, determine further an intent within the voice or data communication of a question or instruction voiced by applying an acoustic model for tagging identified parts about the question or instruction voiced with query parameters in response to the pilot.

Type: Grant

Filed: June 2, 2020

Date of Patent: June 13, 2023

Assignee: HONEYWELL INTERNATIONAL INC.

Inventors: Hariharan Saptharishi, Gobinathan Baladhandapani, Mahesh Kumar Sampath, Sivakumar Kanagarajan
Scalable ground truth disambiguation

Patent number: 11657104

Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: obtaining an utterance input from a user agent, and collecting context data of the utterance input. A context tag is generated based on the context data, and one or more ground truth having respective utterance semantically identical to the utterance input is selected. Semantical relationship between the context tag and an intent of the selected ground truth is examined and the selected ground truth is updated with the context tag.

Type: Grant

Filed: October 21, 2019

Date of Patent: May 23, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Faheem Altaf, Lisa Seacat Deluca, Raghuram Srinivas
System and/or method for semantic parsing of air traffic control audio

Patent number: 11521616

Abstract: The method S200 can include: at an aircraft, receiving an audio utterance from air traffic control S210, converting the audio utterance to text, determining commands from the text using a question-and-answer model S240, and optionally controlling the aircraft based on the commands S250. The method functions to automatically interpret flight commands from the air traffic control (ATC) stream.

Type: Grant

Filed: April 13, 2022

Date of Patent: December 6, 2022

Assignee: Merlin Labs, Inc.

Inventors: Michael Pust, Joseph Bondaryk, Matthew George
Two-level speech prosody transfer

Patent number: 11514888

Abstract: A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation tor the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.

Type: Grant

Filed: August 13, 2020

Date of Patent: November 29, 2022

Assignee: Google LLC

Inventors: Lev Finkelstein, Chun-An Chan, Byungha Chun, Ye Jia, Yu Zhang, Robert Andrew James Clark, Vincent Wan
Personalized voices for text messaging

Patent number: 11508380

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a plurality of speech inputs is received from a first user. A voice model is obtained based on the plurality of speech inputs. A user input is received from the first user, the user input corresponding to a request to provide access to the voice model. The voice model is provided to a second electronic device.

Type: Grant

Filed: May 26, 2020

Date of Patent: November 22, 2022

Assignee: Apple Inc.

Inventors: Qiong Hu, Jiangchuan Li, David A. Winarsky
Electronic device providing response corresponding to user conversation style and emotion and method of operating same

Patent number: 11430438

Abstract: An electronic device includes a microphone, a communication circuit, and a processor configured to obtain a user's utterance through the microphone, transmit first information about the utterance through the communication circuit to an external server for at least partially automatic speech recognition (ASR) or natural language understanding (NLU), obtain a second text from the external server through the communication circuit, the second text being a text resulting from modifying at least part of a first text included in a neutral response to the utterance based on parameters corresponding to the user's conversation style and emotion identified based on the first information, and provide a voice corresponding to the second text or a message including the second text in response to the utterance.

Type: Grant

Filed: March 11, 2020

Date of Patent: August 30, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Piotr Andruszkiewicz, Tomasz Latkowski, Kamil Herba, Maciej Pienkosz, Iryna Orlova, Jakub Staniszewski, Krystian Koziel
Method and system for processing comment information

Patent number: 11417341

Abstract: Techniques for processing comment information are disclosed herein. The disclosed techniques include collecting first voice information from a user in response to receiving a request for inputting voice information while the user is watching a video comprising a plurality of segments; obtaining a timestamp corresponding to a segment among the plurality of segments of the video; processing the first voice information and obtaining second voice information; and generating bullet screen information based at least in part on the timestamp and the second voice information.

Type: Grant

Filed: February 25, 2020

Date of Patent: August 16, 2022

Assignee: SHANGHAI BILIBILI TECHNOLOGY CO., LTD.

Inventor: Yue Zhu
Speech synthesis method and apparatus

Patent number: 11404045

Abstract: A speech synthesis method performed by an electronic apparatus to synthesize speech from text and includes: obtaining text input to the electronic apparatus; obtaining a text representation by encoding the text using a text encoder of the electronic apparatus; obtaining an audio representation of a first audio frame set from an audio encoder of the electronic apparatus, based on the text representation; obtaining an audio representation of a second audio frame set based on the text representation and the audio representation of the first audio frame set; obtaining an audio feature of the second audio frame set by decoding the audio representation of the second audio frame set; and synthesizing speech based on an audio feature of the first audio frame set and the audio feature of the second audio frame set.

Type: Grant

Filed: August 31, 2020

Date of Patent: August 2, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Seungdo Choi, Kyoungbo Min, Sangjun Park, Kihyun Choo
Audio processing

Patent number: 11363377

Abstract: An audio processing method comprises: for each given input digital audio signal of a set of two or more input digital audio signals, detecting a correlation between the given input digital audio signal and others of the input digital audio signals; generating a gain adjustment for application to the given input digital audio signal in dependence upon the detected correlation; applying the gain adjustment to the given input digital audio signal to generate a respective gain-adjusted input digital audio signal; and combining the set of gain-adjusted input digital audio signals to generate an output digital audio signal.

Type: Grant

Filed: October 12, 2018

Date of Patent: June 14, 2022

Assignee: SONY EUROPE B.V.

Inventors: Emmanuel Deruty, Stéphane Rivaud
Unsupervised keyword spotting and word discovery for fraud analytics

Patent number: 11355103

Abstract: Embodiments described herein provide for a computer that detects one or more keywords of interest using acoustic features, to detect or query commonalities across multiple fraud calls. Embodiments described herein may implement unsupervised keyword spotting (UKWS) or unsupervised word discovery (UWD) in order to identify commonalities across a set of calls, where both UKWS and UWD employ Gaussian Mixture Models (GMM) and one or more dynamic time-warping algorithms. A user may indicate a training exemplar or occurrence of call-specific information, referred to herein as “a named entity,” such as a person's name, an account number, account balance, or order number. The computer may perform a redaction process that computationally nullifies the import of the named entity in the modeling processes described herein.

Type: Grant

Filed: January 28, 2020

Date of Patent: June 7, 2022

Assignee: PINDROP SECURITY, INC.

Inventor: Hrishikesh Rao
Systems and methods for improving animation of computer-generated avatars

Patent number: 11270487

Abstract: The disclosed computer-implemented method may include identifying a muscle group engaged by a user to execute a predefined body action by (1) capturing a set of images of the user while the user executes the predefined body action, and (2) associating a feature of a body of the user with the muscle group based on the predefined body action and the set of images. The method may also include determining, based on the set of images, a set of parameters associated with the user, the muscle group, and the predefined body action. The method may also include directing a computer-generated avatar that represents the body of the user to produce the predefined body action in accordance with the set of parameters. Various other methods, systems, and computer-readable media are also disclosed.

Type: Grant

Filed: September 17, 2019

Date of Patent: March 8, 2022

Assignee: Facebook Technologies, LLC

Inventors: William Arthur Hugh Steptoe, Michael Andrew Howard, Melinda Ozel, Giovanni F. Nakpil, Timothy Naylor
Home appliance, control system by voice recognition and operating method of the same

Patent number: 10978067

Abstract: A home appliance including an audio input unit including at least one microphone and into which a voice command formed of natural language is inputted; a communication unit which transmits the voice command to a voice recognition system as voice data, and receives a response signal from the voice recognition system; a controller which sets an operation corresponding to the response signal, outputs an operation state, and outputs a guidance announcement, according to a result of voice recognition of the voice recognition system; and an audio output unit which outputs the guidance announcement corresponding to the operation, wherein the controller determines whether it is possible to set the operation or to support a function for the operation in response to the response signal and an operation state of the home appliance, and performs the operation.

Type: Grant

Filed: January 8, 2019

Date of Patent: April 13, 2021

Assignee: LG ELECTRONICS INC.

Inventors: Junhee An, Hyojeong Kang, Jaehoon Lee, Heungkyu Lee
Systems and methods for identifying and storing a portion of a media asset

Patent number: 10979762

Abstract: Systems and methods are described herein for a media guidance application that can cause a specific portion of a media asset to be stored based on a user command. For example, if the user requests the closing scene from a given movie, the media guidance application may detect the command, determine that it comprises an instruction to store a portion of a media asset, identify a source of the portion of the media asset, and cause the portion of the media asset to be stored. The media guidance application may also cause the entirety of the media asset to be stored and initiate playback at the start of the requested portion. This may allow users to store and watch portions of particular interest without requiring that the users seek through the entire media asset on their own.

Type: Grant

Filed: July 22, 2019

Date of Patent: April 13, 2021

Assignee: Rovi Guides, Inc.

Inventors: Paul Maltar, Milan Patel, Yong Gong
Text-to-speech (TTS) processing

Patent number: 10699695

Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.

Type: Grant

Filed: June 29, 2018

Date of Patent: June 30, 2020

Assignee: Amazon Washington, Inc.

Inventors: Adam Franciszek Nadolski, Daniel Korzekwa, Thomas Edward Merritt, Marco Nicolis, Bartosz Putrycz, Roberto Barra Chicote, Rafal Kuklinski, Wiktor Dolecki
Phonetic patterns for fuzzy matching in natural language processing

Patent number: 10546062

Abstract: A token is extracted from a Natural Language input. A phonetic pattern is computed corresponding to the token, the phonetic pattern including a sound pattern that represents a part of the token when the token is spoken. New data is created from data of the phonetic pattern, the new data including a syllable sequence corresponding to the phonetic pattern. A state of a data storage device is changed by storing the new data in a matrix of syllable sequences corresponding to the token. An option is selected that corresponds to the token by executing a fuzzy matching algorithm using a processor and a memory, the selecting of the option is based on a syllable sequence in the matrix.

Type: Grant

Filed: November 15, 2017

Date of Patent: January 28, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sean M. Fuoco, John M. Ganci, Jr., Craig M. Trim, Jie Zeng
Method and apparatus for updating a previously generated text

Patent number: 10467333

Abstract: Methods, apparatuses, and computer program products are described herein that are configured to enable updating of an output text. In some example embodiments, a method is provided that comprises generating a new message for each updateable data element based on a predetermined indication. The method of this embodiment may also include determining a classification for each new message by comparing each new message with a corresponding message that describes the updateable data element. The method of this embodiment may also include generating an additional document plan tree that contains at least a portion of the new messages. The method of this embodiment may also include combining the additional document plan tree with an original document plan tree.

Type: Grant

Filed: April 7, 2016

Date of Patent: November 5, 2019

Assignee: ARRIA DATA2TEXT LIMITED

Inventors: Alasdair James Logan, Ehud Baruch Reiter
Optimized decision tree based models

Patent number: 10339465

Abstract: During a training phase of a machine learning model, representations of at least some nodes of a decision tree are generated and stored on persistent storage in depth-first order. A respective predictive utility metric (PUM) value is determined for one or more nodes, indicating expected contributions of the nodes to a prediction of the model. A particular node is selected for removal from the tree based at least partly on its PUM value. A modified version of the tree, with the particular node removed, is stored for obtaining a prediction.

Type: Grant

Filed: August 19, 2014

Date of Patent: July 2, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Robert Matthias Steele, Tarun Agarwal, Leo Parker Dirac, Jun Qian
Text-to-speech synthesis using an autoencoder

Patent number: 10249289

Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.

Type: Grant

Filed: July 13, 2017

Date of Patent: April 2, 2019

Assignee: Google LLC

Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
Robust crosstalk cancellation using a speaker array

Patent number: 9756446

Abstract: An audio receiver that performs crosstalk cancellation using a speaker array is described. The audio receiver detects the location of a listener in a room and processes a piece of sound program content to be output through the speaker array using one or more beam pattern matrices. The beam pattern matrices are generated according to one or more constraints. The constraints may include increasing a right channel and decreasing a left channel at the right ear of the listener, increasing a left channel and decreasing a right channel at the left ear of the listener, and decreasing sound in all other areas of the room. These constraints cause the audio receiver to beam sound primarily towards the listener and not in other areas of the room such that crosstalk cancellation is achieved with minimal effects due to changes to the frequency response of the room. Other embodiments are also described.

Type: Grant

Filed: March 13, 2014

Date of Patent: September 5, 2017

Assignee: Apple Inc.

Inventors: Martin E. Johnson, Ronald N. Isaac
Discriminating synonymous expressions using images

Patent number: 9436891

Abstract: A method for identifying synonymous expressions includes determining synonymous expression candidates for a target expression. A plurality of target images related to the target expression and a plurality of candidate images related to each of the synonymous expression candidates are identified. Features extracted from the plurality of target images are compared with features extracted from the plurality of candidate images using a processor to identify a synonymous expression of the target expression.

Type: Grant

Filed: July 30, 2013

Date of Patent: September 6, 2016

Assignee: GlobalFoundries, Inc.

Inventors: Chiaki Oishi, Tetsuya Nasukawa, Shoko Suzuki
Crowd-sourcing pronunciation corrections in text-to-speech engines

Patent number: 9275633

Abstract: Technologies are described herein for providing validated text-to-speech correction hints from aggregated pronunciation corrections received from text-to-speech applications. A number of pronunciation corrections are received by a Web service. The pronunciation corrections may be provided by users of text-to-speech applications executing on a variety of user computer systems. Each of the plurality of pronunciation corrections includes a specification of a word or phrase and a suggested pronunciation provided by the user. The pronunciation corrections are analyzed to generate validated correction hints, and the validated correction hints are provided back to the text-to-speech applications to be used to correct pronunciation of words and phrases in the text-to-speech applications.

Type: Grant

Filed: January 9, 2012

Date of Patent: March 1, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Edward Cath, Timothy Edwin Harris, James Oliver Tisdale, III
Information processing apparatus, conference system, and information processing method

Patent number: 9195312

Abstract: An information processing apparatus includes a voice acquiring unit configured to acquire voice information remarked by a user; an image information acquiring unit configured to acquire image information of the user; a user specifying unit configured to specify the user having remarked the voice information based on the image information when the voice information is acquired; a word extracting unit configured to extract a word from the voice information; a priority calculating unit configured to increase priority of the user having remarked the word for operation of a display device when the extracted word matches any one of keywords; a gesture detecting unit configured to detect a gesture made by the user based on the image information; and an operation permission determining unit configured to permit the user to operate the display device when the priority of the user having made the gesture is highest.

Type: Grant

Filed: June 4, 2013

Date of Patent: November 24, 2015

Assignee: RICOH COMPANY, LIMITED

Inventors: Akinori Itoh, Hidekuni Annaka
Apparatus and a method for controlling a sound field

Patent number: 9154880

Abstract: A first filter coefficient and a second filter coefficient are calculated by using spatial transfer characteristics from a first speaker and a second speaker to a first control point and a second control point, and a first sound increase ratio na at the first control point and a second sound increase ratio nb at the second control point, so that, when the first filter coefficient is a through characteristic, a first composite sound pressure from the first speaker and the second speaker to the first control point is na times a first sound pressure from the first speaker to the first control point, and a second composite sound pressure from the first speaker and the second speaker to the second control point is nb times a second sound pressure from the first speaker to the second control point.

Type: Grant

Filed: June 19, 2013

Date of Patent: October 6, 2015

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takahiro Hiruma, Akihiko Enamito, Osamu Nishimura
Aural skimming and scrolling

Patent number: 9087507

Abstract: Computer-based skimming and scrolling of aurally presented information is described. Different levels of skimming are achieved in aural presentations with allowing a user to navigate an aural presentation according to significant points identified within an information source. The significant points are identified using various indicia that suggest logical arrangements for the information contained within the source, such as semantics, syntax, typography, formatting, named entities, and markup tags. The identified significant points signal changes in playback mode for the audio presentation, such as different tones, pitches, volumes, or voices. Similar indicia may be used to generate identifying markers from the information source that can be aurally presented in lieu of the information source itself to allow for aural scrolling of the information.

Type: Grant

Filed: November 15, 2006

Date of Patent: July 21, 2015

Assignee: Yahoo! Inc.

Inventor: Srinivasan H. Sengamedu
Speech recognition and synthesis utilizing context dependent acoustic models containing decision trees

Patent number: 9043213

Abstract: A speech recognition method including the steps of receiving a speech input from a known speaker of a sequence of observations and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model. The acoustic model has a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation and has been trained using first training data and adapted using second training data to said speaker. The speech recognition method also determines the likelihood of a sequence of observations occurring in a given language using a language model and combines the likelihoods determined by the acoustic model and the language model and outputs a sequence of words identified from said speech input signal. The acoustic model is context based for the speaker, the context based information being contained in the model using a plurality of decision trees and the structure of the decision trees is based on second training data.

Type: Grant

Filed: January 26, 2011

Date of Patent: May 26, 2015

Assignee: Kabushiki Kaisha Toshiba

Inventor: Byung Ha Chun
Method and System for Non-Parametric Voice Conversion

Publication number: 20150127350

Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.

Type: Application

Filed: November 1, 2013

Publication date: May 7, 2015

Applicant: Google Inc.

Inventor: Ioannis Agiomyrgiannakis
Method and System for Cross-Lingual Voice Conversion

Publication number: 20150127349

Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.

Type: Application

Filed: November 1, 2013

Publication date: May 7, 2015

Applicant: Google Inc.

Inventor: Ioannis Agiomyrgiannakis
Detecting barge-in in a speech dialogue system

Patent number: 9026438

Abstract: A method for detecting barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialog system configured to detect barge-in is also disclosed.

Type: Grant

Filed: March 31, 2009

Date of Patent: May 5, 2015

Assignee: Nuance Communications, Inc.

Inventors: Markus Buck, Franz Gerl, Tim Haulick, Tobias Herbig, Gerhard Uwe Schmidt, Matthias Schulz
Text-to-speech user's voice cooperative server for instant messaging clients

Patent number: 9026445

Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.

Type: Grant

Filed: March 20, 2013

Date of Patent: May 5, 2015

Assignee: Nuance Communications, Inc.

Inventors: Terry Wade Niemeyer, Liliana Orozco
Audio signal processing method and device

Patent number: 9020812

Abstract: Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook.

Type: Grant

Filed: November 24, 2010

Date of Patent: April 28, 2015

Assignees: LG Electronics Inc., Industry-Academic Cooperation Foundation, Yonsei University

Inventors: Hye Jeong Jeon, Dae Hwan Kim, Hong Goo Kang, Min Ki Lee, Byung Suk Lee, Gyu Hyeok Jeong
Speech synthesis apparatus and method

Patent number: 9002711

Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.

Type: Grant

Filed: December 16, 2010

Date of Patent: April 7, 2015

Assignee: Kabushiki Kaisha Toshiba

Inventors: Ryo Morinaka, Takehiko Kagoshima
WIDEBAND SPEECH PARAMETERIZATION FOR HIGH QUALITY SYNTHESIS, TRANSFORMATION AND QUANTIZATION

Publication number: 20150095035

Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.

Type: Application

Filed: September 30, 2013

Publication date: April 2, 2015

Applicant: International Business Machines Corporation

Inventor: Slava Shechtman
Transforming components of a web page to voice prompts

Patent number: 8996384

Abstract: Embodiments of the invention address the deficiencies of the prior art by providing a method, apparatus, and program product to of converting components of a web page to voice prompts for a user. In some embodiments, the method comprises selectively determining at least one HTML component from a plurality of HTML components of a web page to transform into a voice prompt for a mobile system based upon a voice attribute file associated with the web page. The method further comprises transforming the at least one HTML component into parameterized data suitable for use by the mobile system based upon at least a portion of the voice attribute file associated with the at least one HTML component and transmitting the parameterized data to the mobile system.

Type: Grant

Filed: October 30, 2009

Date of Patent: March 31, 2015

Assignee: Vocollect, Inc.

Inventors: Paul M. Funyak, Norman J. Connors, Paul E. Kolonay, Matthew Aaron Nichols
Providing text to speech from digital content on an electronic device

Patent number: 8990087

Abstract: A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.

Type: Grant

Filed: September 30, 2008

Date of Patent: March 24, 2015

Assignee: Amazon Technologies, Inc.

Inventors: John Lattyak, John T. Kim, Robert Wai-Chi Chu, Laurent An Minh Nguyen
Method and system for enhancing a speech database

Patent number: 8977552

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: May 28, 2014

Date of Patent: March 10, 2015

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
Systems and methods for document narration with multiple characters having multiple moods

Patent number: 8954328

Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.

Type: Grant

Filed: January 14, 2010

Date of Patent: February 10, 2015

Assignee: K-NFB Reading Technology, Inc.

Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
Strained-rough-voice conversion device, voice conversion device, voice synthesis device, voice conversion method, voice synthesis method, and program

Patent number: 8898062

Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.

Type: Grant

Filed: January 22, 2008

Date of Patent: November 25, 2014

Assignee: Panasonic Intellectual Property Corporation of America

Inventors: Yumiko Kato, Takahiro Kamai
Interactive environment for performing arts scripts

Patent number: 8888494

Abstract: One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user.

Type: Grant

Filed: June 27, 2011

Date of Patent: November 18, 2014

Inventor: Randall Lee Threewits
Recognition dictionary creation device and voice recognition device

Patent number: 8868431

Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.

Type: Grant

Filed: February 5, 2010

Date of Patent: October 21, 2014

Assignee: Mitsubishi Electric Corporation

Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
Receiver intelligibility enhancement system

Patent number: 8868418

Abstract: Embodiments of the invention provide a communication device and methods for enhancing audio signals. A first audio signal buffer and a second audio signal buffer are acquired. Thereafter, the magnitude spectrum calculated from the Fast Fourier Transform (FFT) of the second audio signal is processed based on the Linear Predictive Coding (LPC) spectrum of the first audio signal to generate an enhanced second audio signal.

Type: Grant

Filed: November 20, 2010

Date of Patent: October 21, 2014

Inventors: Alon Konchitsky, Sandeep Kulakcherla, Alberto D. Berstein
System and method for controlling access to resources with a spoken CAPTCHA test

Patent number: 8868423

Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.

Type: Grant

Filed: July 11, 2013

Date of Patent: October 21, 2014

Assignee: John Nicholas and Kristin Gross Trust

Inventor: John Nicholas Gross
Training and applying prosody models

Patent number: 8856008

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: September 18, 2013

Date of Patent: October 7, 2014

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.

1 2 3 4 5 … next