Synthesis Patents (Class 704/258)

Neural network (Class 704/259)

Image to speech (Class 704/260)

Vocal tract model (Class 704/261)

Linear prediction (Class 704/262)

Correlation (Class 704/263)

Excitation (Class 704/264)

Interpolation (Class 704/265)

Specialized model (Class 704/266)

Time element (Class 704/267)

Frequency element (Class 704/268)

Transformation (Class 704/269)

Audio processing device and method and electro-acoustic converting device and method

Patent number: 9792917

Abstract: The present invention relates to an audio processing device and method, and an electro-acoustic converting device and method. The audio processing device comprises: a first receiving module, configured to receive a first audio signal; a second receiving module, configured to receive a second audio signal; an audio synthesizing module, configured to synthesize the first audio signal and the second audio signal to obtain a third audio signal; and an audio outputting module, configured to output the third audio signal. According to the present invention, when enjoying songs or music or learning languages by using an audio processing device or an audio playing device, a user is capable of hearing his or her own voice while singing or reading, which greatly improves the effects of self-entertainment and language learning.

Type: Grant

Filed: May 8, 2014

Date of Patent: October 17, 2017

Assignee: KT MICRO, INC.

Inventors: Dianyu Chen, Yihai Xiang, Haiqing Lin, Pan Mu, Yanqing Wu, Hekai Kang, Dongfeng Zhou, Yuqiang Yuan, Wenhui Yuan, Jinfeng Wu
Computer processor employing instructions with elided nop operations

Patent number: 9785441

Abstract: A computer processor that operates on distinct first and second instruction streams that have a predefined timed semantic relationship. At least one of the first and second instruction streams includes variable-length instructions having a header and associated bundle bounded by a head end and a tail end. An alignment hole within the bundle encodes information representing at least one nop operation. The computer processor includes first and second multi-stage instruction processing components configured to process in parallel the first and second instruction streams. At least one of the first and second multi-stage instruction processing components includes an instruction buffer operably coupled to a decode stage. The decode stage is configured to process a variable-length instruction by isolating and interpreting the alignment hole of the variable length instruction in order to initiate zero or more nop operations that follow the timed semantic relationship between the first and second instruction streams.

Type: Grant

Filed: May 29, 2014

Date of Patent: October 10, 2017

Assignee: Mill Computing, Inc.

Inventors: Roger Rawson Godard, Arthur David Kahlich, David Arthur Yost
Method and apparatus for electronically sythesizing acoustic waveforms representing a series of words based on syllable-defining beats

Patent number: 9747892

Abstract: Speech is modeled as a cognitively-driven sensory-motor activity where the form of speech is the result of categorization processes that any given subject recreates by focusing on creating sound patterns that are represented by syllables. These syllables are then combined in characteristic patterns to form words, which are in turn, combined in characteristic patterns to form utterances. A speech recognition process first identifies syllables in an electronic waveform representing ongoing speech. The pattern of syllables is then deconstructed into a standard form that is used to identify words. The words are then concatenated to identify an utterance. Similarly, a speech synthesis process converts written words into patterns of syllables. The pattern of syllables is then processed to produce the characteristic rhythmic sound of naturally spoken words. The words are then assembled into an utterance which is also processed to produce a natural sounding speech.

Type: Grant

Filed: October 3, 2016

Date of Patent: August 29, 2017

Inventor: Boris Fridman-Mintz
Systems for secure tracking code generation, application, and verification

Patent number: 9741012

Abstract: Embodiments include a computer-implemented management platform for securely generating tracking codes, and for verifiably imprinting those tracking codes onto physical articles. In an embodiment, one or more hardware processors generate tracking code(s) and send the tracking code(s) towards an automated computer-controlled production line, and which physically imprint each tracking codes onto a corresponding article, and physically verify the physical imprinting. If a tracking code was correctly imprinted on its corresponding article, one or more records are recorded in a durable storage medium, which indicate that the tracking code imprinted on an article. If a tracking code was incorrectly imprinted on its corresponding article, the factory line physically rejects the corresponding article. Embodiments also include the computer-implemented management platform securely managing those physical articles throughout their lifecycle, based on the securely-generated and verifiably-imprinted tracking codes.

Type: Grant

Filed: November 4, 2015

Date of Patent: August 22, 2017

Assignee: HURU SYSTEMS LTD.

Inventors: Ian A. Nazzari, Paul Eipper
Text-to-speech task scheduling

Patent number: 9734817

Abstract: To prioritize the processing text-to-speech (TTS) tasks, a TTS system may determine, for each task, an amount of time prior to the task reaching underrun, that is the time before the synthesized speech output to a user catches up to the time since a TTS task was originated. The TTS system may also prioritize tasks to reduce the amount of time between when a user submits a TTS request and when results are delivered to the user. When prioritizing tasks, such as allocating resources to existing tasks or accepting new tasks, the TTS system may prioritize tasks with the lowest amount of time prior to underrun and/or tasks with the longest time prior to delivery of first results.

Type: Grant

Filed: March 21, 2014

Date of Patent: August 15, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventor: Bartosz Putrycz
Name recognition system

Patent number: 9721563

Abstract: A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.

Type: Grant

Filed: June 8, 2012

Date of Patent: August 1, 2017

Assignee: Apple Inc.

Inventor: Devang K. Naik
Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment

Patent number: 9697818

Abstract: A method and apparatus that dynamically adjust operational parameters of a text-to-speech engine in a speech-based system are disclosed. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.

Type: Grant

Filed: December 5, 2014

Date of Patent: July 4, 2017

Assignee: Vocollect, Inc.

Inventors: James Hendrickson, Debra Drylie Stiffey, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks

Patent number: 9697820

Abstract: Systems and processes for performing unit-selection text-to-speech synthesis are provided. In one example process, a sequence of target units can represent a spoken pronunciation of text. A set of predicted acoustic model parameters of a second target unit can be determined using a set of acoustic features of a first candidate speech segment of a first target unit and a set of linguistic features of the second target unit. A likelihood score of the second candidate speech segment with respect to the first candidate speech segment can be determined using the set of predicted acoustic model parameters of the second target unit and a set of acoustic features of the second candidate speech segment of the second target unit. The second candidate speech segment can be selected for speech synthesis based on the determined likelihood score. Speech corresponding to the received text can be generated using the selected second candidate speech segment.

Type: Grant

Filed: December 7, 2015

Date of Patent: July 4, 2017

Assignee: Apple Inc.

Inventor: Woojay Jeon
Note-taking assistance system, information delivery device, terminal, note-taking assistance method, and computer-readable recording medium

Patent number: 9697851

Abstract: A note-taker terminal (200) and an information delivery device (100) are used. The information delivery device (100) includes a breathing detection unit (104) that specifies breathing sections from silent sections of uttered speech, a data processing unit (105) that determines, for every allocated time period of a note-taker, whether a breathing section exists in a range based on an end point of the allocated time period, and generates, if a breathing section exists, speech data of the utterance from a start point of the allocated time period until the breathing section, and, if a breathing section does not exist, speech data of the utterance from the start point until the end point of the allocated time period, and a data transmission unit (106) that transmits the speech data to the note-taker terminal (200). The note-taker terminal (200) receives the speech data, and transmits input text data to a user terminal (300) of a note-taking user.

Type: Grant

Filed: February 20, 2014

Date of Patent: July 4, 2017

Assignee: NEC Solution Innovators, Ltd.

Inventor: Tomonari Nishimura
Thermal-aware compiler for parallel instruction execution in processors

Patent number: 9639359

Abstract: Embodiments are described for a method for compiling instruction code for execution in a processor having a number of functional units by determining a thermal constraint of the processor, and defining instruction words comprising both real instructions and one or more no operation (NOP) instructions to be executed by the functional units within a single clock cycle, wherein a number of NOP instructions executed over a number of consecutive clock cycles is configured to prevent exceeding the thermal constraint during execution of the instruction code.

Type: Grant

Filed: May 21, 2013

Date of Patent: May 2, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Yuan Xie, Junli Gu
Smart conversation method and electronic device using the same

Patent number: 9641481

Abstract: The disclosure proposes a smart conversation method and an electronic device using the same method. According to one of the exemplary embodiments, an electronic device may receive via a receiver a first communication in a first communication type and determining a recipient status. The electronic device may determine a second communication type as an optimal communication type based on the recipient status. The electronic device may convert the first communication into a second communication that is suitable for the second communication type. The electronic device may transmit via a transmitter the second communication in the second communication type.

Type: Grant

Filed: January 30, 2015

Date of Patent: May 2, 2017

Assignee: HTC Corporation

Inventor: Wen-Ping Ying
Voice mail transcription

Patent number: 9628603

Abstract: For voice mail transcription, a method is disclosed that includes detecting a communication device communicating an audio signal from the communication device to a voicemail system and transmitting data selected from the group consisting of text message data generated from the audio signal and voice training data to the voicemail system.

Type: Grant

Filed: July 23, 2014

Date of Patent: April 18, 2017

Assignee: Lenovo (Singapore) PTE. LTD.

Inventors: Nathan J. Peterson, John Carl Mese, Russell Speight VanBlon, Rod D. Waltermann
Intelligent text-to-speech conversion

Patent number: 9626955

Abstract: Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.

Type: Grant

Filed: April 4, 2016

Date of Patent: April 18, 2017

Assignee: Apple Inc.

Inventors: Christopher Brian Fleizach, Reginald Dean Hudson
Sender-responsive text-to-speech processing

Patent number: 9570066

Abstract: A method of speech synthesis including receiving a text input sent by a sender, processing the text input responsive to at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender, and communicating the synthesized speech to a recipient user of the system.

Type: Grant

Filed: July 16, 2012

Date of Patent: February 14, 2017

Assignee: General Motors LLC

Inventors: Gaurav Talwar, Xufang Zhao, Ron M. Hecht
System and method for automatically converting textual messages to musical compositions

Patent number: 9570055

Abstract: A method for converting textual messages to musical messages comprising receiving a text input and receiving a musical input selection. The method includes analyzing the text input to determine text characteristics and analyzing a musical input corresponding to the musical input selection to determine musical characteristics. Based on the text characteristic and the musical characteristic, the method includes correlating the text input with the musical input to generate a synthesizer input, and sending the synthesizer input to a voice synthesizer. The method includes receiving a vocal rendering of the text input from the voice synthesizer, generating a musical message from the vocal rendering and the musical input, and outputting the musical message.

Type: Grant

Filed: August 24, 2015

Date of Patent: February 14, 2017

Assignee: ZYA, INC.

Inventors: Matthew Michael Serletic, II, Bo Bazylevsky, James Mitchell, Ricky Kovac, Patrick Woodward, Thomas Webb, Ryan Groves
System and method for generalized preselection for unit selection synthesis

Patent number: 9564121

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.

Type: Grant

Filed: August 7, 2014

Date of Patent: February 7, 2017

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Mark Beutnagel, Yeon-Jun Kim, Ann K. Syrdal
Storage-network de-duplication

Patent number: 9558201

Abstract: Techniques are provided for de-duplication of data. In one embodiment, a system comprises de-duplication logic that is coupled to a de-duplication repository. The de-duplication logic is operable to receive, from a client device over a network, a request to store a file in the de-duplicated repository using a single storage encoding. The request includes a file identifier and a set of signatures that identify a set of chunks from the file. The de-duplication logic determines whether any chunks in the set are missing from the de-duplicated repository and requests the missing chunks from the client device. Then, for each missing chunk, the de-duplication logic stores in the de-duplicated repository that chunk and a signature representing that chunk. The de-duplication logic also stores, in the de-duplicated repository, a file entry that represents the file and that associates the set of signatures with the file identifier.

Type: Grant

Filed: January 7, 2014

Date of Patent: January 31, 2017

Assignee: VMware, Inc.

Inventors: Israel Zvi Ben-Shaul, Leonid Vasetsky
Generating sound for a rotating machine of a device

Patent number: 9542925

Abstract: The invention relates to a method for generating sound for a rotating machine, including a step (E1) of determining the frequencies and amplitudes of n partials and/or harmonics (i) pertaining to the sound of a rotating machine, characterized in that the method includes a step (E2) of determining values (a1) and a step (E7-E8) of calculating a synthetic sound for the rotating machine, said synthetic sound being composed from the n partials and/or harmonics (i), while the frequency thereof is entirely or partially shifted by the values (ai).

Type: Grant

Filed: April 4, 2012

Date of Patent: January 10, 2017

Assignees: RENAULT SAS, GENESIS

Inventors: Nathalie Le-Hir, Gaël Guyader, Patrick Boussard, Benoît Gauduin, Florent Jaillet
Apparatus and method for sound stage enhancement

Patent number: 9532156

Abstract: A non-transitory computer readable storage medium with instructions executable by a processor identify a center component, a side component and an ambient component within right and left channels of a digital audio input signal. A spatial ratio is determined from the center component and side component. The digital audio input signal is adjusted based upon the spatial ratio to form a pre-processed signal. Recursive crosstalk cancellation processing is performed on the pre-processed signal to form a crosstalk cancelled. The center component of the crosstalk cancelled signal is realigned to create the final digital audio output.

Type: Grant

Filed: December 12, 2014

Date of Patent: December 27, 2016

Assignee: AMBIDIO, INC.

Inventor: Tsai-Yi Wu
Systems and methods for combining stochastic average gradient and hessian-free optimization for sequence training of deep neural networks

Patent number: 9483728

Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.

Type: Grant

Filed: October 30, 2014

Date of Patent: November 1, 2016

Assignee: International Business Machines Corporation

Inventors: Pierre Dognin, Vaibhava Goel
Voice learning apparatus, voice learning method, and storage medium storing voice learning program

Patent number: 9483953

Abstract: A voice learning apparatus includes a learning-material voice storage unit that stores learning material voice data including example sentence voice data; a learning text storage unit that stores a learning material text including an example sentence text; a learning-material text display controller that displays the learning material text; a learning-material voice output controller that performs voice output based on the learning material voice data; an example sentence specifying unit that specifies the example sentence text during the voice output; an example-sentence voice output controller that performs voice output based on the example sentence voice data associated with the specified example sentence text; and a learning-material voice output restart unit that restarts the voice output from a position where the voice output is stopped last time, after the voice output is performed based on the example sentence voice data.

Type: Grant

Filed: August 7, 2012

Date of Patent: November 1, 2016

Assignee: CASIO COMPUTER CO., LTD.

Inventor: Daisuke Nakajima
Accessible white space in graphical representations of information

Patent number: 9471901

Abstract: Mechanisms are provided for representing white space in a graphical representation of a data model. These mechanisms involve analyzing output data that is to be output to a user via an output device, to identify white spaces in the output data. White spaces comprise portions of a range of metrics of output data values where the output data does not have data objects representing those portions of the range of metrics of output data. For each identified white space, a white space data object is created. The white space data objects are provided to an application which performs an operation on the white space data objects to output the white space data objects in a manner that identifies the white space data objects differently from non-white space data objects in the output data.

Type: Grant

Filed: September 12, 2011

Date of Patent: October 18, 2016

Assignee: International Business Machines Corporation

Inventors: Brian J. Cragun, Mary J. Mueller, James S. Taylor
Speech processing system

Patent number: 9466285

Abstract: A method of deriving speech synthesis parameters from an input speech audio signal, wherein the audio signal is segmented on the basis of estimated positions of glottal closure incidents and the resulting segments are processed to obtain the complex cepstrum used to derive a synthesis filter. A reconstructed speech signal is produced by passing a pulsed excitation signal derived from the position of the glottal closure incidents through the synthesis filter, and compared with the input speech audio signal. The pulse excitation signal and the complex cepstrum are then iteratively modified to minimize the difference between the reconstructed speech signal and the input speech audio signal, by optimizing the position of the pulses in the excitation signal to reduce the mean squared error between the reconstructed speech signal and the input speech audio signal, and recalculating the complex using the optimized pulse positions.

Type: Grant

Filed: November 26, 2013

Date of Patent: October 11, 2016

Assignee: Kabushiki Kaisha Toshiba

Inventor: Ranniery Maia
Automated data cleanup by substitution of words of the same pronunciation and different spelling in speech recognition

Patent number: 9460708

Abstract: The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary. The transducer can process speech recognition data in some cases by substituting a second word for a first word which shares pronunciation with the first word but is spelled differently. In some cases, this can be accomplished by establishing corresponding probabilities of the first word and second word based on a third word that appears in sequence with the first word.

Type: Grant

Filed: September 17, 2009

Date of Patent: October 4, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Geoffrey Zweig, Yun-Cheng Ju
Document management and retrieval system and document management and retrieval method

Patent number: 9454597

Abstract: A document management & retrieval system is configured to: store, for each word in a set of words, appearance positions of the each word in a set of documents as a word index; store, for each tag in a set of tags attached to words, a set of words that appear to a right and left of the each tag, and also store, as a tag LR index, appearance positions of the each tag in a set of documents with a combination of the each tag and a word appearing to its right or a combination of the each tag and a word appearing to its left as a key; and, in a tag search where a query phrase contains words and a tag next to each other, refer to the index with a tag and the word to the right or left of the tag as a key, thereby reducing the size of a document list to be read without needing to have a tag name as a secondary key. A tag is updated by just updating two places in the tag LR index.

Type: Grant

Filed: November 6, 2008

Date of Patent: September 27, 2016

Assignee: NEC Corporation

Inventors: Yukitaka Kusumura, Toshiyuki Kamiya
Audio mixing method, apparatus and system

Patent number: 9456273

Abstract: An audio mixing method, apparatus and system, which can ensure sound quality after audio mixing and reduce consumption of computing resources. The method includes: receiving an audio stream of each site, and analyzing the audio stream of each site to obtain a sound characteristic value of a sound source object; selecting, according to a descending sequence of sound characteristic values of sound source objects, a predetermined number of sound source objects from the sound source objects to serve as main sound source objects; determining, according to a relationship between a target site and the sites where the main sound source objects are located, audio streams that require audio mixing for the target site; and performing audio mixing on the audio streams that require audio mixing for the target site and sending the audio streams after the audio mixing to the target site.

Type: Grant

Filed: March 26, 2014

Date of Patent: September 27, 2016

Assignee: Huawei Device Co., Ltd.

Inventors: Dongqi Wang, Wuzhou Zhan
Exploiting heterogeneous data in deep neural network-based speech recognition systems

Patent number: 9454958

Abstract: Technologies pertaining to training a deep neural network (DNN) for use in a recognition system are described herein. The DNN is trained using heterogeneous data, the heterogeneous data including narrowband signals and wideband signals. The DNN, subsequent to being trained, receives an input signal that can be either a wideband signal or narrowband signal. The DNN estimates the class posterior probability of the input signal regardless of whether the input signal is the wideband signal or the narrowband signal.

Type: Grant

Filed: March 7, 2013

Date of Patent: September 27, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinyu Li, Dong Yu, Yifan Gong
Systems and methods for narrating electronic books

Patent number: 9449523

Abstract: A narration session between a plurality of participants can be set up to allow participants to collaboratively narrate an electronic book. Information can be transmitted to each participant so that the views of the participants remain in sync. Visual cues can also be transmitted to notify a participant of text that is to read aloud and audio snippets of read text are collected to form a narration file. Participants without access rights to the electronic book can be granted temporary rights.

Type: Grant

Filed: June 27, 2012

Date of Patent: September 20, 2016

Assignee: Apple Inc.

Inventors: Casey Maureen Dougherty, Gregory Robbin, Melissa Breglio Hajj
Vocal processing with accompaniment music input

Patent number: 9418642

Abstract: Systems, including methods and apparatus, for generating audio effects based on accompaniment audio produced by live or pre-recorded accompaniment instruments, in combination with melody audio produced by a singer. Audible broadcast of the accompaniment audio may be delayed by a predetermined time, such as the time required to determine chord information contained in the accompaniment signal. As a result, audio effects that require the chord information may be substantially synchronized with the audible broadcast of the accompaniment audio. The present teachings may be especially suitable for use in karaoke systems, to correct and add sound effects to a singer's voice that sings along with a pre-recorded accompaniment track.

Type: Grant

Filed: July 31, 2015

Date of Patent: August 16, 2016

Assignee: Sing Trix LLC

Inventors: David Kenneth Hilderman, John Devecka
System and method for data-driven socially customized models for language generation

Patent number: 9412358

Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.

Type: Grant

Filed: May 13, 2014

Date of Patent: August 9, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Taniya Mishra, Alistair D. Conkie, Svetlana Stoyanchev
Method for phonetizing a data list and voice-controlled user interface

Patent number: 9405742

Abstract: A method for phonetizing a data list having text-containing list entries, each list entry in the data list being subdivided into at least two data fields for provision to a voice-controlled user interface, includes: converting a list entry from a text representation into phonetics; storing the phonetics as phonemes in a phonetized data list; inserting a separating character into the text of a list entry between the respective data fields of the list entry, concomitantly converting the inserted separating character into phonetics and concomitantly storing the converted separating character as a phoneme symbol; and storing the phonemes in a phonetic database, the phonetized data list being produced from the phonemes stored in the phonetic database.

Type: Grant

Filed: February 11, 2013

Date of Patent: August 2, 2016

Assignee: Continental Automotive GmbH

Inventor: Jens Walther
Method of applying a combined or hybrid sound-field control strategy

Patent number: 9392390

Abstract: A method of applying a combined control strategy for the reproduction of multichannel audio signals in two or more sound zones, the method comprising deriving a first cost function for controlling the acoustic potential energy, such as on the basis of the Acoustic Contrast Control method and/or the Energy Difference Maximation method, in the zones to obtain acoustic separation between the zones in terms of sound pressure, deriving a second cost function, such as the Pressure Matching method, controlling the phase of the sound provided in the zones, and where a weight is obtained for determining a combination of the first and second cost functions in a combined optimization.

Type: Grant

Filed: March 14, 2013

Date of Patent: July 12, 2016

Assignee: Bang & Olufsen A/S

Inventors: Martin Olsen, Martin Bo Møller
Voice analysis apparatus, voice synthesis apparatus, voice analysis synthesis system

Patent number: 9390728

Abstract: A speech analysis apparatus is provided. An F0 extraction part extracts a pitch value from speech information. A spectrum extraction part extracts spectrum information from the speech information. An MVF extraction part extract a maximum voiced frequency and allows boundary information for respectively filtering a harmonic component and a non-harmonic component to be obtained. According to the speech analysis apparatus, speech synthesis apparatus, and speech analysis synthesis system of the present invention, speech that is closer to the original voice and is more natural may be synthesized. Also, speech may be represented with less data capacity.

Type: Grant

Filed: March 27, 2013

Date of Patent: July 12, 2016

Assignee: GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: Hong-Kook Kim, Kwang-Myung Jeon
In-context exact (ICE) matching

Patent number: 9342506

Abstract: Methods, systems and program product are disclosed for determining matching level of a text lookup segment with a plurality of source texts in a translation memory in terms of context. The invention determines exact matches for the lookup segment in the plurality of source texts, and determines, in the case that at least one exact match is determined, that a respective exact match is an in-context exact match for the lookup segment in the case that a context of the lookup segment matches that of the respective exact match. Degree of context matching required can be predetermined, and results prioritized. The invention also includes methods, systems and program products for storing a translation pair of source text and target text in a translation memory including context, and the translation memory so formed. The invention ensures that content is translated the same as previously translated content and reduces translator intervention.

Type: Grant

Filed: October 20, 2014

Date of Patent: May 17, 2016

Assignee: SDL Inc.

Inventors: Russell G. Ross, Kevin Gillespie
Distributed collection and processing of voice bank data

Patent number: 9336782

Abstract: Voice data may be collected by a plurality of voice donors and stored in a voice bank. A voice donor may authenticate to a voice collection system to start a session to provide voice data. During the voice collection session, the voice donor may be presented with a sequence of prompts to speak and voice data may be transferred to a server. The received voice data may be processed to determine the speech units spoken by the voice donor and a count of speech units received from the voice donor may be updated. Feedback may be provided to the voice donor indicating, for example, a progress of the voice collection, a quality level of the voice data, or information about speech unit counts. The voice bank may be used to create TTS voices for voice recipients, create a model of voice aging, or for other applications.

Type: Grant

Filed: June 29, 2015

Date of Patent: May 10, 2016

Assignee: VOCALID, INC.

Inventor: Rupal Patel
Method and system for obtaining music track information

Patent number: 9306687

Abstract: Methods and systems according to the disclosure are for obtaining information for a music track on a radio broadcast and may include storing track information received from an information service in a data storage module. When an intermediary server receives a request from an end-user device for information for a particular music track playing on a particular radio broadcast signal, the server first checks the data storage module to determine if it has music track information for the particular music track. If so, the server provides that music track information to the end-user device. The intermediary server may be configured to automatically and preemptively request information from the music information service each time a new music track is played on a radio broadcast signal. The intermediary server may be configured to request information from the music information service the first time it receives a request for that particular music track.

Type: Grant

Filed: September 27, 2013

Date of Patent: April 5, 2016

Assignee: Imagination Technologies Limited

Inventors: Nicholas H. Jurascheck, Ian Robert Knowles
Intelligent text-to-speech conversion

Patent number: 9305543

Abstract: Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.

Type: Grant

Filed: February 25, 2015

Date of Patent: April 5, 2016

Assignee: Apple Inc.

Inventors: Christopher Brian Fleizach, Reginald Dean Hudson
Smoothening the information density of spoken words in an audio signal

Patent number: 9293150

Abstract: A portion of an audio signal is identified corresponding to a spoken word and its phonemes. A set of alternate spoken words satisfying phonetic similarity criteria to the spoken word is generated. A subset of the set of alternate spoken words is also identified; each member of the subset shares the same phoneme in a similar temporal position as the spoken word. A significance factor is then calculated for the phoneme based on the number of alternates in the subset and on the total number of alternates. The calculated significance factor may then be used to lengthen or shorten the temporal duration of the phoneme in the audio signal according to its significance in the spoken word.

Type: Grant

Filed: September 12, 2013

Date of Patent: March 22, 2016

Assignee: International Business Machines Corporation

Inventors: Flemming Boegelund, Lav R. Varshney
Atmosphere expression word selection system, atmosphere expression word selection method, and program

Patent number: 9286913

Abstract: Disclosed is an information display system provided with: a signal analyzing unit which analyzes the audio signals obtained from a predetermined location and which generates ambient sound information regarding the sound generated at the predetermined location; and an ambient expression selection unit which selects an ambient expression which expresses the content of what a person is feeling from the sound generated at the predetermined location on the basis of the ambient sound information.

Type: Grant

Filed: March 28, 2011

Date of Patent: March 15, 2016

Assignee: NEC CORPORATION

Inventors: Toshiyuki Nomura, Yuzo Senda, Kyota Higa, Takayuki Arakawa, Yasuyuki Mitsui
Text to speech system

Patent number: 9269347

Abstract: A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute, including: inputting text; dividing the inputted text into a sequence of acoustic units; selecting a speaker for the inputted text; selecting a speaker attribute for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model; and outputting the sequence of speech vectors as audio with the selected speaker voice and a selected speaker attribute. The acoustic model includes a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, which parameters do not overlap. The selecting a speaker voice includes selecting parameters from the first set of parameters and the selecting the speaker attribute includes selecting the parameters from the second set of parameters.

Type: Grant

Filed: March 15, 2013

Date of Patent: February 23, 2016

Assignee: Kabushiki Kaisha Toshiba

Inventors: Javier Latorre-Martinez, Vincent Ping Leung Wan, Kean Kheong Chin, Mark John Francis Gales, Katherine Mary Knill, Masami Akamine
Shifting and recharging of emotional states with word sequencing

Patent number: 9261952

Abstract: The shifting and recharging of an emotional state with word sequencing is disclosed. A selection of a first word sequence set is received from the user. The word sequence set is defined by a mood recharging characteristic value, and includes a plurality of words each with at least one corresponding definition. A first one of the plurality of words in the selected first word sequence set is displayed. Then, a first one of the at least one corresponding definition of the first one of the plurality of words in the first word sequence set is displayed while the first one of the plurality of words remains displayed. The definition remains displayed for a time duration corresponding to a predefined cadence rate value. The user is prompted with a question related to the mood recharging characteristic value and associated with the first word sequence set.

Type: Grant

Filed: February 5, 2013

Date of Patent: February 16, 2016

Assignee: Spectrum Alliance, LLC

Inventors: Pamela Gail Greene, David L. Greene, Mary Anne Thomas
Text-to-speech processing using pre-stored results

Patent number: 9240178

Abstract: A text-to-speech (TTS) system is configured with multiple voice corpuses used to synthesize speech. An incoming TTS request may be processed by a first, smaller, voice corpus to quickly return results to the user. The text of the request may be stored by the TTS system and then processed in the background using a second, larger, voice corpus. The second corpus takes longer to process but returns higher quality results. Future incoming TTS requests may be compared against the text of the first TTS request. If the text, or portions thereof match, the system may return stored results from the processing by the second corpus, thus returning high quality speech results in a shorter time.

Type: Grant

Filed: June 26, 2014

Date of Patent: January 19, 2016

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Adam Franciszek Nadolski, Michal Krzysztof Kiedrowicz
Voice synthesizer

Patent number: 9230536

Abstract: A candidate voice segment sequence generator 1 generates candidate voice segment sequences 102 for an input language information sequence 101 by using DB voice segments 105 in a voice segment database 4. An output voice segment sequence determinator 2 calculates a degree of match between the input language information sequence 101 and each of the candidate voice segment sequences 102 by using a parameter 107 showing a value according to a cooccurrence criterion 106 for cooccurrence between the input language information sequence 101 and a sound parameter showing the attribute of each of a plurality of candidate voice segments in each of the candidate voice segment sequences 102, and determines an output voice segment sequence 103 on the basis of the degree of match.

Type: Grant

Filed: February 21, 2014

Date of Patent: January 5, 2016

Assignee: Mitsubishi Electric Corporation

Inventors: Takahiro Otsuka, Keigo Kawashima, Satoru Furuta, Tadashi Yamaura
Multilingual prosody generation

Patent number: 9195656

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Grant

Filed: December 30, 2013

Date of Patent: November 24, 2015

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
Chinese speech recognition system and method

Patent number: 9190051

Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.

Type: Grant

Filed: April 13, 2012

Date of Patent: November 17, 2015

Assignee: NATIONAL CHIAO TUNG UNIVERSITY

Inventors: Jyh-Her Yang, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
Systems and methods for reconstructing an audio signal from transformed audio information

Patent number: 9177560

Abstract: A system and method may be configured to reconstruct an audio signal from transformed audio information. The audio signal may be resynthesized based on individual harmonics and corresponding pitches determined from the transformed audio information. Noise may be subtracted from the transformed audio information by interpolating across peak points and across trough points of harmonic pitch paths through the transformed audio information, and subtracting values associated with the trough point interpolations from values associated with the peak point interpolations. Noise between harmonics of the sound may be suppressed in the transformed audio information by centering functions at individual harmonics in the transformed audio information, the functions serving to suppress noise between the harmonics.

Type: Grant

Filed: December 22, 2014

Date of Patent: November 3, 2015

Assignee: The Intellisis Corporation

Inventors: David C. Bradley, Daniel S. Goldin, Robert N. Hilton, Nicholas K. Fisher, Rodney Gateau
Speech synthesis information editing apparatus

Patent number: 9135909

Abstract: A speech synthesis information editing apparatus is provided. The speech synthesis information editing apparatus includes a phoneme storage unit that stores phoneme information, which designates a duration of each phoneme of speech to be synthesized. The speech synthesis information editing apparatus also includes a feature storage unit that stores feature information, which designates a time variation in a feature of the speech. In addition, the speech synthesis information editing apparatus includes an edition processing unit that changes a duration of each phoneme designated by the phoneme information with an expansion/compression degree, based on a feature designated by the feature information in correspondence to the phoneme.

Type: Grant

Filed: December 1, 2011

Date of Patent: September 15, 2015

Assignee: Yamaha Corporation

Inventor: Tatsuya Iriyama
Vocal processing with accompaniment music input

Patent number: 9123319

Abstract: Systems, including methods and apparatus, for generating audio effects based on accompaniment audio produced by live or pre-recorded accompaniment instruments, in combination with melody audio produced by a singer. Audible broadcast of the accompaniment audio may be delayed by a predetermined time, such as the time required to determine chord information contained in the accompaniment signal. As a result, audio effects that require the chord information may be substantially synchronized with the audible broadcast of the accompaniment audio. The present teachings may be especially suitable for use in karaoke systems, to correct and add sound effects to a singer's voice that sings along with a pre-recorded accompaniment track.

Type: Grant

Filed: August 25, 2014

Date of Patent: September 1, 2015

Assignee: Sing Trix LLC

Inventors: David Kenneth Hilderman, John Devecka
Mobile terminal

Patent number: 9058061

Abstract: A touch panel 1 is arranged on the front face of a key display unit 2 and accepts an input to a numeric keypad displayed on the key display unit 2. When a key of the numeric keypad displayed on the key display unit 2 is pressed, the control unit 4 measures the time for which the key is kept pressed long with a timer 5 and updates characters assigned to the key one by one by regarding that the key is pressed once each time a predetermined period of time passes while the key is kept pressed long, and then displays each character on the display unit 3. Further, the control unit 4 notifies a vibration unit 6 of the timing of updating the character each time the predetermined time passes while the key is kept pressed long. The vibration unit 6 vibrates the touch panel 1 based on the timing of updating the character notified by the control unit 4. Thus a mobile terminal that enables a user to input a desired character without watching the mobile terminal is provided.

Type: Grant

Filed: August 8, 2008

Date of Patent: June 16, 2015

Assignee: Kyocera Corporation

Inventors: Tomotake Aono, Tetsuya Takenaka
Email administration for rendering email on a digital audio player

Patent number: 9037466

Abstract: Methods, systems, and computer program products are provided for email administration for rendering email on a digital audio player. Embodiments include retrieving an email message; extracting text from the email message; creating a media file; and storing the extracted text of the email message as metadata associated with the media file. Embodiments may also include storing the media file on a digital audio player and displaying the metadata describing the media file, the metadata containing the extracted text of the email message.

Type: Grant

Filed: March 9, 2006

Date of Patent: May 19, 2015

Assignee: Nuance Communications, Inc.

Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson

prev 1 2 3 4 5 6 7 8 … next