Synthesis Patents (Class 704/258)

Neural network (Class 704/259)

Image to speech (Class 704/260)

Vocal tract model (Class 704/261)

Linear prediction (Class 704/262)

Correlation (Class 704/263)

Excitation (Class 704/264)

Interpolation (Class 704/265)

Specialized model (Class 704/266)

Time element (Class 704/267)

Frequency element (Class 704/268)

Transformation (Class 704/269)

Systems and methods for multilingual text generation field

Patent number: 11151334

Abstract: In at least one broad aspect, described herein are systems and methods in which a latent representation shared between two languages is built and/or accessed, and then leveraged for the purpose of text generation in both languages. Neural text generation techniques are applied to facilitate text generation, and in particular the generation of sentences (i.e., sequences of words or subwords) in both languages, in at least some embodiments.

Type: Grant

Filed: September 26, 2018

Date of Patent: October 19, 2021

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Mehdi Rezagholizadeh, Md Akmal Haidar, Alan Do-Omri, Ahmad Rashid
Method and system for implementing three-dimensional facial modeling and visual speech synthesis

Patent number: 11145100

Abstract: Novel tools and techniques are provided for implementing three-dimensional facial modeling and visual speech synthesis. In various embodiments, a computing system might determine an orientation, size, and location of a face in a received input image; retrieve a three-dimensional model template comprising a face and head; project the input image onto the model template to generate a three-dimensional model; define, on the model, a polygon mesh in a region of facial feature corresponding to feature in the input image; adjust parameters on the model; and display the model. The computing system might parse a text string into allophonic units; encode each allophonic unit into a point(s) in linguistic space corresponding to mouth movements; retrieve, from a codebook, indexed images/morphs corresponding to encoded points in the linguistic space; render the indexed images/morphs into an animation of the three-dimensional model; synchronize, for output, the animation with audio representations of the text string.

Type: Grant

Filed: January 12, 2018

Date of Patent: October 12, 2021

Assignee: The Regents of the University of Colorado, a body corporate

Inventors: Sarel van Vuuren, Nattawut Ngampatipatpong, Robert N. Bowen
Method and system for processing audio communications over a network

Patent number: 11114091

Abstract: A method of processing audio communications over a network, comprising: at a first client device: receiving a first audio transmission from a second client device that is provided in a source language distinct from a default language associated with the first client device; obtaining current user language attributes for the first client device that are indicative of a current language used for the communication session at the first client device; if the current user language attributes suggest a target language currently used for the communication session at the first client device is distinct from the default language associated with the first client device: obtaining a translation of the first audio transmission from the source language into the target language; and presenting the translation of the first audio transmission in the target language to a user at the first client device.

Type: Grant

Filed: October 10, 2019

Date of Patent: September 7, 2021

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Fei Xiong, Jinghui Shi, Lei Chen, Min Ren, Feixiang Peng
Computer vision and mapping for audio applications

Patent number: 11087728

Abstract: Systems, devices, media, and methods are presented for playing audio sounds, such as music, on a portable electronic device using a digital color image of a note matrix on a map. A computer vision engine, in an example implementation, includes a mapping module, a color detection module, and a music playback module. The camera captures a color image of the map, including a marker and a note matrix. Based on the color image, the computer vision engine detects a token color value associated with each field. Each token color value is associated with a sound sample from a specific musical instrument. A global state map is stored in memory, including the token color value and location of each field in the note matrix. The music playback module, for each column, in order, plays the notes associated with one or more the rows, using the corresponding sound sample, according to the global state map.

Type: Grant

Filed: December 21, 2019

Date of Patent: August 10, 2021

Assignee: Snap Inc.

Inventors: Ilteris Canberk, Donald Giovannini, Sana Park
Speech synthesis using one or more recurrent neural networks

Patent number: 11069335

Abstract: Aspects of the disclosure are related to synthesizing speech or other audio based on input data. Additionally, aspects of the disclosure are related to using one or more recurrent neural networks. For example, a computing device may receive text input; may determine features based on the text input; may provide the features as input to an recurrent neural network; may determine embedded data from one or more activations of a hidden layer of the recurrent neural network; may determine speech data based on a speech unit search that attempts to select, from a database, speech units based on the embedded data; and may generate speech output based on the speech data.

Type: Grant

Filed: July 12, 2017

Date of Patent: July 20, 2021

Assignee: Cerence Operating Company

Inventors: Vincent Pollet, Enrico Zovato
Computer-assisted conversation using addressible conversation segments

Patent number: 11069339

Abstract: At least some embodiments described herein relate to computer-assisted conversation. The set of available conversation segments is updated by addressing conversation segments at the granularity of a conversation segment or a group of conversation segments. For instance, an entire class of conversation segments may be addressed to add, delete, turn on, or turn off, the class of conversation segments. Groups of class of conversation segments may also be similarly addressed. Thus, as the scope of a conversation changes, the available set of conversation segments may likewise change with fine-grained control. Accordingly, rather than pre-plan every set of possible conversations, the context and direction of the conversation may be evaluated by code to thereby determine what new sets of conversation segments should be added, deleted, turned on, or turned off. New conversation segments may even be generated dynamically, taking into account the values of parameters that then exist.

Type: Grant

Filed: January 21, 2020

Date of Patent: July 20, 2021

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Angshuman Sarkar, John Anthony Taylor, Henrik Frystyk Nielsen
Operation method of dialog agent and apparatus thereof

Patent number: 11056110

Abstract: An operation method of a dialog agent includes obtaining an utterance history including at least one of an outgoing utterance to be transmitted to request a service or at least one of an incoming utterance to be received to request the service, updating a requirement specification including items requested for the service based on the utterance history, generating utterance information to be used to request the service based on the updated requirement specification, and outputting the generated utterance information.

Type: Grant

Filed: March 18, 2019

Date of Patent: July 6, 2021

Assignee: Samsung Electronics Co., Ltd.

Inventors: Young-Seok Kim, Jeong-Hoon Park, Seongmin Ok, Je Hun Jeon, Jun Hwi Choi
Parallel neural text-to-speech

Patent number: 11017761

Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.

Type: Grant

Filed: October 16, 2019

Date of Patent: May 25, 2021

Assignee: Baidu USA LLC

Inventors: Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
Speech recognition method and apparatus with activation word based on operating environment of the apparatus

Patent number: 11003417

Abstract: A speech recognition method and apparatus for performing speech recognition in response to an activation word determined based on a situation are provided. The speech recognition method and apparatus include an artificial intelligence (AI) system and its application, which simulates functions such as recognition and judgment of a human brain using a machine learning algorithm such as deep learning.

Type: Grant

Filed: October 13, 2017

Date of Patent: May 11, 2021

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sung-ja Choi, Eun-kyoung Kim, Ji-sang Yu, Ji-yeon Hong, Jong-youb Ryu, Jae-won Lee
Providing media content based on media element preferences

Patent number: 10984036

Abstract: A computing device is programmed to receive data collected from communications of a user. The computer identifies portions of the collected data including a keyword selected from a list of media content elements or lists of keywords associated with each of the media content elements. The computer associates each portion with a media content element. The computer further determines a score for each media content element based on at least one of the number of references, words included in the portion of collected referring to the media content element, and the voice quality of the portion of collected data referring to the media content element. Based on the scores, the computer assigns media content elements to the user. The computer recommends media content items to the user based at least in part on the media content elements assigned to the user.

Type: Grant

Filed: May 3, 2016

Date of Patent: April 20, 2021

Assignee: DISH Technologies L.L.C.

Inventors: Prakash Subramanian, Nicholas Brandon Newell
Word selection for natural language interface

Patent number: 10978069

Abstract: Techniques for altering default language, in system outputs, with language included in system inputs are described. A system may determine a word(s) in user inputs, associated with a particular user identifier, correspond to but are not identical to a word(s) in system outputs. The system may store an association between the user identifier, the word(s) in the user inputs, and the word(s) in the system outputs. Thereafter, when the system is generates a response to a user input, the system may replace the word(s), traditionally in the system outputs, with the word(s) that was present in previous user inputs. Such processing may further be tailored to a natural language intent.

Type: Grant

Filed: March 18, 2019

Date of Patent: April 13, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Andrew Starr McCraw, Sheena Yang, Sampat Biswas, Ryan Summers, Michael Sean McPhillips
Confidentiality-smart voice delivery of text-based incoming messages

Patent number: 10938976

Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: a voice delivery application, running on a mobile device of a user, receives a text message from a user; by use of sensor inputs of the mobile device, the mobile device stores data regarding environment of the mobile device including external audio equipment, speed of the user, and bystanders within a hearing range of the environment; various data describing a sender of the text message and the bystanders are analyzed for respective relationships with the user and with each other to determine a confidentiality group dictating whether or not the text message may be heard by the bystander; the text message may be scanned for content screening, then according to configuration of the voice delivery application, the text message is securely delivered to the user by voice.

Type: Grant

Filed: September 30, 2019

Date of Patent: March 2, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Darryl M. Adderly, Jonathan W. Jackson, Ajit Jariwala, Eric B. Libow
Quality of text analytics

Patent number: 10930302

Abstract: Text can be presented with speech indicators generated by a cognitive system by processing the text. The speech indicators can indicate recommended speech characteristics to be exhibited by a user while the user generates spoken utterances representing the text. Data indicating at least one user input changing at least one of the speech indicators from a first state as originally presented to a second state can be received. In response, a value indicating a level of change made to the at least one of the speech indicators can be determined. At least one parameter used by the cognitive system to select the speech indicators can be modified or created based on the value indicating the level of change made to the at least one of the speech indicators.

Type: Grant

Filed: December 22, 2017

Date of Patent: February 23, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ilse Breedvelt-Schouten, Sasa Matijevic
Speech synthesis unit selection

Patent number: 10923103

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting units for speech synthesis. One of the methods includes determining a sequence of text units that each represent a respective portion of text for speech synthesis; and determining multiple paths of speech units that each represent the sequence of text units by selecting a first speech unit that includes speech synthesis data representing a first text unit; selecting multiple second speech units including speech synthesis data representing a second text unit based on (i) a join cost to concatenate the second speech unit with a first speech unit and (ii) a target cost indicating a degree that the second speech unit corresponds to the second text unit; and defining paths from the selected first speech unit to each of the multiple second speech units to include in the multiple paths of speech units.

Type: Grant

Filed: November 28, 2017

Date of Patent: February 16, 2021

Assignee: Google LLC

Inventor: Ioannis Agiomyrgiannakis
Systems and methods for composition of audio content from multi-object audio

Patent number: 10901685

Abstract: Embodiments are related to processing of one or more input audio feeds for generation of a target audio stream that includes at least one object of interest to a listener. In some embodiments, the target audio stream may exclusively or primarily include the sound of the object of interest to the listener, without including other persons. This allows a listener to focus on an object of his or her interest and not necessarily have to listen to the performances of other objects in the input audio feed. Some embodiments contemplate multiple audio feeds and/or with multiple objects of interest.

Type: Grant

Filed: June 13, 2019

Date of Patent: January 26, 2021

Assignee: Sling Media Pvt. Ltd.

Inventors: Yatish Jayant Naik Raikar, Mohammed Rasool, Trinadha Harish Babu Pallapothu
Computer-based systems and methods configured for one or more technological applications for the automated assisting of telephone agent services

Patent number: 10904385

Abstract: At least some embodiments, a system includes a memory, and a processor configured to convert an audio stream of a speech of a customer during a customer call session into customer-originated text. The customer-originated text is displayed in a first chat interface. A request from a first call center agent is sent to a second call center agent via the first chat interface to interact with the customer during the customer call session and displayed in a second chat interface. The second agent is allowed to participate in the customer call session when the second call center agent accepts the request from the first call center agent. First agent-originated text and second agent-originated text during the customer call session is merged to form a combined agent-originated text and synthesized to computer-generated agent speech having a voice of a computer-generated agent based on the combined agent-originated text communicated to the customer over the voice channel.

Type: Grant

Filed: January 9, 2020

Date of Patent: January 26, 2021

Assignee: CAPITAL ONE SERVICES, LLC

Inventors: Srikanth Reddy Sheshaiahgari, Jignesh Rangwala, Lee Adcock, Vamsi Kavuri, Muthukumaran Vembuli, Mehulkumar Jayantilal Garnara, Soumyajit Ray, Vincent Pham
Systems and methods for multi-speaker neural text-to-speech

Patent number: 10896669

Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Type: Grant

Filed: May 8, 2018

Date of Patent: January 19, 2021

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
Devices, systems, and methods for assessing a vessel

Patent number: 10888232

Abstract: Embodiments of the present disclosure are configured to assess the severity of a blockage in a vessel and, in particular, a stenosis in a blood vessel. In some particular embodiments, the devices, systems, and methods of the present disclosure are configured to assess the severity of a stenosis in the coronary arteries without the administration of a hyperemic agent. Further, in some implementations devices, systems, and methods of the present disclosure are configured to normalize and/or temporally align pressure measurements from two different pressure sensing instruments. Further still, in some instances devices, systems, and methods of the present disclosure are configured to exclude outlier cardiac cycles from calculations utilized to evaluate a vessel, including providing visual indication to a user that the cardiac cycles have been excluded.

Type: Grant

Filed: January 16, 2014

Date of Patent: January 12, 2021

Assignee: PHILIPS IMAGE GUIDED THERAPY CORPORATION

Inventors: David Anderson, Howard David Alpert
Systems and methods for prioritizing messages for conversion from text to speech based on predictive user behavior

Patent number: 10887268

Abstract: Disclosed embodiments describe systems and methods for prioritizing messages for conversion from text to speech. A message manager can execute on a device. The message manager can identify a plurality of messages accessible via the device and can determine, for each message of the plurality of messages, a conversion score based on one or more parameters of each message. The conversion score can indicate a priority of each message to convert from text to speech. The message manager can identify a message of the plurality of messages for transmission to a text-to-speech converter for converting the message from text to speech. The message manager can also receive, from the text-to-speech converter, speech data of the message to play via an audio output of the device.

Type: Grant

Filed: August 19, 2019

Date of Patent: January 5, 2021

Assignee: Citrix Systems, Inc.

Inventors: Thierry Duchastel, Marcos Alejandro Di Pietro
Systems and methods for parallel wave generation in end-to-end text-to-speech

Patent number: 10872596

Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.

Type: Grant

Filed: February 15, 2019

Date of Patent: December 22, 2020

Assignee: Baidu USA LLC

Inventors: Wei Ping, Kainan Peng, Jitong Chen
Terminal apparatus, server, and information processing system

Patent number: 10867488

Abstract: [Solving Means] A terminal apparatus includes a processor part, a haptic-feedback-information output part, and a detection part. The processor part is configured to generate segmented-waveshape data by segmenting oscillation-waveshape data at a timing based on one or a plurality of periods of the oscillation-waveshape data, and to generate waveshape data based on the segmented-waveshape data. The haptic-feedback-information output part is configured to generate and output haptic-feedback information corresponding to the waveshape data generated by the processor part. The detection part is configured to detect an input signal based on a user action, and to output the detected input signal to the processor part as the oscillation-waveshape data.

Type: Grant

Filed: July 20, 2017

Date of Patent: December 15, 2020

Assignee: Sony Corporation

Inventors: Ryosuke Takeuchi, Seiji Muramatsu, Mioko Ambe, Kazutoshi Ohno, Tetsuya Takahashi, Tetsuya Naruse, Mikio Takenaka, Ryo Yokoyama, Akira Ono, Ryosuke Murakami, Hideaki Hayashi
Voice interaction apparatus and voice interaction method

Patent number: 10854219

Abstract: A voice interaction apparatus acquires a speech signal indicative of a speech sound, identifies a series of pitches of the speech sound from the speech signal, and causes a reproduction device to reproduce a response voice of pitches controlled in accordance with the lowest pitch of the pitches identified during a tailing section proximate to an end point within the speech sound.

Type: Grant

Filed: June 7, 2018

Date of Patent: December 1, 2020

Assignee: Yamaha Corporation

Inventor: Hiraku Kayama
Communication tone training

Patent number: 10832587

Abstract: An approach is provided that may obtain communication information regarding a communication between a first entity and a second entity while the communication may be ongoing. The communication may include an utterance. A tone associated with the utterance may be identified and may result in an identified tone. An outcome of the communication may be predicted using a machine learning based prediction module and the identified tone.

Type: Grant

Filed: March 15, 2017

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Rama K Akkiraju, Jalal U Mahmud, Vibha S Sinha, Mengdi Zhang
Communication tone training

Patent number: 10832588

Abstract: An approach is provided that may obtain communication information regarding a communication between a first entity and a second entity while the communication may be ongoing. The communication may include an utterance. A tone associated with the utterance may be identified and may result in an identified tone. An outcome of the communication may be predicted using a machine learning based prediction module and the identified tone.

Type: Grant

Filed: November 13, 2017

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Rama K. Akkiraju, Jalal U. Mahmud, Vibha S. Sinha, Mengdi Zhang
System for and method of accessing and selecting emoticons, content, and mood messages during chat sessions

Patent number: 10824297

Abstract: Emoticons or other images are inserted into text messages during chat sessions without leaving the chat session by entering an input sequence onto an input area of a touchscreen on an electronic device, thereby causing an emoticon library to be presented to a user. The user selects an emoticon, and the emoticon library either closes automatically or closes after the user enters a closing input sequence. The opening and closing input sequences are, for example, any combination of swipes and taps along or on the input area. Users are also able to add content to chat sessions and generate mood messages to chat sessions.

Type: Grant

Filed: November 2, 2017

Date of Patent: November 3, 2020

Assignee: Google LLC

Inventors: Lior Gonnen, Iddo Tal
Multifunction simultaneous interpretation device

Patent number: 10817674

Abstract: A multifunction simultaneous interpretation device includes an audio input and recognition module for receiving input speech of a first language, recognizing same, and converting the input speech of the first language into input speech signals of the first language; an interpretation module electrically connected to the audio input and recognition module and configured to receive the input speech signals of the first language, interpret and convert same into speech signals of a second language different from the first language, and make the speech signals of the second language as output; an output module electrically connected to the interpretation module and configured to receive the speech signals of the second language from the interpretation module and output a voice representing the speech signals of the second language; and a wireless transceiver electrically connected to the interpretation module and configured for wireless signal transmission to a mobile phone.

Type: Grant

Filed: June 14, 2018

Date of Patent: October 27, 2020

Inventors: Chun-Ai Tu, Chun-Yang Chang, Chun-Ling Ho, Yu Chin Chan
Method and apparatus for processing speech splicing and synthesis, computer device and readable medium

Patent number: 10803851

Abstract: The present disclosure provides a method for processing speech splicing and synthesis and apparatus, a computer device and a readable medium. The method comprises: expanding a speech library according to a pre-trained speech synthesis model and an obtained synthesized text; the speech library before the expansion comprises manually-collected original language materials; using the expanded speech library to perform speech splicing and synthesis processing. According to the technical solution of the present embodiment, the speech library is expanded so that the speech library includes sufficient language materials. As such, when speech splicing processing is performed according to the expanded speech library, it is possible to select more speech segments, and thereby improve coherence and naturalness of the effect of speech synthesis so that the speech synthesis effect is very coherent with very good naturalness and can sufficiently satisfy the user's normal use.

Type: Grant

Filed: December 19, 2018

Date of Patent: October 13, 2020

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Xiaohui Sun, Yu Gu
Voice generation with predetermined emotion type

Patent number: 10803850

Abstract: Techniques for generating voice with predetermined emotion type. In an aspect, semantic content and emotion type are separately specified for a speech segment to be generated. A candidate generation module generates a plurality of emotionally diverse candidate speech segments, wherein each candidate has the specified semantic content. A candidate selection module identifies an optimal candidate from amongst the plurality of candidate speech segments, wherein the optimal candidate most closely corresponds to the predetermined emotion type. In further aspects, crowd-sourcing techniques may be applied to generate the plurality of speech output candidates associated with a given semantic content, and machine-learning techniques may be applied to derive parameters for a real-time algorithm for the candidate selection module.

Type: Grant

Filed: September 8, 2014

Date of Patent: October 13, 2020

Inventors: Chi-Ho Li, Baoxun Wang, Max Leung
Optimized scale factor for frequency band extension in an audio frequency signal decoder

Patent number: 10783895

Abstract: A method and device are provided for determining an optimized scale factor to be applied to an excitation signal or a filter during a process for frequency band extension of an audio frequency signal. The band extension process includes decoding or extracting, in a first frequency band, an excitation signal and parameters of the first frequency band including coefficients of a linear prediction filter, generating an excitation signal extending over at least one second frequency band, filtering using a linear prediction filter for the second frequency band. The determination method includes determining an additional linear prediction filter, of a lower order than that of the linear prediction filter of the first frequency band, the coefficients of the additional filter being obtained from the parameters decoded or extracted from the first frequency and calculating the optimized scale factor as a function of at least the coefficients of the additional filter.

Type: Grant

Filed: August 30, 2019

Date of Patent: September 22, 2020

Assignee: Koninklijke Philips N.V.

Inventors: Magdalena Kaniewska, Stephane Ragot
Machine learning to integrate knowledge and augment natural language processing

Patent number: 10776586

Abstract: A system, computer program product, and method are provided to automate a framework for knowledge graph based persistence of data, and to resolve temporal changes and uncertainties in the knowledge graph. Natural language understanding, together with one or more machine learning models (MLMs), is used to extract data from unstructured information, including entities and entity relationships. The extracted data is populated into a knowledge graph. As the KG is subject to change, the KG is used to create new and retrain existing machine learning models (MLMs). Weighting is applied to the populated data in the form of veracity value. Blockchain technology is applied to the populated data to ensure reliability of the data and to provide auditability to assess changes to the data.

Type: Grant

Filed: January 10, 2018

Date of Patent: September 15, 2020

Assignee: International Business Machines Corporation

Inventors: David Bacarella, James H. Barnebee, IV, Nicholas Lawrence, Sumit Patel
Method and apparatus for calculating meal period

Patent number: 10755597

Abstract: Disclosed are a method and an apparatus for calculating a meal period, the method including: calculating, by a wrist acceleration calculating unit, a wrist acceleration variation value which is a variation value of acceleration in respect to a motion of a user's wrist which is measured based on gravitational acceleration; calculating, by a wrist angle calculating unit, a wrist angle variation value which is a variation value of an angle to the user's wrist based on a gravitational direction by using the wrist acceleration variation value; detecting, by an eating behavior candidate pattern detecting unit, an eating behavior candidate pattern based on a predetermined reference by applying one or more threshold values to the wrist angle variation value; and calculating, by a meal period calculating unit, a meal period based on the number of times the eating behavior candidate pattern occurs.

Type: Grant

Filed: April 27, 2017

Date of Patent: August 25, 2020

Assignee: AJOU UNIVERSITY INDUSTRY-ACADEMIC COOPERATION FOUNDATION

Inventors: We Duke Cho, Kyeong Chan Park, Sun Taag Choe
Electronic device with voice-synthesis and acoustic watermark capabilities

Patent number: 10755694

Abstract: An electronic device includes an audio synthesizer. The audio synthesizer can generate a voice-synthesized audio output stream as a function of one or more audible characteristics extracted from voice input received from an authorized user of the electronic device. The audio synthesizer can also apply an acoustic watermark to the voice-synthesized audio output stream, the acoustic watermark indicating that the voice-synthesized audio output stream is machine made.

Type: Grant

Filed: March 15, 2018

Date of Patent: August 25, 2020

Assignee: Motorola Mobility LLC

Inventors: Rachid Alameh, James P Ashley, Jarrett Simerson, Thomas Merrell
System and method of generating effects during live recitations of stories

Patent number: 10726838

Abstract: One aspect of this disclosure relates to presentation of a first effect on one or more presentation devices during an oral recitation of a first story. The first effect is associated with a first trigger point, first content, and/or first story. The first trigger point being one or more specific syllables from a word and/or phrase in the first story. A first transmission point associated with the first effect can be determined based on a latency of a presentation device and user speaking profile. The first transmission point being one or more specific syllables from a word and/or phrase before the first trigger point in the first story. Control signals for instructions to present the first content at the first trigger point are transmitted to the presentation device when a user recites the first transmission point such that first content is presented at the first trigger point.

Type: Grant

Filed: June 14, 2018

Date of Patent: July 28, 2020

Assignee: Disney Enterprises, Inc.

Inventors: Taylor Hellam, Malcolm E. Murdock, Mohammad Poswal, Nicolas Peck
Optimized scale factor for frequency band extension in an audio frequency signal decoder

Patent number: 10672412

Abstract: A method and device are provided for determining an optimized scale factor to be applied to an excitation signal or a filter during a process for frequency band extension of an audio frequency signal. The band extension process includes decoding or extracting, in a first frequency band, an excitation signal and parameters of the first frequency band including coefficients of a linear prediction filter, generating an excitation signal extending over at least one second frequency band, filtering using a linear prediction filter for the second frequency band. The determination method includes determining an additional linear prediction filter, of a lower order than that of the linear prediction filter of the first frequency band, the coefficients of the additional filter being obtained from the parameters decoded or extracted from the first frequency and calculating the optimized scale factor as a function of at least the coefficients of the additional filter.

Type: Grant

Filed: August 28, 2019

Date of Patent: June 2, 2020

Assignee: Koninklijke Philips N.V.

Inventors: Magdalena Kaniewska, Stephane Ragot
System and method for data-driven socially customized models for language generation

Patent number: 10665226

Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.

Type: Grant

Filed: June 4, 2019

Date of Patent: May 26, 2020

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Taniya Mishra, Alistair D. Conkie, Svetlana Stoyanchev
Mapping between speech signal and transcript

Patent number: 10650803

Abstract: A method, a computer program product, and a computer system for mapping between a speech signal and a transcript of the speech signal. The computer system segments the speech signal to obtain one or more segmented speech signals and the transcript of the speech signal to obtain one or more segmented transcripts of the speech signal. The computer system generates estimated phone sequences and reference phone sequences, calculates costs of correspondences between the estimated phone sequences and the reference phone sequences, determines a series of the estimated phone sequences with a smallest cost, selects a partial series of the estimated phone sequences from the series of the estimated phone sequences, and generates mapping data which includes the partial series of the estimated phone sequences and a corresponding series of the reference phone sequences.

Type: Grant

Filed: October 10, 2017

Date of Patent: May 12, 2020

Assignee: International Business Machines Corporation

Inventors: Takashi Fukuda, Nobuyasu Itoh
Voice enabled screen reader

Patent number: 10636429

Abstract: In some embodiments, a system may process a user interface to identify textual or graphical items in the interface, and may prepare a plurality of audio files containing spoken representations of the items. As the user navigates through the interface, different ones of the audio files may be selected and played, to announce text associated with items selected by the user. A computing device may periodically determine whether a cache offering the interface to users stores audio files for all of the interface's textual items, and if the cache is missing any audio files for any of the textual items, the computing device may take steps to have a corresponding audio file created.

Type: Grant

Filed: March 3, 2017

Date of Patent: April 28, 2020

Assignee: Comcast Cable Communications, LLC

Inventors: Thomas Wlodkowski, Michael J. Cook
Avatar image animation using translation vectors

Patent number: 10628985

Abstract: Techniques are described for image generation for avatar image animation using translation vectors. An avatar image is obtained for representation on a first computing device. An autoencoder is trained, on a second computing device comprising an artificial neural network, to generate synthetic emotive faces. A plurality of translation vectors is identified corresponding to a plurality of emotion metrics, based on the training. A bottleneck layer within the autoencoder is used to identify the plurality of translation vectors. A subset of the plurality of translation vectors is applied to the avatar image, wherein the subset represents an emotion metric input. The emotion metric input is obtained from facial analysis of an individual. An animated avatar image is generated for the first computing device, based on the applying, wherein the animated avatar image is reflective of the emotion metric input and the avatar image includes vocalizations.

Type: Grant

Filed: November 30, 2018

Date of Patent: April 21, 2020

Assignee: Affectiva, Inc.

Inventors: Taniya Mishra, George Alexander Reichenbach, Rana el Kaliouby
System and method for voice-to-voice conversion

Patent number: 10614826

Abstract: A method of building a speech conversion system uses target information from a target voice and source speech data. The method receives the source speech data and the target timbre data, which is within a timbre space. A generator produces first candidate data as a function of the source speech data and the target timbre data. A discriminator compares the first candidate data to the target timbre data with reference to timbre data of a plurality of different voices. The discriminator determines inconsistencies between the first candidate data and the target timbre data. The discriminator produces an inconsistency message containing information relating to the inconsistencies. The inconsistency message is fed back to the generator, and the generator produces a second candidate data. The target timbre data in the timbre space is refined using information produced by the generator and/or discriminator as a result of the feeding back.

Type: Grant

Filed: May 24, 2018

Date of Patent: April 7, 2020

Assignee: Modulate, Inc.

Inventors: William Carter Huffman, Michael Pappas
Oversampling in a combined transposer filterbank

Patent number: 10584386

Abstract: The present invention relates to coding of audio signals, and in particular to high frequency reconstruction methods including a frequency domain harmonic transposer. A system and method for generating a high frequency component of a signal from a low frequency component of the signal is described.

Type: Grant

Filed: December 18, 2018

Date of Patent: March 10, 2020

Assignee: Dolby International AB

Inventors: Lars Villemoes, Per Ekstrand
Parametric adaptation of voice synthesis

Patent number: 10586079

Abstract: Software-based systems perform parametric speech synthesis. TTS voice parameters determine the generated speech audio. Voice parameters include gender, age, dialect, donor, arousal, authoritativeness, pitch, range, speech rate, volume, flutter, roughness, breath, frequencies, bandwidths, and relative amplitudes of formants and nasal sounds. The system chooses TTS parameters based on one or more of: user profile attributes including gender, age, and dialect; situational attributes such as location, noise level, and mood; natural language semantic attributes such as domain of conversation, expression type, dimensions of affect, word emphasis and sentence structure; and analysis of target speaker voices. The system chooses TTS parameters to improve listener satisfaction or other desired listener behavior. Choices may be made by specified algorithms defined by code developers, or by machine learning algorithms trained on labeled samples of system performance.

Type: Grant

Filed: January 13, 2017

Date of Patent: March 10, 2020

Assignee: SOUNDHOUND, INC.

Inventors: Monika Almudafar-Depeyrot, Bernard Mont-Reynaud
Biometric signal analysis for communication enhancement and transformation

Patent number: 10579742

Abstract: Techniques are described for data transformation performed based on a current emotional state of the user who provided input data, the emotional state determined based on biometric data for the user. Sensor(s) may generate biometric data that indicates physiological characteristic(s) of the user, and an emotional state of the user is determined based on the biometric data. Different dictionaries and/or dictionary entries may be used in translation, depending on the emotional state of the sender when the data was input. In some implementations, the emotional state of the sending user may be used to infer or otherwise determine that a translation was incorrect. The input data may be transformed to include information indicating the current emotional state of the sending user when they provided the input data. For example, the output text may be presented in a user interface with an icon and/or other indication of the sender's emotional state.

Type: Grant

Filed: August 8, 2017

Date of Patent: March 3, 2020

Assignee: United Services Automobile Association (USAA)

Inventor: Amanda S. Fernandez
Voice interaction apparatus and voice interaction method

Patent number: 10573307

Abstract: A syntactic analysis unit 104 performs a syntactic analysis for linguistic information on acquired user' speech (hereinafter simply referred to as “user speech”). A non-linguistic information analysis unit 106 analyzes non-linguistic information different from the linguistic information for the acquired user speech. A topic continuation determination unit 110 determines whether a topic of the current conversation should be continued or should be changed to a different topic according to the non-linguistic information analysis result. A response generation unit 120 generates a response according to a result of a determination by the topic continuation determination unit 110.

Type: Grant

Filed: October 30, 2017

Date of Patent: February 25, 2020

Assignees: Furhat Robotics AB, TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventors: Gabriel Skantze, Martin Johansson, Tatsuro Hori, Narimasa Watanabe
Text-to-speech task scheduling

Patent number: 10546573

Abstract: To prioritize the processing text-to-speech (TTS) tasks, a TTS system may determine, for each task, an amount of time prior to the task reaching underrun, that is the time before the synthesized speech output to a user catches up to the time since a TTS task was originated. The TTS system may also prioritize tasks to reduce the amount of time between when a user submits a TTS request and when results are delivered to the user. When prioritizing tasks, such as allocating resources to existing tasks or accepting new tasks, the TTS system may prioritize tasks with the lowest amount of time prior to underrun and/or tasks with the longest time prior to delivery of first results.

Type: Grant

Filed: August 10, 2017

Date of Patent: January 28, 2020

Assignee: AMAZON TECHNOLOGIES, INC.

Inventor: Bartosz Putrycz
Implementing a classification model for recognition processing

Patent number: 10529318

Abstract: A method, system, and computer program product for learning a recognition model for recognition processing. The method includes preparing one or more examples for learning, each of which includes an input segment, an additional segment adjacent to the input segment and an assigned label. The input segment and the additional segment are extracted from an original training data. A classification model is trained, using the input segment and the additional segment in the examples, to initialize parameters of the classification model so that extended segments including the input segment and the additional segment are reconstructed from the input segment. Then, the classification model is tuned to predict a target label, using the input segment and the assigned label in the examples, based on the initialized parameters. At least a portion of the obtained classification model is included in the recognition model.

Type: Grant

Filed: July 31, 2015

Date of Patent: January 7, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Gakuto Kurata
Speech synthesizer, and speech synthesis method and computer program product utilizing multiple-acoustic feature parameters selection

Patent number: 10529314

Abstract: A speech synthesizer includes a statistical-model sequence generator, a multiple-acoustic feature parameter sequence generator, and a waveform generator. The statistical-model sequence generator generates, based on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states. The multiple-acoustic feature parameter sequence generator, for each speech section corresponding to each state of the statistical model sequence, selects a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generates a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters.

Type: Grant

Filed: February 16, 2017

Date of Patent: January 7, 2020

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masatsune Tamura, Masahiro Morita
System and method for outlier identification to remove poor alignments in speech synthesis

Patent number: 10497362

Abstract: A system and method are presented for outlier identification to remove poor alignments in speech synthesis. The quality of the output of a text-to-speech system directly depends on the accuracy of alignments of a speech utterance. The identification of mis-alignments and mis-pronunciations from automated alignments may be made based on fundamental frequency methods and group delay based outlier methods. The identification of these outliers allows for their removal, which improves the synthesis quality of the text-to-speech system.

Type: Grant

Filed: February 26, 2018

Date of Patent: December 3, 2019

Inventors: E. Veera Raghavendra, Aravind Ganapathiraju
Solar tablet verbal

Patent number: 10481860

Abstract: A Solar Tablet verbal with nano scale layers, lithium battery a solar MP3 player, e-books reader, e-newspaper reader, and e-magazine reader. All units are operable by verbal command and can work manually from an ultra-high definition touch screen. The solar technology utilizes the Photo electric effect with nano scale layers to boost solar cell efficiency. The tablet has encryption software.

Type: Grant

Filed: May 29, 2015

Date of Patent: November 19, 2019

Inventor: Gregory Walker Johnson
Multi mode voice assistant for the hearing disabled

Patent number: 10468022

Abstract: A voice assistant (VA) can switch between a voice input mode, in which the VA produces audible responses to voice queries, and a gesture input mode that can be triggered by a predetermined gesture, in which the VA produces visual responses to gesture-based queries.

Type: Grant

Filed: April 3, 2017

Date of Patent: November 5, 2019

Assignee: Motorola Mobility LLC

Inventors: Jun-ki Min, Mir Farooq Ali, Navin Tulsibhai Dabhi
Sound quality determination device, method for the sound quality determination and recording medium

Patent number: 10453478

Abstract: A sound quality determination device includes an acquisition unit acquiring an input sound, a frequency distribution calculation unit calculating a frequency distribution of the input sound acquired by the acquisition unit, a tilt calculation unit calculating a tilt indicating a change in intensity of an overtone with respect to a frequency based on the frequency distribution calculated by the frequency distribution calculation unit, a tilt comparison unit comparing the tilt calculated by the tilt calculation unit and a threshold related to the tilt, and a determination unit determining based on a result of comparison by the tilt comparison unit whether the input sound has a predetermined sound quality.

Type: Grant

Filed: March 14, 2018

Date of Patent: October 22, 2019

Assignee: YAMAHA CORPORATION

Inventor: Ryuichi Nariyama

prev 1 2 3 4 5 6 … next