Image To Speech Patents (Class 704/260)

Audio file quality and accuracy assessment

Patent number: 11971926

Abstract: Disclosed computer-based systems and methods for analyzing a plurality of audio files corresponding to text-based news stories and received from a plurality of audio file creators are configured to (i) compare quality and/or accuracy metrics of individual audio files against corresponding quality and/or accuracy thresholds, and (ii) based on the comparison: (a) accept audio files meeting the quality and/or accuracy thresholds for distribution to a plurality of subscribers for playback, (b) reject audio files failing to meet one or more certain quality and/or accuracy thresholds, (c) remediate audio files failing to meet certain quality thresholds, and (d) designate for human review, audio files failing to meet one or more certain quality and/or accuracy thresholds by a predetermined margin.

Type: Grant

Filed: August 17, 2020

Date of Patent: April 30, 2024

Assignee: Gracenote Digital Ventures, LLC

Inventors: Gregory P. Defouw, Venkatarama Anilkumar Panguluri
Robust direct speech-to-speech translation

Patent number: 11960852

Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.

Type: Grant

Filed: December 15, 2021

Date of Patent: April 16, 2024

Assignee: Google LLC

Inventors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz
Automatic grammar augmentation for robust voice command recognition

Patent number: 11948559

Abstract: Various embodiments include methods and devices for implementing automatic grammar augmentation for improving voice command recognition accuracy in systems with a small footprint acoustic model. Alternative expressions that may capture acoustic model decoding variations may be added to a grammar set. An acoustic model-specific statistical pronunciation dictionary may be derived by running the acoustic model through a large general speech dataset and constructing a command-specific candidate set containing potential grammar expressions. Greedy based and cross-entropy-method (CEM) based algorithms may be utilized to search the candidate set for augmentations with improved recognition accuracy.

Type: Grant

Filed: March 21, 2022

Date of Patent: April 2, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yang Yang, Anusha Lalitha, Jin Won Lee, Christopher Lott
Interplay between digital assistive technology

Patent number: 11922194

Abstract: A method of operating a computing device in support of improved accessibility includes displaying a user interface to an application on a display screen of the computing device, wherein the computing device includes an accessibility assistant that reads an audible description of an element of the user interface; initiating, on the computing device, a virtual assistant that conducts an audible conversation between a user and the virtual assistant through at least a microphone and a speaker associated with the computing device, wherein the virtual assistant is not integrated with an operating system of the computing device; inhibiting an ability of the accessibility assistant to read the audible description of the element of the user interface; and upon transition of the virtual assistant from an active state, enabling the ability of the accessibility assistant.

Type: Grant

Filed: May 19, 2022

Date of Patent: March 5, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jaclyn Carley Knapp, Lisa Stifelman, André Roberto Lima Tapajós, Jin Xu, Steven DiCarlo, Kaichun Wu, Yuhua Guan
Neural pitch-shifting and time-stretching

Patent number: 11915714

Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.

Type: Grant

Filed: December 21, 2021

Date of Patent: February 27, 2024

Assignees: Adobe Inc., Northwestern University

Inventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
Emoji-first messaging

Patent number: 11888797

Abstract: Emoji-first messaging where text messaging is automatically converted to emojis by an emoji-first application so that only emojis are communicated from one client device to another client device. Each client device has a library of emojis that are mapped to words, which libraries are customizable and unique to the users of the client devices, such that the users can communicate secretly in code. Upon receipt of a string of emojis, a user can select the emoji string to convert to text if desired, for a predetermined period of time.

Type: Grant

Filed: April 20, 2021

Date of Patent: January 30, 2024

Assignee: Snap Inc.

Inventors: Karl Bayer, Prerna Chikersal, Shree K. Nayar, Brian Anthony Smith
Media content discovery and character organization techniques

Patent number: 11886483

Abstract: Techniques for recommending media are described. A character preference function comprising a plurality of preference coefficients is accessed. A first character model comprises a first set of attribute values for the plurality of attributes of a first character. The first and second characters are associated with a first and second salience value, respectively. A second character model comprises a second set of attribute values for the plurality of attributes of a second character of the plurality of characters. A first character rating is calculated using the plurality of preference coefficients and the first set of attribute values. A second character rating of the second character is calculated using the plurality of preference coefficients with the second set of attribute values. A media rating is calculated based on the first and second salience values and the first and second character ratings. A media is recommended based on the media rating.

Type: Grant

Filed: May 4, 2020

Date of Patent: January 30, 2024

Assignee: The Nielsen Company (US), LLC

Inventors: Rachel Payne, Meghana Bhatt, Natasha Mohanty
Systems and methods for providing non-lexical cues in synthesized speech

Patent number: 11848001

Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes processor circuitry to generate a breathing cue to enhance speech to be synthesized from text; determine a first insertion point of the breathing cue in the text, wherein the breathing cue is identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text; determine a second insertion point of the prosody cue in the text, wherein the prosody cue is identified by a second tag of the markup language; insert the breathing cue at the first insertion point based on the first tag and the prosody cue at the second insertion point based on the second tag; and trigger a synthesis of the speech from the text, the breathing cue, and the prosody cue.

Type: Grant

Filed: June 23, 2022

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
Electronic device and method for controlling thereof

Patent number: 11848004

Abstract: A method for controlling an electronic device includes obtaining a text, obtaining, by inputting the text into a first neural network model, acoustic feature information corresponding to the text and alignment information in which each frame of the acoustic feature information is matched with each phoneme included in the text, identifying an utterance speed of the acoustic feature information based on the alignment information, identifying a reference utterance speed for each phoneme included in the acoustic feature information based on the text and the acoustic feature information, obtaining utterance speed adjustment information based on the utterance speed of the acoustic feature information and the reference utterance speed for each phoneme, and obtaining, based on the utterance speed adjustment information, speech data corresponding to the text by inputting the acoustic feature information into a second neural network model.

Type: Grant

Filed: June 27, 2022

Date of Patent: December 19, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Sangjun Park, Kihyun Choo
Multi-channel volume level equalization based on user preferences

Patent number: 11848655

Abstract: Systems, devices, and methods are provided for multi-stem volume equalization, wherein the volume levels of each stem may be adjusted non-uniformly. Audio may be diarized into a plurality of stems, including background noise separate. Mean and variance of the volume levels of the stems may be computed. Each audio stem may be automatically adjusted based on a stem-specific preference that a user may specify. View may adjust actor volume relative to the mean/variance that maintains a relative difference in volume levels between stems.

Type: Grant

Filed: September 15, 2021

Date of Patent: December 19, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Mohammed Khalilia, Naveen Sudhakaran Nair
Managing software defined networks using human language

Patent number: 11837223

Abstract: A human language software defined network (SDN) control system, including: a voice to text machine learning model configured to convert user speech to text; a machine learning language processing engine configured to control the operation of a SDN controller based upon the text; and a machine learning minimal language processing engine configured to control the operation of a SDN element based upon commands from the SDN controller produced by the machine learning language processing engine.

Type: Grant

Filed: December 18, 2020

Date of Patent: December 5, 2023

Assignee: NOKIA SOLUTIONS AND NETWORKS OY

Inventor: Sowrirajan Padmanabhan
Predicting spectral representations for training speech synthesis neural networks

Patent number: 11830475

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform speech synthesis. One of the methods includes obtaining a training data set for training a first neural network to process a spectral representation of an audio sample and to generate a prediction of the audio sample, wherein, after training, the first neural network obtains spectral representations of audio samples from a second neural network; for a plurality of audio samples in the training data set: generating a ground-truth spectral representation of the audio sample; and processing the ground-truth spectral representation using a third neural network to generate an updated spectral representation of the audio sample; and training the first neural network using the updated spectral representations, wherein the third neural network is configured to generate updated spectral representations that resemble spectral representations generated by the second neural network.

Type: Grant

Filed: June 1, 2022

Date of Patent: November 28, 2023

Assignee: DeepMind Technologies Limited

Inventor: Norman Casagrande
Systems and methods for reducing latency in cloud services

Patent number: 11817087

Abstract: Systems and methods for distributing cloud-based language processing services to partially execute in a local device to reduce latency perceived by the user. For example, a local device may receive a request via audio input, that requires a cloud-based service to process the request and generate a response. A partial response may be generated locally and played back while a more complete response is generated remotely.

Type: Grant

Filed: August 28, 2020

Date of Patent: November 14, 2023

Assignee: Micron Technology, Inc.

Inventor: Ameen D. Akel
Systems and methods for expanding data classification using synthetic data generation in machine learning models

Patent number: 11810000

Abstract: Systems and methods for classifying data are disclosed. For example, a system may include at least one memory storing instructions and at least one processor configured to execute the instructions to perform operations. The operations may include receiving training data comprising a class. The operations may include training a data classification model using the training data to generate a trained data classification model. The operations may include receiving additional data comprising labeled samples of an additional class not contained in the training data. The operations may include creating a synthetic data generator. The operations may include training the synthetic data generator to generate synthetic data corresponding to the additional class. The operations may include generating a synthetic classified dataset comprising the additional class. The operations may include retraining the trained data classification model using the synthetic classified dataset.

Type: Grant

Filed: November 30, 2022

Date of Patent: November 7, 2023

Assignee: CAPITAL ONE SERVICES, LLC

Inventors: Austin Walters, Jeremy Goodsitt, Anh Truong
Symbol sequence converting apparatus and symbol sequence conversion method

Patent number: 11809831

Abstract: A symbol sequence converting apparatus according to an embodiment includes one or more hardware processors. The processors: generates a plurality of candidate output symbol sequences, based on rule information in which input symbols are each associated with one or more output symbols each obtained by converting the corresponding input symbol in accordance with a predetermined conversion condition, the plurality of candidate output symbol sequences each containing one or more of the output symbols and corresponding to an input symbol sequence containing one or more of the input symbols; derives respective confidence levels of the plurality of candidate output symbol sequences by using a learning model; and identifies, as an output symbol sequence corresponding to the input symbol sequence, the candidate output symbol sequence corresponding to a highest confidence level.

Type: Grant

Filed: August 27, 2020

Date of Patent: November 7, 2023

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Tomohiro Yamasaki, Yoshiyuki Kokojima
Speech translation method and system using multilingual text-to-speech synthesis model

Patent number: 11810548

Abstract: A speech translation method using a multilingual text-to-speech synthesis model includes acquiring a single artificial neural network text-to-speech synthesis model having acquired learning based on a learning text of a first language and learning speech data of the first language corresponding to the learning text of the first language, and a learning text of a second language and learning speech data of the second language corresponding to the learning text of the second language, receiving input speech data of the first language and an articulatory feature of a speaker regarding the first language, converting the input speech data of the first language into a text of the first language, converting the text of the first language into a text of the second language, and generating output speech data for the text of the second language that simulates the speaker's speech.

Type: Grant

Filed: July 10, 2020

Date of Patent: November 7, 2023

Assignee: NEOSAPIENCE, INC.

Inventors: Taesu Kim, Younggun Lee
Systems and methods for presenting social network communications in audible form based on user engagement with a user device

Patent number: 11804209

Abstract: Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.

Type: Grant

Filed: December 21, 2022

Date of Patent: October 31, 2023

Assignee: Rovi Product Corporation

Inventor: William Korbecki
Non-transitory computer-readable medium and video game processing system

Patent number: 11794106

Abstract: A non-transitory computer-readable medium including a video game processing program for causing a server to perform functions to control progress of a video game is provided. The functions include: a first arranging function configured to arrange a first object determined based on a user operation at a position determined based on a position of the user terminal in a virtual space corresponding to map information of a real space; a second arranging function configured to arrange a second object with which a predetermined event is associated at a position determined based on a position of the first object after a time determined in accordance with the first object elapses from a time when the first object is arranged; and a generating function configured to generate the predetermined event in accordance with a positional condition regarding the positions of the user terminal and the second object.

Type: Grant

Filed: September 30, 2021

Date of Patent: October 24, 2023

Assignee: SQUARE ENIX CO., LTD.

Inventors: Takamasa Shiba, Hiroshi Kobayashi, Jun Waga, Yutaka Yoshida
Symmetric discovery over audio

Patent number: 11756024

Abstract: A first computing device broadcasts a first audio token comprising the first user computing device identifier over two or more audio frequency channels at specified intervals and listens for audio inputs via the two or more audio frequency channels at the specified intervals. The first computing device receives a second audio token generated by a second computing device and communicates the received second audio token to the one or more computing devices. The second computing device receives the first audio token generated by the first computing device and communicates the received first audio token to the one or more computing devices. The one or more computing devices receive the first and second audio tokens and pair the first computing device and the second computing device and facilitate a transfer of data between the first computing device and the second computing device.

Type: Grant

Filed: July 12, 2021

Date of Patent: September 12, 2023

Assignee: GOOGLE LLC

Inventors: Edward Chiang, Arjita Madan, Gopi Krishna Madabhushi, Heman Khanna, Rohan Laishram, Aviral Gupta
Output apparatus, output method and non-transitory computer-readable recording medium

Patent number: 11749270

Abstract: An output apparatus according to the present application includes an estimation unit, a decision unit, and an output unit. The estimation unit estimates an emotion of a user from detection information detected by a predetermined detection device. The decision unit decides information to be changed on the basis of the estimated emotion of the user. The output unit outputs information for changing the information to be changed.

Type: Grant

Filed: March 10, 2021

Date of Patent: September 5, 2023

Assignee: YAHOO JAPAN CORPORATION

Inventors: Kota Tsubouchi, Teruhiko Teraoka, Hidehito Gomi, Junichi Sato
Development of voice and other interaction applications

Patent number: 11749256

Abstract: Among other things, a developer of an interaction application for an enterprise can create items of content to be provided to an assistant platform for use in responses to requests of end-users. The developer can deploy the interaction application using defined items of content and an available general interaction model including intents and sample utterances having slots. The developer can deploy the interaction application without requiring the developer to formulate any of the intents, sample utterances, or slots of the general interaction model.

Type: Grant

Filed: August 10, 2020

Date of Patent: September 5, 2023

Assignee: Voicify, LLC

Inventors: Jeffrey K. McMahon, Robert T. Naughton, Nicholas G. Laidlaw, Alexander M. Dunn, Jason Green
Adaptive virtual assistant attributes

Patent number: 11741945

Abstract: An adaptive virtual assistant system can be configured to change an attribute of a virtual assistant based on user responses, environmental conditions, and/or topics of discussion. For example, the virtual assistant system can determine, based at least in part on user data, a communication profile that is associated with the virtual assistant and determine first communication data comprising a first communication attribute based on the communication profile. In some instances, the system can transmit the first communication data to a user device and receive, from the user device, input audio data representing a user utterance. Based at least in part on the input audio data, the system can determine second communication data comprising a second communication attribute and transmit the second communication data to the user device.

Type: Grant

Filed: September 30, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Joseph Daniel Sullivan, Pasquale DeMaio, Akshay Isaac Lazarus, Juliana Saussy
Method and system for generating synthetic video advertisements

Patent number: 11741996

Abstract: In one aspect, an example method includes (i) obtaining a set of user attributes for a user of a content-presentation device; (ii) based on the set of user attributes, obtaining structured data and determining a textual description of the structured data; (iii) transforming, using a text-to-speech engine, the textual description of the structured data into synthesized speech; and (iv) generating, using the synthesized speech and for display by the content-presentation device, a synthetic video of a targeted advertisement comprising the synthesized speech.

Type: Grant

Filed: December 26, 2022

Date of Patent: August 29, 2023

Assignee: Roku, Inc.

Inventors: Sunil Ramesh, Michael Cutter, Charles Brian Pinkerton, Karina Levitian
Augmented intelligence explainability with recourse

Patent number: 11741429

Abstract: A method, system and computer-readable storage medium for performing a cognitive information processing operation. The cognitive information processing operation includes: receiving data from a plurality of data sources; processing the data from the plurality of data sources to provide cognitively processed insights via an augmented intelligence system, the augmented intelligence system executing on a hardware processor of an information processing system, the augmented intelligence system and the information processing system providing a cognitive computing function; performing an explainability with recourse operation, the explainability with recourse operation providing an assurance explanation regarding the cognitive computing function; and, providing the cognitively processed insights to a destination, the destination comprising a cognitive application, the cognitive application enabling a user to interact with the cognitive insights.

Type: Grant

Filed: October 25, 2019

Date of Patent: August 29, 2023

Assignee: Tecnotree Technologies, Inc.

Inventors: Joydeep Ghosh, Jessica Henderson, Matthew Sanchez
Text-to-speech (TTS) processing

Patent number: 11735162

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: August 8, 2022

Date of Patent: August 22, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Methods and systems for computer-generated visualization of speech

Patent number: 11735204

Abstract: Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.

Type: Grant

Filed: August 17, 2021

Date of Patent: August 22, 2023

Assignee: SomniQ, Inc.

Inventors: Rikko Sakaguchi, Hidenori Ishikawa
Synthetic speech processing

Patent number: 11735156

Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.

Type: Grant

Filed: August 31, 2020

Date of Patent: August 22, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
Intent recognition and emotional text-to-speech learning

Patent number: 11727914

Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.

Type: Grant

Filed: December 24, 2021

Date of Patent: August 15, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan, Jian Luan, Yu Shi, Malone Ma, Mei-Yuh Hwang
Device functionality identification

Patent number: 11721331

Abstract: Systems and methods for device functionality identification are disclosed. For example, a connected device may be coupled to a secondary device. A user may request operation of the connected device, and a system may determine that the connected device is of a given device type. Based on the connected device being of the given device type, the system may cause another device having an environmental sensor to send sensor data indicating environmental changes sensed by the sensor. The connected device may be operated and the sensor may sense environmental changes caused by operation of the connected device. When the sensed environmental changes indicate the device type of the secondary device, a recommendation to change the device type of the connected device to the device type of the secondary device may be provided to a user device associated with the connected device.

Type: Grant

Filed: December 6, 2019

Date of Patent: August 8, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Jeffrey B Kinsey
Systems and methods for morpheme reflective engagement response for revision and transmission of a recording to a target individual

Patent number: 11699037

Abstract: Systems and methods for increasing the impact of a message for a target individual are provided. An audio recording of the message and audio recordings of the target individual are each associated with transcribed text, which is separated into morphemes. Morphemes in the message are substituted with, or supplemented by, matching morphemes in the audio recordings of the target individual to create a revised version of the audio recording of the message, and then electronically transmit the revised audio recording to an electronic device associated with the target individual.

Type: Grant

Filed: March 8, 2021

Date of Patent: July 11, 2023

Assignee: Rankin Labs, LLC

Inventor: John Rankin
Multi-scale spectrogram text-to-speech

Patent number: 11694674

Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.

Type: Grant

Filed: May 26, 2021

Date of Patent: July 4, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Syed Ammar Abbas, Bajibabu Bollepalli, Alexis Pierre Moinet, Thomas Renaud Drugman, Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Simon Slangen, Petr Makarov
Automated comprehension of natural language via constraint-based processing

Patent number: 11687722

Abstract: A consistent meaning framework (CMF) graph including a plurality of nodes linked by a plurality of edges is maintained in data storage of a data processing system. Multiple nodes among the plurality of nodes are meaning nodes corresponding to different word meanings for a common word spelling of a natural language. Each of the multiple word meanings has a respective one of a plurality of associated constraints. A natural language communication is processed by reference to the CMF graph. The processing includes parsing the natural language communication and selecting, for each of multiple word spellings in the natural language communication, a selected word meaning from among the word meanings provided by the CMF graph. The selected word meaning for each of the multiple word spellings in the natural language communication is recorded in data storage.

Type: Grant

Filed: February 6, 2020

Date of Patent: June 27, 2023

Inventor: Thomas A. Visel
Robust natural language parser

Patent number: 11687727

Abstract: A method for processing a natural language communication includes a processor receiving a natural language communication and identifying, in the natural language communication, an ordered sequence of a plurality of word spellings of a natural human language. The processor performs constraint-based parsing of the natural language communication by reference to a consistent meaning framework (CMF) graph including a plurality of nodes each corresponding to a respective word meaning. The parsing includes scanning the phrase in multiple different parse scan directions and determining a respective meaning associated with each of multiple word spellings in a phrase of the natural language communication by reference to multiple constraints provided by multiple nodes in the CMF graph respectively corresponding to the word meanings.

Type: Grant

Filed: May 6, 2021

Date of Patent: June 27, 2023

Inventor: Thomas A. Visel
Accessibility verification and correction for digital content

Patent number: 11681417

Abstract: Techniques are disclosed for increasing accessibility of digital content. For instance, a code for the digital content and one or more accessibility guidelines are received. The code is analyzed to identify a violation of an accessibility guideline. The digital content presented in accordance with the code, data indicative of the violation, and an option to correct the violation are displayed on a User Interface (UI). In response to receiving an input indicative of a selection of the option to correct the violation, one or more correction options to correct the violation are provided. In response to a selection of a correction option, the code is altered, based on the selected correction option. The alteration of the code corrects the violation of the accessibility guideline and thereby changes one or more aspects of how the digital content is to be presented.

Type: Grant

Filed: October 23, 2020

Date of Patent: June 20, 2023

Assignee: Adobe Inc.

Inventors: Meera Ramachandran Nair, Manish Kumar Pandey, Majji Kranthi Kumar, Mohit Chaturvedi, Malkeet Singh, Sanjay Kumar Biswas
Speech processing

Patent number: 11676577

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for adapting a language model are disclosed. In one aspect, a method includes the actions of receiving transcriptions of utterances that were received by computing devices operating in a domain and that are in a source language. The actions further include generating translated transcriptions of the transcriptions of the utterances in a target language. The actions further include receiving a language model for the target language. The actions further include biasing the language model for the target language by increasing the likelihood of the language model selecting terms included in the translated transcriptions. The actions further include generating a transcription of an utterance in the target language using the biased language model and while operating in the domain.

Type: Grant

Filed: September 9, 2021

Date of Patent: June 13, 2023

Assignee: Google LLC

Inventors: Petar Aleksic, Benjamin Paul Hillson Haynor
Data driven analysis, modeling, and semi-supervised machine learning for qualitative and quantitative determinations

Patent number: 11670415

Abstract: Systems and methods are provided for data driven analysis, modeling, and semi-supervised machine learning for qualitative and quantitative determinations. The systems and methods include obtaining data associated with individuals, and determining features associated with the individuals based on the data and similarities among the individuals based on the features. The systems and methods can label some individuals as exemplary, generate a graph wherein nodes of the graph represent individuals, edges of the graph represent similarity among the individuals, and nodes associated labeled individuals are weighted. The disclosed system and methods can apply a weight to unweighted nodes of the graph based on propagating the labels through the graph where the propagation is based on influence exerted by the weighted nodes on the unweighted nodes. The disclosed systems and methods can provide output associated with the individuals represented on the graph and the associated weights.

Type: Grant

Filed: December 18, 2020

Date of Patent: June 6, 2023

Assignee: INCLUDED HEALTH, INC.

Inventors: Seiji James Yamamoto, Ranjit Chacko
Clockwork hierarchal variational encoder

Patent number: 11664011

Abstract: A method of providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word and selecting a mel spectral embedding for the text utterance. Each word has at least one syllable and each syllable has at least one phoneme. For each phoneme, the method further includes using the selected mel spectral embedding to: (i) predict a duration of the corresponding phoneme based on corresponding linguistic features associated with the word that includes the corresponding phoneme and corresponding linguistic features associated with the syllable that includes the corresponding phoneme; and (ii) generate a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame represents mel-spectral information of the corresponding phoneme.

Type: Grant

Filed: February 9, 2022

Date of Patent: May 30, 2023

Assignee: Google LLC

Inventors: Robert Andrew James Clark, Chun-an Chan, Vincent Ping Leung Wan
Framework for explainability with recourse of black-box trained classifiers and assessment of fairness and robustness of black-box trained classifiers

Patent number: 11645620

Abstract: A method, system and computer-readable storage medium for performing a counterfactual generation operation. The counterfactual generation operation includes: receiving a subject data point; classifying the data point via a trained classifier, the classifying providing a classified data point; identifying a counterfactual using the classified data point, the counterfactual comprising another datapoint, the another data point being close to the subject data point, the another data point resulting in production of a different outcome when provided to a model when compared to an outcome resulting from the subject data point being provided to the model; and, providing the counterfactual to a destination.

Type: Grant

Filed: October 18, 2019

Date of Patent: May 9, 2023

Assignee: Tecnotree Technologies, Inc.

Inventors: Joydeep Ghosh, Shubham Sharma, Jessica Henderson, Matthew Sanchez
Apparatus for voice-age adjusting an input voice signal according to a desired age

Patent number: 11646021

Abstract: According to one embodiment, an apparatus for processing a voice signal includes a display configured to display an image of a user or a character corresponding to the user, a microphone, a speaker configured to output a voice signal of the user, a memory configured to store a trained voice age conversion model, and a processor configured to, based on changing an age of the user or the character displayed on the display, control the display such that the display displays the user or the character corresponding to the changed age. The processor is further configured to determine a first age that is a current age of the user or the character based on the voice signal of the user inputted through the microphone. Accordingly, convenience of a user may be enhanced.

Type: Grant

Filed: April 16, 2020

Date of Patent: May 9, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Siyoung Yang, Yongchul Park, Sungmin Han, Sangki Kim, Juyeong Jang, Minook Kim
System for selective presentation of notifications

Patent number: 11632346

Abstract: A device, such as a head-mounted wearable device (HMWD), provides audible notifications to a user with a voice user interface (VUI). A filtered subset of notifications addressed to the user, such as notifications from contacts associated with specified applications, are processed by a text to speech system that generates audio output for presentation to the user. The audio output may be presented using the HMWD. For example, the audio output generated from a text message received from a contact may be played on the device. The user may provide an input to play the notification again, initiate a reply, or take another action. The input may comprise a gesture on a touch sensor, activation of a button, verbal input acquired by a microphone, and so forth.

Type: Grant

Filed: December 12, 2019

Date of Patent: April 18, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Abinash Mahapatra, Anuj Saluja, Ouning Zhang, Xinyu Miao, Ting Liu, Yanina Potashnik, Alfred Ying Fai Lui, Choon-Mun Hooi, Jeffrey John Easter, Oliver Huy Doan, Jonathan B. Assayag
Autonomously acting robot that changes pupil image of the autonomously acting robot

Patent number: 11623347

Abstract: A monitor is installed in an eye of a robot, and an eye image is displayed on the monitor. The robot extracts a feature quantity of an eye of a user from a filmed image of the user. The feature quantity of the eye of the user is reflected in the eye image. For example, a feature quantity is a size of a pupillary region and a pupil image, and a form of an eyelid image. Also, a blinking frequency or the like may also be reflected as a feature quantity. Familiarity with respect to each user is set, and which user's feature quantity is to be reflected may be determined in accordance with the familiarity.

Type: Grant

Filed: May 23, 2019

Date of Patent: April 11, 2023

Assignee: GROOVE X, INC.

Inventor: Kaname Hayashi
Systems and methods for formatting informal utterances

Patent number: 11610582

Abstract: Methods and systems are presented for translating informal utterances into formal texts. Informal utterances may include words in abbreviation forms or typographical errors. The informal utterances may be processed by mapping each word in an utterance into a well-defined token. The mapping from the words to the tokens may be based on a context associated with the utterance derived by analyzing the utterance in a character-by-character basis. The token that is mapped for each word can be one of a vocabulary token that corresponds to a formal word in a pre-defined word corpus, an unknown token that corresponds to an unknown word, or a masked token. Formal text may then be generated based on the mapped tokens. Through the processing of informal utterances using the techniques disclosed herein, the informal utterances are both normalized and sanitized.

Type: Grant

Filed: March 26, 2020

Date of Patent: March 21, 2023

Assignee: PayPal, Inc.

Inventors: Sandro Cavallari, Yuzhen Zhuo, Van Hoang Nguyen, Quan Jin Ferdinand Tang, Gautam Vasappanavara
Method and system for communication

Patent number: 11611649

Abstract: Provided is a computer implemented method and system for delivering text messages, emails, and messages from a messenger application to a user while the user is engaged in an activity, such as driving, exercising, or working. Typically, the emails and other messages are announced to the user and read aloud without any user input. In Drive Mode, while the user is driving, a clean interface is shown to the user, and the user can hear announcements and messages/emails aloud without looking at the screen of the phone, and use gestures to operate the phone. After a determination is made that a new text message and/or email has arrived, the user is informed aloud of the text message/email/messenger message and in most instances, and if the user takes no further action, the body and/or subject of the text message/email/messenger message is read aloud to the user. All messages can be placed in a single queue, and read to the user in order of receipt.

Type: Grant

Filed: March 21, 2022

Date of Patent: March 21, 2023

Assignee: MESSAGELOUD INC

Inventor: Garin Toren
Voice synthesis method, apparatus, device and storage medium

Patent number: 11600259

Abstract: Provided are a voice synthesis method, an apparatus, a device, and a storage medium, involving obtaining text information and determining characters in the text information and a text content of each of the characters; performing a character recognition on the text content of each of the characters, to determine character attribute information of each of the characters; obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, where the speakers are pre-stored pronunciation object having the character attribute information; and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information. These improve pronunciation diversities of different characters in the synthesized voices, improve an audience's discrimination between different characters in the synthesized voices, and thereby improve experience of a user.

Type: Grant

Filed: September 10, 2019

Date of Patent: March 7, 2023

Inventor: Jie Yang
Integrating overlaid textual digital content into displayed data via graphics processing circuitry using a frame buffer

Patent number: 11586835

Abstract: An apparatus, method, and computer readable medium for generating and displaying a dynamic language translation overlay that include accessing a frame buffer of the GPU, analyzing, in the frame buffer of the GPU, a frame representing a section of a stream of displayed data that is being displayed by a display device, based on the analyzed frame, identifying a reference patch that includes an instruction to identify an object comprising original text, based on the instruction included in the reference patch, recognizing the original text, generating translated text, generating an overlay comprising an augmentation layer, the augmentation layer including the translated text, and overlaying the overlay, onto the displayed data such that the translated text is viewable while the original text is obscured from view.

Type: Grant

Filed: February 18, 2022

Date of Patent: February 21, 2023

Assignee: MOBEUS INDUSTRIES, INC.

Inventors: Dharmendra Etwaru, Michael R. Sutcliff
Text-driven video synthesis with phonetic dictionary

Patent number: 11587548

Abstract: Presented herein are novel approaches to synthesize video of the speech from text. In a training phase, embodiments build a phoneme-pose dictionary and train a generative neural network model using a generative adversarial network (GAN) to generate video from interpolated phoneme poses. In deployment, the trained generative neural network in conjunction with the phoneme-pose dictionary convert an input text into a video of a person speaking the words of the input text. Compared to audio-driven video generation approaches, the embodiments herein have a number of advantages: 1) they only need a fraction of the training data used by an audio-driven approach; 2) they are more flexible and not subject to vulnerability due to speaker variation; and 3) they significantly reduce the preprocessing, training, and inference times.

Type: Grant

Filed: April 2, 2021

Date of Patent: February 21, 2023

Assignee: Baidu USA LLC

Inventors: Sibo Zhang, Jiahong Yuan, Miao Liao, Liangjun Zhang
Synthetic speech processing

Patent number: 11580955

Abstract: A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.

Type: Grant

Filed: March 31, 2021

Date of Patent: February 14, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Yixiong Meng, Roberto Barra Chicote, Grzegorz Beringer, Zeya Chen, Jie Liang, James Garnet Droppo, Chia-Hao Chang, Oguz Hasan Elibol
Multilingual speech synthesis and cross-language voice cloning

Patent number: 11580952

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

Type: Grant

Filed: April 22, 2020

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection

Patent number: 11580970

Abstract: A method, an electronic device and computer readable medium for dialogue breakdown detection are provided. The method includes obtaining a verbal input from an audio sensor. The method also includes generating a reply to the verbal input. The method additionally includes identifying a local context from the verbal input and a global context from the verbal input, additional verbal inputs previously received by the audio sensor, and previous replies generated in response to the additional verbal inputs. The method further includes identifying a dialogue breakdown in response to determining that the reply does not correspond to the local context and the global context. In addition, the method includes generating sound corresponding to the reply through a speaker when the dialogue breakdown is not identified.

Type: Grant

Filed: March 23, 2020

Date of Patent: February 14, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: JongHo Shin, Alireza Dirafzoon, Aviral Anshu
Method and apparatus for generating speech

Patent number: 11580963

Abstract: A speech generation method and apparatus are disclosed. The speech generation method includes obtaining, by a processor, a linguistic feature and a prosodic feature from an input text, determining, by the processor, a first candidate speech element through a cost calculation and a Viterbi search based on the linguistic feature and the prosodic feature, generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element, and outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search.

Type: Grant

Filed: October 14, 2020

Date of Patent: February 14, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jangsu Lee, Hoshik Lee, Jehun Jeon

1 2 3 4 5 … next