Image To Speech Patents (Class 704/260)

System for generating voice in an ongoing call session based on artificial intelligent techniques

Patent number: 12367867

Abstract: A method for generating voice in an ongoing call session based on artificial intelligent techniques is provided. The method includes extracting a plurality of features from a voice input through an artificial neural network (ANN); identifying one or more lost audio frames within the voice input; predicting by the ANN, for each of the one or more lost audio frames, one or more features of the respective lost audio frame; and superposing the predicted features upon the voice input to generate an updated voice input.

Type: Grant

Filed: October 25, 2022

Date of Patent: July 22, 2025

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Sandeep Singh Spall, Tarun Gupta, Narang Lucky Manoharlal
Controllable, natural paralinguistics for text to speech synthesis

Patent number: 12361925

Abstract: A speech recognition module receives training data of speech and creates a representation for individual words, non-words, phonemes, and any combination. A set of speech processing detectors analyze the training data of speech from humans communicating. The set of speech processing detectors detect speech parameters that are indicative of paralinguistic effects on top of enunciated words, phonemes, and non-words in the audio stream. One or more machine learning models undergo supervised machine learning on their neural network to train on how to associate one or more mark-up markers with a textual representation, for each individual word, individual non-word, individual phoneme, and any combinations of these, that was enunciated with a particular paralinguistic effect. Each mark-up marker can correspond to its own paralinguistic effect.

Type: Grant

Filed: December 29, 2020

Date of Patent: July 15, 2025

Assignee: SRI International

Inventors: Harry Bratt, Colleen Richey, Maneesh Yadav
Speech interaction method, speech interaction system and storage medium

Patent number: 12361946

Abstract: A speech interaction method, a speech interaction system and a storage medium. The method includes: receiving interactive speech input by a user; determining an emotional tag corresponding to the interactive speech based on the interactive speech and interactive text corresponding to the interactive speech; determining, based on the emotional tag, a response text corresponding to the interactive text, and a first prosodic feature and a second prosodic feature corresponding to the response text. The first prosodic feature is used for characterizing the whole sentence prosodic feature of the response text, and the second prosodic feature is used for characterizing a local prosodic feature of each character in the response text; and generating and outputting a response speech corresponding to the interactive speech based on the response text, the first prosodic feature and the second prosodic feature.

Type: Grant

Filed: January 14, 2025

Date of Patent: July 15, 2025

Inventors: Huapeng Sima, Zhengkun Mei, Yiping Tang
Generating expressive speech audio from text data

Patent number: 12340788

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Grant

Filed: May 7, 2024

Date of Patent: June 24, 2025

Assignee: ELECTRONIC ARTS INC.

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
Method and apparatus for conditioning neural networks

Patent number: 12334056

Abstract: Broadly speaking, the present techniques provide methods for conditioning a neural network, which not only improve the generalizable performance of conditional neural networks, but also reduce model size and latency significantly. The resulting conditioned neural network is suitable for on-device deployment due to having a significantly lower model size, lower dynamic memory requirement, and lower latency.

Type: Grant

Filed: July 27, 2022

Date of Patent: June 17, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos
Systems and methods for supporting always-on applications featuring artificial intelligence models by populating parallel data domains

Patent number: 12333176

Abstract: Systems and methods for a novel architecture to support always-on applications and/or models suffering drift in their results. The system may comprise one or more servers that are configured to track the historical behavior of incoming request data for a model and/or redirect the request as needed using parallel data domains. The one or more servers may maintain and update a catalog of potential data domains that partitions the historically received data. One partition may comprise data output from a current model. Another partition may comprise detected outliers in the data. In the case of drift, outliers, and/or anomalies in the incoming data, the system may return an error signal that causes data to be duplicated into a new data domain.

Type: Grant

Filed: July 14, 2023

Date of Patent: June 17, 2025

Assignee: Capital One Services, LLC

Inventors: Trijeet Sethi, Muralikumar Venkatasubramaniam
Unstructured description-based chatbot development techniques

Patent number: 12334049

Abstract: Implementations are directed to receiving unstructured free-form natural language input, generating a chatbot based on the unstructured free-form natural language input and in response to receiving the unstructured free-form natural language input, and causing the chatbot to perform task(s) associated with an entity and on behalf of the user. In various implementations, the unstructured free-form natural language input conveys details of the task(s) to be performed, but does not define any corresponding dialog state map (e.g., does not define any dialog states or any dialog state transitions). Nonetheless, the unstructured free-form natural language input may be utilized to fine-tune and/or prime a machine learning model that is already capable of being utilized in conducting generalized conversations. As a result, the chatbot can be generated and deployed in a quick and efficient manner for performance of the task(s) on behalf of the user.

Type: Grant

Filed: December 5, 2022

Date of Patent: June 17, 2025

Assignee: GOOGLE LLC

Inventors: Asaf Aharoni, Eyal Segalis, Sasha Goldshtein, Ofer Ron, Yaniv Leviathan, Yoav Tzur
Information processing device and information processing method

Patent number: 12327491

Abstract: An information processing apparatus includes an audio information obtaining unit obtaining audio information on a user learning a first language, an analysis processing unit estimating a voice production condition of the user in accordance with analyzing the audio information, and a display processing unit displaying voice production condition images in animated display in accordance with a time-series change of the voice production condition. The display processing unit executes processing to superimpose to display a first feature point and a second feature point on the voice production condition images. The first feature point is identified in accordance with the voice production condition observed when a first sound in the first language is pronounced. The second feature point is identified in accordance with the voice production condition observed when a second sound similar to the first sound is pronounced in a second language different from the first language.

Type: Grant

Filed: May 8, 2023

Date of Patent: June 10, 2025

Inventors: Nic Schumann, Daniel Brenners, Bryan Ma, Mateus Rezende, Justin Chen
Phrase-based end-to-end text-to-speech (TTS) synthesis

Patent number: 12322374

Abstract: The present disclosure provides methods and apparatuses for phrase-based end-to-end text-to-speech (TTS) synthesis. A text may be obtained. A target phrase in the text may be identified. A phrase context of the target phrase may be determined. An acoustic feature corresponding to the target phrase may be generated based at least on the target phrase and the phrase context. A speech waveform corresponding to the target phrase may be generated based on the acoustic feature.

Type: Grant

Filed: March 19, 2021

Date of Patent: June 3, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ran Zhang, Jian Luan, Yahuan Cong
Music/video messaging system and method

Patent number: 12323377

Abstract: Device(s) and computer program products for creating custom music/video messages to facilitate and/or improve social interaction. The created music/video messages include at least portions of: music, video, pictures, slideshows, and/or text. The music/video messages enable feelings or emotions to be communicated by the user of the device to one or more recipient device(s).

Type: Grant

Filed: July 7, 2023

Date of Patent: June 3, 2025

Assignee: Ameritech Solutions, Inc.

Inventors: Nader Asghari Kamrani, Kamran Asghari Kamrani
Text-to-speech and speech recognition for noisy environments

Patent number: 12315490

Abstract: The present disclosure relates generally to speech processing. Humans change their speech patterns in noisy environments. The systems and devices described herein can compensate for noisy environments to be more human-like. Thus, the configurations and implementations herein can determine a sound profile for the sound environment where the user is listening. Based on the sound profile, the devices can determine a transform to apply to output speech from the device. This transform is applied to the wake word, speech recognition, and to the output speech to compensate for the noise level of the environment by mimicking the Lombard effect.

Type: Grant

Filed: December 30, 2021

Date of Patent: May 27, 2025

Assignee: Spotify AB

Inventors: Daniel Bromand, Björn Erik Roth, Kåre Sjölander
Methods and systems of content integration for generative artificial intelligence

Patent number: 12314300

Abstract: Systems and methods are provided for a device to obtain a query, such as from a user. The query is vectorized to obtain a numerical representation of the query and provided to a vector database to find the nearest vectors corresponding to most relevant context, such as for a particular domain or subject matter. The query, query vector, and context vectors, and optionally past query history and past query responses, are provided to an artificial intelligence, such as a large language model (LLM), to receive a response to the query without providing the context to the LLM.

Type: Grant

Filed: December 28, 2023

Date of Patent: May 27, 2025

Assignee: Open Text Inc.

Inventors: Vikash Sharma, Laxman Singh Chauhan, Raja Parshotam Lalwani
Speech prosody prediction in video games

Patent number: 12296265

Abstract: This specification describes a computer-implemented method of generating context-dependent speech audio in a video game. The method comprises obtaining contextual information relating to a state of the video game. The contextual information is inputted into a prosody prediction module. The prosody prediction module comprises a trained machine learning model which is configured to generate predicted prosodic features based on the contextual information. Input data comprising the predicted prosodic features and speech content data associated with the state of the video game is inputted into a speech audio generation module. An encoded representation of the speech content data dependent on the predicted prosodic features is generated using one or more encoders of the speech audio generation module. Context-dependent speech audio is generated, based on the encoded representation, using a decoder of the speech audio generation module.

Type: Grant

Filed: January 9, 2024

Date of Patent: May 13, 2025

Assignee: ELECTRONIC ARTS INC.

Inventors: Kilol Gupta, Zahra Shakeri, Gordon Durity, Mohsen Sardari, Harold Chaput, Navid Aghdaie
Method for training a neural network to describe an environment on the basis of an audio signal, and the corresponding neural network

Patent number: 12288567

Abstract: A neural network, a system using this neural network and a method for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the method including: obtaining audio and image training signals of a scene showing an environment with objects generating sounds, obtaining a target description of the environment seen on the image training signal, inputting the audio training signal to the neural network so that the neural network outputs a training description of the environment, and comparing the target description of the environment with the training description of the environment.

Type: Grant

Filed: January 10, 2020

Date of Patent: April 29, 2025

Assignees: TOYOTA JIDOSHA KABUSHIKI KAISHA, ETH ZÜRICH

Inventors: Wim Abbeloos, Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool
Text-to-speech (TTS) processing

Patent number: 12272350

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: May 15, 2024

Date of Patent: April 8, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Adaptive, individualized, and contextualized text-to-speech systems and methods

Patent number: 12266340

Abstract: Systems and methods receive, in real-time from a user via a user device, input audio data comprising communication element(s) and trained model(s) are applied thereto to categorize the communication element(s), the categorizing comprising assigning a contextual category to a communication element. Text is generated that includes a response to the communication element(s), the response including individualized and contextualized qualities predicted to provide an optimal outcome based on (i) the assigned contextual category and (ii) the user. Text-to-speech processing of the text is implemented to produce an audio output comprising (a) the response and (b) a speech pattern predicted to facilitate the optimal outcome. The audio output is provided to the user via the user device, and based thereon a user's reaction is measured according to a quantifiable quality score that is used to modify future iterations of text-to-speech processing to provide future audio output(s) including a revised speech pattern.

Type: Grant

Filed: December 7, 2022

Date of Patent: April 1, 2025

Assignee: TRUIST BANK

Inventor: Bjorn Austraat
Digital delegate computer system architecture for improved multi-agent large language model (LLM) implementations

Patent number: 12260260

Abstract: Systems, apparatus, methods, and articles of manufacture for digital delegate computer system architecture that provides for improved multi-agent LLM implementations.

Type: Grant

Filed: March 29, 2024

Date of Patent: March 25, 2025

Assignee: The Travelers Indemnity Company

Inventors: Matthew J. Gorman, Vincent E. Haines, Girish A. Modgil, Brad E. Gawron
Routing for chatbots

Patent number: 12249314

Abstract: Techniques are described for invoking and switching between chatbots of a chatbot system. In some embodiments, the chatbot system is capable of routing an utterance received while a user is already interacting with a first chatbot in the chatbot system. For instance, the chatbot system may identify a second chatbot based on determining that (i) such an utterance is an invalid input to the first chatbot or (ii) that the first chatbot is attempting to route the utterance to a destination associated with the first chatbot. Identifying the second chatbot can involve computing, using a predictive model, separate confidence scores for the first chatbot and the second chatbot, and then determining that a confidence score for the second chatbot satisfies one or more confidence score thresholds. The utterance is then routed to the second chatbot based on the identifying of the second chatbot.

Type: Grant

Filed: April 19, 2023

Date of Patent: March 11, 2025

Assignee: Oracle International Corporation

Inventors: Vishal Vishnoi, Xin Xu, Srinivasa Phani Kumar Gadde, Fen Wang, Muruganantham Chinnananchi, Manish Parekh, Stephen Andrew McRitchie, Jae Min John, Crystal C. Pan, Gautam Singaraju, Saba Amsalu Teserra
Predicting video edits from text-based conversations using neural networks

Patent number: 12238451

Abstract: Embodiments are disclosed for predicting, using neural networks, editing operations for application to a video sequence based on processing conversational messages by a video editing system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including a video sequence and text sentences, the text sentences describing a modification to the video sequence, mapping, by a first neural network content of the text sentences describing the modification to the video sequence to a candidate editing operation, processing, by a second neural network, the video sequence to predict parameter values for the candidate editing operation, and generating a modified video sequence by applying the candidate editing operation with the predicted parameter values to the video sequence.

Type: Grant

Filed: November 14, 2022

Date of Patent: February 25, 2025

Assignee: Adobe Inc.

Inventors: Uttaran Bhattacharya, Gang Wu, Viswanathan Swaminathan, Stefano Petrangeli
Post-parse semantic analyzer

Patent number: 12229516

Abstract: A data processing system receives a natural language communication including an ordered sequence of a plurality of word spellings of a natural human language. A processor of the data processing system parses the plurality of word spelling of the natural language communication utilizing constraint-based parsing to identify a plurality of satisfied parsing constraints. The processor performs semantic analysis based on the plurality of satisfied parsing constraints. Performing semantic analysis includes obtaining at least mid-level comprehension of the natural language communication by identifying in the natural language communication utilizing constraints at least one of the following set: a clausal structure within the natural language communication, a sentence structure of a sentence in the natural language communication, an implied topic of the natural language communication, and a classical linguistic role in the natural language communication.

Type: Grant

Filed: June 19, 2023

Date of Patent: February 18, 2025

Inventor: Thomas A. Visel
Wearable multimedia device and cloud computing platform with application ecosystem

Patent number: 12225284

Abstract: Systems, methods, devices and non-transitory, computer-readable storage mediums are disclosed for a wearable multimedia device and cloud computing platform with an application ecosystem for processing multimedia data captured by the wearable multimedia device. In an embodiment, a method comprises: receiving, by one or more processors of a cloud computing platform, context data from a wearable multimedia device, the wearable multimedia device including at least one data capture device for capturing the context data; creating a data processing pipeline with one or more applications based on one or more characteristics of the context data and a user request; processing the context data through the data processing pipeline; and sending output of the data processing pipeline to the wearable multimedia device or other device for presentation of the output.

Type: Grant

Filed: March 27, 2023

Date of Patent: February 11, 2025

Assignee: Humane, Inc.

Inventors: Imran A. Chaudhri, Bethany Bongiorno, Shahzad Chaudhri
Voice conversion apparatus, voice conversion learning apparatus, image generation apparatus, image generation learning apparatus, voice conversion method, voice conversion learning method, image generation method, image generation learning method, and computer program

Patent number: 12217755

Abstract: A voice conversion device is provided with a linguistic information extraction unit that extracts linguistic information corresponding to utterance content from a conversion source voice signal, an appearance feature extraction unit that extracts appearance features expressing features related to the look of a person's face from a captured image of the person, and a converted voice generation unit that generates a converted voice on a basis of the linguistic information and the appearance features.

Type: Grant

Filed: September 4, 2020

Date of Patent: February 4, 2025

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Hirokazu Kameoka, Ko Tanaka, Yasunori Oishi, Takuhiro Kaneko, Aaron Valero Puche
Parallel processing of hierarchical text

Patent number: 12217002

Abstract: Apparatuses, systems, and techniques to parse textual data using parallel computing devices. In at least one embodiment, text is parsed by a plurality of parallel processing units using a finite state machine and logical stack to convert the text to a tree data structure. Data is extracted from the tree by the plurality of parallel processors and stored.

Type: Grant

Filed: May 11, 2022

Date of Patent: February 4, 2025

Assignee: NVIDIA Corporation

Inventors: Elias Stehle, Gregory Michael Kimball
Systems and methods for aggregating content

Patent number: 12197812

Abstract: Systems, methods, and devices may generate speech files that reflect emotion of text-based content. An example process includes selecting a first text from a first source of text content and selecting a second text from a second source of text content. The first text and the second text are aggregated into an aggregated text, and the aggregated text includes a first emotion associated with content of the first text. The aggregated text also includes a second emotion associated with content of the second text. The aggregated text is converted into a speech stored in an audio file. The speech replicates human expression of the first emotion and of the second emotion.

Type: Grant

Filed: April 13, 2023

Date of Patent: January 14, 2025

Assignee: DISH Technologies L.L.C.

Inventor: John C. Calef, III
Method for identifying a word corresponding to a target word in text information

Patent number: 12197862

Abstract: Disclosed is a method for identifying a word corresponding to a target word in text information, which is performed by one or more processors of a computing device. The method may include: determining a target word; determining a threshold for an edit distance associated with the target word; determining a word of which the edit distance from the target word among words included in text information is equal to or less than the threshold; and identifying the word corresponding to the target word based on the determined word.

Type: Grant

Filed: November 3, 2022

Date of Patent: January 14, 2025

Assignee: ActionPower Corp.

Inventor: Seongmin Park
Electronic apparatus and controlling method thereof

Patent number: 12190872

Abstract: An electronic apparatus includes: a memory storing one or more commands; and a processor connected to the memory and configured to control the electronic apparatus, wherein the processor is configured, by executing the one or more instructions, to: identify a first intention word and a first target word from first speech, acquire second speech received after the first speech based on at least one of the identified first intention word or the identified first target word not matching a word stored in the memory, acquire a similarity between the first speech and the second speech, and acquire response information based on the first speech and the second speech based on the similarity being a threshold value or more.

Type: Grant

Filed: April 5, 2022

Date of Patent: January 7, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ali Ahmad Adel Al Massaeed, Abdelrahman Mustafa Mahmoud Yaseen, Shrouq Walid Jamil Sabah
System and method for generating, triggering, and playing audio cues in real time using a personal audio device

Patent number: 12190018

Abstract: A system and method for generating, triggering and playing a sequence of audio files with cues for delivering a presentation for a presenter using a personal audio device coupled to a computing device. The system comprising the comprising a computer device that is coupled to a presentation data analysis server through a network. The method includes (i) generating a sequence of audio files with cues for delivering a presentation, (ii) triggering playing an audio file from the sequence of audio files, and (iii) playing the sequence of audio files one by one, on the computing device, using the personal audio device coupled to a computing device to enable the presenter to recall and speak the content based on the sequence of the audio files.

Type: Grant

Filed: January 4, 2021

Date of Patent: January 7, 2025

Inventor: Arjun Karthik Bala
Method and system for generating synthetic speech for text through user interface

Patent number: 12183320

Abstract: A method for generating synthetic speech for text through a user interface is provided. The method may include receiving one or more sentences, determining a speech style characteristic for the received one or more sentences, and outputting a synthetic speech for the one or more sentences that reflects the determined speech style characteristic. The one or more sentences and the determined speech style characteristic may be inputted to an artificial neural network text-to-speech synthesis model and the synthetic speech may be generated based on the speech data outputted from the artificial neural network text-to-speech synthesis model.

Type: Grant

Filed: January 20, 2021

Date of Patent: December 31, 2024

Assignee: NEOSAPIENCE, INC.

Inventors: Taesu Kim, Younggun Lee
Intent identification for dialogue support

Patent number: 12183340

Abstract: Systems and methods of intent identification for customized dialogue support in virtual environments are provided. Dialogue intent models stored in memory may each specify one or more intents each associated with a dialogue filter. Input data may be received from a user device of a user. Such input data may be captured during an interactive session of an interactive title that provides a virtual environment to the user device. The input data may be analyzed based on the intent models in response to a detected dialogue trigger and may be determined to correspond to one of the stored intents. The dialogue filter associated with the determined intent may be applied to a plurality of available dialogue outputs associated with the detected dialogue filter. A customized dialogue output may be generated in accordance with a filtered subset of the available dialogue outputs.

Type: Grant

Filed: July 21, 2022

Date of Patent: December 31, 2024

Assignees: SONY INTERACTIVE ENTERTAINMENT LLC, SONY INTERACTIVE ENTERTAINMENT INC.

Inventors: Benaisha Patel, Alessandra Luizello, Olga Rudi
Systems and methods for computerized interactive skill training

Patent number: 12175394

Abstract: The present invention is directed to interactive training, and in particular, to methods and systems for computerized interactive skill training. An example embodiment provides a method and system for providing skill training using a computerized system. The computerized system receives a selection of a first training subject. Several related training components can be invoked, such as reading, watching, performing, and/or reviewing components. In addition, a scored challenge session is provided, wherein a training challenge is provided to a user via a terminal, optionally in video form.

Type: Grant

Filed: April 12, 2023

Date of Patent: December 24, 2024

Assignee: Breakthrough PerformanceTech, LLC

Inventors: Martin L. Cohen, Edward G. Brown
Phoneme-based text transcription searching

Patent number: 12165647

Abstract: A computer-implemented method is disclosed. A search query of a text transcription is received. The search query includes a word or words having a specified spelling. A sequence of search phonemes corresponding to the specified spelling is generated. A sequence of transcript phonemes corresponding to the text transcription is generated from the text transcription. A search alignment in which the sequence of search phonemes is aligned to a transcript phoneme fragment is generated. Based at least on the search alignment having a quality score exceeding a quality score threshold, the transcript phoneme fragment and an associated portion of the text transcription is determined to result from an utterance of the specified spelling in an audio session corresponding to the text transcription. A search result indicating that the transcript phoneme fragment and the associated portion of the text transcription is determined to have resulted from the utterance is output.

Type: Grant

Filed: May 27, 2022

Date of Patent: December 10, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventor: Yuchen Li
Electronic apparatus and method for controlling thereof

Patent number: 12154563

Abstract: An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.

Type: Grant

Filed: February 24, 2022

Date of Patent: November 26, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jonghoon Jeong, Hosang Sung, Doohwa Hong, Kyoungbo Min, Eunmi Oh, Kihyun Choo
Methods and systems for detecting audio output of associated device

Patent number: 12149781

Abstract: Systems and methods for determining whether a first electronic device detects a media item that is to be output by a second electronic device is described herein. In some embodiments, an individual may request, using a first electronic device, that a media item be played on a second electronic device. The backend system may send first audio data representing a first response to the first electronic device, along with instructions to delay outputting the first response, as well as to continue sending audio data of additional audio captured thereby. The backend system may also send second audio data representing a second response to the second electronic device along with the media item. Text data may be generated representing the captured audio, which may then be compared with text data representing the second response to determine whether or not they match.

Type: Grant

Filed: September 16, 2022

Date of Patent: November 19, 2024

Assignee: Amazon Technologies, Inc.

Inventor: Dennis Francis Cwik
Application builder

Patent number: 12147646

Abstract: Disclosed herein are system, method, and computer program product embodiments for unifying graphical user interface (GUI) displays across different device types. In an embodiment, a unification system may convert various GUI view appearing on, for example, a desktop device into a GUI view on a mobile device. Both devices may be accessing the same application and/or may use a cloud computing platform to access the application. The unification system may aid in reproducing GUI modifications performed on one user device onto other user devices. In this manner, the unification system may maintain a consistent look-and-feel for a user across different computing device type.

Type: Grant

Filed: December 14, 2018

Date of Patent: November 19, 2024

Assignee: Salesforce, Inc.

Inventors: Eric Jacobson, Michael Gonzalez, Wayne Cho, Adheip Varadarajan, John Vollmer, Benjamin Snyder
Emotion-based text to speech

Patent number: 12142257

Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.

Type: Grant

Filed: February 8, 2022

Date of Patent: November 12, 2024

Assignee: SNAP INC.

Inventors: Liron Harazi, Jacob Assa, Alan Bekker
Information processing device, information processing method, and program for generating synthesized audio content from text when audio content is not reproducible

Patent number: 12142256

Abstract: An information processing device according to embodiments includes a communication unit configured to receive audio data of content and text data corresponding to the audio data, an audio data reproduction unit configured to perform reproduction of the audio data, a text data reproduction unit configured to perform the reproduction by audio synthesis of the text data, and a controller that controls the reproduction of the audio data or the text data. The controller causes the text data reproduction unit to perform the reproduction of the text data when the audio data reproduction unit is unable to perform the reproduction of the audio data.

Type: Grant

Filed: October 20, 2023

Date of Patent: November 12, 2024

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor: Jun Tsukamoto
Method and apparatus for detecting face, computer device and computer-readable storage medium

Patent number: 12136259

Abstract: A method for training a neural network, including: determining a neural network; training the neural network at a first learning rate according to a first optimization mode, where the first learning rate is updated each time the neural network is trained; mapping the first learning rate of the first optimization mode to a second learning rate of a second optimization mode in the same vector space; determining the second learning rate satisfies a preset update condition; and continuing to train the neural network at the second learning rate according to the second optimization mode.

Type: Grant

Filed: August 20, 2020

Date of Patent: November 5, 2024

Assignee: BIGO TECHNOLOGY PTE. LTD.

Inventors: Wei Xiang, Chao Pei
Speech processing system and a method of processing a speech signal

Patent number: 12100386

Abstract: A speech processing system for generating translated speech, the system comprising: an input for receiving a first speech signal comprising a second language; an output for outputting a second speech signal comprising a first language; and a processor configured to: generate a first text signal from a segment of the first speech signal, the first text signal comprising the second language; generate a second text signal from the first text signal, the second text signal comprising the first language; extract a plurality of first feature vectors from the segment of the first speech signal, wherein the first feature vectors comprise information relating to audio data corresponding to the segment of the first speech signal; generate a speaker vector using a first trained algorithm taking one or more of the first feature vectors as input, wherein the speaker vector represents a set of features corresponding to a speaker; generate a second speech signal segment using a second trained algorithm taking information re

Type: Grant

Filed: March 13, 2019

Date of Patent: September 24, 2024

Assignee: Papercup Technologies Limited

Inventor: Jiameng Gao
Spatio-temporal navigation of content of different types with synchronous displacement of playback position indicators

Patent number: 12096068

Abstract: A method and a device for enabling a spatio-temporal navigation of content. In response to a request, from a client, for a content comprising a first content of a first type, the device transmits to the client: the first content; a second content of a second type generated from the first content; synchronization metadata associating each word of one of the contents with a time marker to the other content; and a script, for execution at the client, configured to re-establish at least one of the contents and offer a navigation of said content depending on the type of content re-established by using at least one among: the first content, the second content, and the synchronization metadata.

Type: Grant

Filed: December 13, 2019

Date of Patent: September 17, 2024

Assignee: ORANGE

Inventors: Fabrice Boudin, Frederic Herledan
User-customized synthetic voice

Patent number: 12087270

Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.

Type: Grant

Filed: September 29, 2022

Date of Patent: September 10, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Sebastian Dariusz Cygert, Daniel Korzekwa, Kamil Pokora, Piotr Tadeusz Bilinski, Kayoko Yanagisawa, Abdelhamid Ezzerg, Thomas Edward Merritt, Raghu Ram Sreepada Srinivas, Nikhil Sharma
Multilingual speech synthesis and cross-language voice cloning

Patent number: 12087273

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

Type: Grant

Filed: January 30, 2023

Date of Patent: September 10, 2024

Assignee: Google LLC

Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
Training speech synthesis to generate distinct speech sounds

Patent number: 12087272

Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

Type: Grant

Filed: December 13, 2019

Date of Patent: September 10, 2024

Assignee: Google LLC

Inventors: Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yu Zhang
Autonomously acting robot that generates and displays an eye image of the autonomously acting robot

Patent number: 12076850

Abstract: A monitor is installed in an eye of a robot, and an eye image is displayed on the monitor. The robot extracts a feature quantity of an eye of a user from a filmed image of the user. The feature quantity of the eye of the user is reflected in the eye image. For example, a feature quantity is a size of a pupillary region and a pupil image, and a form of an eyelid image. Also, a blinking frequency or the like may also be reflected as a feature quantity. Familiarity with respect to each user is set, and which user's feature quantity is to be reflected may be determined in accordance with the familiarity.

Type: Grant

Filed: April 7, 2023

Date of Patent: September 3, 2024

Assignee: GROOVE X, INC.

Inventor: Kaname Hayashi
Method and apparatus for text-based speech synthesis

Patent number: 12080270

Abstract: An apparatus for synthesizing speech according to an embodiment is a computing apparatus that includes one or more processors and a memory storing one or more programs executed by the one or more processors. The apparatus for synthesizing speech includes a pre-processing module that marks a preset classification symbol on each of unit texts input; and a speech synthesis module that receives each unit text marked with the classification symbol and synthesizes speech uttering the unit text based on the input unit text.

Type: Grant

Filed: December 22, 2020

Date of Patent: September 3, 2024

Assignee: DEEPBRAIN AI INC.

Inventors: Gyeongsu Chae, Dalhyun Kim
Attention-based clockwork hierarchical variational encoder

Patent number: 12080272

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

Type: Grant

Filed: December 10, 2019

Date of Patent: September 3, 2024

Assignee: Google LLC

Inventors: Robert Clark, Chun-an Chan, Vincent Wan
Enhancing machine translation of handwritten documents

Patent number: 12080089

Abstract: A computer-implemented method, a computer system and a computer program product enhance machine translation of a document. The method includes capturing an image of the document. The document includes a plurality of characters that are arranged in a character layout. The method also includes classifying the image by a document type based on the character layout. The method further includes determining a strategy for an intelligent character recognition (ICR) algorithm with the image based on the character layout of the image. Lastly, the method includes generating a translated document by applying the intelligent character recognition (ICR) algorithm to the plurality of characters in the image using the strategy. The translated document includes a plurality of translated characters that are arranged in the character layout.

Type: Grant

Filed: December 8, 2021

Date of Patent: September 3, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Barton Wayne Emanuel, Nadiya Kochura, Su Liu, Tetsuya Shimada
Access to multiple virtual assistants

Patent number: 12073838

Abstract: A speech-processing system may provide access to multiple virtual assistants via one or more voice-controlled devices. Each assistant may leverage language processing and language generation features of the speech-processing system, while handling different commands and/or providing access to different back applications. Each assistant may be associated with its own voice and/or speech style, and thus be perceived as having a particular “personality.” Different assistants may be available for use with a particular voice-controlled device based on time, location, the particular user, etc. In some situations, language processing may be improved by leveraging data such as intent data, grammars, lexicons, entities, etc., associated with assistants available for use with the particular voice-controlled device.

Type: Grant

Filed: December 7, 2020

Date of Patent: August 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Naveen Bobbili, David Henry, Mark Vincent Mattione, Richard Du, Jyoti Chhabra
Homomorphic encryption of communications involving voice-enabled devices in a distributed computing environment

Patent number: 12067130

Abstract: The disclosed exemplary embodiments include computer-implemented systems, devices, apparatuses, and processes that maintain data confidentiality in communications involving voice-enabled devices in a distributed computing environment using homomorphic encryption. By way of example, an apparatus may receive encrypted command data from a computing system, decrypt the encrypted command data using a homomorphic private key, and perform operations that associate the decrypted command data with a request for an element of data. Using a public cryptographic key associated with a device, the apparatus generate an encrypted response that includes the requested data element, and transmit the encrypted response to the device. The device may decrypt the encrypted response using a private cryptographic key and to perform operations that present first audio content representative of the requested data element through an acoustic interface.

Type: Grant

Filed: November 12, 2021

Date of Patent: August 20, 2024

Assignee: The Toronto-Dominion Bank

Inventors: Alexey Shpurov, Milos Dunjic, Brian Andrew Lam
Method of registering attribute in speech synthesis model, apparatus of registering attribute in speech synthesis model, electronic device, and medium

Patent number: 12062357

Abstract: A method of registering an attribute in a speech synthesis model, an apparatus of registering an attribute in a speech synthesis model, an electronic device, and a medium are provided, which relate to a field of an artificial intelligence technology such as a deep learning and intelligent speech technology. The method includes: acquiring a plurality of data associated with an attribute to be registered; and registering the attribute in the speech synthesis model by using the plurality of data associated with the attribute, wherein the speech synthesis model is trained in advance by using a training data in a training data set.

Type: Grant

Filed: November 16, 2021

Date of Patent: August 13, 2024

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Wenfu Wang, Xilei Wang, Tao Sun, Han Yuan, Zhengkun Gao, Lei Jia
Authoring content for a conversational bot

Patent number: 12057106

Abstract: A method of authorizing content for use, e.g., in association with a conversational bot. The method begins by configuring a conversational bot using a machine learning model trained to classify utterances into topics. Utterances that are not recognized by the machine learning model (e.g., according to some configurable threshold) are then identified. Using a clustering algorithm, one or more of the identified utterances are then processed into a grouping. Information identifying a topic associated with the grouping is then received and, in response, the machine learning model is updated to include the topic.

Type: Grant

Filed: March 15, 2023

Date of Patent: August 6, 2024

Assignee: Drift.com, Inc.

Inventors: Maria C. Moya, Natalie Duerr, Jeffrey D. Orkin, Jane S. Taschman, Carolina M. Caprile, Christopher M. Ward

1 2 3 4 5 … next