Image To Speech Patents (Class 704/260)
  • Patent number: 12367867
    Abstract: A method for generating voice in an ongoing call session based on artificial intelligent techniques is provided. The method includes extracting a plurality of features from a voice input through an artificial neural network (ANN); identifying one or more lost audio frames within the voice input; predicting by the ANN, for each of the one or more lost audio frames, one or more features of the respective lost audio frame; and superposing the predicted features upon the voice input to generate an updated voice input.
    Type: Grant
    Filed: October 25, 2022
    Date of Patent: July 22, 2025
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sandeep Singh Spall, Tarun Gupta, Narang Lucky Manoharlal
  • Patent number: 12361925
    Abstract: A speech recognition module receives training data of speech and creates a representation for individual words, non-words, phonemes, and any combination. A set of speech processing detectors analyze the training data of speech from humans communicating. The set of speech processing detectors detect speech parameters that are indicative of paralinguistic effects on top of enunciated words, phonemes, and non-words in the audio stream. One or more machine learning models undergo supervised machine learning on their neural network to train on how to associate one or more mark-up markers with a textual representation, for each individual word, individual non-word, individual phoneme, and any combinations of these, that was enunciated with a particular paralinguistic effect. Each mark-up marker can correspond to its own paralinguistic effect.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: July 15, 2025
    Assignee: SRI International
    Inventors: Harry Bratt, Colleen Richey, Maneesh Yadav
  • Patent number: 12361946
    Abstract: A speech interaction method, a speech interaction system and a storage medium. The method includes: receiving interactive speech input by a user; determining an emotional tag corresponding to the interactive speech based on the interactive speech and interactive text corresponding to the interactive speech; determining, based on the emotional tag, a response text corresponding to the interactive text, and a first prosodic feature and a second prosodic feature corresponding to the response text. The first prosodic feature is used for characterizing the whole sentence prosodic feature of the response text, and the second prosodic feature is used for characterizing a local prosodic feature of each character in the response text; and generating and outputting a response speech corresponding to the interactive speech based on the response text, the first prosodic feature and the second prosodic feature.
    Type: Grant
    Filed: January 14, 2025
    Date of Patent: July 15, 2025
    Inventors: Huapeng Sima, Zhengkun Mei, Yiping Tang
  • Patent number: 12340788
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Grant
    Filed: May 7, 2024
    Date of Patent: June 24, 2025
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Patent number: 12334056
    Abstract: Broadly speaking, the present techniques provide methods for conditioning a neural network, which not only improve the generalizable performance of conditional neural networks, but also reduce model size and latency significantly. The resulting conditioned neural network is suitable for on-device deployment due to having a significantly lower model size, lower dynamic memory requirement, and lower latency.
    Type: Grant
    Filed: July 27, 2022
    Date of Patent: June 17, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos
  • Patent number: 12333176
    Abstract: Systems and methods for a novel architecture to support always-on applications and/or models suffering drift in their results. The system may comprise one or more servers that are configured to track the historical behavior of incoming request data for a model and/or redirect the request as needed using parallel data domains. The one or more servers may maintain and update a catalog of potential data domains that partitions the historically received data. One partition may comprise data output from a current model. Another partition may comprise detected outliers in the data. In the case of drift, outliers, and/or anomalies in the incoming data, the system may return an error signal that causes data to be duplicated into a new data domain.
    Type: Grant
    Filed: July 14, 2023
    Date of Patent: June 17, 2025
    Assignee: Capital One Services, LLC
    Inventors: Trijeet Sethi, Muralikumar Venkatasubramaniam
  • Patent number: 12334049
    Abstract: Implementations are directed to receiving unstructured free-form natural language input, generating a chatbot based on the unstructured free-form natural language input and in response to receiving the unstructured free-form natural language input, and causing the chatbot to perform task(s) associated with an entity and on behalf of the user. In various implementations, the unstructured free-form natural language input conveys details of the task(s) to be performed, but does not define any corresponding dialog state map (e.g., does not define any dialog states or any dialog state transitions). Nonetheless, the unstructured free-form natural language input may be utilized to fine-tune and/or prime a machine learning model that is already capable of being utilized in conducting generalized conversations. As a result, the chatbot can be generated and deployed in a quick and efficient manner for performance of the task(s) on behalf of the user.
    Type: Grant
    Filed: December 5, 2022
    Date of Patent: June 17, 2025
    Assignee: GOOGLE LLC
    Inventors: Asaf Aharoni, Eyal Segalis, Sasha Goldshtein, Ofer Ron, Yaniv Leviathan, Yoav Tzur
  • Patent number: 12327491
    Abstract: An information processing apparatus includes an audio information obtaining unit obtaining audio information on a user learning a first language, an analysis processing unit estimating a voice production condition of the user in accordance with analyzing the audio information, and a display processing unit displaying voice production condition images in animated display in accordance with a time-series change of the voice production condition. The display processing unit executes processing to superimpose to display a first feature point and a second feature point on the voice production condition images. The first feature point is identified in accordance with the voice production condition observed when a first sound in the first language is pronounced. The second feature point is identified in accordance with the voice production condition observed when a second sound similar to the first sound is pronounced in a second language different from the first language.
    Type: Grant
    Filed: May 8, 2023
    Date of Patent: June 10, 2025
    Inventors: Nic Schumann, Daniel Brenners, Bryan Ma, Mateus Rezende, Justin Chen
  • Patent number: 12322374
    Abstract: The present disclosure provides methods and apparatuses for phrase-based end-to-end text-to-speech (TTS) synthesis. A text may be obtained. A target phrase in the text may be identified. A phrase context of the target phrase may be determined. An acoustic feature corresponding to the target phrase may be generated based at least on the target phrase and the phrase context. A speech waveform corresponding to the target phrase may be generated based on the acoustic feature.
    Type: Grant
    Filed: March 19, 2021
    Date of Patent: June 3, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ran Zhang, Jian Luan, Yahuan Cong
  • Patent number: 12323377
    Abstract: Device(s) and computer program products for creating custom music/video messages to facilitate and/or improve social interaction. The created music/video messages include at least portions of: music, video, pictures, slideshows, and/or text. The music/video messages enable feelings or emotions to be communicated by the user of the device to one or more recipient device(s).
    Type: Grant
    Filed: July 7, 2023
    Date of Patent: June 3, 2025
    Assignee: Ameritech Solutions, Inc.
    Inventors: Nader Asghari Kamrani, Kamran Asghari Kamrani
  • Patent number: 12315490
    Abstract: The present disclosure relates generally to speech processing. Humans change their speech patterns in noisy environments. The systems and devices described herein can compensate for noisy environments to be more human-like. Thus, the configurations and implementations herein can determine a sound profile for the sound environment where the user is listening. Based on the sound profile, the devices can determine a transform to apply to output speech from the device. This transform is applied to the wake word, speech recognition, and to the output speech to compensate for the noise level of the environment by mimicking the Lombard effect.
    Type: Grant
    Filed: December 30, 2021
    Date of Patent: May 27, 2025
    Assignee: Spotify AB
    Inventors: Daniel Bromand, Björn Erik Roth, Kåre Sjölander
  • Patent number: 12314300
    Abstract: Systems and methods are provided for a device to obtain a query, such as from a user. The query is vectorized to obtain a numerical representation of the query and provided to a vector database to find the nearest vectors corresponding to most relevant context, such as for a particular domain or subject matter. The query, query vector, and context vectors, and optionally past query history and past query responses, are provided to an artificial intelligence, such as a large language model (LLM), to receive a response to the query without providing the context to the LLM.
    Type: Grant
    Filed: December 28, 2023
    Date of Patent: May 27, 2025
    Assignee: Open Text Inc.
    Inventors: Vikash Sharma, Laxman Singh Chauhan, Raja Parshotam Lalwani
  • Patent number: 12296265
    Abstract: This specification describes a computer-implemented method of generating context-dependent speech audio in a video game. The method comprises obtaining contextual information relating to a state of the video game. The contextual information is inputted into a prosody prediction module. The prosody prediction module comprises a trained machine learning model which is configured to generate predicted prosodic features based on the contextual information. Input data comprising the predicted prosodic features and speech content data associated with the state of the video game is inputted into a speech audio generation module. An encoded representation of the speech content data dependent on the predicted prosodic features is generated using one or more encoders of the speech audio generation module. Context-dependent speech audio is generated, based on the encoded representation, using a decoder of the speech audio generation module.
    Type: Grant
    Filed: January 9, 2024
    Date of Patent: May 13, 2025
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Kilol Gupta, Zahra Shakeri, Gordon Durity, Mohsen Sardari, Harold Chaput, Navid Aghdaie
  • Patent number: 12288567
    Abstract: A neural network, a system using this neural network and a method for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the method including: obtaining audio and image training signals of a scene showing an environment with objects generating sounds, obtaining a target description of the environment seen on the image training signal, inputting the audio training signal to the neural network so that the neural network outputs a training description of the environment, and comparing the target description of the environment with the training description of the environment.
    Type: Grant
    Filed: January 10, 2020
    Date of Patent: April 29, 2025
    Assignees: TOYOTA JIDOSHA KABUSHIKI KAISHA, ETH ZÜRICH
    Inventors: Wim Abbeloos, Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool
  • Patent number: 12272350
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: May 15, 2024
    Date of Patent: April 8, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 12266340
    Abstract: Systems and methods receive, in real-time from a user via a user device, input audio data comprising communication element(s) and trained model(s) are applied thereto to categorize the communication element(s), the categorizing comprising assigning a contextual category to a communication element. Text is generated that includes a response to the communication element(s), the response including individualized and contextualized qualities predicted to provide an optimal outcome based on (i) the assigned contextual category and (ii) the user. Text-to-speech processing of the text is implemented to produce an audio output comprising (a) the response and (b) a speech pattern predicted to facilitate the optimal outcome. The audio output is provided to the user via the user device, and based thereon a user's reaction is measured according to a quantifiable quality score that is used to modify future iterations of text-to-speech processing to provide future audio output(s) including a revised speech pattern.
    Type: Grant
    Filed: December 7, 2022
    Date of Patent: April 1, 2025
    Assignee: TRUIST BANK
    Inventor: Bjorn Austraat
  • Patent number: 12260260
    Abstract: Systems, apparatus, methods, and articles of manufacture for digital delegate computer system architecture that provides for improved multi-agent LLM implementations.
    Type: Grant
    Filed: March 29, 2024
    Date of Patent: March 25, 2025
    Assignee: The Travelers Indemnity Company
    Inventors: Matthew J. Gorman, Vincent E. Haines, Girish A. Modgil, Brad E. Gawron
  • Patent number: 12249314
    Abstract: Techniques are described for invoking and switching between chatbots of a chatbot system. In some embodiments, the chatbot system is capable of routing an utterance received while a user is already interacting with a first chatbot in the chatbot system. For instance, the chatbot system may identify a second chatbot based on determining that (i) such an utterance is an invalid input to the first chatbot or (ii) that the first chatbot is attempting to route the utterance to a destination associated with the first chatbot. Identifying the second chatbot can involve computing, using a predictive model, separate confidence scores for the first chatbot and the second chatbot, and then determining that a confidence score for the second chatbot satisfies one or more confidence score thresholds. The utterance is then routed to the second chatbot based on the identifying of the second chatbot.
    Type: Grant
    Filed: April 19, 2023
    Date of Patent: March 11, 2025
    Assignee: Oracle International Corporation
    Inventors: Vishal Vishnoi, Xin Xu, Srinivasa Phani Kumar Gadde, Fen Wang, Muruganantham Chinnananchi, Manish Parekh, Stephen Andrew McRitchie, Jae Min John, Crystal C. Pan, Gautam Singaraju, Saba Amsalu Teserra
  • Patent number: 12238451
    Abstract: Embodiments are disclosed for predicting, using neural networks, editing operations for application to a video sequence based on processing conversational messages by a video editing system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including a video sequence and text sentences, the text sentences describing a modification to the video sequence, mapping, by a first neural network content of the text sentences describing the modification to the video sequence to a candidate editing operation, processing, by a second neural network, the video sequence to predict parameter values for the candidate editing operation, and generating a modified video sequence by applying the candidate editing operation with the predicted parameter values to the video sequence.
    Type: Grant
    Filed: November 14, 2022
    Date of Patent: February 25, 2025
    Assignee: Adobe Inc.
    Inventors: Uttaran Bhattacharya, Gang Wu, Viswanathan Swaminathan, Stefano Petrangeli
  • Patent number: 12229516
    Abstract: A data processing system receives a natural language communication including an ordered sequence of a plurality of word spellings of a natural human language. A processor of the data processing system parses the plurality of word spelling of the natural language communication utilizing constraint-based parsing to identify a plurality of satisfied parsing constraints. The processor performs semantic analysis based on the plurality of satisfied parsing constraints. Performing semantic analysis includes obtaining at least mid-level comprehension of the natural language communication by identifying in the natural language communication utilizing constraints at least one of the following set: a clausal structure within the natural language communication, a sentence structure of a sentence in the natural language communication, an implied topic of the natural language communication, and a classical linguistic role in the natural language communication.
    Type: Grant
    Filed: June 19, 2023
    Date of Patent: February 18, 2025
    Inventor: Thomas A. Visel
  • Patent number: 12225284
    Abstract: Systems, methods, devices and non-transitory, computer-readable storage mediums are disclosed for a wearable multimedia device and cloud computing platform with an application ecosystem for processing multimedia data captured by the wearable multimedia device. In an embodiment, a method comprises: receiving, by one or more processors of a cloud computing platform, context data from a wearable multimedia device, the wearable multimedia device including at least one data capture device for capturing the context data; creating a data processing pipeline with one or more applications based on one or more characteristics of the context data and a user request; processing the context data through the data processing pipeline; and sending output of the data processing pipeline to the wearable multimedia device or other device for presentation of the output.
    Type: Grant
    Filed: March 27, 2023
    Date of Patent: February 11, 2025
    Assignee: Humane, Inc.
    Inventors: Imran A. Chaudhri, Bethany Bongiorno, Shahzad Chaudhri
  • Patent number: 12217755
    Abstract: A voice conversion device is provided with a linguistic information extraction unit that extracts linguistic information corresponding to utterance content from a conversion source voice signal, an appearance feature extraction unit that extracts appearance features expressing features related to the look of a person's face from a captured image of the person, and a converted voice generation unit that generates a converted voice on a basis of the linguistic information and the appearance features.
    Type: Grant
    Filed: September 4, 2020
    Date of Patent: February 4, 2025
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Hirokazu Kameoka, Ko Tanaka, Yasunori Oishi, Takuhiro Kaneko, Aaron Valero Puche
  • Patent number: 12217002
    Abstract: Apparatuses, systems, and techniques to parse textual data using parallel computing devices. In at least one embodiment, text is parsed by a plurality of parallel processing units using a finite state machine and logical stack to convert the text to a tree data structure. Data is extracted from the tree by the plurality of parallel processors and stored.
    Type: Grant
    Filed: May 11, 2022
    Date of Patent: February 4, 2025
    Assignee: NVIDIA Corporation
    Inventors: Elias Stehle, Gregory Michael Kimball
  • Patent number: 12197812
    Abstract: Systems, methods, and devices may generate speech files that reflect emotion of text-based content. An example process includes selecting a first text from a first source of text content and selecting a second text from a second source of text content. The first text and the second text are aggregated into an aggregated text, and the aggregated text includes a first emotion associated with content of the first text. The aggregated text also includes a second emotion associated with content of the second text. The aggregated text is converted into a speech stored in an audio file. The speech replicates human expression of the first emotion and of the second emotion.
    Type: Grant
    Filed: April 13, 2023
    Date of Patent: January 14, 2025
    Assignee: DISH Technologies L.L.C.
    Inventor: John C. Calef, III
  • Patent number: 12197862
    Abstract: Disclosed is a method for identifying a word corresponding to a target word in text information, which is performed by one or more processors of a computing device. The method may include: determining a target word; determining a threshold for an edit distance associated with the target word; determining a word of which the edit distance from the target word among words included in text information is equal to or less than the threshold; and identifying the word corresponding to the target word based on the determined word.
    Type: Grant
    Filed: November 3, 2022
    Date of Patent: January 14, 2025
    Assignee: ActionPower Corp.
    Inventor: Seongmin Park
  • Patent number: 12190872
    Abstract: An electronic apparatus includes: a memory storing one or more commands; and a processor connected to the memory and configured to control the electronic apparatus, wherein the processor is configured, by executing the one or more instructions, to: identify a first intention word and a first target word from first speech, acquire second speech received after the first speech based on at least one of the identified first intention word or the identified first target word not matching a word stored in the memory, acquire a similarity between the first speech and the second speech, and acquire response information based on the first speech and the second speech based on the similarity being a threshold value or more.
    Type: Grant
    Filed: April 5, 2022
    Date of Patent: January 7, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ali Ahmad Adel Al Massaeed, Abdelrahman Mustafa Mahmoud Yaseen, Shrouq Walid Jamil Sabah
  • Patent number: 12190018
    Abstract: A system and method for generating, triggering and playing a sequence of audio files with cues for delivering a presentation for a presenter using a personal audio device coupled to a computing device. The system comprising the comprising a computer device that is coupled to a presentation data analysis server through a network. The method includes (i) generating a sequence of audio files with cues for delivering a presentation, (ii) triggering playing an audio file from the sequence of audio files, and (iii) playing the sequence of audio files one by one, on the computing device, using the personal audio device coupled to a computing device to enable the presenter to recall and speak the content based on the sequence of the audio files.
    Type: Grant
    Filed: January 4, 2021
    Date of Patent: January 7, 2025
    Inventor: Arjun Karthik Bala
  • Patent number: 12183320
    Abstract: A method for generating synthetic speech for text through a user interface is provided. The method may include receiving one or more sentences, determining a speech style characteristic for the received one or more sentences, and outputting a synthetic speech for the one or more sentences that reflects the determined speech style characteristic. The one or more sentences and the determined speech style characteristic may be inputted to an artificial neural network text-to-speech synthesis model and the synthetic speech may be generated based on the speech data outputted from the artificial neural network text-to-speech synthesis model.
    Type: Grant
    Filed: January 20, 2021
    Date of Patent: December 31, 2024
    Assignee: NEOSAPIENCE, INC.
    Inventors: Taesu Kim, Younggun Lee
  • Patent number: 12183340
    Abstract: Systems and methods of intent identification for customized dialogue support in virtual environments are provided. Dialogue intent models stored in memory may each specify one or more intents each associated with a dialogue filter. Input data may be received from a user device of a user. Such input data may be captured during an interactive session of an interactive title that provides a virtual environment to the user device. The input data may be analyzed based on the intent models in response to a detected dialogue trigger and may be determined to correspond to one of the stored intents. The dialogue filter associated with the determined intent may be applied to a plurality of available dialogue outputs associated with the detected dialogue filter. A customized dialogue output may be generated in accordance with a filtered subset of the available dialogue outputs.
    Type: Grant
    Filed: July 21, 2022
    Date of Patent: December 31, 2024
    Assignees: SONY INTERACTIVE ENTERTAINMENT LLC, SONY INTERACTIVE ENTERTAINMENT INC.
    Inventors: Benaisha Patel, Alessandra Luizello, Olga Rudi
  • Patent number: 12175394
    Abstract: The present invention is directed to interactive training, and in particular, to methods and systems for computerized interactive skill training. An example embodiment provides a method and system for providing skill training using a computerized system. The computerized system receives a selection of a first training subject. Several related training components can be invoked, such as reading, watching, performing, and/or reviewing components. In addition, a scored challenge session is provided, wherein a training challenge is provided to a user via a terminal, optionally in video form.
    Type: Grant
    Filed: April 12, 2023
    Date of Patent: December 24, 2024
    Assignee: Breakthrough PerformanceTech, LLC
    Inventors: Martin L. Cohen, Edward G. Brown
  • Patent number: 12165647
    Abstract: A computer-implemented method is disclosed. A search query of a text transcription is received. The search query includes a word or words having a specified spelling. A sequence of search phonemes corresponding to the specified spelling is generated. A sequence of transcript phonemes corresponding to the text transcription is generated from the text transcription. A search alignment in which the sequence of search phonemes is aligned to a transcript phoneme fragment is generated. Based at least on the search alignment having a quality score exceeding a quality score threshold, the transcript phoneme fragment and an associated portion of the text transcription is determined to result from an utterance of the specified spelling in an audio session corresponding to the text transcription. A search result indicating that the transcript phoneme fragment and the associated portion of the text transcription is determined to have resulted from the utterance is output.
    Type: Grant
    Filed: May 27, 2022
    Date of Patent: December 10, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Yuchen Li
  • Patent number: 12154563
    Abstract: An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.
    Type: Grant
    Filed: February 24, 2022
    Date of Patent: November 26, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jonghoon Jeong, Hosang Sung, Doohwa Hong, Kyoungbo Min, Eunmi Oh, Kihyun Choo
  • Patent number: 12149781
    Abstract: Systems and methods for determining whether a first electronic device detects a media item that is to be output by a second electronic device is described herein. In some embodiments, an individual may request, using a first electronic device, that a media item be played on a second electronic device. The backend system may send first audio data representing a first response to the first electronic device, along with instructions to delay outputting the first response, as well as to continue sending audio data of additional audio captured thereby. The backend system may also send second audio data representing a second response to the second electronic device along with the media item. Text data may be generated representing the captured audio, which may then be compared with text data representing the second response to determine whether or not they match.
    Type: Grant
    Filed: September 16, 2022
    Date of Patent: November 19, 2024
    Assignee: Amazon Technologies, Inc.
    Inventor: Dennis Francis Cwik
  • Patent number: 12147646
    Abstract: Disclosed herein are system, method, and computer program product embodiments for unifying graphical user interface (GUI) displays across different device types. In an embodiment, a unification system may convert various GUI view appearing on, for example, a desktop device into a GUI view on a mobile device. Both devices may be accessing the same application and/or may use a cloud computing platform to access the application. The unification system may aid in reproducing GUI modifications performed on one user device onto other user devices. In this manner, the unification system may maintain a consistent look-and-feel for a user across different computing device type.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: November 19, 2024
    Assignee: Salesforce, Inc.
    Inventors: Eric Jacobson, Michael Gonzalez, Wayne Cho, Adheip Varadarajan, John Vollmer, Benjamin Snyder
  • Patent number: 12142257
    Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: November 12, 2024
    Assignee: SNAP INC.
    Inventors: Liron Harazi, Jacob Assa, Alan Bekker
  • Patent number: 12142256
    Abstract: An information processing device according to embodiments includes a communication unit configured to receive audio data of content and text data corresponding to the audio data, an audio data reproduction unit configured to perform reproduction of the audio data, a text data reproduction unit configured to perform the reproduction by audio synthesis of the text data, and a controller that controls the reproduction of the audio data or the text data. The controller causes the text data reproduction unit to perform the reproduction of the text data when the audio data reproduction unit is unable to perform the reproduction of the audio data.
    Type: Grant
    Filed: October 20, 2023
    Date of Patent: November 12, 2024
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventor: Jun Tsukamoto
  • Patent number: 12136259
    Abstract: A method for training a neural network, including: determining a neural network; training the neural network at a first learning rate according to a first optimization mode, where the first learning rate is updated each time the neural network is trained; mapping the first learning rate of the first optimization mode to a second learning rate of a second optimization mode in the same vector space; determining the second learning rate satisfies a preset update condition; and continuing to train the neural network at the second learning rate according to the second optimization mode.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: November 5, 2024
    Assignee: BIGO TECHNOLOGY PTE. LTD.
    Inventors: Wei Xiang, Chao Pei
  • Patent number: 12100386
    Abstract: A speech processing system for generating translated speech, the system comprising: an input for receiving a first speech signal comprising a second language; an output for outputting a second speech signal comprising a first language; and a processor configured to: generate a first text signal from a segment of the first speech signal, the first text signal comprising the second language; generate a second text signal from the first text signal, the second text signal comprising the first language; extract a plurality of first feature vectors from the segment of the first speech signal, wherein the first feature vectors comprise information relating to audio data corresponding to the segment of the first speech signal; generate a speaker vector using a first trained algorithm taking one or more of the first feature vectors as input, wherein the speaker vector represents a set of features corresponding to a speaker; generate a second speech signal segment using a second trained algorithm taking information re
    Type: Grant
    Filed: March 13, 2019
    Date of Patent: September 24, 2024
    Assignee: Papercup Technologies Limited
    Inventor: Jiameng Gao
  • Patent number: 12096068
    Abstract: A method and a device for enabling a spatio-temporal navigation of content. In response to a request, from a client, for a content comprising a first content of a first type, the device transmits to the client: the first content; a second content of a second type generated from the first content; synchronization metadata associating each word of one of the contents with a time marker to the other content; and a script, for execution at the client, configured to re-establish at least one of the contents and offer a navigation of said content depending on the type of content re-established by using at least one among: the first content, the second content, and the synchronization metadata.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: September 17, 2024
    Assignee: ORANGE
    Inventors: Fabrice Boudin, Frederic Herledan
  • Patent number: 12087270
    Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.
    Type: Grant
    Filed: September 29, 2022
    Date of Patent: September 10, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Sebastian Dariusz Cygert, Daniel Korzekwa, Kamil Pokora, Piotr Tadeusz Bilinski, Kayoko Yanagisawa, Abdelhamid Ezzerg, Thomas Edward Merritt, Raghu Ram Sreepada Srinivas, Nikhil Sharma
  • Patent number: 12087273
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Grant
    Filed: January 30, 2023
    Date of Patent: September 10, 2024
    Assignee: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
  • Patent number: 12087272
    Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: September 10, 2024
    Assignee: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yu Zhang
  • Patent number: 12076850
    Abstract: A monitor is installed in an eye of a robot, and an eye image is displayed on the monitor. The robot extracts a feature quantity of an eye of a user from a filmed image of the user. The feature quantity of the eye of the user is reflected in the eye image. For example, a feature quantity is a size of a pupillary region and a pupil image, and a form of an eyelid image. Also, a blinking frequency or the like may also be reflected as a feature quantity. Familiarity with respect to each user is set, and which user's feature quantity is to be reflected may be determined in accordance with the familiarity.
    Type: Grant
    Filed: April 7, 2023
    Date of Patent: September 3, 2024
    Assignee: GROOVE X, INC.
    Inventor: Kaname Hayashi
  • Patent number: 12080270
    Abstract: An apparatus for synthesizing speech according to an embodiment is a computing apparatus that includes one or more processors and a memory storing one or more programs executed by the one or more processors. The apparatus for synthesizing speech includes a pre-processing module that marks a preset classification symbol on each of unit texts input; and a speech synthesis module that receives each unit text marked with the classification symbol and synthesizes speech uttering the unit text based on the input unit text.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: September 3, 2024
    Assignee: DEEPBRAIN AI INC.
    Inventors: Gyeongsu Chae, Dalhyun Kim
  • Patent number: 12080272
    Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: September 3, 2024
    Assignee: Google LLC
    Inventors: Robert Clark, Chun-an Chan, Vincent Wan
  • Patent number: 12080089
    Abstract: A computer-implemented method, a computer system and a computer program product enhance machine translation of a document. The method includes capturing an image of the document. The document includes a plurality of characters that are arranged in a character layout. The method also includes classifying the image by a document type based on the character layout. The method further includes determining a strategy for an intelligent character recognition (ICR) algorithm with the image based on the character layout of the image. Lastly, the method includes generating a translated document by applying the intelligent character recognition (ICR) algorithm to the plurality of characters in the image using the strategy. The translated document includes a plurality of translated characters that are arranged in the character layout.
    Type: Grant
    Filed: December 8, 2021
    Date of Patent: September 3, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Barton Wayne Emanuel, Nadiya Kochura, Su Liu, Tetsuya Shimada
  • Patent number: 12073838
    Abstract: A speech-processing system may provide access to multiple virtual assistants via one or more voice-controlled devices. Each assistant may leverage language processing and language generation features of the speech-processing system, while handling different commands and/or providing access to different back applications. Each assistant may be associated with its own voice and/or speech style, and thus be perceived as having a particular “personality.” Different assistants may be available for use with a particular voice-controlled device based on time, location, the particular user, etc. In some situations, language processing may be improved by leveraging data such as intent data, grammars, lexicons, entities, etc., associated with assistants available for use with the particular voice-controlled device.
    Type: Grant
    Filed: December 7, 2020
    Date of Patent: August 27, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Naveen Bobbili, David Henry, Mark Vincent Mattione, Richard Du, Jyoti Chhabra
  • Patent number: 12067130
    Abstract: The disclosed exemplary embodiments include computer-implemented systems, devices, apparatuses, and processes that maintain data confidentiality in communications involving voice-enabled devices in a distributed computing environment using homomorphic encryption. By way of example, an apparatus may receive encrypted command data from a computing system, decrypt the encrypted command data using a homomorphic private key, and perform operations that associate the decrypted command data with a request for an element of data. Using a public cryptographic key associated with a device, the apparatus generate an encrypted response that includes the requested data element, and transmit the encrypted response to the device. The device may decrypt the encrypted response using a private cryptographic key and to perform operations that present first audio content representative of the requested data element through an acoustic interface.
    Type: Grant
    Filed: November 12, 2021
    Date of Patent: August 20, 2024
    Assignee: The Toronto-Dominion Bank
    Inventors: Alexey Shpurov, Milos Dunjic, Brian Andrew Lam
  • Patent number: 12062357
    Abstract: A method of registering an attribute in a speech synthesis model, an apparatus of registering an attribute in a speech synthesis model, an electronic device, and a medium are provided, which relate to a field of an artificial intelligence technology such as a deep learning and intelligent speech technology. The method includes: acquiring a plurality of data associated with an attribute to be registered; and registering the attribute in the speech synthesis model by using the plurality of data associated with the attribute, wherein the speech synthesis model is trained in advance by using a training data in a training data set.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: August 13, 2024
    Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Wenfu Wang, Xilei Wang, Tao Sun, Han Yuan, Zhengkun Gao, Lei Jia
  • Patent number: 12057106
    Abstract: A method of authorizing content for use, e.g., in association with a conversational bot. The method begins by configuring a conversational bot using a machine learning model trained to classify utterances into topics. Utterances that are not recognized by the machine learning model (e.g., according to some configurable threshold) are then identified. Using a clustering algorithm, one or more of the identified utterances are then processed into a grouping. Information identifying a topic associated with the grouping is then received and, in response, the machine learning model is updated to include the topic.
    Type: Grant
    Filed: March 15, 2023
    Date of Patent: August 6, 2024
    Assignee: Drift.com, Inc.
    Inventors: Maria C. Moya, Natalie Duerr, Jeffrey D. Orkin, Jane S. Taschman, Carolina M. Caprile, Christopher M. Ward