Image To Speech Patents (Class 704/260)
  • Patent number: 12238451
    Abstract: Embodiments are disclosed for predicting, using neural networks, editing operations for application to a video sequence based on processing conversational messages by a video editing system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including a video sequence and text sentences, the text sentences describing a modification to the video sequence, mapping, by a first neural network content of the text sentences describing the modification to the video sequence to a candidate editing operation, processing, by a second neural network, the video sequence to predict parameter values for the candidate editing operation, and generating a modified video sequence by applying the candidate editing operation with the predicted parameter values to the video sequence.
    Type: Grant
    Filed: November 14, 2022
    Date of Patent: February 25, 2025
    Assignee: Adobe Inc.
    Inventors: Uttaran Bhattacharya, Gang Wu, Viswanathan Swaminathan, Stefano Petrangeli
  • Patent number: 12229516
    Abstract: A data processing system receives a natural language communication including an ordered sequence of a plurality of word spellings of a natural human language. A processor of the data processing system parses the plurality of word spelling of the natural language communication utilizing constraint-based parsing to identify a plurality of satisfied parsing constraints. The processor performs semantic analysis based on the plurality of satisfied parsing constraints. Performing semantic analysis includes obtaining at least mid-level comprehension of the natural language communication by identifying in the natural language communication utilizing constraints at least one of the following set: a clausal structure within the natural language communication, a sentence structure of a sentence in the natural language communication, an implied topic of the natural language communication, and a classical linguistic role in the natural language communication.
    Type: Grant
    Filed: June 19, 2023
    Date of Patent: February 18, 2025
    Inventor: Thomas A. Visel
  • Patent number: 12225284
    Abstract: Systems, methods, devices and non-transitory, computer-readable storage mediums are disclosed for a wearable multimedia device and cloud computing platform with an application ecosystem for processing multimedia data captured by the wearable multimedia device. In an embodiment, a method comprises: receiving, by one or more processors of a cloud computing platform, context data from a wearable multimedia device, the wearable multimedia device including at least one data capture device for capturing the context data; creating a data processing pipeline with one or more applications based on one or more characteristics of the context data and a user request; processing the context data through the data processing pipeline; and sending output of the data processing pipeline to the wearable multimedia device or other device for presentation of the output.
    Type: Grant
    Filed: March 27, 2023
    Date of Patent: February 11, 2025
    Assignee: Humane, Inc.
    Inventors: Imran A. Chaudhri, Bethany Bongiorno, Shahzad Chaudhri
  • Patent number: 12217755
    Abstract: A voice conversion device is provided with a linguistic information extraction unit that extracts linguistic information corresponding to utterance content from a conversion source voice signal, an appearance feature extraction unit that extracts appearance features expressing features related to the look of a person's face from a captured image of the person, and a converted voice generation unit that generates a converted voice on a basis of the linguistic information and the appearance features.
    Type: Grant
    Filed: September 4, 2020
    Date of Patent: February 4, 2025
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Hirokazu Kameoka, Ko Tanaka, Yasunori Oishi, Takuhiro Kaneko, Aaron Valero Puche
  • Patent number: 12217002
    Abstract: Apparatuses, systems, and techniques to parse textual data using parallel computing devices. In at least one embodiment, text is parsed by a plurality of parallel processing units using a finite state machine and logical stack to convert the text to a tree data structure. Data is extracted from the tree by the plurality of parallel processors and stored.
    Type: Grant
    Filed: May 11, 2022
    Date of Patent: February 4, 2025
    Assignee: NVIDIA Corporation
    Inventors: Elias Stehle, Gregory Michael Kimball
  • Patent number: 12197862
    Abstract: Disclosed is a method for identifying a word corresponding to a target word in text information, which is performed by one or more processors of a computing device. The method may include: determining a target word; determining a threshold for an edit distance associated with the target word; determining a word of which the edit distance from the target word among words included in text information is equal to or less than the threshold; and identifying the word corresponding to the target word based on the determined word.
    Type: Grant
    Filed: November 3, 2022
    Date of Patent: January 14, 2025
    Assignee: ActionPower Corp.
    Inventor: Seongmin Park
  • Patent number: 12197812
    Abstract: Systems, methods, and devices may generate speech files that reflect emotion of text-based content. An example process includes selecting a first text from a first source of text content and selecting a second text from a second source of text content. The first text and the second text are aggregated into an aggregated text, and the aggregated text includes a first emotion associated with content of the first text. The aggregated text also includes a second emotion associated with content of the second text. The aggregated text is converted into a speech stored in an audio file. The speech replicates human expression of the first emotion and of the second emotion.
    Type: Grant
    Filed: April 13, 2023
    Date of Patent: January 14, 2025
    Assignee: DISH Technologies L.L.C.
    Inventor: John C. Calef, III
  • Patent number: 12190018
    Abstract: A system and method for generating, triggering and playing a sequence of audio files with cues for delivering a presentation for a presenter using a personal audio device coupled to a computing device. The system comprising the comprising a computer device that is coupled to a presentation data analysis server through a network. The method includes (i) generating a sequence of audio files with cues for delivering a presentation, (ii) triggering playing an audio file from the sequence of audio files, and (iii) playing the sequence of audio files one by one, on the computing device, using the personal audio device coupled to a computing device to enable the presenter to recall and speak the content based on the sequence of the audio files.
    Type: Grant
    Filed: January 4, 2021
    Date of Patent: January 7, 2025
    Inventor: Arjun Karthik Bala
  • Patent number: 12190872
    Abstract: An electronic apparatus includes: a memory storing one or more commands; and a processor connected to the memory and configured to control the electronic apparatus, wherein the processor is configured, by executing the one or more instructions, to: identify a first intention word and a first target word from first speech, acquire second speech received after the first speech based on at least one of the identified first intention word or the identified first target word not matching a word stored in the memory, acquire a similarity between the first speech and the second speech, and acquire response information based on the first speech and the second speech based on the similarity being a threshold value or more.
    Type: Grant
    Filed: April 5, 2022
    Date of Patent: January 7, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ali Ahmad Adel Al Massaeed, Abdelrahman Mustafa Mahmoud Yaseen, Shrouq Walid Jamil Sabah
  • Patent number: 12183340
    Abstract: Systems and methods of intent identification for customized dialogue support in virtual environments are provided. Dialogue intent models stored in memory may each specify one or more intents each associated with a dialogue filter. Input data may be received from a user device of a user. Such input data may be captured during an interactive session of an interactive title that provides a virtual environment to the user device. The input data may be analyzed based on the intent models in response to a detected dialogue trigger and may be determined to correspond to one of the stored intents. The dialogue filter associated with the determined intent may be applied to a plurality of available dialogue outputs associated with the detected dialogue filter. A customized dialogue output may be generated in accordance with a filtered subset of the available dialogue outputs.
    Type: Grant
    Filed: July 21, 2022
    Date of Patent: December 31, 2024
    Assignees: SONY INTERACTIVE ENTERTAINMENT LLC, SONY INTERACTIVE ENTERTAINMENT INC.
    Inventors: Benaisha Patel, Alessandra Luizello, Olga Rudi
  • Patent number: 12183320
    Abstract: A method for generating synthetic speech for text through a user interface is provided. The method may include receiving one or more sentences, determining a speech style characteristic for the received one or more sentences, and outputting a synthetic speech for the one or more sentences that reflects the determined speech style characteristic. The one or more sentences and the determined speech style characteristic may be inputted to an artificial neural network text-to-speech synthesis model and the synthetic speech may be generated based on the speech data outputted from the artificial neural network text-to-speech synthesis model.
    Type: Grant
    Filed: January 20, 2021
    Date of Patent: December 31, 2024
    Assignee: NEOSAPIENCE, INC.
    Inventors: Taesu Kim, Younggun Lee
  • Patent number: 12175394
    Abstract: The present invention is directed to interactive training, and in particular, to methods and systems for computerized interactive skill training. An example embodiment provides a method and system for providing skill training using a computerized system. The computerized system receives a selection of a first training subject. Several related training components can be invoked, such as reading, watching, performing, and/or reviewing components. In addition, a scored challenge session is provided, wherein a training challenge is provided to a user via a terminal, optionally in video form.
    Type: Grant
    Filed: April 12, 2023
    Date of Patent: December 24, 2024
    Assignee: Breakthrough PerformanceTech, LLC
    Inventors: Martin L. Cohen, Edward G. Brown
  • Patent number: 12165647
    Abstract: A computer-implemented method is disclosed. A search query of a text transcription is received. The search query includes a word or words having a specified spelling. A sequence of search phonemes corresponding to the specified spelling is generated. A sequence of transcript phonemes corresponding to the text transcription is generated from the text transcription. A search alignment in which the sequence of search phonemes is aligned to a transcript phoneme fragment is generated. Based at least on the search alignment having a quality score exceeding a quality score threshold, the transcript phoneme fragment and an associated portion of the text transcription is determined to result from an utterance of the specified spelling in an audio session corresponding to the text transcription. A search result indicating that the transcript phoneme fragment and the associated portion of the text transcription is determined to have resulted from the utterance is output.
    Type: Grant
    Filed: May 27, 2022
    Date of Patent: December 10, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Yuchen Li
  • Patent number: 12154563
    Abstract: An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.
    Type: Grant
    Filed: February 24, 2022
    Date of Patent: November 26, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jonghoon Jeong, Hosang Sung, Doohwa Hong, Kyoungbo Min, Eunmi Oh, Kihyun Choo
  • Patent number: 12147646
    Abstract: Disclosed herein are system, method, and computer program product embodiments for unifying graphical user interface (GUI) displays across different device types. In an embodiment, a unification system may convert various GUI view appearing on, for example, a desktop device into a GUI view on a mobile device. Both devices may be accessing the same application and/or may use a cloud computing platform to access the application. The unification system may aid in reproducing GUI modifications performed on one user device onto other user devices. In this manner, the unification system may maintain a consistent look-and-feel for a user across different computing device type.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: November 19, 2024
    Assignee: Salesforce, Inc.
    Inventors: Eric Jacobson, Michael Gonzalez, Wayne Cho, Adheip Varadarajan, John Vollmer, Benjamin Snyder
  • Patent number: 12149781
    Abstract: Systems and methods for determining whether a first electronic device detects a media item that is to be output by a second electronic device is described herein. In some embodiments, an individual may request, using a first electronic device, that a media item be played on a second electronic device. The backend system may send first audio data representing a first response to the first electronic device, along with instructions to delay outputting the first response, as well as to continue sending audio data of additional audio captured thereby. The backend system may also send second audio data representing a second response to the second electronic device along with the media item. Text data may be generated representing the captured audio, which may then be compared with text data representing the second response to determine whether or not they match.
    Type: Grant
    Filed: September 16, 2022
    Date of Patent: November 19, 2024
    Assignee: Amazon Technologies, Inc.
    Inventor: Dennis Francis Cwik
  • Patent number: 12142257
    Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: November 12, 2024
    Assignee: SNAP INC.
    Inventors: Liron Harazi, Jacob Assa, Alan Bekker
  • Patent number: 12142256
    Abstract: An information processing device according to embodiments includes a communication unit configured to receive audio data of content and text data corresponding to the audio data, an audio data reproduction unit configured to perform reproduction of the audio data, a text data reproduction unit configured to perform the reproduction by audio synthesis of the text data, and a controller that controls the reproduction of the audio data or the text data. The controller causes the text data reproduction unit to perform the reproduction of the text data when the audio data reproduction unit is unable to perform the reproduction of the audio data.
    Type: Grant
    Filed: October 20, 2023
    Date of Patent: November 12, 2024
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventor: Jun Tsukamoto
  • Patent number: 12136259
    Abstract: A method for training a neural network, including: determining a neural network; training the neural network at a first learning rate according to a first optimization mode, where the first learning rate is updated each time the neural network is trained; mapping the first learning rate of the first optimization mode to a second learning rate of a second optimization mode in the same vector space; determining the second learning rate satisfies a preset update condition; and continuing to train the neural network at the second learning rate according to the second optimization mode.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: November 5, 2024
    Assignee: BIGO TECHNOLOGY PTE. LTD.
    Inventors: Wei Xiang, Chao Pei
  • Patent number: 12100386
    Abstract: A speech processing system for generating translated speech, the system comprising: an input for receiving a first speech signal comprising a second language; an output for outputting a second speech signal comprising a first language; and a processor configured to: generate a first text signal from a segment of the first speech signal, the first text signal comprising the second language; generate a second text signal from the first text signal, the second text signal comprising the first language; extract a plurality of first feature vectors from the segment of the first speech signal, wherein the first feature vectors comprise information relating to audio data corresponding to the segment of the first speech signal; generate a speaker vector using a first trained algorithm taking one or more of the first feature vectors as input, wherein the speaker vector represents a set of features corresponding to a speaker; generate a second speech signal segment using a second trained algorithm taking information re
    Type: Grant
    Filed: March 13, 2019
    Date of Patent: September 24, 2024
    Assignee: Papercup Technologies Limited
    Inventor: Jiameng Gao
  • Patent number: 12096068
    Abstract: A method and a device for enabling a spatio-temporal navigation of content. In response to a request, from a client, for a content comprising a first content of a first type, the device transmits to the client: the first content; a second content of a second type generated from the first content; synchronization metadata associating each word of one of the contents with a time marker to the other content; and a script, for execution at the client, configured to re-establish at least one of the contents and offer a navigation of said content depending on the type of content re-established by using at least one among: the first content, the second content, and the synchronization metadata.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: September 17, 2024
    Assignee: ORANGE
    Inventors: Fabrice Boudin, Frederic Herledan
  • Patent number: 12087270
    Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.
    Type: Grant
    Filed: September 29, 2022
    Date of Patent: September 10, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Sebastian Dariusz Cygert, Daniel Korzekwa, Kamil Pokora, Piotr Tadeusz Bilinski, Kayoko Yanagisawa, Abdelhamid Ezzerg, Thomas Edward Merritt, Raghu Ram Sreepada Srinivas, Nikhil Sharma
  • Patent number: 12087273
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Grant
    Filed: January 30, 2023
    Date of Patent: September 10, 2024
    Assignee: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
  • Patent number: 12087272
    Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: September 10, 2024
    Assignee: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yu Zhang
  • Patent number: 12080089
    Abstract: A computer-implemented method, a computer system and a computer program product enhance machine translation of a document. The method includes capturing an image of the document. The document includes a plurality of characters that are arranged in a character layout. The method also includes classifying the image by a document type based on the character layout. The method further includes determining a strategy for an intelligent character recognition (ICR) algorithm with the image based on the character layout of the image. Lastly, the method includes generating a translated document by applying the intelligent character recognition (ICR) algorithm to the plurality of characters in the image using the strategy. The translated document includes a plurality of translated characters that are arranged in the character layout.
    Type: Grant
    Filed: December 8, 2021
    Date of Patent: September 3, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Barton Wayne Emanuel, Nadiya Kochura, Su Liu, Tetsuya Shimada
  • Patent number: 12080270
    Abstract: An apparatus for synthesizing speech according to an embodiment is a computing apparatus that includes one or more processors and a memory storing one or more programs executed by the one or more processors. The apparatus for synthesizing speech includes a pre-processing module that marks a preset classification symbol on each of unit texts input; and a speech synthesis module that receives each unit text marked with the classification symbol and synthesizes speech uttering the unit text based on the input unit text.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: September 3, 2024
    Assignee: DEEPBRAIN AI INC.
    Inventors: Gyeongsu Chae, Dalhyun Kim
  • Patent number: 12080272
    Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: September 3, 2024
    Assignee: Google LLC
    Inventors: Robert Clark, Chun-an Chan, Vincent Wan
  • Patent number: 12076850
    Abstract: A monitor is installed in an eye of a robot, and an eye image is displayed on the monitor. The robot extracts a feature quantity of an eye of a user from a filmed image of the user. The feature quantity of the eye of the user is reflected in the eye image. For example, a feature quantity is a size of a pupillary region and a pupil image, and a form of an eyelid image. Also, a blinking frequency or the like may also be reflected as a feature quantity. Familiarity with respect to each user is set, and which user's feature quantity is to be reflected may be determined in accordance with the familiarity.
    Type: Grant
    Filed: April 7, 2023
    Date of Patent: September 3, 2024
    Assignee: GROOVE X, INC.
    Inventor: Kaname Hayashi
  • Patent number: 12073838
    Abstract: A speech-processing system may provide access to multiple virtual assistants via one or more voice-controlled devices. Each assistant may leverage language processing and language generation features of the speech-processing system, while handling different commands and/or providing access to different back applications. Each assistant may be associated with its own voice and/or speech style, and thus be perceived as having a particular “personality.” Different assistants may be available for use with a particular voice-controlled device based on time, location, the particular user, etc. In some situations, language processing may be improved by leveraging data such as intent data, grammars, lexicons, entities, etc., associated with assistants available for use with the particular voice-controlled device.
    Type: Grant
    Filed: December 7, 2020
    Date of Patent: August 27, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Naveen Bobbili, David Henry, Mark Vincent Mattione, Richard Du, Jyoti Chhabra
  • Patent number: 12067130
    Abstract: The disclosed exemplary embodiments include computer-implemented systems, devices, apparatuses, and processes that maintain data confidentiality in communications involving voice-enabled devices in a distributed computing environment using homomorphic encryption. By way of example, an apparatus may receive encrypted command data from a computing system, decrypt the encrypted command data using a homomorphic private key, and perform operations that associate the decrypted command data with a request for an element of data. Using a public cryptographic key associated with a device, the apparatus generate an encrypted response that includes the requested data element, and transmit the encrypted response to the device. The device may decrypt the encrypted response using a private cryptographic key and to perform operations that present first audio content representative of the requested data element through an acoustic interface.
    Type: Grant
    Filed: November 12, 2021
    Date of Patent: August 20, 2024
    Assignee: The Toronto-Dominion Bank
    Inventors: Alexey Shpurov, Milos Dunjic, Brian Andrew Lam
  • Patent number: 12062357
    Abstract: A method of registering an attribute in a speech synthesis model, an apparatus of registering an attribute in a speech synthesis model, an electronic device, and a medium are provided, which relate to a field of an artificial intelligence technology such as a deep learning and intelligent speech technology. The method includes: acquiring a plurality of data associated with an attribute to be registered; and registering the attribute in the speech synthesis model by using the plurality of data associated with the attribute, wherein the speech synthesis model is trained in advance by using a training data in a training data set.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: August 13, 2024
    Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Wenfu Wang, Xilei Wang, Tao Sun, Han Yuan, Zhengkun Gao, Lei Jia
  • Patent number: 12057106
    Abstract: A method of authorizing content for use, e.g., in association with a conversational bot. The method begins by configuring a conversational bot using a machine learning model trained to classify utterances into topics. Utterances that are not recognized by the machine learning model (e.g., according to some configurable threshold) are then identified. Using a clustering algorithm, one or more of the identified utterances are then processed into a grouping. Information identifying a topic associated with the grouping is then received and, in response, the machine learning model is updated to include the topic.
    Type: Grant
    Filed: March 15, 2023
    Date of Patent: August 6, 2024
    Assignee: Drift.com, Inc.
    Inventors: Maria C. Moya, Natalie Duerr, Jeffrey D. Orkin, Jane S. Taschman, Carolina M. Caprile, Christopher M. Ward
  • Patent number: 12056854
    Abstract: Embodiments of the present invention provide end-to-end frame time synchronization designed to improve smoothness for displaying images of 3D applications, such as PC gaming applications. Traditionally, an application that renders 3D graphics functions based on the assumption that the average render time will be used as the animation time for a given frame. When this condition is not met, and the render time for a frame does not match the average render time of prior frames, the frames are not captured or displayed at a consistent rate. This invention enables feedback to be provided to the rendering application for adjusting the animation times used to produce new frames, and a post-render queue is used to store completed frames for mitigating stutter and hitches. Flip control is used to sync the display of a rendered frame with the animation time used to generate the frame, thereby producing a smooth, consistent image.
    Type: Grant
    Filed: March 28, 2022
    Date of Patent: August 6, 2024
    Assignee: NVIDIA CORPORATION
    Inventors: Thomas Albert Petersen, Ankan Banerjee, Shishir Goyal, Sau Yan Keith Li, Lars Nordskog, Rouslan Dimitrov
  • Patent number: 12046236
    Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.
    Type: Grant
    Filed: August 27, 2021
    Date of Patent: July 23, 2024
    Assignee: International Business Machines Corporation
    Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
  • Patent number: 12039969
    Abstract: A speech processing system for generating translated speech, the system comprising: an input for receiving a first speech signal comprising a second language; an output for outputting a second speech signal comprising a first language; and a processor configured to: generate a first text signal from a segment of the first speech signal, the first text signal comprising the second language; generate a second text signal from the first text signal, the second text signal comprising the first language; extract a plurality of first feature vectors from the segment of the first speech signal, wherein the first feature vectors comprise information relating to audio data corresponding to the segment of the first speech signal; generate a speaker vector using a first trained algorithm taking one or more of the first feature vectors as input, wherein the speaker vector represents a set of features corresponding to a speaker; generate a second speech signal segment using a second trained algorithm taking information re
    Type: Grant
    Filed: October 28, 2021
    Date of Patent: July 16, 2024
    Assignee: PAPERCUP TECHNOLOGIES LIMITED
    Inventor: Jiameng Gao
  • Patent number: 12026456
    Abstract: Systems and methods for using optical character recognition (OCR) with voice recognition commands are provided. Some embodiments include receiving a user interface that includes a text field, capturing an image of at least a portion of the user interface, and performing OCR on the image to identify the text field and a word in the user interface. Some embodiments include mapping a coordinate of the word in the text field, receiving a voice command that includes the word, and navigating, by the computing device, a cursor to the coordinate to execute the voice command.
    Type: Grant
    Filed: August 7, 2017
    Date of Patent: July 2, 2024
    Assignee: DOLBEY & COMPANY, INC.
    Inventor: Curtis A. Weeks
  • Patent number: 12027165
    Abstract: A non-transitory computer readable medium stores computer executable instructions which, when executed by at least one processor, cause the at least one processor to acquire a speech signal of speech of a user; perform a signal processing on the speech signal to acquire at least one feature of the speech of the user; and control display of information, related to each of one or more first candidate converters having a feature corresponding to the at least one feature, to present the one or more first candidate converters for selection by the user.
    Type: Grant
    Filed: July 9, 2021
    Date of Patent: July 2, 2024
    Assignee: GREE, INC.
    Inventor: Akihiko Shirai
  • Patent number: 12014720
    Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.
    Type: Grant
    Filed: August 21, 2020
    Date of Patent: June 18, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Xixin Wu, Mu Wang, Shiyin Kang, Dan Su, Dong Yu
  • Patent number: 12010399
    Abstract: Methods, systems, and computer-readable media for generating videos with characters indicating regions of images are provided. For example, an image containing a first region may be received. At least one characteristic of a character may be obtained. A script containing a first segment of the script may be received. The first segment of the script may be related to the first region of the image. The at least one characteristic of a character and the script may be used to generate a video of the character presenting the script and at least part of the image, where the character visually indicates the first region of the image while presenting the first segment of the script.
    Type: Grant
    Filed: January 17, 2023
    Date of Patent: June 11, 2024
    Inventors: Ben Avi Ingel, Ron Zass
  • Patent number: 11997344
    Abstract: Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: May 28, 2024
    Assignee: Rovi Guides, Inc.
    Inventors: Vijay Kumar, Rajendran Pichaimurthy, Madhusudhan Seetharam
  • Patent number: 11996083
    Abstract: A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.
    Type: Grant
    Filed: June 3, 2021
    Date of Patent: May 28, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox
  • Patent number: 11990118
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: June 6, 2023
    Date of Patent: May 21, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 11981261
    Abstract: A vehicle projection control device includes: a vehicle information acquisition unit that acquires vehicle information including a vehicle speed of a vehicle; an identification information acquisition unit that acquires identification information identifying a tunnel through which the vehicle travels; a virtual vehicle video generation unit that generates a virtual moving body video of a virtual moving body that moves ahead of the vehicle in a direction that is the same as the vehicle, the virtual moving body video being for projection by a projection unit of a head-up display device; and a projection control unit that controls projection of the virtual moving body video, such that a virtual image of the generated virtual moving body video is visually recognized ahead of the vehicle with use of a projection unit. The projection control unit controls the projection of the virtual moving body video based on the acquired identification information.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: May 14, 2024
    Assignee: JVCKENWOOD Corporation
    Inventors: Kenichi Matsuo, Makoto Kurihara
  • Patent number: 11977575
    Abstract: Techniques for recommending media are described. A character preference function comprising a plurality of preference coefficients is accessed. A first character model comprises a first set of attribute values for the plurality of attributes of a first character. The first and second characters are associated with a first and second salience value, respectively. A second character model comprises a second set of attribute values for the plurality of attributes of a second character of the plurality of characters. A first character rating is calculated using the plurality of preference coefficients and the first set of attribute values. A second character rating of the second character is calculated using the plurality of preference coefficients with the second set of attribute values. A media rating is calculated based on the first and second salience values and the first and second character ratings. A media is recommended based on the media rating.
    Type: Grant
    Filed: May 12, 2022
    Date of Patent: May 7, 2024
    Assignee: The Nielsen Company (US), LLC
    Inventors: Rachel Payne, Meghana Bhatt, Natasha Mohanty
  • Patent number: 11971926
    Abstract: Disclosed computer-based systems and methods for analyzing a plurality of audio files corresponding to text-based news stories and received from a plurality of audio file creators are configured to (i) compare quality and/or accuracy metrics of individual audio files against corresponding quality and/or accuracy thresholds, and (ii) based on the comparison: (a) accept audio files meeting the quality and/or accuracy thresholds for distribution to a plurality of subscribers for playback, (b) reject audio files failing to meet one or more certain quality and/or accuracy thresholds, (c) remediate audio files failing to meet certain quality thresholds, and (d) designate for human review, audio files failing to meet one or more certain quality and/or accuracy thresholds by a predetermined margin.
    Type: Grant
    Filed: August 17, 2020
    Date of Patent: April 30, 2024
    Assignee: Gracenote Digital Ventures, LLC
    Inventors: Gregory P. Defouw, Venkatarama Anilkumar Panguluri
  • Patent number: 11960852
    Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.
    Type: Grant
    Filed: December 15, 2021
    Date of Patent: April 16, 2024
    Assignee: Google LLC
    Inventors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz
  • Patent number: 11948559
    Abstract: Various embodiments include methods and devices for implementing automatic grammar augmentation for improving voice command recognition accuracy in systems with a small footprint acoustic model. Alternative expressions that may capture acoustic model decoding variations may be added to a grammar set. An acoustic model-specific statistical pronunciation dictionary may be derived by running the acoustic model through a large general speech dataset and constructing a command-specific candidate set containing potential grammar expressions. Greedy based and cross-entropy-method (CEM) based algorithms may be utilized to search the candidate set for augmentations with improved recognition accuracy.
    Type: Grant
    Filed: March 21, 2022
    Date of Patent: April 2, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Yang Yang, Anusha Lalitha, Jin Won Lee, Christopher Lott
  • Patent number: 11922194
    Abstract: A method of operating a computing device in support of improved accessibility includes displaying a user interface to an application on a display screen of the computing device, wherein the computing device includes an accessibility assistant that reads an audible description of an element of the user interface; initiating, on the computing device, a virtual assistant that conducts an audible conversation between a user and the virtual assistant through at least a microphone and a speaker associated with the computing device, wherein the virtual assistant is not integrated with an operating system of the computing device; inhibiting an ability of the accessibility assistant to read the audible description of the element of the user interface; and upon transition of the virtual assistant from an active state, enabling the ability of the accessibility assistant.
    Type: Grant
    Filed: May 19, 2022
    Date of Patent: March 5, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jaclyn Carley Knapp, Lisa Stifelman, André Roberto Lima Tapajós, Jin Xu, Steven DiCarlo, Kaichun Wu, Yuhua Guan
  • Patent number: 11915714
    Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.
    Type: Grant
    Filed: December 21, 2021
    Date of Patent: February 27, 2024
    Assignees: Adobe Inc., Northwestern University
    Inventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
  • Patent number: 11888797
    Abstract: Emoji-first messaging where text messaging is automatically converted to emojis by an emoji-first application so that only emojis are communicated from one client device to another client device. Each client device has a library of emojis that are mapped to words, which libraries are customizable and unique to the users of the client devices, such that the users can communicate secretly in code. Upon receipt of a string of emojis, a user can select the emoji string to convert to text if desired, for a predetermined period of time.
    Type: Grant
    Filed: April 20, 2021
    Date of Patent: January 30, 2024
    Assignee: Snap Inc.
    Inventors: Karl Bayer, Prerna Chikersal, Shree K. Nayar, Brian Anthony Smith