Patents Examined by Susan I McFadden
  • Patent number: 12087275
    Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.
    Type: Grant
    Filed: February 16, 2022
    Date of Patent: September 10, 2024
    Assignee: GOOGLE LLC
    Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John-Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
  • Patent number: 12067975
    Abstract: Methods, systems, and apparatuses for predicting an end of a command in a voice recognition input are described herein. The system may receive data comprising a voice input. The system may receive a signal comprising a voice input. The system may detect, in the voice input, data that is associated with a first portion of a command. The system may predict, based on the first portion and while the voice input is being received, a second portion of the command. The prediction may be generated by a machine learning algorithm that is trained based at least in part on historical data comprising user input data. The system may cause execution of the command, based on the first portion and the predicted second portion, prior to an end of the voice input.
    Type: Grant
    Filed: April 18, 2023
    Date of Patent: August 20, 2024
    Assignee: Comcast Cable Communications, LLC
    Inventors: Rui Min, Hongcheng Wang
  • Patent number: 12062363
    Abstract: A recurrent neural network-transducer (RNN-T) model improves speech recognition by processing sequential non-blank symbols at each time step after an initial one. The model's prediction network receives a sequence of symbols from a final Softmax layer and employs a shared embedding matrix to create and map embeddings to each symbol, associating them with unique position vectors. These embeddings are weighted according to their similarity to their matching position vector. Subsequently, a joint network of the RNN-T model uses these weighted embeddings to output a probability distribution for potential speech recognition hypotheses at each time step, enabling more accurate transcriptions of spoken language.
    Type: Grant
    Filed: July 6, 2023
    Date of Patent: August 13, 2024
    Assignee: Google LLC
    Inventors: Rami Botros, Tara Sainath
  • Patent number: 12050880
    Abstract: In a first aspect, a system for creating fact-based content is presented. The system includes an application service provider operating on a network. The application service provider is configured to receive a user prompt and generate a web query for content based on the user prompt. The system includes a fact-based language model in communication with the application service provider. The fact-based language model is configured to receive the web query from the application service provider and retrieve, from a electronic library, relevant fact-based content based on the web query. The electronic library includes proprietary data. The fact-based language model is configured to provide the relevant fact-based content to the application service provider. The application service provider communicates content to a user based on the user prompt. The content includes at least a portion of the relevant fact-based content from the electronic library.
    Type: Grant
    Filed: December 21, 2023
    Date of Patent: July 30, 2024
    Assignee: Cengage Learning, Inc.
    Inventors: James Chilton, Peter Griffiths, Charles Qian
  • Patent number: 12033636
    Abstract: This relates to an intelligent automated assistant in a video communication environment. An example includes, during a video communication session between at least two devices, receiving a voice input at one device, generating and transmitting to a server a textual representation of the voice input, receiving from the server a shared transcription including both the textual representation of the voice input and one or more additional textual representations generated by another device, and determining and presenting one or more candidate tasks based on the shared transcription.
    Type: Grant
    Filed: August 9, 2023
    Date of Patent: July 9, 2024
    Assignee: Apple Inc.
    Inventors: Niranjan Manjunath, Willem Mattelaer, Jessica Peck, Lily Shuting Zhang
  • Patent number: 12033630
    Abstract: An information processing device includes an input unit, an extracting unit, an output unit, and a specifying unit. The input unit receives a voice operation. The extracting unit extracts a processing detail corresponding to the voice operation received by the input unit. When the processing detail corresponding to the voice operation received by the input unit cannot be specified, the output unit outputs response information for the user to make a selection of at least one processing detail from a plurality of processing details extracted by the extracting unit. The specifying unit specifies the processing detail selected from among the plurality of processing details contained in the response information as the processing detail corresponding to the voice operation received by the input unit.
    Type: Grant
    Filed: March 2, 2020
    Date of Patent: July 9, 2024
    Assignee: SONY GROUP CORPORATION
    Inventors: Yuhei Taki, Hiro Iwase, Kunihito Sawai, Masaki Takase, Akira Miyashita
  • Patent number: 12032613
    Abstract: In order to facilitate a search and identification of documents, an information retrieval system is provided for performing a search on a corpus of data objects. The information retrieval system comprises a device and a database. The database is configured to store at least one syntactic search index data structure and at least one semantic search index data structure. The syntactic search index data structure is configured to index and store in the database a plurality of terms from the corpus of data objects along with syntactic annotations indicating syntactic information. The at least one semantic search index data structure is configured to index and store in the database the plurality of terms from the corpus of data objects along with semantic annotations indicating semantic information. The device comprises an input unit, a processing unit, and an output unit. The input unit is configured to receive a syntactic query and a semantic query.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: July 9, 2024
    Assignee: BASF SE
    Inventors: Henning Schwabe, Arunav Mishra, Juergen Mueller, Michael Schuhmacher
  • Patent number: 12026470
    Abstract: Various techniques are disclosed, including receiving at a multiplatform management system a communication from a computing device via a groupware platform, the multiplatform management system interfacing with multiple disparate platforms including the groupware platform and an image processing platform, determining an event type based on the communication from the computing device to identify a cloud platform to be selected from among the plurality of disparate platforms based on a detection of one of the image or the text in the communication from the groupware platform; and identifying an action to be performed by the selected cloud platform based on the determined event type.
    Type: Grant
    Filed: July 3, 2023
    Date of Patent: July 2, 2024
    Assignee: Certinia Inc.
    Inventors: Stephen Paul Willcock, Matthew David Wood
  • Patent number: 12026460
    Abstract: To make it possible to generate dialogue data for generating a question sentence to deeply delve conversation at a low cost. For each of a plurality of pieces of data each including a set of a first utterance sentence that is a sentence uttered by a first user, a second utterance sentence that is a sentence uttered by a second user and is a response to the first utterance sentence, and a third utterance sentence that is a sentence uttered by the first user and is a response to the second utterance sentence, a dialogue data generation unit 110 generates a set of the first utterance sentence of the data and the second utterance sentence of the data as dialogue data when the second utterance sentence of the data is a question sentence using an interrogative.
    Type: Grant
    Filed: May 7, 2019
    Date of Patent: July 2, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Taichi Katayama, Atsushi Otsuka, Ko Mitsuda, Kuniko Saito, Junji Tomita
  • Patent number: 12020708
    Abstract: Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.
    Type: Grant
    Filed: October 11, 2021
    Date of Patent: June 25, 2024
    Assignee: SoundHound AI IP, LLC.
    Inventors: Kiersten L. Bradley, Ethan Coeytaux, Ziming Yin
  • Patent number: 12008319
    Abstract: Disclosed are a method and apparatus for selecting answers to idiom fill-in-the-blank questions, a computer device, and a storage medium. The method includes: obtaining a question text of idiom fill-in-the-blank questions, the question text including a fill-in-the-blank text and n candidate idioms, and the fill-in-the-blank text including m fill-in-the-blanks to be filled in with the candidate idioms; obtaining an explanatory text of all the candidate idioms; obtaining, through an idiom selection fill-in-the-blank model, a confidence that each fill-in-the-blank is filled in with each candidate idiom; selecting m idioms from the n candidate idioms to form multiple groups of answers; calculating a sum of the confidences that the fill-in-the-blanks are filled in with the candidate idioms in each group of answers; and obtaining a group of answers with the highest confidence sum as answers to the idiom fill-in-the-blank questions.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: June 11, 2024
    Assignee: PING AN TECHNOLOGY (SHENZHEN) CO., LTD.
    Inventors: Xiang Liu, Xiuling Chen
  • Patent number: 12002475
    Abstract: The present disclosure provides an electronic device and a control method thereof. The electronic device of the present disclosure includes: a memory in which a speaker model including acoustic characteristics and context information of a first user voice is stored; and a processor for comparing a degree of similarity between the acoustic characteristics of the first user included in the speaker model and the acoustic characteristics of a second user voice, with a threshold value changing according to a degree of similarity between the context information included in the speaker model and the context information of the second user voice, and then performing authentication on the second user voice.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: June 4, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Jaesung Kwon
  • Patent number: 12001809
    Abstract: Machine learning translation models may be selectively tuned to provide custom machine translations. A request to translate input text from an input language to a target language may be received. A tuning data set for translating the input text to the target language may be identified and searched to select pairs of texts in the tuning data according to comparisons with the input text. A machine learning model used to translate into the target language may be tuned using only second texts in the target language in the selected pairs of texts. The tuned machine learning model may then be used to translate the input text into the target language.
    Type: Grant
    Filed: November 18, 2021
    Date of Patent: June 4, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Anna Currey, Dengke Liu, Aakash Upadhyay, Prashant Mathur, Georgiana Dinu, Eric J. Nowell
  • Patent number: 11996102
    Abstract: Implementations relate to receiving natural language input that requests an automated assistant to provide information and processing the natural language input to identify the requested information and to identify one or more predicted actions. Those implementations further cause a computing device, at which the natural language input is received, to render the requested information and the one or more predicted actions in response to the natural language input. Yet further, those implementations, in response to the user confirming a rendered predicted action, cause the automated assistant to initialize the predicted action.
    Type: Grant
    Filed: May 25, 2023
    Date of Patent: May 28, 2024
    Assignee: GOOGLE LLC
    Inventors: Lucas Mirelmann, Zaheed Sabur, Bohdan Vlasyuk, Marie Patriarche Bledowski, Sergey Nazarov, Denis Burakov, Behshad Behzadi, Michael Golikov, Steve Cheng, Daniel Cotting, Mario Bertschler
  • Patent number: 11996081
    Abstract: Techniques for generating a visual response to a user input are described. A system may receive a natural language input and use a machine learning model to determine a first component is to determine a response to the natural language input while a second component is to determine supplemental content related to the natural language input. The system may receive, from the first component, first image data corresponding to the response. The system may also receive, from the second component, second image data corresponding to the supplemental content. The system may send, to a display, a command to present the first image data and the second image data.
    Type: Grant
    Filed: May 26, 2023
    Date of Patent: May 28, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Vasiliy Radostev, Ruhi Sarikaya, Rekha Seshadrinathan, Abhinav Sethy, Chetan Nagaraj Naik, Anjishnu Kumar
  • Patent number: 11996117
    Abstract: A toxicity moderation system has an input configured to receive speech from a speaker. The system includes a multi-stage toxicity machine learning system having a first stage and a second stage. The first stage is trained to analyze the received speech to determine whether a toxicity level of the speech meets a toxicity threshold. The first stage is also configured to filter-through, to the second stage, speech that meets the toxicity threshold, and is further configured to filter-out speech that does not meet the toxicity threshold.
    Type: Grant
    Filed: October 8, 2021
    Date of Patent: May 28, 2024
    Inventors: William Carter Huffman, Michael Pappas, Henry Howie
  • Patent number: 11975729
    Abstract: A method for generating a voice announcement as feedback to a handwritten user input is disclosed in which a user enters on a control device. A list of possible whole words which can be entered by the user input is provided together with a corresponding transcription and a predetermined word end, which comprises one or more characters of a whole word of the whole words, is removed from the end of said whole word in accordance with a predetermined shortening rule and corresponding to this, a transcription end corresponding to the word end is determined based on a predetermined assignment rule and is removed from the corresponding transcription of the whole word for generating a partial word and an associated partial transcription. The partial word and the partial transcription are added to another list.
    Type: Grant
    Filed: July 29, 2019
    Date of Patent: May 7, 2024
    Assignee: AUDI AG
    Inventor: Jan Dusik
  • Patent number: 11978440
    Abstract: Techniques for processing input data for a detected user are described. Received image data is processed to identify an indicated user. Based on the user a machine learning model is implemented. The machine learning model is then used to process input data for a user input. An action is performed using the resulting output data.
    Type: Grant
    Filed: May 25, 2023
    Date of Patent: May 7, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Deepak Yavagal, Ajith Prabhakara, John Gray
  • Patent number: 11961526
    Abstract: A method and an apparatus for calculating a downmixed signal and a residual signal are provided. According to the method, if a first target frame (a current frame or a previous frame of the current frame) is a switching frame, a to-be-encoded downmixed signal and a to-be-encoded residual signal of the subband corresponding to the preset frequency band in the current frame is calculated based on a switch fade-in/fade-out factor of a second target frame, an initial downmixed signal and an initial residual signal of the preset frequency band.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: April 16, 2024
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Haiting Li, Bin Wang, Zexin Liu
  • Patent number: 11961506
    Abstract: An electronic apparatus including a memory configured to store first voice recognition information related to a first language and second voice recognition information related to a second language, and a processor to obtain a first text corresponding to a user voice that is received on the basis of first voice recognition information, based on an entity name being included in the user voice according to the obtained first text, identify a segment in the user voice in which the entity name is included. The processor is to obtain a second text corresponding to the identified segment of the user voice on the basis of the second voice recognition information, and obtain control information corresponding to the user voice on the basis of the first text and the second text.
    Type: Grant
    Filed: February 23, 2023
    Date of Patent: April 16, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Chansik Bok, Jihun Park