Speech To Image Patents (Class 704/235)
  • Patent number: 12165638
    Abstract: A method includes receiving audio data corresponding to an utterance spoken by a user and processing, using a first recognition model, the audio data to generate a non-contextual candidate hypothesis as output from the first recognition model. The non-contextual candidate hypothesis has a corresponding likelihood score assigned by the first recognition model. The method also includes generating, using a second recognition model configured to receive personal context information, a contextual candidate hypothesis that includes a personal named entity. The method also includes scoring, based on the personal context information and the corresponding likelihood score assigned to the non-contextual candidate hypothesis, the contextual candidate hypothesis relative to the non-contextual candidate hypotheses.
    Type: Grant
    Filed: April 14, 2022
    Date of Patent: December 10, 2024
    Assignee: Google LLC
    Inventors: Leonid Aleksandrovich Velikovich, Petar Stanisa Aleksic
  • Patent number: 12167107
    Abstract: In one embodiment, a computer-implemented method for editing navigation of a content item is disclosed. The method may include presenting, via a user interface at a client computing device, time-synchronized text pertaining to the content item; receiving an input of a tag for the time-synchronized text of the content item; storing the tag associated with the time-synchronized text of the content item; and responsive to receiving a request to play the content item: playing the content item via a media player presented in the user interface, and concurrently presenting the time-synchronized text and the tag as a graphical user element in the user interface.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: December 10, 2024
    Assignee: Musixmatch S. P. A.
    Inventors: Marco Paglia, Paolo Spazzini, Pierpaolo Di Panfilo, Nicolae-Daniel Dima, Emanuele Cantalini, Christian Zanin
  • Patent number: 12165630
    Abstract: A method of training a speech model includes receiving, at a voice-enabled device, a fixed set of training utterances where each training utterance in the fixed set of training utterances includes a transcription paired with a speech representation of the corresponding training utterance. The method also includes sampling noisy audio data from an environment of the voice-enabled device. For each training utterance in the fixed set of training utterances, the method further includes augmenting, using the noisy audio data sampled from the environment of the voice-enabled device, the speech representation of the corresponding training utterance to generate noisy audio samples and pairing each of the noisy audio samples with the corresponding transcription of the corresponding training utterance. The method additionally includes training a speech model on the noisy audio samples generated for each speech representation in the fixed set of training utterances.
    Type: Grant
    Filed: July 21, 2023
    Date of Patent: December 10, 2024
    Assignee: Google LLC
    Inventors: Matthew Sharifi, Victor Carbune
  • Patent number: 12165081
    Abstract: Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive data analysis operations. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by generating a predicted eligibility score for a predictive entity using a cross-feature-type eligibility prediction machine learning framework.
    Type: Grant
    Filed: March 10, 2022
    Date of Patent: December 10, 2024
    Assignee: Optum Services (Ireland) Limited
    Inventors: Riccardo Mattivi, Venkata Krishnan Mittinamalli Thandapani, Conor Breen, Peter Cogan
  • Patent number: 12164859
    Abstract: Methods for generating a categorized, ranked, condensed summary of a transcript of a conversation, involving obtaining a diarized version of the transcript of the conversation, storing textual monologues from the transcript, determining classifications as to the textual monologues based on a classifier algorithm, associating the classifications with the textual monologues, creating textually-modified rephrasings of the textual monologues based on text and classification thereof, storing the textually-modified rephrasings, aggregating the textually-modified rephrasings based on associated clustering and scoring, and transmitting summary information pertaining to the aggregated textually-modified rephrasings to a user device.
    Type: Grant
    Filed: June 1, 2022
    Date of Patent: December 10, 2024
    Assignee: GONG.IO LTD
    Inventors: Shlomi Medalion, Inbal Horev, Raz Nussbaum, Omri Allouche, Raquel Sitman, Ortal Ashkenazi
  • Patent number: 12154560
    Abstract: A voice controlled apparatus for performing a workflow operation is described. The voice controlled apparatus can include a microphone, a speaker, and a processor. In some examples, the voice controlled apparatus can generate, via the speaker, a voice prompt associated with a task of a workflow and identify, via the microphone, a voice response received from a worker. In this regard, the voice prompt and the voice response can be a part of a voice dialogue. Further, the processor of the voice controlled apparatus can identify a performance status associated with the execution of the task, before providing a next voice prompt subsequent to the voice prompt. In this aspect, the performance status can be identified based on analyzing the voice dialogue using a machine learning model. Furthermore, the voice controlled apparatus can generate a message including a suggestion to improve the performance status of the task.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: November 26, 2024
    Assignee: VOCOLLECT, INC.
    Inventors: Ranganathan Srinivasan, Rekula Dinesh, Kaushik Hazra, Duff H. Gold
  • Patent number: 12153872
    Abstract: The present disclosure relates to systems and methods for automatically linking a note to a transcript of a conference. According to one of the embodiments a computer-implemented method is provided. The method comprises: receiving a transcript of a conference and a note from a conference participant; responsive to receiving the transcript of the conference and the note, applying a natural language processing on a content of the note and on a content of the transcript; identifying a matching content between the content of the note and the content of the transcript; generating a link corresponding to the matching content; and causing to display the link corresponding to the matching content.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: November 26, 2024
    Assignee: RingCentral, Inc.
    Inventor: Vlad Vendrow
  • Patent number: 12153613
    Abstract: A computer-implemented method for presenting relevant information to a customer service representative of a business may include receiving a digitized data stream corresponding to a spoken conversation between a customer and a representative; converting the data stream to a text stream; determining one or more keywords from the text stream; comparing the one or more keywords with a history of keywords that have previously been searched; and/or searching a database for information related to the one or more keywords that have not been previously searched. As a result of the keyword search, information about topics that the customer is interested in, may be located and displayed on a customer service representative display to facilitate the customer service representative timely relaying the information found by the keyword search to enhance the customer experience. Exemplary keywords may relate to insurance and financial services, such as “auto,” “home,” “life,” “insurance,” or “vehicle loan.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: November 26, 2024
    Assignee: State Farm Mutual Automobile Insurance Company
    Inventor: Sylvia Hernandez
  • Patent number: 12147477
    Abstract: Disclosed is a search result display device (1) comprising: a regard prediction unit (12) configured to predict, from a dialogue between a customer and a service person, a regard of the customer; a keyword extraction unit (17) configured to extract a keyword from the regard; and a display controller (13) configured to cause a display (14) to display the dialogue and a search result obtained from the database (21) with the keyword as a search query, wherein when a string has been designated by the service person, the display controller (13) causes the display (14) to display a search result obtained from the database (21) using a search query that incorporates the string, until a search result automatic update instruction is given by the service person.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: November 19, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Yoshiaki Noda, Setsuo Yamada, Takaaki Hasegawa
  • Patent number: 12148422
    Abstract: Disclosed is an electronic apparatus. The electronic apparatus obtains a first character string comprising a previously defined character from first user utterance; recognizes a second character string, which is edited from the first character string based on a first edition command, as an input character, based on the first user utterance comprising the first edition command following the first character string; and performs edition with regard to the second character string based on second edition command, based on second user utterance comprising the second edition command without the first edition command.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: November 19, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jihun Park, Dongheon Seok
  • Patent number: 12149793
    Abstract: An example display device may include a voice signal receiver, a display, at least one memory storing an application supporting a contents providing service and storing instructions, a communication circuit communicating with at least one external server supporting the contents providing service, and at least one processor. The contents providing service may provide contents files of a first type and contents files of a second type.
    Type: Grant
    Filed: June 16, 2023
    Date of Patent: November 19, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jibum Moon, Gyungchan Seol, Kyerim Lee
  • Patent number: 12148423
    Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: November 19, 2024
    Assignee: Google LLC
    Inventors: Michael J. Lebeau, William J. Byrne, John Nicholas Jitkoff, Brandon M. Ballinger, Trausti T. Kristjansson
  • Patent number: 12141529
    Abstract: Natural language-based question answering systems and techniques are generally described. In some examples, a natural language processing system, first natural language data. The natural language processing system may determine first slot data included in the first natural language data. A set of content items associated with the first slot data may be determined. A first machine learning model may use the first natural language data to generate prediction data associated with a first attribute among a list of attributes of the set of content items. In some examples, a first value associated with the first attribute for a first content item of the set of content items may be determined. Second natural language data may be generated based at least in part on the first value. The second natural language data may include a response to the first natural language data.
    Type: Grant
    Filed: March 22, 2022
    Date of Patent: November 12, 2024
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Yuval Nezri, Eilon Sheetrit, Lital Kuchy, Avihai Mejer
  • Patent number: 12141672
    Abstract: An example method includes receiving, by a computational assistant executing at one or more processors, a representation of an utterance spoken at a computing device; identifying, based on the utterance, a task to be performed by the computational assistant; responsive to determining, by the computational assistant, that complete performance of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operably connected to the computing device, synthesized voice data that informs a user of the computing device that complete performance of the task will not be immediate; and performing, by the computational assistant, the task.
    Type: Grant
    Filed: September 13, 2023
    Date of Patent: November 12, 2024
    Assignee: GOOGLE LLC
    Inventors: Yariv Adan, Vladimir Vuskovic, Behshad Behzadi
  • Patent number: 12138452
    Abstract: A noninvasive ergonomic self-use device including a plurality of electrodes and a processor in electrical communication with the electrodes, the processor is configured to switch two or more of the electrodes between at least an ECG mode of operation in which the electrodes receive user body signals and an EPG mode in which the electrodes generate electrical pulses for stimulating the abdominal muscles of the user.
    Type: Grant
    Filed: October 10, 2019
    Date of Patent: November 12, 2024
    Assignee: GerdCare Medical Ltd.
    Inventors: Giora Arbel, Mordechay Esh
  • Patent number: 12141043
    Abstract: An utterance test method for an utterance device, an utterance test server, an utterance test system, and a program perform an utterance test on a test device (20). The utterance test system includes at least one utterance device (20) capable of uttering, a terminal device (30), and an utterance test server (10). The utterance test server (10) receives an utterance test start command from the terminal device (30), sets at least one utterance device (20) to be a test device (20) as a target of an utterance test, sets test content of the utterance test, and causes the test device (20) to utter the test content.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: November 12, 2024
    Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.
    Inventors: Hiroki Urabe, Kentaro Nakai, Satoru Matsunaga, Yoshiki Ohashi
  • Patent number: 12141712
    Abstract: This disclosure relates generally to method and system for extracting contextual information from a knowledge base. The method receives a user query comprising a request to extract contextual information from the user query. Further, the user query is analyzed based on a plurality of predefined parameters to determine sufficiency of information comprised in the user query. The received user query identifies relevant sources of the structured data, the unstructured data or the semi-structured data storage repositories. The user query is processed using a fine grain approach, where a dictionary of one or more keywords with weights are created through the domain ontology builder from the one or more knowledge articles. Furthermore, an appropriate contextual information related to the user query is extracted using the fine grain approach, based on the knowledge articles associated with the trained knowledge base comprising information required by the user query extracted from the knowledge articles.
    Type: Grant
    Filed: December 16, 2020
    Date of Patent: November 12, 2024
    Assignee: Tata Consultancy Services Limited
    Inventors: Sanjeev Manchanda, Ajeet Phansalkar, Mahesh Kshirsagar, Kamlesh Pandurang Mhashilkar, Nihar Ranjan Sahoo, Sonam Yashpal Sharma
  • Patent number: 12135938
    Abstract: Systems, methods, apparatuses, and computer program products for natural language processing are provided. One method may include utilizing a trained machine learning model to learn syntax dependency patterns and parts of speech tag patterns of text based on labeled training data. The method may also include contextualizing vector embeddings from a language model for each word in the text, and extracting relationships for a given fragment of the text based on the contextualization. The method may further include resolving relationships between identified verbs based on a plurality of heuristics to identify the syntax dependency patterns, identifying nested relationships, and capturing metadata associated with the nested relationships.
    Type: Grant
    Filed: May 11, 2022
    Date of Patent: November 5, 2024
    Assignee: CORASCLOUD, INC.
    Inventors: Ajay Patel, Alex Sands
  • Patent number: 12136414
    Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: November 5, 2024
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
  • Patent number: 12136425
    Abstract: A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.
    Type: Grant
    Filed: May 8, 2023
    Date of Patent: November 5, 2024
    Assignee: Ultratec, Inc.
    Inventors: Robert M. Engelke, Kevin R. Colwell, Christopher Engelke
  • Patent number: 12136423
    Abstract: Introduced here are computer programs and associated computer-implemented techniques for facilitating the creation of a master transcription (or simply “transcript”) that more accurately reflects underlying audio by comparing multiple independently generated transcripts. The master transcript may be used to record and/or produce various forms of media content, as further discussed below. Thus, the technology described herein may be used to facilitate editing of text content, audio content, or video content. These computer programs may be supported by a media production platform that is able to generate the interfaces through which individuals (also referred to as “users”) can create, edit, or view media content. For example, a computer program may be embodied as a word processor that allows individuals to edit voice-based audio content by editing a master transcript, and vice versa.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: November 5, 2024
    Assignee: Descript, Inc.
    Inventors: Kundan Kumar, Vicki Anand
  • Patent number: 12136426
    Abstract: A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.
    Type: Grant
    Filed: December 19, 2023
    Date of Patent: November 5, 2024
    Assignee: Ultratec, Inc.
    Inventors: Robert M. Engelke, Kevin R. Colwell, Christopher Engelke
  • Patent number: 12136416
    Abstract: In one embodiment, a method includes accessing a decoded hypothesis corresponding to an utterance, computing a predicted probability of observing each token in the decoded hypothesis by having a local first machine-learning model process the decoded hypothesis, computing a confidence score for each token in the decoded hypothesis by having a second machine-learning model process the decoded hypothesis, where the confidence score indicates a degree of confidence for the token to be observed at its position, calculating a loss for the computed predicted probabilities of observing tokens in the decoded hypothesis based on the computed confidence scores, and updating parameters of the local first machine-learning model based on the calculated loss.
    Type: Grant
    Filed: July 5, 2022
    Date of Patent: November 5, 2024
    Assignee: Meta Platforms, Inc.
    Inventors: Zhe Liu, Ke Li, Fuchun Peng
  • Patent number: 12137301
    Abstract: One embodiment provides a method, the method including: detecting, during a meeting comprising at least one participant remote to a user identified as a presenter, communication data and visual data provided by the user to the at least one participant; determining, utilizing a meeting discrepancy system, that content of the visual data does not match content of the communication data; and providing, to the user and utilizing a meeting discrepancy system, a notification indicating the content of the visual data does not match the content of the communication data. Other aspects are claimed and described.
    Type: Grant
    Filed: August 17, 2022
    Date of Patent: November 5, 2024
    Assignee: Lenovo (Singapore) Pte. Ltd.
    Inventors: Joshua Smith, Matthew Fardig, Travis Ennis, Richard Downey
  • Patent number: 12130848
    Abstract: A virtual assistant server receives conversational inputs as part of a conversation from a customer device and generates a version of a data record of the conversation upon: the receiving of each of the conversational inputs, or receiving each output generated by one of a plurality of software modules when the one of the software modules receives a system input from the conversation management framework. The virtual assistant server provides each of the generated versions of the data record to a communication orchestrator and receives for each of the generated versions of the data record, execution instructions from the communication orchestrator. Further, the virtual assistant server communicates with one or more of the software modules based on the received execution instructions, and provides based on the communicating, one or more of the outputs of the software modules to the customer device, when the one or more of the outputs comprise one or more responses to one or more of the conversational inputs.
    Type: Grant
    Filed: August 4, 2023
    Date of Patent: October 29, 2024
    Assignee: Kore.ai, Inc.
    Inventors: Rajkumar Koneru, Prasanna Kumar Arikala Gunalan, Thirupathi Bandam, Jayesh Arunkumar Jain, Vishnu Vardhan Sai Lanka
  • Patent number: 12130857
    Abstract: Method for editorializing digital audiovisual or audio recording content of an oral presentation given by a speaker using a presentation support enriched with tags and recorded in the form of a digital audiovisual file. This method comprises written transcription of the oral presentation with indication of a time code for each word, comparative automatic analysis of this written transcription and of the tagged presentation support, transposition of the time codes from the written transcription to the tagged presentation support, identification of the tags and of the time codes of the presentation support, and marking of the digital audiovisual file with the tags and time codes, so as to generate an enriched digital audiovisual file.
    Type: Grant
    Filed: September 18, 2020
    Date of Patent: October 29, 2024
    Assignee: VERNSTHER
    Inventor: Jennifer Dahan
  • Patent number: 12125479
    Abstract: A system for providing a sociolinguistic virtual assistant includes a communication device, a processing device, and a storage device. The processing device being configured to process input data using a natural language processing algorithm; categorize the semantic data based on psych-sociological categorizations associated with the at least one user; analyze the command from the at least one user to identify a task associated with the command; generate a response based on identification of the task associated with the command; execute the task associated with the command using categorized semantic data, to derive a result. A method corresponding to the system is also provided.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: October 22, 2024
    Assignee: Seam Social Labs Inc
    Inventors: Tiasia O'Brien, Marisa Jean Dinko
  • Patent number: 12125473
    Abstract: Embodiments of this disclosure disclose a speech recognition method, apparatus, and device, and a storage medium. The method in the embodiments of this disclosure includes: adjusting a probability of a relationship between at least one pair of elements in a language recognition model according to a probability of the relationship between the at least one pair of elements in a textual segment; inputting a to-be-recognized speech into a speech recognition model including the language recognition model; and determining, according to the adjusted probability of relationship between the at least tone pair of elements in the language recognition model, a sequence of elements corresponding to the to-be-recognized speech as a speech recognition result.
    Type: Grant
    Filed: March 4, 2021
    Date of Patent: October 22, 2024
    Assignee: Tencent Technology (Shenzhen) Company Limited
    Inventor: Tao Li
  • Patent number: 12125488
    Abstract: An electronic apparatus is provided. The electronic apparatus includes a microphone, a memory storing at least one instruction, and a processor connected to the microphone and the memory configured to control the electronic apparatus, and the processor, by executing the at least one instruction, may, based on receiving a user voice signal through the microphone, obtain a text corresponding to the user voice signal, identify a plurality of sentences included in the obtained text, identify a domain corresponding to each of the plurality of sentences among a plurality of domains, based on a similarity of a first sentence and a second sentence, among the plurality of sentences, having a same domain being greater than or equal to a threshold value, obtain a third sentence in which the first sentence and the second sentence are combined by using a first neural network model, and perform natural language understanding for the third sentence.
    Type: Grant
    Filed: November 10, 2021
    Date of Patent: October 22, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jiyoun Hong, Hyeonmok Ko, Dayoung Kwon, Jonggu Kim, Seoha Song, Kyenghun Lee, Hojung Lee, Saebom Jang, Pureum Jung, Changho Paeon
  • Patent number: 12124498
    Abstract: A time code to byte conversion system is provided herein that maps time codes to byte ranges such that a user device can retrieve a portion of, but not all of, a media file by specifying a time range. For example, the time code to byte conversion system can play a media file and identify the byte at which each time code begins. The time code to byte conversion system can then store the byte to time code mapping in an index accessible by a media retrieval server. A user device can then provide a time range to the media retrieval server, the media retrieval server can query the index to identify the range of bytes that corresponds to the provided time range, and then the media retrieval server can retrieve the identified range of bytes from a media database for transmission to the user device.
    Type: Grant
    Filed: January 9, 2020
    Date of Patent: October 22, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeremiah Dunham, Andrew Tunall, Benjamin Schwartz, Jason LaPier, Justin Abrahms
  • Patent number: 12118272
    Abstract: Systems and methods to accept speech input and edit a note upon receipt of an indication to edit are disclosed. Exemplary implementations may: effectuate presentation of a graphical user interface that includes a note, the note including note sections, the note sections including a first note section, the individual note sections including body fields; obtain user input from the client computing platform, the user input representing an indication to edit a first body field of the first note section; obtain audio information representing sound captured by an audio section of the client computing platform, the audio information including value definition information specifying one or more values to be included in the individual body fields; perform speech recognition on the audio information to obtain a first value; and populate the first body field with the first value so that the first value is included in the first body field.
    Type: Grant
    Filed: November 18, 2022
    Date of Patent: October 15, 2024
    Assignee: Suki AI, Inc.
    Inventor: Matt Pallakoff
  • Patent number: 12118514
    Abstract: Systems and methods to generate records within a collaboration environment are described herein. Exemplary implementations may perform one or more of: manage environment state information maintaining a collaboration environment; effectuate presentation of a user interface through which users upload digital assets representing recorded audio and/or video content; obtain input information defining the digital assets input via the user interface; generate transcription information characterizing the recorded audio and/or video content of the digital assets; provide the transcription information as input into a trained machine-learning model; obtain the output from the trained machine-learning model, the output defining one or more new records based on the transcripts; and/or other operations.
    Type: Grant
    Filed: February 17, 2022
    Date of Patent: October 15, 2024
    Assignee: Asana, Inc.
    Inventor: Steve B Morin
  • Patent number: 12118266
    Abstract: Media content can be created and/or modified using a network-accessible platform. Scripts for content-based experiences could be readily created using one or more interfaces generated by the network-accessible platform. For example, a script for a content-based experience could be created using an interface that permits triggers to be inserted directly into the script. Interface(s) may also allow different media formats to be easily aligned for post-processing. For example, a transcript and an audio file may be dynamically aligned so that the network-accessible platform can globally reflect changes made to either item. User feedback may also be presented directly on the interface(s) so that modifications can be made based on actual user experiences.
    Type: Grant
    Filed: February 25, 2022
    Date of Patent: October 15, 2024
    Assignee: Descript, Inc.
    Inventors: Steven Surmacz Rubin, Ulf Schwekendiek, David John Williams
  • Patent number: 12112138
    Abstract: Embodiments provide a software framework for evaluating and troubleshooting real-world task-oriented bot systems. Specifically, the evaluation framework includes a generator that infers dialog acts and entities from bot definitions and generates test cases for the system via model-based paraphrasing. The framework may also include a simulator for task-oriented dialog user simulation that supports both regression testing and end-to-end evaluation. The framework may also include a remediator to analyze and visualize the simulation results, remedy some of the identified issues, and provide actionable suggestions for improving the task-oriented dialog system.
    Type: Grant
    Filed: June 2, 2022
    Date of Patent: October 8, 2024
    Assignee: Salesforce, Inc.
    Inventors: Guangsen Wang, Samson Min Rong Tan, Shafiq Rayhan Joty, Gang Wu, Chu Hong Hoi, Ka Chun Au
  • Patent number: 12112742
    Abstract: Provided are an electronic device for correcting a speech input, and an operating method thereof. The method may include receiving a first speech signal; obtaining first text; obtaining an intent of the first speech signal and a confidence score of the intent, by inputting the first text to a natural language understanding model; identifying a plurality of correction candidate semantic elements capable of being correction targets in the first text; receiving a second speech signal; obtaining second text; identifying whether the second speech signal is a speech signal for correcting the first text; comparing the plurality of correction candidate semantic elements in the first text with a semantic element in the second text, based on the confidence score; and correcting at least one of the plurality of correction candidate semantic elements in the first text.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: October 8, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jongsun Lee, Jongyoub Ryu, Seonghan Ryu, Eunji Lee, Jaechul Yang, Hyungtak Choi
  • Patent number: 12112743
    Abstract: A speech recognition method includes: obtaining speech data; performing feature extraction on speech data, to obtain speech features of at least two speech segments; inputting the speech features of the at least two speech segments into the speech recognition model, and processing the speech features of the speech segments by using cascaded hidden layers in the speech recognition model, to obtain hidden layer features of the speech segments, a hidden layer feature of an ith speech segment being determined based on speech features of n speech segments located after the ith speech segment in a time sequence and a speech feature of the ith speech segment; and obtaining text information corresponding to the speech data based on the hidden layer features of the speech segments.
    Type: Grant
    Filed: March 30, 2022
    Date of Patent: October 8, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Xilin Zhang, Bo Liu
  • Patent number: 12112750
    Abstract: A method for estimating a user's location in an environment may involve receiving output signals from each microphone of a plurality of microphones in the environment. At least two microphones of the plurality of microphones may be included in separate devices at separate locations in the environment and the output signals may correspond to a current utterance of a user. The method may involve determining multiple current acoustic features from the output signals of each microphone and applying a classifier to the multiple current acoustic features. Applying the classifier may involve applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the user in a plurality of user zones in the environment. The method may involve determining, based at least in part on output from the classifier, an estimate of the user zone in which the user is currently located.
    Type: Grant
    Filed: July 28, 2020
    Date of Patent: October 8, 2024
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Mark R. P. Thomas, Richard J. Cartwright
  • Patent number: 12112759
    Abstract: A method and apparatus for speaker diarization with early-stop clustering, segmenting an audio stream into at least one speech segment (710), the audio stream comprising speeches from at least one speaker; clustering the at least one speech segment into a plurality of clusters (720), the number of the plurality of clusters being greater than the number of the at least one speaker; selecting, from the plurality of clusters, at least one cluster of the highest similarity (730), the number of the selected at least one cluster being equal to the number of the at least one speaker; establishing a speaker classification model based on the selected at least one cluster (740); and aligning, through the speaker classification model, speech frames in the audio stream to the at least one speaker (750).
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: October 8, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Liping Chen, Kao-Ping Soong
  • Patent number: 12113934
    Abstract: A computer-implemented method is provided for quantitative performance evaluation of a call agent. The method comprises converting an audio recording of a call between the call agent and a customer to a text-based transcript and identifying at least one topic for categorizing the transcript. The method also includes retrieving a set of criteria associated with the topic. Each criterion correlates to a set of predefined questions for interrogating the transcript to evaluate the performance of the call agent with respect to the corresponding criterion. Each question captures a sub-criterion under the corresponding criterion. The method further includes inputting the predefined questions and the transcript into a trained large language model to obtain scores for respective ones of the predefined questions. Each score measures a degree of satisfaction of the performance of the call agent during the call with respect to the sub-criterion captured by the corresponding predefined question.
    Type: Grant
    Filed: April 4, 2024
    Date of Patent: October 8, 2024
    Assignee: FMR LLC
    Inventors: Bryan Dempsey, Saquib Ilahi, Jenson Joy, Nirupam Sarkar, Murad Maayah, Abigail Parker, Meagan Gilbert, Derek Kaschl
  • Patent number: 12106055
    Abstract: A chatbot system is configured to execute code to perform determining, by the chatbot system, a classification result for an utterance and one or more anchors each anchor of the one or more anchors corresponding to one or more anchor words of the utterance. For each anchor of the one or more anchors, one or more synthetic utterances are generated, and one or more classification results for the one or more synthetic utterances are determined. A report is generated by the chatbot system comprising a representation of a particular anchor of the one or more anchors, the particular anchor corresponding to a highest confidence value among the one or more anchors. The one or more synthetic utterances may be used to generate a new training dataset for training a machine-learning model. The training dataset may be refined according to a threshold confidence values to filter out datasets for training.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: October 1, 2024
    Assignee: Oracle International Corporation
    Inventors: Gautam Singaraju, Vishal Vishnoi, Manish Parekh, Alexander Wang
  • Patent number: 12106758
    Abstract: Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.
    Type: Grant
    Filed: May 17, 2021
    Date of Patent: October 1, 2024
    Assignee: GOOGLE LLC
    Inventors: Victor Carbune, Alvin Abdagic, Behshad Behzadi, Jacopo Sannazzaro Natta, Julia Proskurnia, Krzysztof Andrzej Goj, Srikanth Pandiri, Viesturs Zarins, Nicolo D'Ercole, Zaheed Sabur, Luv Kothari
  • Patent number: 12106777
    Abstract: Embodiments of the present disclosure provide an audio processing method and an electronic device. The method includes: first obtaining text information corresponding to a to-be-processed audio, where the text information includes a to-be-processed text and a playback period corresponding to each field in the to-be-processed text; then receiving a first input on the to-be-processed text; in response to the first input, determining, as a to-be-processed field, a field indicated by the first input in the to-be-processed text; then receiving a second input on the to-be-processed field; obtaining a target audio segment in response to the second input; and finally modifying an audio segment at a playback period corresponding to the to-be-processed field according to the target audio segment, to obtain a target audio.
    Type: Grant
    Filed: September 8, 2022
    Date of Patent: October 1, 2024
    Assignee: VIVO MOBILE COMMUNICATION CO., LTD.
    Inventor: Jixiang Hu
  • Patent number: 12101199
    Abstract: A conference system is described that associates a first device and a second device to the same user, compares a first input from the first device and a second input from the second device, and modifies a setting of a conference session. The first input and the second input may be a video input or an audio input. The modification may include, for example, noise removal, determination of the user's AV feed device, or removing a background image.
    Type: Grant
    Filed: July 21, 2023
    Date of Patent: September 24, 2024
    Assignee: Capital One Services, LLC
    Inventors: Lee Adcock, Mehulkumar Jayantilal Garnara, Vamsi Kavuri
  • Patent number: 12101279
    Abstract: Systems and methods that offer significant improvements to current virtual agent (VA) conversational experiences are disclosed. The proposed systems and methods are configured to manage conversations in real-time with human customers while accommodating a dynamic goal. The VA includes a goal-driven module with a reinforcement learning-based dialogue manager. The VA is an interactive tool that utilizes both task-specific rewards and sentiment-based rewards to respond to a dynamic goal. The VA is capable of handling dynamic goals with a significantly high success rate. As the system is trained primarily with a user simulator, it can be readily extended for applications across other domains.
    Type: Grant
    Filed: August 27, 2021
    Date of Patent: September 24, 2024
    Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITED
    Inventors: Shubhashis Sengupta, Anutosh Maitra, Roshni Ramesh Ramnani, Sriparna Saha, Abhisek Tiwari, Pushpak Bhattacharyya
  • Patent number: 12100399
    Abstract: Methods, apparatus, systems, and computer-readable media are provided for isolating at least one device, from multiple devices in an environment, for being responsive to assistant invocations (e.g., spoken assistant invocations). A process for isolating a device can be initialized in response to a single instance of a spoken utterance, of a user, that is detected by multiple devices. One or more of the multiple devices can be caused to query the user regarding identifying a device to be isolated for receiving subsequent commands. The user can identify the device to be isolated by, for example, describing a unique identifier for the device. Unique identifiers can be generated by each device of the multiple devices and/or by a remote server device. The unique identifiers can be presented graphically and/or audibly to the user, and user interface input. Any device that is not identified can become temporarily unresponsive to certain commands, such as spoken invocation commands.
    Type: Grant
    Filed: August 28, 2023
    Date of Patent: September 24, 2024
    Assignee: GOOGLE LLC
    Inventors: Vikram Aggarwal, Moises Morgenstern Gali
  • Patent number: 12094464
    Abstract: An utterance analysis device including: a storage that stores a plurality of pieces of related information each relating to one of a plurality of categories; a control circuit that receives utterance data of an utterer in order of time series, and analyzes content of the utterance data by using a plurality of first likelihoods, which are each values for identifying a possibility that the utterance data acquired by the acquire corresponds to each category; and a display processor that displays, under control of the control circuit, display data including link information indicating an association for displaying related information relating to the category of the utterance data from the storage.
    Type: Grant
    Filed: December 17, 2021
    Date of Patent: September 17, 2024
    Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.
    Inventor: Natsuki Saeki
  • Patent number: 12087286
    Abstract: A computing system obtains features that have been extracted from an acoustic signal, where the acoustic signal comprises spoken words uttered by a user. The computing system performs automatic speech recognition (ASR) based upon the features and a language model (LM) generated based upon expanded pattern data. The expanded pattern data includes a name of an entity and a search term, where the entity belongs to a segment identified in a knowledge base. The search term has been included in queries for entities belonging to the segment. The computing system identifies a sequence of words corresponding to the features based upon results of the ASR. The computing system transmits computer-readable text to a search engine, where the text includes the sequence of words.
    Type: Grant
    Filed: May 6, 2021
    Date of Patent: September 10, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Ankur Gupta, Satarupa Guha, Rupeshkumar Rasiklal Mehta, Issac John Alphonso, Anastasios Anastasakos, Shuangyu Chang
  • Patent number: 12086558
    Abstract: A method and system for automated voice casting compares candidate voices samples from candidate speakers in a target language with a primary voice sample from a primary speaker in a primary language. Utterances in the audio samples of the candidates speakers and the primary speaker are identified and typed and voice samples generated that meet applicable utterance type criteria. A neural network is used to generate an embedding for the voice samples. A voice sample can include groups of different utterance types and embeddings generated for each utterance group in the voice sample and then combined in a weighted form wherein the resulting embedding emphasizes selected utterance types. Similarities between embeddings for the candidate voice samples relative to the primary voice sample are evaluated and used to select a candidate speaker that is a vocal match.
    Type: Grant
    Filed: March 9, 2021
    Date of Patent: September 10, 2024
    Assignee: Warner Bros. Entertainment Inc.
    Inventors: Aansh Malik, Ha Thanh Nguyen
  • Patent number: 12087292
    Abstract: Various embodiments of the teachings herein include methods and systems for providing a speech-based service for the control of room control elements in buildings. Speech instructions are received by means of an audio device. The audio device is configured to analyze the received speech instructions, to convert them into corresponding operating commands for room control elements for the control of, in particular, HVAC devices (e.g. field devices) in a building and to pass them on to the corresponding room control elements. Before the receipt of the speech instructions by the audio device, the identity of the sender (user) of the speech instructions is anonymized by means of an anonymization service.
    Type: Grant
    Filed: January 15, 2019
    Date of Patent: September 10, 2024
    Assignee: SIEMENS SCHWEIZ AG
    Inventors: Kai Rohrbacher, Oliver Zechlin
  • Patent number: 12079270
    Abstract: A system comprising a client computer, a data store comprising a content management repository, a server computer coupled to the client computer by a network, the server computer comprising code for: receiving audio data; converting the audio data to text; extracting a specified string from the text as an extracted string; determining an extracted string attribute for the extracted string; storing a media file containing the audio data as a content object; configuring the content object to be searchable by the extracted string; receiving a search query from the client application; searching a plurality of managed objects based on the search query; and based on determining that the extracted string matches a search string, returning an indication of the first media file, the extracted string and the extracted string attribute in a search result.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: September 3, 2024
    Assignee: OPEN TEXT HOLDINGS, INC.
    Inventors: Gajendra Babu Bandhu, Sharath Babu Pulumati