Speech To Image Patents (Class 704/235)
-
Patent number: 12165638Abstract: A method includes receiving audio data corresponding to an utterance spoken by a user and processing, using a first recognition model, the audio data to generate a non-contextual candidate hypothesis as output from the first recognition model. The non-contextual candidate hypothesis has a corresponding likelihood score assigned by the first recognition model. The method also includes generating, using a second recognition model configured to receive personal context information, a contextual candidate hypothesis that includes a personal named entity. The method also includes scoring, based on the personal context information and the corresponding likelihood score assigned to the non-contextual candidate hypothesis, the contextual candidate hypothesis relative to the non-contextual candidate hypotheses.Type: GrantFiled: April 14, 2022Date of Patent: December 10, 2024Assignee: Google LLCInventors: Leonid Aleksandrovich Velikovich, Petar Stanisa Aleksic
-
Patent number: 12167107Abstract: In one embodiment, a computer-implemented method for editing navigation of a content item is disclosed. The method may include presenting, via a user interface at a client computing device, time-synchronized text pertaining to the content item; receiving an input of a tag for the time-synchronized text of the content item; storing the tag associated with the time-synchronized text of the content item; and responsive to receiving a request to play the content item: playing the content item via a media player presented in the user interface, and concurrently presenting the time-synchronized text and the tag as a graphical user element in the user interface.Type: GrantFiled: June 17, 2021Date of Patent: December 10, 2024Assignee: Musixmatch S. P. A.Inventors: Marco Paglia, Paolo Spazzini, Pierpaolo Di Panfilo, Nicolae-Daniel Dima, Emanuele Cantalini, Christian Zanin
-
Patent number: 12165630Abstract: A method of training a speech model includes receiving, at a voice-enabled device, a fixed set of training utterances where each training utterance in the fixed set of training utterances includes a transcription paired with a speech representation of the corresponding training utterance. The method also includes sampling noisy audio data from an environment of the voice-enabled device. For each training utterance in the fixed set of training utterances, the method further includes augmenting, using the noisy audio data sampled from the environment of the voice-enabled device, the speech representation of the corresponding training utterance to generate noisy audio samples and pairing each of the noisy audio samples with the corresponding transcription of the corresponding training utterance. The method additionally includes training a speech model on the noisy audio samples generated for each speech representation in the fixed set of training utterances.Type: GrantFiled: July 21, 2023Date of Patent: December 10, 2024Assignee: Google LLCInventors: Matthew Sharifi, Victor Carbune
-
Patent number: 12165081Abstract: Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive data analysis operations. For example, certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis operations by generating a predicted eligibility score for a predictive entity using a cross-feature-type eligibility prediction machine learning framework.Type: GrantFiled: March 10, 2022Date of Patent: December 10, 2024Assignee: Optum Services (Ireland) LimitedInventors: Riccardo Mattivi, Venkata Krishnan Mittinamalli Thandapani, Conor Breen, Peter Cogan
-
Patent number: 12164859Abstract: Methods for generating a categorized, ranked, condensed summary of a transcript of a conversation, involving obtaining a diarized version of the transcript of the conversation, storing textual monologues from the transcript, determining classifications as to the textual monologues based on a classifier algorithm, associating the classifications with the textual monologues, creating textually-modified rephrasings of the textual monologues based on text and classification thereof, storing the textually-modified rephrasings, aggregating the textually-modified rephrasings based on associated clustering and scoring, and transmitting summary information pertaining to the aggregated textually-modified rephrasings to a user device.Type: GrantFiled: June 1, 2022Date of Patent: December 10, 2024Assignee: GONG.IO LTDInventors: Shlomi Medalion, Inbal Horev, Raz Nussbaum, Omri Allouche, Raquel Sitman, Ortal Ashkenazi
-
Patent number: 12154560Abstract: A voice controlled apparatus for performing a workflow operation is described. The voice controlled apparatus can include a microphone, a speaker, and a processor. In some examples, the voice controlled apparatus can generate, via the speaker, a voice prompt associated with a task of a workflow and identify, via the microphone, a voice response received from a worker. In this regard, the voice prompt and the voice response can be a part of a voice dialogue. Further, the processor of the voice controlled apparatus can identify a performance status associated with the execution of the task, before providing a next voice prompt subsequent to the voice prompt. In this aspect, the performance status can be identified based on analyzing the voice dialogue using a machine learning model. Furthermore, the voice controlled apparatus can generate a message including a suggestion to improve the performance status of the task.Type: GrantFiled: September 25, 2020Date of Patent: November 26, 2024Assignee: VOCOLLECT, INC.Inventors: Ranganathan Srinivasan, Rekula Dinesh, Kaushik Hazra, Duff H. Gold
-
Patent number: 12153872Abstract: The present disclosure relates to systems and methods for automatically linking a note to a transcript of a conference. According to one of the embodiments a computer-implemented method is provided. The method comprises: receiving a transcript of a conference and a note from a conference participant; responsive to receiving the transcript of the conference and the note, applying a natural language processing on a content of the note and on a content of the transcript; identifying a matching content between the content of the note and the content of the transcript; generating a link corresponding to the matching content; and causing to display the link corresponding to the matching content.Type: GrantFiled: September 20, 2021Date of Patent: November 26, 2024Assignee: RingCentral, Inc.Inventor: Vlad Vendrow
-
Patent number: 12153613Abstract: A computer-implemented method for presenting relevant information to a customer service representative of a business may include receiving a digitized data stream corresponding to a spoken conversation between a customer and a representative; converting the data stream to a text stream; determining one or more keywords from the text stream; comparing the one or more keywords with a history of keywords that have previously been searched; and/or searching a database for information related to the one or more keywords that have not been previously searched. As a result of the keyword search, information about topics that the customer is interested in, may be located and displayed on a customer service representative display to facilitate the customer service representative timely relaying the information found by the keyword search to enhance the customer experience. Exemplary keywords may relate to insurance and financial services, such as “auto,” “home,” “life,” “insurance,” or “vehicle loan.Type: GrantFiled: May 21, 2020Date of Patent: November 26, 2024Assignee: State Farm Mutual Automobile Insurance CompanyInventor: Sylvia Hernandez
-
Patent number: 12147477Abstract: Disclosed is a search result display device (1) comprising: a regard prediction unit (12) configured to predict, from a dialogue between a customer and a service person, a regard of the customer; a keyword extraction unit (17) configured to extract a keyword from the regard; and a display controller (13) configured to cause a display (14) to display the dialogue and a search result obtained from the database (21) with the keyword as a search query, wherein when a string has been designated by the service person, the display controller (13) causes the display (14) to display a search result obtained from the database (21) using a search query that incorporates the string, until a search result automatic update instruction is given by the service person.Type: GrantFiled: August 14, 2019Date of Patent: November 19, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Yoshiaki Noda, Setsuo Yamada, Takaaki Hasegawa
-
Patent number: 12148422Abstract: Disclosed is an electronic apparatus. The electronic apparatus obtains a first character string comprising a previously defined character from first user utterance; recognizes a second character string, which is edited from the first character string based on a first edition command, as an input character, based on the first user utterance comprising the first edition command following the first character string; and performs edition with regard to the second character string based on second edition command, based on second user utterance comprising the second edition command without the first edition command.Type: GrantFiled: December 22, 2020Date of Patent: November 19, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jihun Park, Dongheon Seok
-
Patent number: 12149793Abstract: An example display device may include a voice signal receiver, a display, at least one memory storing an application supporting a contents providing service and storing instructions, a communication circuit communicating with at least one external server supporting the contents providing service, and at least one processor. The contents providing service may provide contents files of a first type and contents files of a second type.Type: GrantFiled: June 16, 2023Date of Patent: November 19, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jibum Moon, Gyungchan Seol, Kyerim Lee
-
Patent number: 12148423Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.Type: GrantFiled: June 7, 2021Date of Patent: November 19, 2024Assignee: Google LLCInventors: Michael J. Lebeau, William J. Byrne, John Nicholas Jitkoff, Brandon M. Ballinger, Trausti T. Kristjansson
-
Patent number: 12141529Abstract: Natural language-based question answering systems and techniques are generally described. In some examples, a natural language processing system, first natural language data. The natural language processing system may determine first slot data included in the first natural language data. A set of content items associated with the first slot data may be determined. A first machine learning model may use the first natural language data to generate prediction data associated with a first attribute among a list of attributes of the set of content items. In some examples, a first value associated with the first attribute for a first content item of the set of content items may be determined. Second natural language data may be generated based at least in part on the first value. The second natural language data may include a response to the first natural language data.Type: GrantFiled: March 22, 2022Date of Patent: November 12, 2024Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Yuval Nezri, Eilon Sheetrit, Lital Kuchy, Avihai Mejer
-
Patent number: 12141672Abstract: An example method includes receiving, by a computational assistant executing at one or more processors, a representation of an utterance spoken at a computing device; identifying, based on the utterance, a task to be performed by the computational assistant; responsive to determining, by the computational assistant, that complete performance of the task will take more than a threshold amount of time, outputting, for playback by one or more speakers operably connected to the computing device, synthesized voice data that informs a user of the computing device that complete performance of the task will not be immediate; and performing, by the computational assistant, the task.Type: GrantFiled: September 13, 2023Date of Patent: November 12, 2024Assignee: GOOGLE LLCInventors: Yariv Adan, Vladimir Vuskovic, Behshad Behzadi
-
Patent number: 12138452Abstract: A noninvasive ergonomic self-use device including a plurality of electrodes and a processor in electrical communication with the electrodes, the processor is configured to switch two or more of the electrodes between at least an ECG mode of operation in which the electrodes receive user body signals and an EPG mode in which the electrodes generate electrical pulses for stimulating the abdominal muscles of the user.Type: GrantFiled: October 10, 2019Date of Patent: November 12, 2024Assignee: GerdCare Medical Ltd.Inventors: Giora Arbel, Mordechay Esh
-
Patent number: 12141043Abstract: An utterance test method for an utterance device, an utterance test server, an utterance test system, and a program perform an utterance test on a test device (20). The utterance test system includes at least one utterance device (20) capable of uttering, a terminal device (30), and an utterance test server (10). The utterance test server (10) receives an utterance test start command from the terminal device (30), sets at least one utterance device (20) to be a test device (20) as a target of an utterance test, sets test content of the utterance test, and causes the test device (20) to utter the test content.Type: GrantFiled: July 14, 2021Date of Patent: November 12, 2024Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.Inventors: Hiroki Urabe, Kentaro Nakai, Satoru Matsunaga, Yoshiki Ohashi
-
Patent number: 12141712Abstract: This disclosure relates generally to method and system for extracting contextual information from a knowledge base. The method receives a user query comprising a request to extract contextual information from the user query. Further, the user query is analyzed based on a plurality of predefined parameters to determine sufficiency of information comprised in the user query. The received user query identifies relevant sources of the structured data, the unstructured data or the semi-structured data storage repositories. The user query is processed using a fine grain approach, where a dictionary of one or more keywords with weights are created through the domain ontology builder from the one or more knowledge articles. Furthermore, an appropriate contextual information related to the user query is extracted using the fine grain approach, based on the knowledge articles associated with the trained knowledge base comprising information required by the user query extracted from the knowledge articles.Type: GrantFiled: December 16, 2020Date of Patent: November 12, 2024Assignee: Tata Consultancy Services LimitedInventors: Sanjeev Manchanda, Ajeet Phansalkar, Mahesh Kshirsagar, Kamlesh Pandurang Mhashilkar, Nihar Ranjan Sahoo, Sonam Yashpal Sharma
-
Patent number: 12135938Abstract: Systems, methods, apparatuses, and computer program products for natural language processing are provided. One method may include utilizing a trained machine learning model to learn syntax dependency patterns and parts of speech tag patterns of text based on labeled training data. The method may also include contextualizing vector embeddings from a language model for each word in the text, and extracting relationships for a given fragment of the text based on the contextualization. The method may further include resolving relationships between identified verbs based on a plurality of heuristics to identify the syntax dependency patterns, identifying nested relationships, and capturing metadata associated with the nested relationships.Type: GrantFiled: May 11, 2022Date of Patent: November 5, 2024Assignee: CORASCLOUD, INC.Inventors: Ajay Patel, Alex Sands
-
Patent number: 12136414Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.Type: GrantFiled: August 18, 2021Date of Patent: November 5, 2024Assignee: International Business Machines CorporationInventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
-
Patent number: 12136425Abstract: A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.Type: GrantFiled: May 8, 2023Date of Patent: November 5, 2024Assignee: Ultratec, Inc.Inventors: Robert M. Engelke, Kevin R. Colwell, Christopher Engelke
-
Patent number: 12136423Abstract: Introduced here are computer programs and associated computer-implemented techniques for facilitating the creation of a master transcription (or simply “transcript”) that more accurately reflects underlying audio by comparing multiple independently generated transcripts. The master transcript may be used to record and/or produce various forms of media content, as further discussed below. Thus, the technology described herein may be used to facilitate editing of text content, audio content, or video content. These computer programs may be supported by a media production platform that is able to generate the interfaces through which individuals (also referred to as “users”) can create, edit, or view media content. For example, a computer program may be embodied as a word processor that allows individuals to edit voice-based audio content by editing a master transcript, and vice versa.Type: GrantFiled: December 18, 2020Date of Patent: November 5, 2024Assignee: Descript, Inc.Inventors: Kundan Kumar, Vicki Anand
-
Patent number: 12136426Abstract: A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.Type: GrantFiled: December 19, 2023Date of Patent: November 5, 2024Assignee: Ultratec, Inc.Inventors: Robert M. Engelke, Kevin R. Colwell, Christopher Engelke
-
Patent number: 12136416Abstract: In one embodiment, a method includes accessing a decoded hypothesis corresponding to an utterance, computing a predicted probability of observing each token in the decoded hypothesis by having a local first machine-learning model process the decoded hypothesis, computing a confidence score for each token in the decoded hypothesis by having a second machine-learning model process the decoded hypothesis, where the confidence score indicates a degree of confidence for the token to be observed at its position, calculating a loss for the computed predicted probabilities of observing tokens in the decoded hypothesis based on the computed confidence scores, and updating parameters of the local first machine-learning model based on the calculated loss.Type: GrantFiled: July 5, 2022Date of Patent: November 5, 2024Assignee: Meta Platforms, Inc.Inventors: Zhe Liu, Ke Li, Fuchun Peng
-
Patent number: 12137301Abstract: One embodiment provides a method, the method including: detecting, during a meeting comprising at least one participant remote to a user identified as a presenter, communication data and visual data provided by the user to the at least one participant; determining, utilizing a meeting discrepancy system, that content of the visual data does not match content of the communication data; and providing, to the user and utilizing a meeting discrepancy system, a notification indicating the content of the visual data does not match the content of the communication data. Other aspects are claimed and described.Type: GrantFiled: August 17, 2022Date of Patent: November 5, 2024Assignee: Lenovo (Singapore) Pte. Ltd.Inventors: Joshua Smith, Matthew Fardig, Travis Ennis, Richard Downey
-
Patent number: 12130848Abstract: A virtual assistant server receives conversational inputs as part of a conversation from a customer device and generates a version of a data record of the conversation upon: the receiving of each of the conversational inputs, or receiving each output generated by one of a plurality of software modules when the one of the software modules receives a system input from the conversation management framework. The virtual assistant server provides each of the generated versions of the data record to a communication orchestrator and receives for each of the generated versions of the data record, execution instructions from the communication orchestrator. Further, the virtual assistant server communicates with one or more of the software modules based on the received execution instructions, and provides based on the communicating, one or more of the outputs of the software modules to the customer device, when the one or more of the outputs comprise one or more responses to one or more of the conversational inputs.Type: GrantFiled: August 4, 2023Date of Patent: October 29, 2024Assignee: Kore.ai, Inc.Inventors: Rajkumar Koneru, Prasanna Kumar Arikala Gunalan, Thirupathi Bandam, Jayesh Arunkumar Jain, Vishnu Vardhan Sai Lanka
-
Patent number: 12130857Abstract: Method for editorializing digital audiovisual or audio recording content of an oral presentation given by a speaker using a presentation support enriched with tags and recorded in the form of a digital audiovisual file. This method comprises written transcription of the oral presentation with indication of a time code for each word, comparative automatic analysis of this written transcription and of the tagged presentation support, transposition of the time codes from the written transcription to the tagged presentation support, identification of the tags and of the time codes of the presentation support, and marking of the digital audiovisual file with the tags and time codes, so as to generate an enriched digital audiovisual file.Type: GrantFiled: September 18, 2020Date of Patent: October 29, 2024Assignee: VERNSTHERInventor: Jennifer Dahan
-
Patent number: 12125479Abstract: A system for providing a sociolinguistic virtual assistant includes a communication device, a processing device, and a storage device. The processing device being configured to process input data using a natural language processing algorithm; categorize the semantic data based on psych-sociological categorizations associated with the at least one user; analyze the command from the at least one user to identify a task associated with the command; generate a response based on identification of the task associated with the command; execute the task associated with the command using categorized semantic data, to derive a result. A method corresponding to the system is also provided.Type: GrantFiled: February 8, 2022Date of Patent: October 22, 2024Assignee: Seam Social Labs IncInventors: Tiasia O'Brien, Marisa Jean Dinko
-
Patent number: 12125473Abstract: Embodiments of this disclosure disclose a speech recognition method, apparatus, and device, and a storage medium. The method in the embodiments of this disclosure includes: adjusting a probability of a relationship between at least one pair of elements in a language recognition model according to a probability of the relationship between the at least one pair of elements in a textual segment; inputting a to-be-recognized speech into a speech recognition model including the language recognition model; and determining, according to the adjusted probability of relationship between the at least tone pair of elements in the language recognition model, a sequence of elements corresponding to the to-be-recognized speech as a speech recognition result.Type: GrantFiled: March 4, 2021Date of Patent: October 22, 2024Assignee: Tencent Technology (Shenzhen) Company LimitedInventor: Tao Li
-
Patent number: 12125488Abstract: An electronic apparatus is provided. The electronic apparatus includes a microphone, a memory storing at least one instruction, and a processor connected to the microphone and the memory configured to control the electronic apparatus, and the processor, by executing the at least one instruction, may, based on receiving a user voice signal through the microphone, obtain a text corresponding to the user voice signal, identify a plurality of sentences included in the obtained text, identify a domain corresponding to each of the plurality of sentences among a plurality of domains, based on a similarity of a first sentence and a second sentence, among the plurality of sentences, having a same domain being greater than or equal to a threshold value, obtain a third sentence in which the first sentence and the second sentence are combined by using a first neural network model, and perform natural language understanding for the third sentence.Type: GrantFiled: November 10, 2021Date of Patent: October 22, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Jiyoun Hong, Hyeonmok Ko, Dayoung Kwon, Jonggu Kim, Seoha Song, Kyenghun Lee, Hojung Lee, Saebom Jang, Pureum Jung, Changho Paeon
-
Patent number: 12124498Abstract: A time code to byte conversion system is provided herein that maps time codes to byte ranges such that a user device can retrieve a portion of, but not all of, a media file by specifying a time range. For example, the time code to byte conversion system can play a media file and identify the byte at which each time code begins. The time code to byte conversion system can then store the byte to time code mapping in an index accessible by a media retrieval server. A user device can then provide a time range to the media retrieval server, the media retrieval server can query the index to identify the range of bytes that corresponds to the provided time range, and then the media retrieval server can retrieve the identified range of bytes from a media database for transmission to the user device.Type: GrantFiled: January 9, 2020Date of Patent: October 22, 2024Assignee: Amazon Technologies, Inc.Inventors: Jeremiah Dunham, Andrew Tunall, Benjamin Schwartz, Jason LaPier, Justin Abrahms
-
Patent number: 12118272Abstract: Systems and methods to accept speech input and edit a note upon receipt of an indication to edit are disclosed. Exemplary implementations may: effectuate presentation of a graphical user interface that includes a note, the note including note sections, the note sections including a first note section, the individual note sections including body fields; obtain user input from the client computing platform, the user input representing an indication to edit a first body field of the first note section; obtain audio information representing sound captured by an audio section of the client computing platform, the audio information including value definition information specifying one or more values to be included in the individual body fields; perform speech recognition on the audio information to obtain a first value; and populate the first body field with the first value so that the first value is included in the first body field.Type: GrantFiled: November 18, 2022Date of Patent: October 15, 2024Assignee: Suki AI, Inc.Inventor: Matt Pallakoff
-
Patent number: 12118514Abstract: Systems and methods to generate records within a collaboration environment are described herein. Exemplary implementations may perform one or more of: manage environment state information maintaining a collaboration environment; effectuate presentation of a user interface through which users upload digital assets representing recorded audio and/or video content; obtain input information defining the digital assets input via the user interface; generate transcription information characterizing the recorded audio and/or video content of the digital assets; provide the transcription information as input into a trained machine-learning model; obtain the output from the trained machine-learning model, the output defining one or more new records based on the transcripts; and/or other operations.Type: GrantFiled: February 17, 2022Date of Patent: October 15, 2024Assignee: Asana, Inc.Inventor: Steve B Morin
-
Patent number: 12118266Abstract: Media content can be created and/or modified using a network-accessible platform. Scripts for content-based experiences could be readily created using one or more interfaces generated by the network-accessible platform. For example, a script for a content-based experience could be created using an interface that permits triggers to be inserted directly into the script. Interface(s) may also allow different media formats to be easily aligned for post-processing. For example, a transcript and an audio file may be dynamically aligned so that the network-accessible platform can globally reflect changes made to either item. User feedback may also be presented directly on the interface(s) so that modifications can be made based on actual user experiences.Type: GrantFiled: February 25, 2022Date of Patent: October 15, 2024Assignee: Descript, Inc.Inventors: Steven Surmacz Rubin, Ulf Schwekendiek, David John Williams
-
Patent number: 12112138Abstract: Embodiments provide a software framework for evaluating and troubleshooting real-world task-oriented bot systems. Specifically, the evaluation framework includes a generator that infers dialog acts and entities from bot definitions and generates test cases for the system via model-based paraphrasing. The framework may also include a simulator for task-oriented dialog user simulation that supports both regression testing and end-to-end evaluation. The framework may also include a remediator to analyze and visualize the simulation results, remedy some of the identified issues, and provide actionable suggestions for improving the task-oriented dialog system.Type: GrantFiled: June 2, 2022Date of Patent: October 8, 2024Assignee: Salesforce, Inc.Inventors: Guangsen Wang, Samson Min Rong Tan, Shafiq Rayhan Joty, Gang Wu, Chu Hong Hoi, Ka Chun Au
-
Patent number: 12112742Abstract: Provided are an electronic device for correcting a speech input, and an operating method thereof. The method may include receiving a first speech signal; obtaining first text; obtaining an intent of the first speech signal and a confidence score of the intent, by inputting the first text to a natural language understanding model; identifying a plurality of correction candidate semantic elements capable of being correction targets in the first text; receiving a second speech signal; obtaining second text; identifying whether the second speech signal is a speech signal for correcting the first text; comparing the plurality of correction candidate semantic elements in the first text with a semantic element in the second text, based on the confidence score; and correcting at least one of the plurality of correction candidate semantic elements in the first text.Type: GrantFiled: November 29, 2021Date of Patent: October 8, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jongsun Lee, Jongyoub Ryu, Seonghan Ryu, Eunji Lee, Jaechul Yang, Hyungtak Choi
-
Patent number: 12112743Abstract: A speech recognition method includes: obtaining speech data; performing feature extraction on speech data, to obtain speech features of at least two speech segments; inputting the speech features of the at least two speech segments into the speech recognition model, and processing the speech features of the speech segments by using cascaded hidden layers in the speech recognition model, to obtain hidden layer features of the speech segments, a hidden layer feature of an ith speech segment being determined based on speech features of n speech segments located after the ith speech segment in a time sequence and a speech feature of the ith speech segment; and obtaining text information corresponding to the speech data based on the hidden layer features of the speech segments.Type: GrantFiled: March 30, 2022Date of Patent: October 8, 2024Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Xilin Zhang, Bo Liu
-
Patent number: 12112750Abstract: A method for estimating a user's location in an environment may involve receiving output signals from each microphone of a plurality of microphones in the environment. At least two microphones of the plurality of microphones may be included in separate devices at separate locations in the environment and the output signals may correspond to a current utterance of a user. The method may involve determining multiple current acoustic features from the output signals of each microphone and applying a classifier to the multiple current acoustic features. Applying the classifier may involve applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the user in a plurality of user zones in the environment. The method may involve determining, based at least in part on output from the classifier, an estimate of the user zone in which the user is currently located.Type: GrantFiled: July 28, 2020Date of Patent: October 8, 2024Assignee: Dolby Laboratories Licensing CorporationInventors: Mark R. P. Thomas, Richard J. Cartwright
-
Patent number: 12112759Abstract: A method and apparatus for speaker diarization with early-stop clustering, segmenting an audio stream into at least one speech segment (710), the audio stream comprising speeches from at least one speaker; clustering the at least one speech segment into a plurality of clusters (720), the number of the plurality of clusters being greater than the number of the at least one speaker; selecting, from the plurality of clusters, at least one cluster of the highest similarity (730), the number of the selected at least one cluster being equal to the number of the at least one speaker; establishing a speaker classification model based on the selected at least one cluster (740); and aligning, through the speaker classification model, speech frames in the audio stream to the at least one speaker (750).Type: GrantFiled: March 29, 2019Date of Patent: October 8, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Liping Chen, Kao-Ping Soong
-
Patent number: 12113934Abstract: A computer-implemented method is provided for quantitative performance evaluation of a call agent. The method comprises converting an audio recording of a call between the call agent and a customer to a text-based transcript and identifying at least one topic for categorizing the transcript. The method also includes retrieving a set of criteria associated with the topic. Each criterion correlates to a set of predefined questions for interrogating the transcript to evaluate the performance of the call agent with respect to the corresponding criterion. Each question captures a sub-criterion under the corresponding criterion. The method further includes inputting the predefined questions and the transcript into a trained large language model to obtain scores for respective ones of the predefined questions. Each score measures a degree of satisfaction of the performance of the call agent during the call with respect to the sub-criterion captured by the corresponding predefined question.Type: GrantFiled: April 4, 2024Date of Patent: October 8, 2024Assignee: FMR LLCInventors: Bryan Dempsey, Saquib Ilahi, Jenson Joy, Nirupam Sarkar, Murad Maayah, Abigail Parker, Meagan Gilbert, Derek Kaschl
-
Patent number: 12106055Abstract: A chatbot system is configured to execute code to perform determining, by the chatbot system, a classification result for an utterance and one or more anchors each anchor of the one or more anchors corresponding to one or more anchor words of the utterance. For each anchor of the one or more anchors, one or more synthetic utterances are generated, and one or more classification results for the one or more synthetic utterances are determined. A report is generated by the chatbot system comprising a representation of a particular anchor of the one or more anchors, the particular anchor corresponding to a highest confidence value among the one or more anchors. The one or more synthetic utterances may be used to generate a new training dataset for training a machine-learning model. The training dataset may be refined according to a threshold confidence values to filter out datasets for training.Type: GrantFiled: August 20, 2021Date of Patent: October 1, 2024Assignee: Oracle International CorporationInventors: Gautam Singaraju, Vishal Vishnoi, Manish Parekh, Alexander Wang
-
Patent number: 12106758Abstract: Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.Type: GrantFiled: May 17, 2021Date of Patent: October 1, 2024Assignee: GOOGLE LLCInventors: Victor Carbune, Alvin Abdagic, Behshad Behzadi, Jacopo Sannazzaro Natta, Julia Proskurnia, Krzysztof Andrzej Goj, Srikanth Pandiri, Viesturs Zarins, Nicolo D'Ercole, Zaheed Sabur, Luv Kothari
-
Patent number: 12106777Abstract: Embodiments of the present disclosure provide an audio processing method and an electronic device. The method includes: first obtaining text information corresponding to a to-be-processed audio, where the text information includes a to-be-processed text and a playback period corresponding to each field in the to-be-processed text; then receiving a first input on the to-be-processed text; in response to the first input, determining, as a to-be-processed field, a field indicated by the first input in the to-be-processed text; then receiving a second input on the to-be-processed field; obtaining a target audio segment in response to the second input; and finally modifying an audio segment at a playback period corresponding to the to-be-processed field according to the target audio segment, to obtain a target audio.Type: GrantFiled: September 8, 2022Date of Patent: October 1, 2024Assignee: VIVO MOBILE COMMUNICATION CO., LTD.Inventor: Jixiang Hu
-
Patent number: 12101199Abstract: A conference system is described that associates a first device and a second device to the same user, compares a first input from the first device and a second input from the second device, and modifies a setting of a conference session. The first input and the second input may be a video input or an audio input. The modification may include, for example, noise removal, determination of the user's AV feed device, or removing a background image.Type: GrantFiled: July 21, 2023Date of Patent: September 24, 2024Assignee: Capital One Services, LLCInventors: Lee Adcock, Mehulkumar Jayantilal Garnara, Vamsi Kavuri
-
Patent number: 12101279Abstract: Systems and methods that offer significant improvements to current virtual agent (VA) conversational experiences are disclosed. The proposed systems and methods are configured to manage conversations in real-time with human customers while accommodating a dynamic goal. The VA includes a goal-driven module with a reinforcement learning-based dialogue manager. The VA is an interactive tool that utilizes both task-specific rewards and sentiment-based rewards to respond to a dynamic goal. The VA is capable of handling dynamic goals with a significantly high success rate. As the system is trained primarily with a user simulator, it can be readily extended for applications across other domains.Type: GrantFiled: August 27, 2021Date of Patent: September 24, 2024Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITEDInventors: Shubhashis Sengupta, Anutosh Maitra, Roshni Ramesh Ramnani, Sriparna Saha, Abhisek Tiwari, Pushpak Bhattacharyya
-
Patent number: 12100399Abstract: Methods, apparatus, systems, and computer-readable media are provided for isolating at least one device, from multiple devices in an environment, for being responsive to assistant invocations (e.g., spoken assistant invocations). A process for isolating a device can be initialized in response to a single instance of a spoken utterance, of a user, that is detected by multiple devices. One or more of the multiple devices can be caused to query the user regarding identifying a device to be isolated for receiving subsequent commands. The user can identify the device to be isolated by, for example, describing a unique identifier for the device. Unique identifiers can be generated by each device of the multiple devices and/or by a remote server device. The unique identifiers can be presented graphically and/or audibly to the user, and user interface input. Any device that is not identified can become temporarily unresponsive to certain commands, such as spoken invocation commands.Type: GrantFiled: August 28, 2023Date of Patent: September 24, 2024Assignee: GOOGLE LLCInventors: Vikram Aggarwal, Moises Morgenstern Gali
-
Patent number: 12094464Abstract: An utterance analysis device including: a storage that stores a plurality of pieces of related information each relating to one of a plurality of categories; a control circuit that receives utterance data of an utterer in order of time series, and analyzes content of the utterance data by using a plurality of first likelihoods, which are each values for identifying a possibility that the utterance data acquired by the acquire corresponds to each category; and a display processor that displays, under control of the control circuit, display data including link information indicating an association for displaying related information relating to the category of the utterance data from the storage.Type: GrantFiled: December 17, 2021Date of Patent: September 17, 2024Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.Inventor: Natsuki Saeki
-
Patent number: 12087286Abstract: A computing system obtains features that have been extracted from an acoustic signal, where the acoustic signal comprises spoken words uttered by a user. The computing system performs automatic speech recognition (ASR) based upon the features and a language model (LM) generated based upon expanded pattern data. The expanded pattern data includes a name of an entity and a search term, where the entity belongs to a segment identified in a knowledge base. The search term has been included in queries for entities belonging to the segment. The computing system identifies a sequence of words corresponding to the features based upon results of the ASR. The computing system transmits computer-readable text to a search engine, where the text includes the sequence of words.Type: GrantFiled: May 6, 2021Date of Patent: September 10, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ankur Gupta, Satarupa Guha, Rupeshkumar Rasiklal Mehta, Issac John Alphonso, Anastasios Anastasakos, Shuangyu Chang
-
Patent number: 12086558Abstract: A method and system for automated voice casting compares candidate voices samples from candidate speakers in a target language with a primary voice sample from a primary speaker in a primary language. Utterances in the audio samples of the candidates speakers and the primary speaker are identified and typed and voice samples generated that meet applicable utterance type criteria. A neural network is used to generate an embedding for the voice samples. A voice sample can include groups of different utterance types and embeddings generated for each utterance group in the voice sample and then combined in a weighted form wherein the resulting embedding emphasizes selected utterance types. Similarities between embeddings for the candidate voice samples relative to the primary voice sample are evaluated and used to select a candidate speaker that is a vocal match.Type: GrantFiled: March 9, 2021Date of Patent: September 10, 2024Assignee: Warner Bros. Entertainment Inc.Inventors: Aansh Malik, Ha Thanh Nguyen
-
Patent number: 12087292Abstract: Various embodiments of the teachings herein include methods and systems for providing a speech-based service for the control of room control elements in buildings. Speech instructions are received by means of an audio device. The audio device is configured to analyze the received speech instructions, to convert them into corresponding operating commands for room control elements for the control of, in particular, HVAC devices (e.g. field devices) in a building and to pass them on to the corresponding room control elements. Before the receipt of the speech instructions by the audio device, the identity of the sender (user) of the speech instructions is anonymized by means of an anonymization service.Type: GrantFiled: January 15, 2019Date of Patent: September 10, 2024Assignee: SIEMENS SCHWEIZ AGInventors: Kai Rohrbacher, Oliver Zechlin
-
Patent number: 12079270Abstract: A system comprising a client computer, a data store comprising a content management repository, a server computer coupled to the client computer by a network, the server computer comprising code for: receiving audio data; converting the audio data to text; extracting a specified string from the text as an extracted string; determining an extracted string attribute for the extracted string; storing a media file containing the audio data as a content object; configuring the content object to be searchable by the extracted string; receiving a search query from the client application; searching a plurality of managed objects based on the search query; and based on determining that the extracted string matches a search string, returning an indication of the first media file, the extracted string and the extracted string attribute in a search result.Type: GrantFiled: December 9, 2019Date of Patent: September 3, 2024Assignee: OPEN TEXT HOLDINGS, INC.Inventors: Gajendra Babu Bandhu, Sharath Babu Pulumati