Patents Examined by Paras D Shah
  • Patent number: 12217750
    Abstract: Input context for a statistical dialog manager may be provided. Upon receiving a spoken query from a user, the query may be categorized according to at least one context clue. The spoken query may then be converted to text according to a statistical dialog manager associated with the category of the query and a response to the spoken query may be provided to the user.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: February 4, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Michael Bodell, John Bain, Robert Chambers, Karen M. Cross, Michael Kim, Nick Gedge, Daniel Frederick Penn, Kunal Patel, Edward Mark Tecot, Jeremy C. Waltmunson
  • Patent number: 12217016
    Abstract: An electronic apparatus, including a microphone; a memory configured to store at least one instruction; and a processor configured to: acquire a first token corresponding to a first user voice input in a first language acquired through the microphone, acquire a first text in a second language by inputting the first token into a first neural network model, acquire a feature value corresponding to a predicted subsequent token, which is predicted to be uttered after the first token, by inputting the first text into a second neural network model, and based on a second token being acquired subsequent to the first token, acquire a second text in the second language by inputting the first token, the second token, the first text, and the feature value into the first neural network model.
    Type: Grant
    Filed: May 17, 2022
    Date of Patent: February 4, 2025
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Beomseok Lee, Sathish Indurthi, Mohd Abbas Zaidi, Nikhil Kumar
  • Patent number: 12217756
    Abstract: This disclosure relates generally to systems, methods, and computer readable media for providing improved insights and annotations to enhance recorded audio, video, and/or written transcriptions of testimony. For example, in some embodiments, a method is disclosed for correlating non-verbal cues recognized from an audio and/or video recording of testimony to the corresponding testimony transcript locations. In other embodiments, a method is disclosed for providing testimony-specific artificial intelligence-based insights and annotations to a testimony transcript, e.g., based on the use of machine learning, natural language processing, and/or other techniques. In still other embodiments, a method is disclosed for providing smart citations to a testimony transcript, e.g., which track the location of semantic constructs within the transcript over the course of various modifications being made to the transcript.
    Type: Grant
    Filed: September 2, 2021
    Date of Patent: February 4, 2025
    Assignee: AUDAX PRIVATE DEBT LLC
    Inventors: Robert Ackerman, Anthony J. Vaglica, Holli Goldman, Amber Hickman, Walter Barrett, Cameron Turner, Shawn Rutledge
  • Patent number: 12216996
    Abstract: Embodiments are provided for generating a reasonable language model learning for text data in a knowledge graph in a computing system by a processor. One or more data sources and one or more triples may be analyzed from a knowledge graph. Training data having one or more candidate labels associated with one or more of the triples may be generated. One or more reasonable language models may be trained based on the training data.
    Type: Grant
    Filed: November 2, 2021
    Date of Patent: February 4, 2025
    Assignee: International Business Machines Corporation
    Inventors: Thanh Lam Hoang, Dzung Tien Phan, Gabriele Picco, Lam Minh Nguyen, Vanessa Lopez Garcia
  • Patent number: 12219154
    Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and gen
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: February 4, 2025
    Assignee: VOXSMART LIMITED
    Inventors: Tejas Shastry, Matthew Goldey, Svyat Vergun
  • Patent number: 12217741
    Abstract: A method for implementing a privacy-preserving automatic speech recognition system using federated learning. The method includes receiving, from respective client devices, at a cloud server, local acoustic model weights for a neural network-based acoustic model of a local automatic speech recognition system running on the respective client devices, wherein the local acoustic model weights are generated at the respective client devices without labelled data, updating a global automatic speech recognition system based on (a) the local acoustic model weights received from the respective client devices and (b) global acoustic model weights of the global automatic speech recognition system derived from labelled data to obtain an updated global automatic speech recognition system, and sending the updated global automatic speech recognition system to the respective client devices to operate as a new local automatic speech recognition system.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: February 4, 2025
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Sylvain Le Groux, Erwan Barry Tarik Zerhouni
  • Patent number: 12210826
    Abstract: A method of presenting prompt information by utilizing a neural network which includes a BERT model and a graph convolutional neural network (GCN), comprising: generating a first vector based on a combination of an entity, a context of the entity, a type of the entity and a part of speech of the context by using BERT model; generating a second vector based on each of predefined concepts by using BERT model; generating a third vector based on a graph which is generated based on the concepts and relationships thereamong, by using GCN; generating a fourth vector by concatenating the second and third vectors; calculating semantic similarity between the entity and each concept based on the first and fourth vectors; determining, based on the first vector and the semantic similarity, that the entity corresponds to one of the concepts; and generating the prompt information based on the determined concept.
    Type: Grant
    Filed: March 16, 2022
    Date of Patent: January 28, 2025
    Assignee: FUJITSU LIMITED
    Inventors: Yiling Cao, Zhongguang Zheng, Jun Sun
  • Patent number: 12211491
    Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.
    Type: Grant
    Filed: May 9, 2022
    Date of Patent: January 28, 2025
    Assignee: International Business Machines Corporation
    Inventors: Cheng-I Lai, Yang Zhang, Kaizhi Qian, Chuang Gan, James R. Glass, Alexander Haojan Liu
  • Patent number: 12204867
    Abstract: Provided is a computer-implemented method, system, and computer program product for process mining asynchronous support conversations using attributed directly follows graphing. A processor may collect a plurality of conversation threads from an asynchronous data stream. The processor may label each utterance of a plurality of utterances from the plurality of conversation threads with an event label. The processor may analyze the event label for each utterance of the plurality of utterances. The processor may generate, based on the analyzing of the event label for each utterance, an attributed directly follows graph (DFG).
    Type: Grant
    Filed: March 22, 2022
    Date of Patent: January 21, 2025
    Assignee: International Business Machines Corporation
    Inventors: Sampath Dechu, Monika Gupta, Prerna Agarwal, Renuka Sindhgatta Rajan, Naveen Eravimangalath Purushothaman
  • Patent number: 12198682
    Abstract: An example system includes a processor to receive a summary of a conversation to be generated. The processor can input the summary into a trained summary-grounded conversation generator. The processor can receive a generated conversation from the trained summary-grounded conversation generator.
    Type: Grant
    Filed: September 13, 2021
    Date of Patent: January 14, 2025
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chulaka Gunasekara, Guy Feigenblat, Benjamin Sznajder, Sachindra Joshi
  • Patent number: 12198703
    Abstract: An audio signal encoding method and device are provided. The method and device are used to encode an audio signal to obtain a bitstream representing the analog audio signal, in which a proper bit allocation for spectral coefficients can be performed.
    Type: Grant
    Filed: February 16, 2022
    Date of Patent: January 14, 2025
    Assignee: Top Quality Telephony, LLC
    Inventors: Zexin Liu, Bin Wang, Lei Miao
  • Patent number: 12190893
    Abstract: A system for registering an individual's biometric data and then later verifying the identity of the individual using the previously registered biometric data makes use of either an audio communications channel or a messaging channel, both of which are accessed via an application programming interface (API). In some instances, spoken audio input is received from the individual and the spoken audio input is used to generate a voice print for the individual. In other instances, the biometric data could be image-based, such as facial images or an image of an individual's iris.
    Type: Grant
    Filed: June 11, 2021
    Date of Patent: January 7, 2025
    Assignee: Vonage Business Inc.
    Inventors: Mark Berkeland, Angel Esteban Garcia
  • Patent number: 12183344
    Abstract: Systems, apparatuses, methods, and computer program products are disclosed for predicting an entity and intent based on captured speech. An example method includes capturing speech and converting the speech to text. The example method further includes causing generation of one or more entities and one or more intents based on the speech and the text. The example method further includes determining a next action based on each of the one or more entities and each of the one or more intents.
    Type: Grant
    Filed: November 24, 2021
    Date of Patent: December 31, 2024
    Assignee: Wells Fargo Bank, N.A.
    Inventors: Vinothkumar Venkataraman, Rahul Ignatius, Naveen Gururaja Yeri, Paul Davis
  • Patent number: 12165645
    Abstract: Systems, methods, and computer-readable media are disclosed for annotating content data such as video data with annotation data (e.g., images, emoji, memes, stylized text, sounds) in near real time. Example methods may include determining transcribed text from the content data, associated annotation data with the transcribed text, and annotating the content data with some or all transcribed text and annotation data. Example methods may further include editing the annotated content data to generate modified annotated content data and sending the annotated content data and/or modified annotated content data to a device.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: December 10, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Blair Harold Beebe, Peter Chin, Maksim Surguy, Chris Aaron Edmonds, Christina Siegfried, Kwan Ting Lee, Darvin Vida
  • Patent number: 12159120
    Abstract: A translation method includes acquiring an image, where the image includes a text to be translated; splitting the text to be translated in the image and acquiring a plurality of target objects, where each of the plurality of target objects includes a word or a phrase of the text to be translated; receiving an input operation for the plurality of target objects, acquiring an object to be translated among the plurality of target objects, and translating the object to be translated.
    Type: Grant
    Filed: March 19, 2021
    Date of Patent: December 3, 2024
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Shaoting Yi, Yongjia Yu
  • Patent number: 12154552
    Abstract: A natural language understanding (NLU) system generates in-place annotations for natural language utterances or other types of time-based media based on stand-off annotations. The in-place annotations are associated with particular sub-sequences of an annotation, which provides richer information than stand-off annotations, which are associated only with an utterance as a whole. To generate the in-place annotations for an utterance, the NLU system applies an encoder network and a decoder network to obtain attention weights for the various tokens within the utterance. The NLU system disqualifies tokens of the utterance based on their corresponding attention weights, and selects highest-scoring contiguous sequences of tokens between the disqualified tokens. In-place annotations are associated with the selected sequences.
    Type: Grant
    Filed: August 31, 2021
    Date of Patent: November 26, 2024
    Assignee: Interactions LLC
    Inventors: Brian David Lester, Srinivas Bangalore
  • Patent number: 12148426
    Abstract: Embodiments of the disclosure generally relate to a dialog system allowing for automatically reactivating a speech acquiring mode after the dialog system delivers a response to a user request. The reactivation parameters, such as a delay, depend on a number of predetermined factors and conversation scenarios. The embodiments further provide for a method of operating of the dialog system. An exemplary method comprises the steps of: activating a speech acquiring mode, receiving a first input of a user, deactivating the speech acquiring mode, obtaining a first response associated with the first input, delivering the first response to the user, determining that a conversation mode is activated, and, based on the determination, automatically re-activating the speech acquiring mode within a first predetermined time period after delivery of the first response to the user.
    Type: Grant
    Filed: May 18, 2022
    Date of Patent: November 19, 2024
    Assignee: GOOGLE LLC
    Inventors: Ilya Gennadyevich Gelfenbeyn, Artem Goncharuk, Pavel Aleksandrovich Sirotin
  • Patent number: 12148444
    Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.
    Type: Grant
    Filed: April 5, 2021
    Date of Patent: November 19, 2024
    Assignee: Google LLC
    Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
  • Patent number: 12141542
    Abstract: Methods and servers for training a translation model for translation between a rare language from a group and a target language. The method includes acquiring an actual example of translation and using a transliteration function for generating a synthetic actual example of translation. The method includes acquiring a sentence in the target language, generating an artificial translation of that sentence using back-translation, and thereby generating a given artificial example of translation. The method includes generating a synthetic artificial example based on the given artificial example. The method includes training the translation model based on the synthetic actual example of translation and the synthetic artificial example of translation.
    Type: Grant
    Filed: December 17, 2021
    Date of Patent: November 12, 2024
    Assignee: Y.E. Hub Armenia LLC
    Inventors: Anton Aleksandrovich Dvorkovich, Roman Olegovich Peshkurov
  • Patent number: 12131750
    Abstract: A method for enhancing detection of synthetic voice data is provided that includes converting, by an electronic device, monophonic voice data into stereophonic voice data. The stereophonic voice data includes a first channel signal and a second channel signal. Moreover, the method includes decomposing, by a trained machine learning model, the stereophonic voice data into a mid-signal and a side signal. The method also includes determining artifacts indicative of synthetic generation in the structured and secondary artifacts, calculating, based on the determined artifacts, a probability score reflecting the likelihood the monophonic voice data was synthetically generated, and comparing the probability score against a threshold value. When the probability score satisfies the threshold value, there is a high likelihood that the monophonic voice data includes synthetic artifacts, and an alert is generated indicating the monophonic voice data is potentially fraudulent.
    Type: Grant
    Filed: May 10, 2024
    Date of Patent: October 29, 2024
    Assignee: Daon Technology
    Inventors: Raphael A. Rodriguez, Olena Mizynchuk, Davyd Mizynchuk