Subportions Patents (Class 704/254)

Voice-to-text data processing

Patent number: 12380884

Abstract: A computing system includes a processor configured to convert a word spoken by a user into a pattern of symbols in response to an unsuccessful attempt to retrieve the word in a list. The pattern of symbols provide a visual representation of speech sounds identifying the contact in the contact list. The pattern of symbols of the converted contact is compared to a database of patterns, with the patterns in the database being in a format of symbols corresponding to the words in the list. Each pattern used in the compare has a match value assigned thereto based on being compared to the pattern of symbols of the converted word. The processor provides the word in the list corresponding to the pattern having the match value that is indicative of a match to the converted word.

Type: Grant

Filed: December 10, 2021

Date of Patent: August 5, 2025

Inventors: Hao Wu, Taodong Lu, Yihong Wu
Call word learning data generation device and method

Patent number: 12380878

Abstract: The present disclosure relates to a device and method for generating call word learning data, and the call word learning data generation device includes a processor and storage. The storage stores utterance data and an utterance phrase corresponding to the utterance data. The processor is configured to decompose the utterance data into phoneme units based on the utterance data and the utterance phrase, to receive a call word through a user input, to decompose the received call word into phoneme units, to compare phoneme data of the call word with phoneme data of the utterance data, and to generate call word learning data by combining phoneme data matched as the comparison result.

Type: Grant

Filed: September 8, 2023

Date of Patent: August 5, 2025

Assignees: Hyundai Motor Company, Kia Corporation

Inventors: Eui Hyeok Lee, Hyung Sik Gim, Han Woong Choi, Sung Woo Moon
Knowledge distillation using deep clustering

Patent number: 12321846

Abstract: Methods and systems for training a neural network include clustering a full set of training data samples into specialized training clusters. Specialized teacher neural networks are trained using respective specialized training clusters of the specialized training clusters. Soft labels are generated for the full set of training data samples using the specialized teacher neural networks. A student model is trained using the full set of training data samples, the specialized training clusters, and the soft labels.

Type: Grant

Filed: December 9, 2020

Date of Patent: June 3, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Takashi Fukuda
Methods for evaluating the pronunciation of speech

Patent number: 12315500

Abstract: In a method for quantitatively evaluating a pronunciation of a speaker, an acoustic signal is received from the speaker, which represents an utterance spoken in a language by the speaker. The acoustic signal is segmented into segments at the phone, word or phrase level. The acoustic signal segments are each transcribed by a high fidelity transcriber into high fidelity transcription segments that each comprises a sequence of phones that represents how the native or normative speakers of the language would likely perceive a playback of the acoustic signal segment. Each of the high fidelity transcription segments are compared to a baseline to generate one or more pronunciation scores that evaluate the pronunciation of the speaker. The baseline may be generated by transcribing the acoustic signal into an adapted transcription that comprises a sequence of phones that represents how native or normative speakers of the language would likely speak the utterance.

Type: Grant

Filed: February 28, 2022

Date of Patent: May 27, 2025

Assignee: GOOGLE LLC

Inventor: Jian Cheng
Information processing apparatus, information processing system, and information processing method

Patent number: 12288554

Abstract: An information processing apparatus includes an acquisition unit that acquires, from a storage unit that stores episode data of a speaker, the episode data regarding topic information included in utterance data of the speaker. The information processing apparatus further includes an interaction control unit that controls an interaction with the speaker so as to include an episode based on the episode data.

Type: Grant

Filed: January 19, 2021

Date of Patent: April 29, 2025

Assignee: SONY GROUP CORPORATION

Inventors: Hideki Noma, Katsutoshi Kanamori
Information processing apparatus, information processing system, and information processing method

Patent number: 12260859

Abstract: An information processing apparatus includes an acquisition unit that acquires, from a storage unit that stores episode data of a speaker, the episode data regarding topic information included in utterance data of the speaker. The information processing apparatus further includes an interaction control unit that controls an interaction with the speaker so as to include an episode based on the episode data.

Type: Grant

Filed: January 19, 2021

Date of Patent: March 25, 2025

Assignee: SONY GROUP CORPORATION

Inventors: Hideki Noma, Katsutoshi Kanamori
Utterance evaluation apparatus, utterance evaluation, and program

Patent number: 12249320

Abstract: A stable evaluation result is obtained from a voice of speech for any sentence. A speech evaluation device (1) outputs a score for evaluating speech of an input voice signal spoken by a speaker in a first group. A feature extraction unit (11) extracts an acoustic feature from the input voice signal. A conversion unit (12) converts the acoustic feature of the input voice signal to an acoustic feature when a speaker in a second group speaks the same text as text of the input voice signal. An evaluation unit (13) calculates a score indicating a higher evaluation as a distance between the acoustic feature before the conversion and the acoustic feature after the conversion becomes shorter.

Type: Grant

Filed: June 25, 2019

Date of Patent: March 11, 2025

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventor: Sadao Hiroya
Prediction and identification techniques used with a hearing prosthesis

Patent number: 12236942

Abstract: A method, including receiving a signal which includes speech data, processing the received signal to identify and/or predict one or more words in the speech data, and evoking a hearing percept based on the received signal, wherein the evoked hearing percept includes one or more modified words based on the identification and/or prediction of the one or more words.

Type: Grant

Filed: June 24, 2020

Date of Patent: February 25, 2025

Assignee: Cochlear Limited

Inventors: Paul Michael Carter, Adam Hersbach, Richard Bruce Murphy, Kenneth Oplinger
Method for analyzing the movements of a person, and device for implementing same

Patent number: 12223773

Abstract: A method for analyzing at least one sequence of movements performed by a person, including steps of segmenting the sequence of movements into time units, each associated with a time (t0 to tn), determining for each instant, positions of at least some characteristic points of the person using an acquisition system providing raw data, assigning at each instance at least one position code for the characteristic points as a function for each of them of the determined position to form a combination of position codes at each instant, assigning at least one elementary action code for each given instant corresponding to the combination of position codes at the given instant, and syntactically verifying the elementary action codes and/or position codes from at least one structured language to refine the raw data relating to the positions of the characteristic points.

Type: Grant

Filed: October 17, 2019

Date of Patent: February 11, 2025

Assignee: A.I.O.

Inventors: Cyril Dane, Florian Boudinet
Using video clips as dictionary usage examples

Patent number: 12197868

Abstract: Implementations are provided for automatically mining corpus(es) of electronic video files for video clips that contain spoken utterances that are suitable usage examples to accompany or compliment dictionary definitions. These video clips may then be associated with target n-grams in a searchable database, such as a database underlying an online dictionary. In various implementations, a set of candidate video clips in which a target n-gram is uttered in a target context may be identified from a corpus of electronic video files. For each candidate video clip of the set, pre-existing manual subtitles associated with the candidate video clip may be compared to text generated based on speech recognition processing of an audio portion of the candidate video clip. Based at least in part on the comparing, a measure of suitability as a dictionary usage example may be calculated for the candidate video clip.

Type: Grant

Filed: November 4, 2019

Date of Patent: January 14, 2025

Assignee: GOOGLE LLC

Inventors: Tal Cohen, Tal Snir, Sivan Eiger, Zahi Akiva, Gadi Ben Amram, Ran Dahan, Sasha Goldshtein, Yossi Matias, Shoji Ogura
Electronic device and controlling method thereof

Patent number: 12170088

Abstract: An electronic device is provided. The electronic device according to an embodiment includes a microphone, a communicator comprising communication circuitry, and a processor configured to control the communicator to transmit a control command to an external audio device for reducing an audio output level of the external audio device in response to a trigger signal for starting a voice control mode being received through the microphone and to control the electronic device to operate in the voice control mode.

Type: Grant

Filed: July 7, 2023

Date of Patent: December 17, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Min-seok Kim, Min-ho Lee
Speech processing device, speech processing method, and recording medium

Patent number: 12142279

Abstract: A speaker extracting unit extracts a speaker area from an image. A first utterance data generating unit, on the basis of the shape of the lips of the speaker, generates first utterance data indicating the content of the utterance by the speaker. A second utterance data generating unit, on the basis of a speech signal corresponding to the utterance by the speaker, generates second utterance data indicating the content of the utterance by the speaker. A comparison unit compares the first utterance data and the second utterance data with each other.

Type: Grant

Filed: July 29, 2020

Date of Patent: November 12, 2024

Assignee: NEC CORPORATION

Inventor: Kazuyuki Sasaki
Voiceprint recognition method and device

Patent number: 12130899

Abstract: This application provides a voiceprint recognition method and device. The method includes: calculating, by an electronic device a first confidence value that an entered voice belongs to a first registered user, and calculating a second confidence value that the entered voice belongs to a second registered user. The method further includes: calculating, by another electronic device, a third confidence value that the entered voice belongs to the first registered user, and calculating a fourth confidence value that the entered voice belongs to the second registered user. A server determines, based on the first confidence value and the third confidence value, a fifth confidence value that a user is the first registered user, and determines, based on the second confidence value and the fourth confidence value, a sixth confidence value that the user is the second registered user.

Type: Grant

Filed: January 28, 2022

Date of Patent: October 29, 2024

Assignee: Huawei Technologies Co., Ltd.

Inventors: Yuan Sun, Shuwei Li, Youyu Jiang, Shen Qu, Ming Kuang
System and method for identifying and processing audio signals

Patent number: 12100407

Abstract: A method for phoneme identification. The method includes receiving an audio signal from a speaker, performing initial processing comprising filtering the audio signal to remove audio features, the initial processing resulting in a modified audio signal, transmitting the modified audio signal to a phoneme identification method and a phoneme replacement method to further process the modified audio signal, and transmitting the modified audio signal to a speaker. Also, a system for identifying and processing audio signals. The system includes at least one speaker, at least one microphone, and at least one processor, wherein the processor processes audio signals received using a method for phoneme replacement.

Type: Grant

Filed: August 1, 2022

Date of Patent: September 24, 2024

Assignee: DEKA Products Limited Partnership

Inventors: Dean Kamen, Derek G. Kane
Techniques for establishing communications with third-party accessories

Patent number: 12095939

Abstract: Techniques are disclosed for connecting third-party accessories to a cellular-capable device to participate in a telephone call. In one example, a user can voice a request to make a call to an accessory device. The accessory device can transmit the request to a controller device. Upon processing the request, the controller device can identify an appropriate cellular-capable device and instruct the cellular-capable device to place the requested call. The controller device can also instruct the cellular-capable device to establish an audio connection with the accessory device to relay the call audio. In another example, the controller device can listen for a word spoken at the accessory device indicating the end of the call. Upon receiving the end of call word, the controller device can instruct the cellular-capable device to terminate the call. While in the listening state, the controller device may continue processing user requests received at other accessory devices.

Type: Grant

Filed: April 12, 2022

Date of Patent: September 17, 2024

Assignee: Apple Inc.

Inventors: Jared S. Grubb, Robert M. Stewart, Gabriel Sanchez, Zaka ur Rehman Ashraf, Anshul Jain
Display device and system comprising same

Patent number: 12079544

Abstract: A display device according to an embodiment of the present invention may comprise: a display unit for displaying a content image; a microphone for receiving voice commands from a user; a network interface unit for communicating with a natural language processing server and a search server; and a control unit for transmitting the received voice commands to the natural language processing server, receiving intention analysis result information indicating the user's intention corresponding to the voice commands from the natural language processing server, and performing a function of the display device according to the received intention analysis result information.

Type: Grant

Filed: June 26, 2023

Date of Patent: September 3, 2024

Assignee: LG ELECTRONICS INC.

Inventors: Sunki Min, Kiwoong Lee, Hyangjin Lee, Jeean Chang, Seunghyun Heo, Jaekyung Lee
Freeze words

Patent number: 12073826

Abstract: A method for detecting freeze words includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device associated with the user. The method also includes processing, using a speech recognizer, the audio data to determine that the utterance includes a query for a digital assistant to perform an operation. The speech recognizer is configured to trigger endpointing of the utterance after a predetermined duration of non-speech in the audio data. Before the predetermined duration of non-speech, the method includes detecting a freeze word in the audio data. In response to detecting the freeze word in the audio data, the method also includes triggering a hard microphone closing event at the user device. The hard microphone closing event prevents the user device from capturing any audio subsequent to the freeze word.

Type: Grant

Filed: May 23, 2023

Date of Patent: August 27, 2024

Assignee: Google LLC

Inventors: Matthew Sharifi, Aleksandar Kracun
Two-pass end to end speech recognition

Patent number: 12073824

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

Type: Grant

Filed: December 3, 2020

Date of Patent: August 27, 2024

Assignee: GOOGLE LLC

Inventors: Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Jean Bruguier, Shuo-Yiin Chang, Wei Li
Methods and systems for confusion reduction for compressed acoustic models

Patent number: 12067978

Abstract: Methods and systems are disclosed herein for improvements relating to compressed automatic speech recognition (ASR) systems. The ASR system may comprise a compressed acoustic engine and an adaptive decoder. The adaptive decoder may be dynamically compiled based on characteristics of the compressed acoustic engine and a current state of the application device. In some embodiments, a dynamic command list is used to manage context-specific commands. Two or more commands recognized by the adaptive decoder may be confusable due to compression of the ASR system. Alternate commands may be determined that are semantically equivalent but phonetically different than the confusable commands to reduce classification error of the adaptive decoder. An alternate command may replace one or more of the confusable commands in the adaptive decoder. In some embodiments, a user interface is displayed to a user of the ASR system to select the alternate command for replacement in the decoder.

Type: Grant

Filed: June 1, 2021

Date of Patent: August 20, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Fuliang Weng, Alexei Ivanov, Stephen Cradock
Electronic device and method for controlling the electronic device thereof

Patent number: 12062370

Abstract: An electronic device and a method for controlling the electronic device are disclosed. The method includes receiving a trigger speech of a user, entering a speech recognition mode to recognize a speech command of the user in response, and transmitting information to enter the speech recognition mode to at least one external device located at home. Further, the method includes obtaining a first speech information corresponding to a speech uttered by a user from a microphone included in the electronic device and receiving at least one second speech information corresponding to a speech uttered by the user from the at least one external device; identifying a task corresponding to a speech uttered by the user and an external device to perform the task based on the first speech information and the at least one second speech information; and transmitting, to the identified external device, information about the task.

Type: Grant

Filed: January 29, 2021

Date of Patent: August 13, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventor: Jeonghyun Yun
Multi-turn dialogue response generation with persona modeling

Patent number: 12039280

Abstract: Machine classifiers in accordance with embodiments of the invention capture long-term temporal dependencies in particular tasks, such as turn-based dialogues. Machine classifiers may be used to help users to perform tasks indicated by the user. When a user utterance is received, natural language processing techniques may be used to understand the user's intent. Templates may be determined based on the user's intent in the generation of responses to solicit information from the user. A variety of persona attributes may be determined for a user. The persona attributes may be determined based on the user's utterances and/or provided as metadata included with the user's utterances. A response persona may be used to generate responses to the user's utterances such that the generated responses match a tone appropriate to the task. A response persona may be used to generate templates to solicit additional information and/or generate responses appropriate to the task.

Type: Grant

Filed: April 17, 2023

Date of Patent: July 16, 2024

Assignee: Capital One Services, LLC

Inventors: Oluwatobi Olabiyi, Erik T. Mueller, Rui Zhang, Zachary Kulis, Varun Singh
Speech recognition for keywords

Patent number: 12026753

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser. The method further includes generating a score for the candidate adword based on a likelihood of a speech recognizer generating, based on an utterance of the candidate adword, a transcription that includes a word that is associated with an expected pronunciation of the candidate adword. The method further includes classifying, based at least on the score, the candidate adword as an appropriate adword for use in a bidding process for advertisements that are selected based on a transcription of a speech query or as not an appropriate adword for use in the bidding process for advertisements that are selected based on the transcription of the speech query.

Type: Grant

Filed: May 5, 2021

Date of Patent: July 2, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Annotation method, relation extraction method, storage medium and computing device

Patent number: 12026453

Abstract: An annotation method, a method of relation extraction, a non-transient computer storage medium, and a computing device are provided. The annotation method includes: traversing each sentence in a text to be annotated to generate a first template and selecting the first template; traversing each sentence in the text to be annotated, based on the selected first template, to match at least one new seed; evaluating the at least one new seed having been matched; repeating the above steps until a selected condition is met, and outputting the matched correct seed and a classification relationship between a first entity and a second entity in the matched correct seed.

Type: Grant

Filed: February 26, 2021

Date of Patent: July 2, 2024

Assignee: BOE TECHNOLOGY GROUP CO., LTD.

Inventor: Yafei Dai
Background audio identification for speech disambiguation

Patent number: 12002452

Abstract: Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.

Type: Grant

Filed: December 21, 2022

Date of Patent: June 4, 2024

Assignee: Google LLC

Inventors: Jason Sanders, Gabriel Taubman, John J. Lee
User feedback for speech interactions

Patent number: 11990119

Abstract: An interactive system may be implemented in part by an audio device located within a user environment, which may accept speech commands from a user and may also interact with the user by means of generated speech. In order to improve performance of the interactive system, a user may use a separate device, such as a personal computer or mobile device, to access a graphical user interface that lists details of historical speech interactions. The graphical user interface may be configured to allow the user to provide feedback and/or corrections regarding the details of specific interactions.

Type: Grant

Filed: March 8, 2021

Date of Patent: May 21, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Gilles Jean Roger Belin, Charles S. Rogers, III, Robert David Owen, Jeffrey Penrod Adams, Rajiv Ramachandran, Gregory Michael Hart
Developing an automatic speech recognition system using normalization

Patent number: 11978434

Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.

Type: Grant

Filed: September 29, 2021

Date of Patent: May 7, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Satarupa Guha, Ankur Gupta, Rahul Ambavat, Rupeshkumar Rasiklal Mehta
Wakeword detection

Patent number: 11978440

Abstract: Techniques for processing input data for a detected user are described. Received image data is processed to identify an indicated user. Based on the user a machine learning model is implemented. The machine learning model is then used to process input data for a user input. An action is performed using the resulting output data.

Type: Grant

Filed: May 25, 2023

Date of Patent: May 7, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Deepak Yavagal, Ajith Prabhakara, John Gray
Cross-device presentation with conversational user interface

Patent number: 11978113

Abstract: Techniques are described for cross-device presentation. During a speech interaction with a conversational user interface (CUI) executing on an input device, such as a personal assistant (PA) device or other computing device, a user may utter one or more search terms to search for an item, such as a vehicle to purchase. The search term(s) may be employed by a search engine to identify one or more items that correspond to the search term(s). The search engine can generate recommendation information that includes a description of the item(s) corresponding to the search term(s). The recommendation information can be communicated to an output device that is registered to, or otherwise associated with, the user who spoke the search term(s) to the input device. In some instances, the recommendation information can be presented through speech output on the PA device or other device.

Type: Grant

Filed: May 31, 2023

Date of Patent: May 7, 2024

Assignee: United Services Automobile Association (USAA)

Inventors: Philip Andrew Leal, Ricardo Alcantar
Speech recognition method and apparatus

Patent number: 11955119

Abstract: A speech recognition method includes receiving speech data, obtaining, from the received speech data, a candidate text including at least one word and a phonetic symbol sequence associated with a pronunciation of a target word included in the received speech data, using a speech recognition model, replacing the phonetic symbol sequence included in the candidate text with a replacement word corresponding to the phonetic symbol sequence, and determining a target text corresponding to the received speech data based on a result of the replacing.

Type: Grant

Filed: December 16, 2022

Date of Patent: April 9, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventor: Jihyun Lee
Extensible search, content, and dialog management system with human-in-the-loop curation

Patent number: 11948566

Abstract: The present disclosure describes systems and methods for extensible search, content, and dialog management. Embodiments of the present disclosure provide a dialog system with a trained intent recognition model (e.g., a deep learning model) to receive and understand a natural language query from a user. In cases where intent is not identified for a received query, the dialog system generates one or more candidate responses that may be refined (e.g., using human-in-the-loop curation) to generate a response. The intent recognition model may be updated (e.g., retrained) the accordingly. Upon receiving a subsequent query with similar intent, the dialog system may identify the intent using the updated intent recognition model.

Type: Grant

Filed: March 24, 2021

Date of Patent: April 2, 2024

Assignee: ADOBE INC.

Inventors: Oliver Brdiczka, Kyoung Tak Kim, Charat Maheshwari
Speech processing optimizations based on microphone array

Patent number: 11935525

Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.

Type: Grant

Filed: June 8, 2020

Date of Patent: March 19, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Shiva Kumar Sundaram, Minhua Wu, Anirudh Raju, Spyridon Matsoukas, Arindam Mandal, Kenichi Kumatani
Presenting location related information and implementing a task based on gaze, gesture, and voice detection

Patent number: 11906317

Abstract: Systems and methods for presenting information and executing a task. In an aspect, when a user gazes at a display of a standby device, location related information is presented. In another aspect, when a user utters a voice command and gazes or gestures at a device, a task is executed. In another aspect, a voice input, a gesture, and user information are used to determine a destination for a trip or a product for a purchase. In another aspect, a voice input and user information are used to determine a destination when a user hails a vehicle.

Type: Grant

Filed: June 8, 2021

Date of Patent: February 20, 2024

Inventor: Chian Chiu Li
Method and system for processing speech signal

Patent number: 11900958

Abstract: Embodiments of the present disclosure provide methods and systems for processing a speech signal. The method can include: processing the speech signal to generate a plurality of speech frames; generating a first number of acoustic features based on the plurality of speech frames using a frame shift at a given frequency; and generating a second number of posteriori probability vectors based on the first number of acoustic features using an acoustic model, wherein each of the posteriori probability vectors comprises probabilities of the acoustic features corresponding to a plurality of modeling units, respectively.

Type: Grant

Filed: December 26, 2022

Date of Patent: February 13, 2024

Assignee: Alibaba Group Holding Limited

Inventors: Shiliang Zhang, Ming Lei, Wei Li, Haitao Yao
Display apparatus and method for registration of user command

Patent number: 11900939

Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.

Type: Grant

Filed: October 7, 2022

Date of Patent: February 13, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-yeong Kwon, Kyung-mi Park
Display apparatus and method for registration of user command

Patent number: 11862166

Abstract: A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.

Type: Grant

Filed: October 7, 2022

Date of Patent: January 2, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-yeong Kwon, Kyung-mi Park
Method and system for detecting unsupported utterances in natural language understanding

Patent number: 11854528

Abstract: An apparatus for detecting unsupported utterances in natural language understanding, includes a memory storing instructions, and at least one processor configured to execute the instructions to classify a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, obtain an OOD score of the extracted feature, and identify whether the feature is classified as OOD. The at least one processor is further configured to executed the instructions to, based on the feature being identified to be classified as in-domain, identify whether the obtained OOD score is greater than a predefined threshold, and based on the OOD score being identified to be greater than the predefined threshold, re-classify the feature as OOD.

Type: Grant

Filed: August 13, 2021

Date of Patent: December 26, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Yen-Chang Hsu, Yilin Shen, Avik Ray, Hongxia Jin
Near real time out of home audience measurement

Patent number: 11844043

Abstract: Methods, apparatus, systems and articles of manufacture for near real time out of home audience measurement are disclosed. An example apparatus includes at least one memory; instructions; and processor circuitry to execute the instructions to at least: receive a first data transmission request at a first portable meter; send a second data transmission request from the first portable meter to a second portable meter; determine whether the first portable meter is capable of transmitting at least one data packet, based at least in part on an indication the second portable meter is capable of transmitting the at least one data packet; and in response to determining the first portable meter is capable of transmitting the at least one data packet, transmit the at least one data packet.

Type: Grant

Filed: June 28, 2021

Date of Patent: December 12, 2023

Assignee: The Nielsen Company (US), LLC

Inventors: John T. Livoti, Stanley Wellington Woodruff
Method, apparatus, electronic device and storage medium for speech recognition

Patent number: 11842726

Abstract: A computer-implemented method for speech recognition is disclosed. The method includes extracting a feature word associated with location information from a speech to be recognized, and calculating a similarity between the feature word and respective ones of a plurality of candidate words in a corpus. The corpus includes a first sub-corpus associated with at least one user, and the plurality of candidate words include, in the first sub-corpus, a first standard candidate word and at least one first erroneous candidate word. The at least one first erroneous candidate word has a preset correspondence with the first standard candidate word. The method further includes in response to the similarity between the feature word and one or more of the at least one first erroneous candidate word satisfying a predetermined condition, outputting the first standard candidate word as a recognition result based on the preset correspondence.

Type: Grant

Filed: September 8, 2021

Date of Patent: December 12, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Jing Pei, Xiantao Chen, Meng Xu
Age-sensitive automatic speech recognition

Patent number: 11837221

Abstract: Systems and methods are described to receive a query from a user and provide a reply that is appropriate for an age group of the user. A query for a media asset is received, where such query comprises an inputted term, and the query is determined to be received from a user belonging to a first age group. A context of the inputted term within the query is identified, and in response to the determining, based on the identified context, that the inputted term of the query is inappropriate for the first age group, a replacement term for the inputted term that is related to the inputted term and is appropriate for the first age group in the context of the query is identified. The query is modified to replace the inputted term with the identified replacement term, and a reply to the modified query is generated for output.

Type: Grant

Filed: February 26, 2021

Date of Patent: December 5, 2023

Assignee: Rovi Guides, Inc.

Inventors: Ankur Anil Aher, Jeffry Copps Robert Jose
Speech recognition with selective use of dynamic language models

Patent number: 11810568

Abstract: A computer-implemented method for transcribing an utterance includes receiving, at a computing system, speech data that characterizes an utterance of a user. A first set of candidate transcriptions of the utterance can be generated using a static class-based language model that includes a plurality of classes that are each populated with class-based terms selected independently of the utterance or the user. The computing system can then determine whether the first set of candidate transcriptions includes class-based terms. Based on whether the first set of candidate transcriptions includes class-based terms, the computing system can determine whether to generate a dynamic class-based language model that includes at least one class that is populated with class-based terms selected based on a context associated with at least one of the utterance and the user.

Type: Grant

Filed: December 10, 2020

Date of Patent: November 7, 2023

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Systems and methods for adaptive proper name entity recognition and understanding

Patent number: 11783830

Abstract: Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question, e.g. words that are not part of the proper name entities, may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.

Type: Grant

Filed: May 26, 2021

Date of Patent: October 10, 2023

Assignee: Promptu Systems Corporation

Inventor: Harry William Printz
System and method for controllable machine text generation architecture

Patent number: 11763100

Abstract: A system is provided comprising a processor and a memory storing instructions which configure the processor to process an original sentence structure through an encoder neural network to decompose the original sentence structure into an original semantics component and an original syntax component, process the original syntax component through a syntax variation autoencoder (VAE) to receive a syntax mean vector and a syntax covariance matrix, obtain a sampled syntax value from a syntax Gaussian posterior parameterized by the syntax mean vector and the syntax covariance matrix, process the original semantics component through a semantics VAE to receive a semantics mean vector and a semantics covariance matrix, obtain a sampled semantics vector from the Gaussian semantics posterior parameterized by the semantics mean vector and the semantics covariance matrix, and process the sampled syntax vector and the sampled semantics vector through a decoder neural network to compose a new sentence.

Type: Grant

Filed: May 22, 2020

Date of Patent: September 19, 2023

Assignee: ROYAL BANK OF CANADA

Inventors: Peng Xu, Yanshuai Cao, Jackie C. K. Cheung
Electronic device and method for controlling the same, and storage medium

Patent number: 11735167

Abstract: Disclosed is an electronic device recognizing an utterance voice in units of individual characters. The electronic device includes: a voice receiver; and a processor configured to: obtain a recognition character converted from a character section of a user voice received through the voice receiver, and recognize a candidate character having high acoustic feature related similarity with the character section among a plurality of acquired candidate characters as an utterance character of the character section based on a confusion possibility with the acquired recognition character.

Type: Grant

Filed: November 24, 2020

Date of Patent: August 22, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jihun Park, Dongheon Seok
Speaker recognition with assessment of audio frame contribution

Patent number: 11735191

Abstract: This application describes methods and apparatus for speaker recognition. An apparatus according to an embodiment has an analyzer for analyzing each frame of a sequence of frames of audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame. An assessment module determines, for each frame of audio data, a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined characteristic of the speech sound. Said contribution indicator comprises a weighting to be applied to each frame in the speaker recognition processing. In this way frames which correspond to speech sounds that are of most use for speaker discrimination may be emphasized and/or frames which correspond to speech sounds that are of least use for speaker discrimination may be de-emphasized.

Type: Grant

Filed: June 25, 2019

Date of Patent: August 22, 2023

Assignee: Cirrus Logic, Inc.

Inventors: John Paul Lesso, John Laurence Melanson
Display device and system comprising same

Patent number: 11704089

Abstract: A display device according to an embodiment of the present invention may comprise: a display unit for displaying a content image; a microphone for receiving voice commands from a user; a network interface unit for communicating with a natural language processing server and a search server; and a control unit for transmitting the received voice commands to the natural language processing server, receiving intention analysis result information indicating the user's intention corresponding to the voice commands from the natural language processing server, and performing a function of the display device according to the received intention analysis result information.

Type: Grant

Filed: January 11, 2021

Date of Patent: July 18, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Sunki Min, Kiwoong Lee, Hyangjin Lee, Jeean Chang, Seunghyun Heo, Jaekyung Lee
Methods and apparatus to analyze performance of watermark encoding devices

Patent number: 11676073

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that analyze performance of manufacturer independent devices. An example apparatus includes a software development kit (SDK) deployment engine to deploy an SDK to a manufacturer of a device, the SDK to define heartbeat data to be collected from the device and interfacing techniques to transmit the heartbeat data to a measurement entity. In some examples, the apparatus includes a machine learning engine to predict whether the device is associated with one or more failure modes. The example apparatus also includes an alert generator to generate an alert based on a prediction, the alert to indicate at least one of a type of a first one of the failure modes or at least one component of the device to be remedied according to the first one of the one or more failure modes, and transmit the alert to a management agent.

Type: Grant

Filed: July 12, 2021

Date of Patent: June 13, 2023

Assignee: The Nielsen Company (US), LLC

Inventors: John T. Livoti, Susan Cimino, Stanley Wellington Woodruff, Rajakumar Madhanganesh, Alok Garg
Pre-training with alignments for recurrent neural network transducer based end-to-end speech recognition

Patent number: 11657799

Abstract: Techniques performed by a data processing system for training a Recurrent Neural Network Transducer (RNN-T) herein include encoder pretraining by training a neural network-based token classification model using first token-aligned training data representing a plurality of utterances, where each utterance is associated with a plurality of frames of audio data and tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames; obtaining first cross-entropy (CE) criterion from the token classification model, wherein the CE criterion represent a divergence between expected outputs and reference outputs of the model; pretraining an encoder of an RNN-T based on the first CE criterion; and training the RNN-T with second training data after pretraining the encoder of the RNN-T. These techniques also include whole-network pre-training of the RNN-T.

Type: Grant

Filed: April 3, 2020

Date of Patent: May 23, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong, Hu Hu
Content analysis to enhance voice search

Patent number: 11636146

Abstract: Methods and apparatus for improving speech recognition accuracy in media content searches are described. An advertisement for a media content item is analyzed to identify keywords that may describe the media content item. The identified keywords are associated with the media content item for use during a voice search to locate the media content item. A user may speak the one or more of the keywords as a search input and be provided with the media content item as a result of the search.

Type: Grant

Filed: September 23, 2019

Date of Patent: April 25, 2023

Assignee: Comcast Cable Communications, LLC

Inventor: George Thomas Des Jardins
Voice morphing apparatus having adjustable parameters

Patent number: 11600284

Abstract: A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.

Type: Grant

Filed: January 11, 2020

Date of Patent: March 7, 2023

Assignee: SOUNDHOUND, INC.

Inventor: Steve Pearson
Information processing apparatus, information processing method, and computer program product

Patent number: 11593621

Abstract: An information processing apparatus according to an embodiment includes one or more hardware processors. The hardware processors obtain a first categorical distribution sequence corresponding to first input data and obtain a second categorical distribution sequence corresponding to second input data neighboring the first input data, by using a prediction model outputting a categorical distribution sequence representing a sequence of L categorical distributions for a single input data piece, where, L is a natural number of two or more. The hardware processors calculate, for each i of 1 to L, an inter-distribution distance between i-th categorical distributions in the first and second categorical distribution sequences. The hardware processors calculate a sum of L inter-distribution distances. The hardware processors update the prediction model's parameters to lessen the sum.

Type: Grant

Filed: January 24, 2020

Date of Patent: February 28, 2023

Assignees: KABUSHIKI KAISHA TOSHIBA, TOSHIBA DIGITAL SOLUTIONS CORPORATION

Inventor: Ryohei Tanaka

1 2 3 4 5 … next