Abstract: Methods for determining optimal cloud service resource include determining a reward function for a set of resource configurations identifying cloud service resource parameters. The cloud service resource parameters include a source parameter and a target parameter of services to provide a client computing device. A source parameter dataset for the source parameter and a target parameter dataset is generated using the reward function and historical source parameter data. The matrices are then subject to SVD and clustering. A target parameter reward dataset is learned from output of the SVD and clustering. The target parameter dataset is used to determine the parameters for the target parameter for providing corresponding cloud service resources.
Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example process includes receiving an utterance including a user request, determining, based on the user request, a domain associated with the user request, determining, based on the domain, a first subsequent user action and a second subsequent user action, determining, based on the domain, a first parameter for the first subsequent user action and a second parameter for the second subsequent user action, in accordance with a determination that a first score associated with the first subsequent user action is higher than a score associated with the second subsequent user action, selecting the first subsequent user action as a suggested subsequent user action, and providing the suggested subsequent user action.
Type:
Grant
Filed:
September 6, 2023
Date of Patent:
May 6, 2025
Assignee:
Apple Inc.
Inventors:
Jamie Harry Drummie, Silvia Frias Delgado, Andrew J. Haines, Samuel E. Turnbull, Johan D. Forsell
Abstract: Embodiments of the present invention provide a system for generating two-sided electronic interaction requests for completing resource transfers. In particular, the system may be configured to cause an entity system to transmit interaction data associated with an interaction between a user and a third party system, receive, from an entity system, the interaction data of the interaction associated with the user and the third party system, cause the third party system to generate a resource transfer request associated with the interaction, receive, from the third party system, the resource transfer request associated with the interaction, determine that the interaction data matches data accompanied by the resource transfer request for transfer of resources associated with the interaction, and route the resource transfer request to a settlement system to process the interaction and complete the transfer of resources associated with the interaction.
Type:
Grant
Filed:
August 23, 2021
Date of Patent:
April 22, 2025
Assignee:
BANK OF AMERICA CORPORATION
Inventors:
Charles Russell Kendall, Richard C. Clow, II
Abstract: A portable terminal device in an information processing system and method includes a camera and a microphone. Data of obtained images and voice are transmitted to a server that identifies operations to be executed based on the received voice and image data. The server transmits an identification of one or more results of the plurality of operations to the portable terminal device. When the portable terminal device receives only one result from the server, an operation corresponding to the one result is executed, and when a plurality of results is received, the portable terminal device displays information corresponding to the plurality of results as candidates. Additional voice is captured for selecting one of the plurality of results during the displaying of the information. A determination of one result from the plurality of results is made based on the captured voice, and an operation corresponding to the determined result is executed.
Abstract: A document summarizing apparatus includes an encoding unit receiving document data comprised of one or more sentences and converts the document data into a token defined in a predetermined unit to generate a feature vector, an extraction summary unit receiving the feature vector and calculating a probability value that each sentence corresponds to a summary, with respect to each one or more sentences constituting the document data, and generating an attention vector for each token weight based on the probability value, and a decoding unit receiving the feature vector and the attention vector and generating abstract summary data.
Abstract: The present disclosure provides a video image composition method including the following steps. A priority level list is obtained, and the priority level list includes multiple priority levels of multiple person identities. Multiple video streams are received. Multiple identity labels corresponding to human face frame images from the video streams are determined. The multiple display levels of the human face frame images are determined according to the identity labels and priority level list. A part of the human face frame images being in speaking status are detected. At least one of the part of the human face frame images being in speaking status is constituted as a main display area of a video image, according to the display levels.
Type:
Grant
Filed:
May 26, 2022
Date of Patent:
April 15, 2025
Assignee:
AmTRAN Technology Co., Ltd.
Inventors:
Yen-Chou Chen, Chui-Pang Chiu, Che-Chia Ho
Abstract: Systems and methods for distinguishing valid voice commands from false voice commands in an interactive media guidance application. In some aspects, the interactive media guidance application receives, at a user device, a signature sound sequence. The interactive media guidance application determines, using control circuitry, based on the signature sound sequence, a threshold gain for the current location of the user device. The interactive media guidance application receives, at the user device, a voice command. The interactive media guidance application determines, using the control circuitry, based on the voice command, a gain for the voice command. The interactive media guidance application determines, using the control circuitry, whether the gain for the voice command is different from the threshold gain. Based on determining that the gain for the voice command is different from the threshold gain, the interactive media guidance application executes, using the control circuitry, the voice command.
Abstract: Techniques are described for selectively adapting and/or selectively utilizing a noise reduction technique in detection of one or more features of a stream of audio data frames. For example, various techniques are directed to selectively adapting and/or utilizing a noise reduction technique in detection of an invocation phrase in a stream of audio data frames, detection of voice characteristics in a stream of audio data frames (e.g., for speaker identification), etc. Utilization of described techniques can result in more robust and/or more accurate detections of features of a stream of audio data frames in various situations, such as in environments with strong background noise. In various implementations, described techniques are implemented in combination with an automated assistant, and feature(s) detected utilizing techniques described herein are utilized to adapt the functionality of the automated assistant.
Type:
Grant
Filed:
May 13, 2024
Date of Patent:
March 25, 2025
Assignee:
GOOGLE LLC
Inventors:
Christopher Hughes, Yiteng Huang, Turaj Zakizadeh Shabestary, Taylor Applebaum
Abstract: A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect.
Type:
Grant
Filed:
July 14, 2021
Date of Patent:
March 25, 2025
Assignee:
Google LLC
Inventors:
Lev Finkelstein, Chun-an Chan, Byungha Chun, Norman Casagrande, Yu Zhang, Robert Andrew James Clark, Vincent Wan
Abstract: An improvement to a system for identity verification is provided in which data records are continuously updates to provide the validation, verification and trusted confidence values of an entity (an individual person or organization) for each type and level of identification needed. In addition to the iterative updating of conventional data, the historical change in recorded data is compared with newly received entity identification verification parameters, with changes and an analysis of the changes also iteratively tracked and stored as part of the data record with continuous updating. The data record may also include emotional, mood or feelings responses to emotional, mood or feeling prompts. Similarly, the historical change in emotional, mood or feeling responses is compared with newly received responses to similar or different prompts, with changes and an analysis of the changes also iteratively tracked and stored as part of the data record with continuous updating.
Abstract: It is an object to provide a device for estimating a plurality of mental/nervous system diseases by the voice analysis, the device being capable of estimating either major depression or bipolar disorder. Furthermore, there are provided an estimation device including an extraction means for extracting an acoustic feature amount that is not affected by a location where voices are acquired, and a method for operating the estimation device.
Abstract: Tabular data is accessed that contains multiple entries of alphanumeric data. Multiple tokens are generated of the multiple entries of alphanumeric data using a tokenization process. The tokenization process maintains jargon-specific features of the alphanumeric data. Multiple embeddings of the multiple entries of alphanumeric data are generated using the tokens. The embeddings capture similarity of the multiple entries considering all of global features, column features, and row features in the tokens of the tabular data. A neural network is used to predict probabilities for pre-defined classes for the tabular data using the generated embeddings.
Type:
Grant
Filed:
September 24, 2021
Date of Patent:
March 18, 2025
Assignee:
International Business Machines Corporation
Abstract: In one embodiment, an eyeglass frame includes a lens holder, a first temple with a first end close to the lens holder and a second end, a second temple, an electrical connector and a printed circuit board. The printed circuit board with at least one electrical component can be in the first temple. The connector can be close to the first end of the first temple, facing downward, and electrically connected to the at least one electrical component. In an embodiment, a pair of glasses can perform hearing enhanced functions to enhance audio signals for the user to hear. One embodiment is configured to be a headset. One embodiment has audio and/or textual output capabilities for a user to communicate in different ways depending on configuration, user preferences, prior history, etc. In one embodiment, the communication between users is achieved by short audio or textual messages.
Type:
Grant
Filed:
December 15, 2023
Date of Patent:
March 4, 2025
Assignee:
IngenioSpec, LLC
Inventors:
Thomas A. Howell, David Chao, C. Douglass Thomas, Peter P. Tong
Abstract: Systems, apparatuses, and methods are described for determining a direction associated with a detected spoken keyword, forming an acoustic beam in the determined direction, and listening for subsequent speech using the acoustic beam in the determined direction.
Abstract: A method for assisting a patient through a therapy session using a live real-time voice interaction with a narrative audio is disclosed. The method includes steps of: extracting an emotional content from the set of multimedia in a playlist; calculating an emotional score for each emotional content in the playlist using a deep machine learning model; generating an emotional arc for the emotional content based on the calculated emotional score, previous response from the patient to different genres and personal information of the patient; generating a narrative text from the set of multimedia and generating a narrative text story based on the generated narrative text and the personal information of the patient using an artificial intelligence (AI) story model; converting the generated narrative text story to an audio; and generating a narrative audio by combining the audio file with the emotional arc.
Type:
Grant
Filed:
November 28, 2022
Date of Patent:
March 4, 2025
Assignee:
SBX Technologies Corporation and Aikomi Co. Ltd
Inventors:
Sucheendra Kumar Palaniappan, Vikraman Karunanidhi, Samik Ghosh, Hiroaki Kitano, Nicholas William Hird
Abstract: The present disclosure relates to a data augmentation system and method that uses a large pre-trained encoder language model to generate new, useful intent samples from existing intent samples without fine-tuning. In certain embodiments, for a given class (intent), a limited number of sample utterances of a seed intent classification dataset may be concatenated and provided as input to the encoder language model, which may generate new sample utterances for the given class (intent). Additionally, when the augmented dataset is used to fine-tune an encoder language model of an intent classifier, this technique improves the performance of the intent classifier.
Abstract: In accordance with an embodiment, a system includes: an audio signal pre-processor; a pressure signal pre-processor; an audio signal feature processor; a pressure signal feature processor; a feature combining processor; and a classification processor configured for classifying the external impact on a window or access opening of an enclosed structure by classifying an audio feature and pressure feature vector in order to produce a classification output; wherein the classification processor is configured for executing a first machine learning algorithm, wherein the audio feature and pressure feature vector is fed to an input layer of the first machine learning algorithm, and wherein the classification output is based on an output of the first machine learning algorithm.
Abstract: Techniques for routing data, in a system including multiple assistants, are described. A user device may store configuration data for a virtual assistant, where the configuration data includes a virtual assistant identifier, one or more resource identifiers, and optionally a virtual assistant name. A resource identifier may correspond to a component or device(s) of the virtual assistant. When the user device receives event data associated with a virtual assistant identifier, the user device may use stored configuration data to determine a resource identifier(s) associated with the virtual assistant identifier, associated with the event data. The user device may thereafter send the event data to the component and/or device(s) corresponding to the determined resource identifier(s).
Abstract: A method uses natural language processing for visual analysis of a dataset. The method includes receiving a first natural language (NL) input directed to a data source, from a first client. The method also includes parsing the first NL input into tokens based on a grammar and the data source. The method also includes generating and outputting an intermediate NL response, to a second client, based on the tokens and output modality of the second client. In response to receiving a user input to provide missing information: the method also includes generating an input query based on the user input; and querying the data source using the input query, to obtain a result set. The method also includes generating and outputting, to the second client, a first NL output and a snapshot of a data visualization, based on the result set and the output modality of the second client.
Abstract: A dialogue generation method, a network training method and apparatus, a storage medium, and a device are provided. The method includes: predicting, based on a plurality of a plurality of pieces of candidate knowledge text in a first candidate knowledge set, a preliminary dialogue response of a first dialogue preceding text; processing the first dialogue preceding text based on the preliminary dialogue response to obtain a first dialogue preceding text vector; obtaining a piece of target knowledge text based on a probability value of the piece of target knowledge text of being selected to be used in generating a final dialogue response, the probability value being obtained based on the first dialogue preceding text vector; and generating the final dialogue response based on the first dialogue preceding text and the piece of target knowledge text.
Type:
Grant
Filed:
September 7, 2022
Date of Patent:
January 28, 2025
Assignee:
TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Inventors:
Xiuyi Chen, Fandong Meng, Peng Li, Jie Zhou
Abstract: A speech recognition model includes an encoder network, a prediction network, and a joint network. The encoder network is configured to receive a sequence of acoustic frames characterizing an input utterance; and generate, at each of a plurality of output steps, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The prediction network is configured to: receive a sequence of non-blank symbols output by a final Softmax layer; and generate, at each of the plurality of output steps, a dense representation. The joint network is configured to generate, at each of the plurality of output steps based on the higher order feature representation and the dense representation, a probability distribution over possible speech recognition hypotheses. The joint network includes a stack of gating and bilinear pooling to fuse the dense representation and the higher order feature representation.
Type:
Grant
Filed:
August 19, 2022
Date of Patent:
January 28, 2025
Assignee:
Google LLC
Inventors:
Chao Zhang, Bo Li, Zhiyun Lu, Tara N. Sainath, Shuo-yiin Chang
Abstract: Techniques are disclosed that enable clarifying whether a user query corresponds to a candidate intent when an intent score (indicating the probability the user query corresponds to the candidate intent) fails to satisfy a threshold likelihood value but is “close” to satisfying the threshold likelihood value. For example, the intent score can fail to satisfy the threshold likelihood value but can satisfy an additional threshold likelihood value. Various implementations include generating the candidate intent and corresponding intent score by processing a natural language user query using a natural language understanding (NLU) model.
Abstract: An emergency event detection and response system detects an occurrence of an event associated with a user and initiates an emergency response flow. A user may be associated with a wearable device and have in his home a base station and portable or stationary wireless devices containing sensors capable of detecting an emergency event. The emergency event may be detected based on voice or non-voice audio input from the user, data monitoring by the wearable device, base station, and/or portable or stationary wireless device, or by physical button press. Responsive to determining that an emergency event has occurred, the system triggers an emergency response flow by notifying a call center and contacting one or more caregivers associated with the user. Caregivers may access a response system application to receive updates regarding the detected emergency and to contact the user and/or a provider associated with the call center.
Type:
Grant
Filed:
May 30, 2023
Date of Patent:
January 21, 2025
Assignee:
Aloe Care Health, Inc.
Inventors:
Lasse Hamre, Raymond Eugene Spoljaric, Evan Samuel Schwartz, Ryan Christopher Haigh, Alexander Neville Sassoon, Sveinung Kval Bakken
Abstract: A device which generates a speech moving image includes a first encoder, a second encoder, a combination unit, and an image reconstruction unit. The first encoder receives a person background image in which a portion related to speech of a person that is a video part of the speech moving image of the person is covered with a mask, extracts an image feature vector from the person background image, and compresses the extracted image feature vector. The second encoder receives a speech audio signal that is an audio part of the speech moving image, extracts a voice feature vector from the speech audio signal, and compresses the extracted voice feature vector. The combination unit generates a combination vector of the compressed image feature vector and the compressed voice feature vector. The image reconstruction unit reconstructs the speech moving image of the person with the combination as an input.
Abstract: Systems, methods, and non-transitory computer readable media including instructions for noise suppression are described. A head mountable system for noise suppression includes a wearable housing; a coherent light source configured to project light towards a facial region of the head; a detector configured to receive coherent light reflections from the facial region associated with facial skin micromovements and to output associated reflection signals; and a processor configured to: analyze the reflection signals to determine speech timing; receive audio signals from at least one microphone; correlate the reflection signals with the received audio signals to determine portions of the audio signals associated with the words spoken by the wearer; and output the determined portions of the audio signals associated with words spoken by the wearer, while omitting output of other portions of the audio signals not containing the words spoken by the wearer.
Type:
Grant
Filed:
November 7, 2023
Date of Patent:
January 21, 2025
Assignee:
Q (Cue) Ltd.
Inventors:
Aviad Maizels, Yonatan Wexler, Avi Barliya
Abstract: This disclosure enables various technologies that can (1) learn new synonyms for a given concept without manual curation techniques, (2) relate (e.g., map) some, many, most, or all raw named entity recognition outputs (e.g., “United States”, “United States of America”) to ontological concepts (e.g., ISO-3166 country code: “USA”), (3) account for false positives from a prior named entity recognition process, or (4) aggregate some, many, most, or all named entity recognition results from machine learning or rules based approaches to provide a best of breed hybrid approach (e.g., synergistic effect).
Type:
Grant
Filed:
February 5, 2021
Date of Patent:
January 7, 2025
Assignee:
Tellic LLC
Inventors:
Richard Edward Wendell, Eric Tanalski, Henry Edward Crosby, III, Loren Lee Chen, Paul Ton, Jake Rubenstein
Abstract: An electronic device includes a camera, a display, and a processor. The processor displays a user interface including menu items supporting entry into an edit mode for a body part of an emoji displayed on the display. The processor captures a user face image using the camera, upon a user requesting facial expression edit mode from the interface. The processor generates a facial expression motion file from the user face image. The processor captures a user body image using the camera upon the user requesting body motion edit mode. The processor generates a body motion file from the user body image. The processor adjusts sync for combining the generated facial expression motion file and body motion file. The processor generates a customized emoji sticker on which user facial expression and body movement are reflected by combining the sync-adjusted facial expression motion file and body motion file.
Type:
Grant
Filed:
November 29, 2022
Date of Patent:
December 31, 2024
Assignee:
SAMSUNG ELECTRONICS CO., LTD.
Inventors:
Junho An, Hyejin Kang, Jiyoon Park, Changsub Bae, Sangkyun Seo, Jaeyun Song, Minsheok Choi, Gyuhee Han
Abstract: Embodiments of the present disclosure provide systems and methods for extracting entities from semi-structured enterprise documents. The method performed by a server system includes receiving an enterprise document in a semi-structured format. The method includes extracting document features from the enterprise document. The document features include structural, token-specific, and entity-specific features. Further, the method includes identifying candidate entities in the enterprise document based at least on a machine learning model which uses document features. The candidate entities include candidate tabular entities and candidate non-tabular entities. The method includes computing probability scores for the one or more tokens-corresponding to the candidate non-tabular entities and the candidate tabular entities, based at least on the machine learning model.
Type:
Grant
Filed:
February 22, 2022
Date of Patent:
December 31, 2024
Assignee:
TAO AUTOMATION SERVICES PRIVATE LIMITED
Inventors:
Hariharamoorthy Theriappan, Amit Rajan, Nagaraju Pappu, Jawahar Bekay
Abstract: The present disclosure provides speech recognition and codec methods and apparatuses, an electronic device and a storage medium, and relates to the field of artificial intelligence such as intelligent speech, deep learning and natural language processing. The speech recognition method may include: acquiring an audio feature of to-be-recognized speech; encoding the audio feature to obtain an encoding feature; truncating the encoding feature to obtain continuous N feature fragments, N being a positive integer greater than one; and acquiring, for any one of the feature segments, corresponding historical feature abstraction information, encoding the feature segment in combination with the historical feature abstraction information, and decoding an encoding result to obtain a recognition result corresponding to the feature segment, wherein the historical feature abstraction information is information obtained by feature abstraction of recognized historical feature fragments.
Abstract: Systems and methods for responding to a natural language query are disclosed herein. A query for an entity associated with a plurality of content types is received via a user interface of a computing device. A determination is made as to whether the query specifies any one or more of the plurality of content types. In response to determining that the query specifies one or more of the plurality of content types, a response to the query is generated for visible or audible presentation via the computing device, with the response comprising results from the one or more specified content types. In response to determining that the query lacks specification of any one or more of the plurality of content types, a response to the query is generated for visible or audible presentation via the computing device, with the response comprising results from each of the plurality of content types.
Abstract: In a method for tuning at least one parameter of a noise cancellation enabled audio system with an ear mountable playback device comprising a speaker and a feedforward microphone the playback device is placed onto a measurement fixture, the speaker facing a test microphone located within an ear canal representation. The parameter is varied between a plurality of settings while a test sound is played. A measurement signal from the test microphone is received and stored in the audio system at least while the parameter is varied. A power minimum in the stored measurement signal and a tune parameter associated with the power minimum are determined in the audio system from the plurality of settings of the varied parameter.
Abstract: A vehicle defines an interior space and an exterior space. Within the vehicle are internal microphones that are disposed to capture an acoustic event that originated in an origination space, which is either the interior space or the exterior space. An infotainment system includes circuitry that forms a head unit having an acoustic-signal processor that is configured to receive, from the microphones, a sound vector indicative of the acoustic event and to identify the origination space based at least in part on the sound vector.
Type:
Grant
Filed:
March 22, 2022
Date of Patent:
November 26, 2024
Assignee:
Cerence Operating Company
Inventors:
Tobias Wolff, Markus Buck, Jonas Jungclaussen, Amaury Astier, Tim Haulick
Abstract: To automatically determine a more memorable macro name. Provided is an information processing device that comprises an utterance learning adaptation unit that executes clustering pertaining to a plurality of function execution instructions by a user and estimates, as a macro, a cluster that includes the plurality of function execution instructions and a response control unit that controls the presentation of information pertaining to the macro, wherein the utterance learning adaptation unit determines a name for the estimated macro on the basis of a context acquired at the time of issuing the plurality of function execution instructions included in the cluster, the response control unit controls a notification of the macro name to the user, and the plurality of function execution instructions include at least one function execution instruction issued via an utterance.
Abstract: A device according to an embodiment has one or more processors and a memory storing one or more programs executable by the one or more processors. The device includes a first encoder configured to receive a person background image corresponding to a video part of a speech video of a person and extract an image feature vector from the person background image, a second encoder configured to receive a speech audio signal corresponding to an audio part of the speech video and extract a voice feature vector from the speech audio signal, a combiner configured to generate a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, and a decoder configured to reconstruct the speech video of the person using the combined vector as an input.
Type:
Grant
Filed:
June 19, 2020
Date of Patent:
November 19, 2024
Assignee:
DEEPBRAIN AI INC.
Inventors:
Gyeongsu Chae, Guembuel Hwang, Sungwoo Park, Seyoung Jang
Abstract: Methods and apparatus provide acoustic detection for automated devices such as respiratory treatment apparatus. In some embodiments of the technology, acoustic analysis of noise or sound pulses, such as a cepstrum analysis, based on signals of a sound sensor permits detection of obstruction such as within a patient interface, mask or respiratory conduit or within patient respiratory system. Some embodiments further permit detection of accessories such as an identification thereof or a condition of use thereof, such as a leak. Still further embodiments of the technology permit the detection of a patient or user who is intended to use the automated device.
Type:
Grant
Filed:
August 12, 2020
Date of Patent:
November 19, 2024
Assignee:
ResMed Pty Ltd
Inventors:
Liam Holley, Dion Charles Chewe Martin, Steven Paul Farrugia
Abstract: Systems, methods, and computer program products are disclosed for removing noise from facial skin micromovement signals. Removing noise from facial skin micromovements includes, during a time period when an individual is involved in at least one non-speech-related physical activity, operating a light source in a manner enabling illumination of a facial skin region of the individual; receiving signals representing light reflections from the facial skin region; analyzing the received signals to identify a first reflection component indicative of prevocalization facial skin micromovements and a second reflection component associated with the at least one non-speech-related physical activity; and filtering out the second reflection component to enable interpretation of words from the first reflection component indicative of the prevocalization facial skin micromovements.
Type:
Grant
Filed:
November 9, 2023
Date of Patent:
November 12, 2024
Assignee:
Q (Cue) Ltd.
Inventors:
Aviad Maizels, Yonatan Wexler, Avi Barliya
Abstract: Methods and systems provide for presenting time distributions of participants across topic segments in a communication session. In one embodiment, the system connects to a communication session with a number of participants; receives a transcript of a conversation between the participants produced during the communication session, the transcript including timestamps for each utterance of a speaking participant; determines, based on analysis of the transcript, a meeting type for the communication session; generates a number of topic segments for the conversation and respective timestamps for the topic segments; for each participant, analyzes the time spent by the participant on each of the generated topic segments in the meeting; and presents, to one or more users, data on the time distribution of participants for each topic segment and across topic segments within the conversation.
Type:
Grant
Filed:
January 31, 2022
Date of Patent:
November 12, 2024
Assignee:
Zoom Video Communications, Inc.
Inventors:
Vijay Parthasarathy, David Dharmendran Rajkumar, Tao Tang, Min Xiao-Devins
Abstract: An electronic apparatus is disclosed. The electronic apparatus may include a microphone; a communication interface; a memory configured to store at least one instruction; and a processor configured to execute the at least one instruction to: obtain a user voice input for registering a wake-up voice input via the microphone; input the user voice input into a trained neural network model to obtain a first feature vector corresponding to text included in the user voice input; receive a verification data set determined based on information related to the text included in the user voice input from an external server via the communication interface; input a verification voice input included in the verification data set into the trained neural network model to obtain a second feature vector corresponding to the verification voice input; and identify whether to register the user voice input as the wake-up voice input based on a similarity between the first feature vector and the second feature vector.
Type:
Grant
Filed:
September 14, 2022
Date of Patent:
October 29, 2024
Assignee:
SAMSUNG ELECTRONICS CO., LTD.
Inventors:
Jaeyoung Roh, Hejung Yang, Hojun Jin, Donghan Jang
Abstract: Systems and methods for correcting recognition errors in speech recognition systems are disclosed herein. Natural conversational variations are identified to determine whether a query intends to correct a speech recognition error or whether the query is a new command. When the query intends to correct a speech recognition error, the system identifies a location of the error and performs the correction. The corrected query can be presented to the user or be acted upon as a command for the system.
Type:
Grant
Filed:
June 20, 2023
Date of Patent:
October 22, 2024
Assignee:
Rovi Guides, Inc.
Inventors:
Ankur Anil Aher, Jeffry Copps Robert Jose
Abstract: According to various embodiments, an electronic device may include: a microphone; an audio connector; a wireless communication circuit; a processor operatively connected to the microphone, the audio connector, and the wireless communication circuit; and a memory operatively connected to the processor, wherein the memory may store instructions that, when executed, cause the processor to: receive a first audio signal through the microphone, the audio connector, or the wireless communication circuit, extract audio feature information from the first audio signal, and recognize a speech section in a second audio signal, received after the first audio signal through the microphone, the audio connector, or the wireless communication circuit, using the audio feature information.
Abstract: Systems and methods are provided for the selection of a speech model for automatic speech recognition during the runtime of a transcription system, the system includes an event detector to determine one of a number of flight events that include flight plan changes and phase transitions based on data received from a set of inputs; an intelligent keyword generator to collate a set of keywords associated with the flight plan information and to generate a wordlist in response to a determination by the event detector of flight plan changes or flight phase transitions; and a processor to determine whether the wordlist is covered by a current speech model implemented in the automatic speech recognition wherein if the wordlist is not covered by the current speech model, then the processor to select a pre-built speech model that covers the wordlist for use as the current speech model in the automatic speech recognition.
Abstract: Example techniques relate to local voice control in a media playback system. A satellite device (e.g., a playback device or microcontroller unit) may be configured to recognize a local set of keywords in voice inputs including context specific keywords (e.g., for controlling an associated smart device) as well as keywords corresponding to a subset of media playback commands for controlling playback devices in the media playback system. The satellite device may fall back to a hub device (e.g., a playback device) configured to recognize a more extensive set of keywords. In some examples, either device may fall back to the cloud for processing of other voice inputs.
Type:
Grant
Filed:
January 13, 2023
Date of Patent:
October 15, 2024
Assignee:
Sonos, Inc.
Inventors:
Sebastien Maury, Joseph Dureau, Thibaut Lorrain, Do Kyun Kim
Abstract: Systems and methods for gathering research data using multiple monitoring devices are provided. An example apparatus comprises interface circuitry to obtain, from a first computing device, a first indication of media output by the first computing device, the first indication including first metadata associated with the media; instructions; and processor circuitry to execute the instructions to: apply criteria to at least a portion of the first metadata; and cause storage of the first metadata associated with the media in a database.
Type:
Grant
Filed:
March 23, 2023
Date of Patent:
October 8, 2024
Assignee:
The Nielsen Company (US), LLC
Inventors:
Joan G. FitzGerald, Carol J. Frost, Eugene L. Flanagan
Abstract: Providing a response to a user's speech or utterance by obtaining context information of the electronic device or a user of the electronic device, determine whether the electronic device or an external device is to perform automated speech recognition (ASR) of the user's speech or utterance, based on the context information, and provide a response to the user's speech or utterance based on a result of the electronic device or the external device performing the ASR.
Abstract: A method includes obtaining a speech proficiency value indicator indicative of a speech proficiency value associated with a user of the electronic device. The method further includes in response to determining that the speech proficiency value satisfies a threshold proficiency value: displaying training text via the display device; obtaining, from the audio sensor, speech data associated with the training text, wherein the speech data is characterized by the speech proficiency value; determining, using a speech classifier, one or more speech characterization vectors for the speech data based on linguistic features within the speech data; and adjusting one or more operational values of the speech classifier based on the one or more speech characterization vectors and the speech proficiency value.
Type:
Grant
Filed:
December 8, 2023
Date of Patent:
September 24, 2024
Assignee:
APPLE INC.
Inventors:
Barry-John Theobald, Russell Y. Webb, Nicholas Elia Apostoloff
Abstract: A communication management apparatus creates pieces of character string button information for incorporating part or all of a message content in text format into a character string constituting message information.
Type:
Grant
Filed:
May 27, 2020
Date of Patent:
September 24, 2024
Assignees:
KABUSHIKI KAISHA TOSHIBA, TOSHIBA DIGITAL SOLUTIONS CORPORATION
Abstract: A method for controlling a wearable device includes: acquiring voice information collected by an acoustoelectric element and vibration information collected by a vibration sensor, in which the acoustoelectric element and the vibration sensor are included in the wearable device; determining a voice command based on the voice information; determining identity information of the voice command based on the voice information and the vibration information; and executing or ignoring the voice command based on the identity information.
Type:
Grant
Filed:
November 17, 2021
Date of Patent:
September 24, 2024
Assignee:
GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., LTD.
Abstract: A speech processing device includes: first segment means for dividing first speech into a plurality of first speech segments; second segment means for dividing second speech into a plurality of second speech segments; primary speaker recognition means for calculating scores indicating similarities between the plurality of first and second speech segments; threshold value calculation means for calculating a threshold value based on scores indicating similarities between the plurality of first speech segments; speaker clustering means for classifying each of the plurality of second speech segments into one or more clusters having a similarity higher than the similarity indicated by the threshold value; and secondary speaker recognition means for calculating a similarity between each of the one or more clusters and the first speech and determining based on a result of the calculation whether speech corresponding to the first speech is contained in any of the one or more clusters.
Abstract: The invention relates to a computer-implemented method for virtually assisting a user with a digital assistant comprising at least one iteration of a step, called an assisting step, comprising receiving a user input, predicting at least one potential action corresponding to the user input, and confirming one of the at least one potential action as a correct action corresponding to the user input. The method includes a prefetching step, the prefetching step including triggering execution of the at least one potential action before the confirmation step, such that, if the at least one potential action corresponds to the correct action, a response time to the user input is shortened.