Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)

E Subclasses

Word boundary detection (epo) (Class 704/E15.006)

Systems and methods for phonetic-based natural language understanding

Patent number: 11922931

Abstract: Systems and methods are described for modifying a phonetic search index based on a use frequency associated with phonetic representations of text terms included in metadata of a media item. A first phonetic representation of a text term of the metadata, pronounced as a word, may be generated. A second phonetic representation of the text term may be generated by concatenating a phonetic representation of each letter in the text term. A database may be queried to determine use frequencies of the first and second phonetic representations, one of which may be selected based on a comparison of the use frequencies. A phonetic search index may be modified by including an entry for the selected phonetic representation. A voice query related to the media item may be received, and a reply to the voice query may be generated for output by performing a lookup in the modified phonetic search index.

Type: Grant

Filed: June 30, 2021

Date of Patent: March 5, 2024

Assignee: Rovi Guides, Inc.

Inventors: Ajay Kumar Mishra, Jeffry Copps Robert Jose
Providing suggested voice-based action queries

Patent number: 11869489

Abstract: Technology of the disclosure may facilitate user discovery of various voice-based action queries that can be spoken to initiate computer-based actions, such as voice-based action queries that can be provided as spoken input to a computing device to initiate computer-based actions that are particularized to content being viewed or otherwise consumed by the user on the computing device. Some implementations are generally directed to determining, in view of content recently viewed by a user on a computing device, at least one suggested voice-based action query for presentation via the computing device. Some implementations are additionally or alternatively generally directed to receiving at least one suggested voice-based action query at a computing device and providing the suggested voice-based action query as a suggestion in response to input to initiate providing of a voice-based query via the computing device.

Type: Grant

Filed: January 28, 2022

Date of Patent: January 9, 2024

Assignee: GOOGLE LLC

Inventors: Vikram Aggarwal, Pravir Kumar Gupta
Speech waveform generation

Patent number: 11869482

Abstract: A method and apparatus for generating a speech waveform. Fundamental frequency information, glottal features and vocal tract features associated with an input may be received, wherein the glottal features include a phase feature, a shape feature, and an energy feature (1310). A glottal waveform is generated based on the fundamental frequency information and the glottal features through a first neural network model (1320). A speech waveform is generated based on the glottal waveform and the vocal tract features through a second neural network model (1330).

Type: Grant

Filed: September 30, 2018

Date of Patent: January 9, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yang Cui, Xi Wang, Lei He, Kao-Ping Soong
Device and method for detecting partial matches between a first time varying signal and a second time varying signal

Patent number: 11860934

Abstract: Device for detecting partial matches between a first time varying signal and a second time varying signal, the device including: a fingerprint extraction stage; and a matching stage, wherein each feature information of a first fingerprint is pairwise compared with each feature information of a second fingerprint; wherein the matching stage includes a similarity calculator stage; wherein the matching stage includes a matrix calculator stage configured for arranging similarity values in a similarity matrix having dimensions of La×Lb, wherein an entry in the i-th row and j-th column of the similarity matrix is the similarity value calculated from the pair of the i-th feature information of the first fingerprint and of the j-th feature information of the second fingerprint; wherein the matching stage includes a detection stage configured for detecting the partial matches by evaluating a plurality of diagonals of the similarity matrix.

Type: Grant

Filed: November 13, 2020

Date of Patent: January 2, 2024

Assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.

Inventors: Milica Maksimovic, Patrick Aichroth, Luca Cuccovillo, Hanna Lukashevich
Artificial intelligence-based wakeup word detection method and apparatus, device, and medium

Patent number: 11848008

Abstract: This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.

Type: Grant

Filed: September 23, 2021

Date of Patent: December 19, 2023

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Jie Chen, Dan Su, Mingjie Jin, Zhenling Zhu
Automated assistant interaction prediction using fusion of visual and audio input

Patent number: 11842737

Abstract: Techniques are described herein for detecting and/or enrolling (or commissioning) new “hot commands” that are usable to cause an automated assistant to perform responsive action(s) without having to be first explicitly invoked. In various implementations, an automated assistant may be transitioned from a limited listening state into a full speech recognition state in response to a trigger event. While in the full speech recognition state, the automated assistant may receive and perform speech recognition processing on a spoken command from a user to generate a textual command. The textual command may be determined to satisfy a frequency threshold in a corpus of textual commands. Consequently, data indicative of the textual command may be enrolled as a hot command. Subsequent utterance of another textual command that is semantically consistent with the textual command may trigger performance of a responsive action by the automated assistant, without requiring explicit invocation.

Type: Grant

Filed: March 24, 2021

Date of Patent: December 12, 2023

Assignee: GOOGLE LLC

Inventors: Tuan Nguyen, Yuan Yuan
Data leak prevention using user and device contexts

Patent number: 11830098

Abstract: Disclosed are various examples for audio data leak prevention using user and device contexts. In some examples, a voice assistant device can be connected to a remote service that provides enterprise data to be audibly emitted by the voice assistant device. In response to a request for the enterprise data being received from the voice assistant device, an audio signal can be generated that audibly broadcasts the enterprise data. The audio signal can be generated to audibly redact at least a portion of the enterprise data based at least in part on a mode of operation of the voice assistant device. The voice assistant device can be directed to emit the enterprise data through a playback of the audio signal.

Type: Grant

Filed: February 19, 2020

Date of Patent: November 28, 2023

Assignee: VMWARE, INC.

Inventors: Rohit Pradeep Shetty, Erich Peter Stuntebeck, Ramani Panchapakesan, Suman Aluvala, Chaoting Xuan
Information processing apparatus and information processing method

Patent number: 11823669

Abstract: According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.

Type: Grant

Filed: February 28, 2020

Date of Patent: November 21, 2023

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Ning Ding, Hiroshi Fujimura
Ambient cooperative intelligence system and method

Patent number: 11817095

Abstract: A method, computer program product, and computing system for monitoring a plurality of conversations within a monitored space to generate a conversation data set; processing the conversation data set using machine learning to: define a system-directed command for an ACI system, and associate one or more conversational contexts with the system-directed command; detecting the occurrence of a specific conversational context within the monitored space, wherein the specific conversational context is included in the one or more conversational contexts associated with the system-directed command; and executing, in whole or in part, functionality associated with the system-directed command in response to detecting the occurrence of the specific conversational context without requiring the utterance of the system-directed command and/or a wake-up word/phrase.

Type: Grant

Filed: February 3, 2022

Date of Patent: November 14, 2023

Assignee: Nuance Communications, Inc.

Inventors: Paul Joseph Vozila, Neal Snider
Combination of real-time analytics and automation

Patent number: 11818291

Abstract: Real-time speech analytics (RTSA) provides maintaining real-time speech conditions, rules, and triggers, and real-time actions and alerts to take. A call between a user and an agent is received at an agent computing device. The call is monitored to detect in the call one of the real-time speech conditions, rules, and triggers. Based on the detection, at least one real-time action and/or alert is initiated.

Type: Grant

Filed: September 1, 2021

Date of Patent: November 14, 2023

Assignee: Verint Americas Inc.

Inventors: David Warren Singer, Daniel Thomas Spohrer, Marc Adam Calahan, Paul Michael Munro, Gary Andrew Duke, Padraig Carberry, Christopher Jerome Schnurr
Offensive chat filtering using machine learning models

Patent number: 11805185

Abstract: A server system is provided that includes one or more processors configured to execute a platform for an online multi-user chat service that communicates with a plurality of client devices of users of the online multi-user chat service that exchanges user chat data between the plurality of client devices. The one or more processors are configured to execute a user chat filtering program that performs filter actions for user chat data exchanged on the platform for the online multi-user chat service. The user chat filtering program includes a plurality of trained machine learning models and a filter decision service that determines a filter action to be performed for target portions of user chat data based on output of the plurality of trained machine learning models for those target portions of user chat data.

Type: Grant

Filed: March 3, 2021

Date of Patent: October 31, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventor: Monica Tongya
Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal

Patent number: 11783825

Abstract: Embodiments of the present invention provide a speech recognition method and a terminal. The method includes: listening, by a speech wakeup apparatus, to speech information in a surrounding environment; when determining that the speech information obtained by listening matches a speech wakeup model, buffering, by the speech wakeup apparatus, speech information, of first preset duration, obtained by listening, and sending a trigger signal for triggering enabling of a speech recognition apparatus, where the trigger signal is used to instruct the speech recognition apparatus to read and recognize the speech information buffered by the speech wakeup apparatus; and recognizing first speech information buffered by the speech wakeup apparatus and the second speech information obtained by listening, to obtain a recognition result.

Type: Grant

Filed: February 17, 2021

Date of Patent: October 10, 2023

Assignee: Honor Device Co., Ltd.

Inventor: Junyang Zhou
Artificial intelligence apparatus for recognizing speech of user and method for the same

Patent number: 11776544

Abstract: An embodiment of the present invention provides an artificial intelligence (AI) apparatus for recognizing a speech of a user, the artificial intelligence apparatus includes a memory to store a speech recognition model and a processor to obtain a speech signal for a user speech, to convert the speech signal into a text using the speech recognition model, to measure a confidence level for the conversion, to perform a control operation corresponding to the converted text if the measured confidence level is greater than or equal to a reference value, and to provide feedback for the conversion if the measured confidence level is less than the reference value.

Type: Grant

Filed: May 18, 2022

Date of Patent: October 3, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Jaehong Kim, Hyoeun Kim, Hangil Jeong, Heeyeon Choi
3D face modeling based on neural networks

Patent number: 11776210

Abstract: An electronic device and method for 3D face modeling based on neural networks is provided. The electronic device receives a two-dimensional (2D) color image of a human face with a first face expression and obtains a first three-dimensional (3D) mesh of the human face with the first face expression based on the received 2D color image. The electronic device generates first texture information and a first set of displacement maps. The electronic device feeds the generated first texture information and the first set of displacement maps as an input to the neural network and receives an output of the neural network for the fed input. Thereafter, the electronic device generates a second 3D mesh of the human face with a second face expression which is different from the first face expression based on the received output.

Type: Grant

Filed: January 22, 2021

Date of Patent: October 3, 2023

Assignee: SONY GROUP CORPORATION

Inventors: Hiroyuki Takeda, Mohammad Gharavi Alkhansari
Methods and systems for reducing latency in automated assistant interactions

Patent number: 11763813

Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.

Type: Grant

Filed: April 28, 2021

Date of Patent: September 19, 2023

Assignee: GOOGLE LLC

Inventors: Lior Alon, Rafael Goldfarb, Dekel Auster, Dan Rasin, Michael Andrew Goodman, Trevor Strohman, Nino Tasca, Valerie Nygaard, Jaclyn Konzelmann
Bi-directional voice enabled system for CPE devices

Patent number: 11741141

Abstract: The present disclosure describes a communication environment having a service provider server that receives an audio command from a display control device within the communication environment. The service provider server can translate this audio command into an electrical command for controlling the display device. The service provider server autonomously performs a specifically tailored search of a catalog of command words and/or phrases for the audio command to translate the audio command to the electrical command. This specifically tailored search can include one or more searching routines having various degrees of complexity. The most simplistic searching routine from among these searching routines represents a textual search to identify one or more command words and/or phrases from the catalog of command words and/or phrases that match the audio command.

Type: Grant

Filed: December 4, 2020

Date of Patent: August 29, 2023

Assignee: CSC Holdings, LLC

Inventors: Jaison P. Antony, Heitor J. Almeida, John Markowski, Peter Caramanica
Responding method and device, electronic device and storage medium

Patent number: 11727928

Abstract: Aspects of the disclosure provide a responding method and device, an electronic device and a storage medium. The method is applied to a first electronic device including an audio acquisition component and an audio output component. The method can include acquiring a voice signal through the audio acquisition component, determining whether to respond to the voice signal, and responsive to determining to respond to the voice signal, outputting a first sound signal by the audio output component, the first sound signal being configured to notify at least one second electronic device that the first electronic device responds to the voice signal. In such a manner, an electronic device, responsive to determining to respond to a voice signal, outputs a sound signal to prevent other electronic device(s) from responding to the voice signal, so that competitions between electronic devices are reduced and a user experience is improved.

Type: Grant

Filed: July 29, 2020

Date of Patent: August 15, 2023

Assignee: Beijing Xiaomi Pinecone Electronics Co., Ltd.

Inventors: Lingsong Zhou, Fei Xiang
Electronic device and control method thereof

Patent number: 11709655

Abstract: Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker.

Type: Grant

Filed: August 3, 2022

Date of Patent: July 25, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyungtak Choi, Hyeonmok Ko, Jihie Kim, Hongchul Kim, Inchul Hwang
Dynamic voice input detection for conversation assistants

Patent number: 11705125

Abstract: A processor may receive data regarding a context for a first dialog turn. The processor may monitor a voice input from a user for the first dialog turn. The processor may detect a first pause in the voice input, the first pause having a duration that satisfies a time threshold. The processor may receive, based on the first pause, first voice input data. The processor may analyze the first voice input data. The processor may determine that additional time is recommended for the voice input to be provided by the user.

Type: Grant

Filed: March 26, 2021

Date of Patent: July 18, 2023

Assignee: International Business Machines Corporation

Inventors: Andrew R. Freed, Corville O. Allen, Shikhar Kwatra, Joseph Kozhaya
Method, apparatus, computer device and storage medium for determining POI alias

Patent number: 11698261

Abstract: Embodiments of the present disclosure provide a method, apparatus, computer device, and storage medium for determining a POI alias. The method may include: acquiring a to-be-processed target POI, and generating a candidate alias list corresponding to the target POI based on a query behavior-associated log matching the target POI; and screening out at least one target alias corresponding to the target POI in the candidate alias list, according to an association relationship between each candidate alias in the candidate alias list and the target POI.

Type: Grant

Filed: December 11, 2019

Date of Patent: July 11, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Yanyan Li, Jianguo Duan, Hui Xiong
Server side hotwording

Patent number: 11699443

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Grant

Filed: June 2, 2021

Date of Patent: July 11, 2023

Assignee: GOOGLE LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Detecting deep-fake audio through vocal tract reconstruction

Patent number: 11694694

Abstract: A method is provided for identifying synthetic “deep-fake” audio samples versus organic audio samples. Methods may include: generating a model of a vocal tract using one or more organic audio samples from a user; identifying a set of bigram-feature pairs from the one or more audio samples; estimating the cross-sectional area of the vocal tract of the user when speaking the set of bigram-feature pairs; receiving a candidate audio sample; identifying bigram-feature pairs of the candidate audio sample that are in the set of bigram-feature pairs; calculating a cross-sectional area of a theoretical vocal tract of a user when speaking the identified bigram-feature pairs; and identifying the candidate audio sample as a deep-fake audio sample in response to the calculated cross-sectional area of the theoretical vocal tract of a user failing to correspond within a predetermined measure of the estimated cross sectional area of the vocal tract of the user.

Type: Grant

Filed: July 27, 2021

Date of Patent: July 4, 2023

Assignee: UNIVERSITY OF FLORIDA RESEARCH FOUNDATION, INCORPORATED

Inventors: Patrick G. Traynor, Kevin Butler, Logan E. Blue, Luis Vargas, Kevin S. Warren, Hadi Abdullah, Cassidy Gibson, Jessica Nicole Odell
Dynamic allocation of computing resources

Patent number: 11689472

Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for allocating computing resources. The exemplary embodiments may include collecting data of one or more users, wherein the collected data comprises calendar data of the one or more users, extracting one or more features from the collected data, and allocating one or more computing resources to one or more of the users based on the extracted one or more features and one or more models.

Type: Grant

Filed: July 9, 2021

Date of Patent: June 27, 2023

Assignee: International Business Machines Corporation

Inventors: Venkata Vara Prasad Karri, Saraswathi Sailaja Perumalla, Sarbajit K. Rakshit, Ryan Jackson, Ram Kumar Vadlamani
Speech signal processing method and apparatus

Patent number: 11670290

Abstract: A speech signal processing method and apparatus is disclosed. The speech signal processing method includes receiving an input token that is based on a speech signal, calculating first probability values respectively corresponding to candidate output tokens based on the input token, adjusting at least one of the first probability values based on a priority of each of the first probability values, and processing the speech signal based on an adjusted probability value obtained by the adjusting.

Type: Grant

Filed: November 30, 2020

Date of Patent: June 6, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventor: Tae Gyoon Kang
Speech endpointing based on word comparisons

Patent number: 11636846

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.

Type: Grant

Filed: April 30, 2021

Date of Patent: April 25, 2023

Assignee: Google LLC

Inventors: Michael Buchanan, Pravir Kumar Gupta, Christopher Bo Tandiono
Transcription of communications

Patent number: 11600279

Abstract: A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.

Type: Grant

Filed: August 26, 2019

Date of Patent: March 7, 2023

Assignee: Sorenson IP Holdings, LLC

Inventors: Brian Chevrier, Shane Roylance, Kenneth Boehme
Contextual voice user interface

Patent number: 11594215

Abstract: Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user's command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utterance corresponding to an inquiry with respect to the processing performed to respond to the command. For example, the user may state “why did you tell me that?” In response thereto, the speech processing system may determine the stored speech processing pipeline data used to respond to the command, and may generate output audio data that describes the data and computing decisions involved in determining the content deemed responsive to the command.

Type: Grant

Filed: October 11, 2019

Date of Patent: February 28, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Michael James Moniz, Abishek Ravi, Ryan Scott Aldrich, Michael Bennett Adams
Speech feature extraction apparatus, speech feature extraction method, and computer-readable storage medium

Patent number: 11580967

Abstract: A speech feature extraction apparatus 100 includes a voice activity detection unit 103 that drops non-voice frames from frames corresponding to an input speech utterance, and calculates a posterior of being voiced for each frame, a voice activity detection process unit 106 calculates a function value as weights in pooling frames to produce an utterance-level feature, from a given a voice activity detection posterior, and an utterance-level feature extraction unit 112 that extracts an utterance-level feature, from the frame on a basis of multiple frame-level features, using the function values.

Type: Grant

Filed: June 29, 2018

Date of Patent: February 14, 2023

Assignee: NEC CORPORATION

Inventors: Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka
System and method for voice driven lip syncing and head reenactment

Patent number: 11461948

Abstract: A system and method voice driven animation of an object in an image by sampling an input video, depicting a puppet object, to obtain an image, receiving audio data, extracting voice related features from the audio data, producing an expression representation based on the voice related features, wherein the expression representation is related to a region of interest, obtaining from the image, auxiliary data related to the image and generating a target image based on the expression representation and the auxiliary data.

Type: Grant

Filed: July 15, 2021

Date of Patent: October 4, 2022

Assignee: DE-IDENTIFICATION LTD.

Inventors: Eliran Kuta, Sella Blondheim, Gil Perry, Amitay Nachmani, Matan Ben-Yosef, Or Gorodissky
Image recognition device, operating method of image recognition device, and computing device including image recognition device

Patent number: 11449720

Abstract: Provided is an image recognition device. The image recognition device includes a frame data change detector that sequentially receives a plurality of frame data and detects a difference between two consecutive frame data, an ensemble section controller that sets an ensemble section in the plurality of frame data, based on the detected difference, an image recognizer that sequentially identifies classes respectively corresponding to a plurality of section frame data by applying different neural network classifiers to the plurality of section frame data in the ensemble section, and a recognition result classifier that sequentially identifies ensemble classes respectively corresponding to the plurality of section frame data by combining the classes in the ensemble section.

Type: Grant

Filed: May 8, 2020

Date of Patent: September 20, 2022

Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Ju-Yeob Kim, Byung Jo Kim, Seong Min Kim, Jin Kyu Kim, Ki Hyuk Park, Mi Young Lee, Joo Hyun Lee, Young-deuk Jeon, Min-Hyung Cho
Electronic device and control method thereof

Patent number: 11443749

Abstract: Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker.

Type: Grant

Filed: January 2, 2019

Date of Patent: September 13, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyungtak Choi, Hyeonmok Ko, Jihie Kim, Hongchul Kim, Inchul Hwang
Method and system for multimodal analysis based emotion recognition

Patent number: 11386712

Abstract: The present invention discloses method and system for multimodal analysis based emotion recognition. The method comprising segmenting video data of a user into a plurality of video segments. A plurality of visual features, voice features and text features from the plurality of video segments is extracted. Autocorrelation values among each of the plurality of visual features, the voice features, and the text features is determined. Each of the plurality of visual features, the voice features and the text features is aligned based on video segment identifier and the autocorrelation values to obtain a plurality of aligned multimodal features. One of two classes of emotions is determined for each of the plurality of aligned multimodal features. The determined emotion for each of the plurality of aligned multimodal features is compared with historic multimodal features from a database, and emotion of the user is determined at real time based on comparison.

Type: Grant

Filed: February 20, 2020

Date of Patent: July 12, 2022

Assignee: Wipro Limited

Inventors: Rahul Yadav, Gopichand Agnihotram
System and method for speaker identification in audio data

Patent number: 11373657

Abstract: A system for identifying audio data includes a feature extraction module receiving unknown input audio data and dividing the unknown input audio data into a plurality of segments of unknown input audio data. A similarity module receives the plurality of segments of the unknown input audio data and receives known audio data from a known source, the known audio data being divided into a plurality of segments of known audio data. The similarity module performs comparisons between the segments of unknown input audio data and respective segments of known audio data and generates a respective plurality of similarity values representative of similarity between the segments of the comparisons, the comparisons being performed serially. The similarity module terminates the comparisons if the similarity values indicate insufficient similarity between the segments of the comparisons, prior to completing comparisons for all segments of the unknown input audio data.

Type: Grant

Filed: May 1, 2020

Date of Patent: June 28, 2022

Assignee: Raytheon Applied Signal Technology, Inc.

Inventors: Jonathan C. Wintrode, Nicholas J. Hinnerschitz, Aleksandr R. Jouravlev
Name Based Initiation of Speech Recognition

Publication number: 20150106089

Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.

Type: Application

Filed: December 30, 2010

Publication date: April 16, 2015

Inventors: Evan H. Parker, Michal R. Grabowski
PRIVACY-SENSITIVE SPEECH MODEL CREATION VIA AGGREGATION OF MULTIPLE USER MODELS

Publication number: 20140129226

Abstract: Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.

Type: Application

Filed: November 5, 2012

Publication date: May 8, 2014

Inventors: Antonio R. Lee, Petr Novak, Peder A. Olsen, Vaibhava Goel
CONVERSATION ANALYSIS SYSTEM FOR SOLUTION SCOPING AND POSITIONING

Publication number: 20140114646

Abstract: A system receives vocal input from one or more persons, and extracts one or more keywords from the vocal input. The system then generates a query using the one or more keywords, searches a database of products and services using the query, and identities a product or service as a function of the query.

Type: Application

Filed: October 24, 2012

Publication date: April 24, 2014

Applicant: SAP AG

Inventor: Oleg Figlin
VOICE-ENABLED DOCUMENTS FOR FACILITATING OPERATIONAL PROCEDURES

Publication number: 20140108010

Abstract: A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands.

Type: Application

Filed: October 11, 2012

Publication date: April 17, 2014

Applicant: INTERMEC IP CORP.

Inventors: Paul Maltseff, Roger Byford, Jim Logan
SYSTEMS AND METHODS FOR ENGAGING AN AUDIENCE IN A CONVERSATIONAL ADVERTISEMENT

Publication number: 20140067395

Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.

Type: Application

Filed: August 28, 2012

Publication date: March 6, 2014

Applicant: Nuance Communications, Inc.

Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
METHOD TO PROVIDE INCREMENTAL UI RESPONSE BASED ON MULTIPLE ASYNCHRONOUS EVIDENCE ABOUT USER INPUT

Publication number: 20140058732

Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.

Type: Application

Filed: August 21, 2012

Publication date: February 27, 2014

Applicant: Nuance Communications, Inc.

Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
SYSTEM, METHOD AND PROGRAM PRODUCT FOR PROVIDING AUTOMATIC SPEECH RECOGNITION (ASR) IN A SHARED RESOURCE ENVIRONMENT

Publication number: 20140025380

Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.

Type: Application

Filed: July 18, 2012

Publication date: January 23, 2014

Applicant: International Business Machines Corporation

Inventors: Fernando Luiz Koch, Julio Nogima
AUDIO ANIMATION METHODS AND APPARATUS

Publication number: 20130332167

Abstract: According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor.

Type: Application

Filed: June 12, 2012

Publication date: December 12, 2013

Applicant: Nuance Communications, Inc.

Inventor: Robert M. Kilgore
SPEECH RECOGNITION ADAPTATION SYSTEMS BASED ON ADAPTATION DATA

Publication number: 20130325474

Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.

Type: Application

Filed: May 31, 2012

Publication date: December 5, 2013

Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
Contextual Voice Query Dilation

Publication number: 20130304471

Abstract: A method, an apparatus and an article of manufacture for contextual voice query dilation in a Spoken Web search. The method includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.

Type: Application

Filed: May 14, 2012

Publication date: November 14, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nitendra Rajput, Kundan Shrivastava
EMBEDDED SYSTEM FOR CONSTRUCTION OF SMALL FOOTPRINT SPEECH RECOGNITION WITH USER-DEFINABLE CONSTRAINTS

Publication number: 20130289994

Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.

Type: Application

Filed: April 26, 2012

Publication date: October 31, 2013

Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
METHOD AND APPARATUS FOR ELEMENT IDENTIFICATION IN A SIGNAL

Publication number: 20130262116

Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.

Type: Application

Filed: March 27, 2012

Publication date: October 3, 2013

Applicant: NOVOSPEECH

Inventor: Yossef Ben-Ezra
Crowdsourced, Grounded Language for Intent Modeling in Conversational Interfaces

Publication number: 20130262114

Abstract: Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli.

Type: Application

Filed: April 3, 2012

Publication date: October 3, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Christopher John Brockett, Piali Choudhury, William Brennan Dolan, Yun-Cheng Ju, Patrick Pantel, Noelle Mallory Sophy, Svitlana Volkova
Conference Call Service with Speech Processing for Heavily Accented Speakers

Publication number: 20130226576

Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.

Type: Application

Filed: February 23, 2012

Publication date: August 29, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
CONTINUOUS PHONETIC RECOGNITION METHOD USING SEMI-MARKOV MODEL, SYSTEM FOR PROCESSING THE SAME, AND RECORDING MEDIUM FOR STORING THE SAME

Publication number: 20130191128

Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.

Type: Application

Filed: August 28, 2012

Publication date: July 25, 2013

Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: Chang Dong Yoo, Sung Woong Kim
CHINESE TEXT READABILITY ASSESSING SYSTEM AND METHOD

Publication number: 20130179169

Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.

Type: Application

Filed: July 5, 2012

Publication date: July 11, 2013

Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY

Inventors: Yao-Ting Sung, Ju-Ling Chen
Methods and Apparatus for Audio Input for Customization of Digital Displays

Publication number: 20130166302

Abstract: Aspects of customizing digital signage are addressed. For example, an audio feed may be analyzed for keywords occurring in potential customers' speech. These keywords are then employed to customize display screens of a digital display.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Applicant: NCR Corporation

Inventor: Brennan Eul I. Mercado

1 2 3 4 5 next