Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
  • Patent number: 11922931
    Abstract: Systems and methods are described for modifying a phonetic search index based on a use frequency associated with phonetic representations of text terms included in metadata of a media item. A first phonetic representation of a text term of the metadata, pronounced as a word, may be generated. A second phonetic representation of the text term may be generated by concatenating a phonetic representation of each letter in the text term. A database may be queried to determine use frequencies of the first and second phonetic representations, one of which may be selected based on a comparison of the use frequencies. A phonetic search index may be modified by including an entry for the selected phonetic representation. A voice query related to the media item may be received, and a reply to the voice query may be generated for output by performing a lookup in the modified phonetic search index.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: March 5, 2024
    Assignee: Rovi Guides, Inc.
    Inventors: Ajay Kumar Mishra, Jeffry Copps Robert Jose
  • Patent number: 11869489
    Abstract: Technology of the disclosure may facilitate user discovery of various voice-based action queries that can be spoken to initiate computer-based actions, such as voice-based action queries that can be provided as spoken input to a computing device to initiate computer-based actions that are particularized to content being viewed or otherwise consumed by the user on the computing device. Some implementations are generally directed to determining, in view of content recently viewed by a user on a computing device, at least one suggested voice-based action query for presentation via the computing device. Some implementations are additionally or alternatively generally directed to receiving at least one suggested voice-based action query at a computing device and providing the suggested voice-based action query as a suggestion in response to input to initiate providing of a voice-based query via the computing device.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: January 9, 2024
    Assignee: GOOGLE LLC
    Inventors: Vikram Aggarwal, Pravir Kumar Gupta
  • Patent number: 11869482
    Abstract: A method and apparatus for generating a speech waveform. Fundamental frequency information, glottal features and vocal tract features associated with an input may be received, wherein the glottal features include a phase feature, a shape feature, and an energy feature (1310). A glottal waveform is generated based on the fundamental frequency information and the glottal features through a first neural network model (1320). A speech waveform is generated based on the glottal waveform and the vocal tract features through a second neural network model (1330).
    Type: Grant
    Filed: September 30, 2018
    Date of Patent: January 9, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yang Cui, Xi Wang, Lei He, Kao-Ping Soong
  • Patent number: 11860934
    Abstract: Device for detecting partial matches between a first time varying signal and a second time varying signal, the device including: a fingerprint extraction stage; and a matching stage, wherein each feature information of a first fingerprint is pairwise compared with each feature information of a second fingerprint; wherein the matching stage includes a similarity calculator stage; wherein the matching stage includes a matrix calculator stage configured for arranging similarity values in a similarity matrix having dimensions of La×Lb, wherein an entry in the i-th row and j-th column of the similarity matrix is the similarity value calculated from the pair of the i-th feature information of the first fingerprint and of the j-th feature information of the second fingerprint; wherein the matching stage includes a detection stage configured for detecting the partial matches by evaluating a plurality of diagonals of the similarity matrix.
    Type: Grant
    Filed: November 13, 2020
    Date of Patent: January 2, 2024
    Assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
    Inventors: Milica Maksimovic, Patrick Aichroth, Luca Cuccovillo, Hanna Lukashevich
  • Patent number: 11848008
    Abstract: This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.
    Type: Grant
    Filed: September 23, 2021
    Date of Patent: December 19, 2023
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Jie Chen, Dan Su, Mingjie Jin, Zhenling Zhu
  • Patent number: 11842737
    Abstract: Techniques are described herein for detecting and/or enrolling (or commissioning) new “hot commands” that are usable to cause an automated assistant to perform responsive action(s) without having to be first explicitly invoked. In various implementations, an automated assistant may be transitioned from a limited listening state into a full speech recognition state in response to a trigger event. While in the full speech recognition state, the automated assistant may receive and perform speech recognition processing on a spoken command from a user to generate a textual command. The textual command may be determined to satisfy a frequency threshold in a corpus of textual commands. Consequently, data indicative of the textual command may be enrolled as a hot command. Subsequent utterance of another textual command that is semantically consistent with the textual command may trigger performance of a responsive action by the automated assistant, without requiring explicit invocation.
    Type: Grant
    Filed: March 24, 2021
    Date of Patent: December 12, 2023
    Assignee: GOOGLE LLC
    Inventors: Tuan Nguyen, Yuan Yuan
  • Patent number: 11830098
    Abstract: Disclosed are various examples for audio data leak prevention using user and device contexts. In some examples, a voice assistant device can be connected to a remote service that provides enterprise data to be audibly emitted by the voice assistant device. In response to a request for the enterprise data being received from the voice assistant device, an audio signal can be generated that audibly broadcasts the enterprise data. The audio signal can be generated to audibly redact at least a portion of the enterprise data based at least in part on a mode of operation of the voice assistant device. The voice assistant device can be directed to emit the enterprise data through a playback of the audio signal.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: November 28, 2023
    Assignee: VMWARE, INC.
    Inventors: Rohit Pradeep Shetty, Erich Peter Stuntebeck, Ramani Panchapakesan, Suman Aluvala, Chaoting Xuan
  • Patent number: 11823669
    Abstract: According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.
    Type: Grant
    Filed: February 28, 2020
    Date of Patent: November 21, 2023
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Ning Ding, Hiroshi Fujimura
  • Patent number: 11817095
    Abstract: A method, computer program product, and computing system for monitoring a plurality of conversations within a monitored space to generate a conversation data set; processing the conversation data set using machine learning to: define a system-directed command for an ACI system, and associate one or more conversational contexts with the system-directed command; detecting the occurrence of a specific conversational context within the monitored space, wherein the specific conversational context is included in the one or more conversational contexts associated with the system-directed command; and executing, in whole or in part, functionality associated with the system-directed command in response to detecting the occurrence of the specific conversational context without requiring the utterance of the system-directed command and/or a wake-up word/phrase.
    Type: Grant
    Filed: February 3, 2022
    Date of Patent: November 14, 2023
    Assignee: Nuance Communications, Inc.
    Inventors: Paul Joseph Vozila, Neal Snider
  • Patent number: 11818291
    Abstract: Real-time speech analytics (RTSA) provides maintaining real-time speech conditions, rules, and triggers, and real-time actions and alerts to take. A call between a user and an agent is received at an agent computing device. The call is monitored to detect in the call one of the real-time speech conditions, rules, and triggers. Based on the detection, at least one real-time action and/or alert is initiated.
    Type: Grant
    Filed: September 1, 2021
    Date of Patent: November 14, 2023
    Assignee: Verint Americas Inc.
    Inventors: David Warren Singer, Daniel Thomas Spohrer, Marc Adam Calahan, Paul Michael Munro, Gary Andrew Duke, Padraig Carberry, Christopher Jerome Schnurr
  • Patent number: 11805185
    Abstract: A server system is provided that includes one or more processors configured to execute a platform for an online multi-user chat service that communicates with a plurality of client devices of users of the online multi-user chat service that exchanges user chat data between the plurality of client devices. The one or more processors are configured to execute a user chat filtering program that performs filter actions for user chat data exchanged on the platform for the online multi-user chat service. The user chat filtering program includes a plurality of trained machine learning models and a filter decision service that determines a filter action to be performed for target portions of user chat data based on output of the plurality of trained machine learning models for those target portions of user chat data.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: October 31, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Monica Tongya
  • Patent number: 11783825
    Abstract: Embodiments of the present invention provide a speech recognition method and a terminal. The method includes: listening, by a speech wakeup apparatus, to speech information in a surrounding environment; when determining that the speech information obtained by listening matches a speech wakeup model, buffering, by the speech wakeup apparatus, speech information, of first preset duration, obtained by listening, and sending a trigger signal for triggering enabling of a speech recognition apparatus, where the trigger signal is used to instruct the speech recognition apparatus to read and recognize the speech information buffered by the speech wakeup apparatus; and recognizing first speech information buffered by the speech wakeup apparatus and the second speech information obtained by listening, to obtain a recognition result.
    Type: Grant
    Filed: February 17, 2021
    Date of Patent: October 10, 2023
    Assignee: Honor Device Co., Ltd.
    Inventor: Junyang Zhou
  • Patent number: 11776544
    Abstract: An embodiment of the present invention provides an artificial intelligence (AI) apparatus for recognizing a speech of a user, the artificial intelligence apparatus includes a memory to store a speech recognition model and a processor to obtain a speech signal for a user speech, to convert the speech signal into a text using the speech recognition model, to measure a confidence level for the conversion, to perform a control operation corresponding to the converted text if the measured confidence level is greater than or equal to a reference value, and to provide feedback for the conversion if the measured confidence level is less than the reference value.
    Type: Grant
    Filed: May 18, 2022
    Date of Patent: October 3, 2023
    Assignee: LG ELECTRONICS INC.
    Inventors: Jaehong Kim, Hyoeun Kim, Hangil Jeong, Heeyeon Choi
  • Patent number: 11776210
    Abstract: An electronic device and method for 3D face modeling based on neural networks is provided. The electronic device receives a two-dimensional (2D) color image of a human face with a first face expression and obtains a first three-dimensional (3D) mesh of the human face with the first face expression based on the received 2D color image. The electronic device generates first texture information and a first set of displacement maps. The electronic device feeds the generated first texture information and the first set of displacement maps as an input to the neural network and receives an output of the neural network for the fed input. Thereafter, the electronic device generates a second 3D mesh of the human face with a second face expression which is different from the first face expression based on the received output.
    Type: Grant
    Filed: January 22, 2021
    Date of Patent: October 3, 2023
    Assignee: SONY GROUP CORPORATION
    Inventors: Hiroyuki Takeda, Mohammad Gharavi Alkhansari
  • Patent number: 11763813
    Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.
    Type: Grant
    Filed: April 28, 2021
    Date of Patent: September 19, 2023
    Assignee: GOOGLE LLC
    Inventors: Lior Alon, Rafael Goldfarb, Dekel Auster, Dan Rasin, Michael Andrew Goodman, Trevor Strohman, Nino Tasca, Valerie Nygaard, Jaclyn Konzelmann
  • Patent number: 11741141
    Abstract: The present disclosure describes a communication environment having a service provider server that receives an audio command from a display control device within the communication environment. The service provider server can translate this audio command into an electrical command for controlling the display device. The service provider server autonomously performs a specifically tailored search of a catalog of command words and/or phrases for the audio command to translate the audio command to the electrical command. This specifically tailored search can include one or more searching routines having various degrees of complexity. The most simplistic searching routine from among these searching routines represents a textual search to identify one or more command words and/or phrases from the catalog of command words and/or phrases that match the audio command.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: August 29, 2023
    Assignee: CSC Holdings, LLC
    Inventors: Jaison P. Antony, Heitor J. Almeida, John Markowski, Peter Caramanica
  • Patent number: 11727928
    Abstract: Aspects of the disclosure provide a responding method and device, an electronic device and a storage medium. The method is applied to a first electronic device including an audio acquisition component and an audio output component. The method can include acquiring a voice signal through the audio acquisition component, determining whether to respond to the voice signal, and responsive to determining to respond to the voice signal, outputting a first sound signal by the audio output component, the first sound signal being configured to notify at least one second electronic device that the first electronic device responds to the voice signal. In such a manner, an electronic device, responsive to determining to respond to a voice signal, outputs a sound signal to prevent other electronic device(s) from responding to the voice signal, so that competitions between electronic devices are reduced and a user experience is improved.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: August 15, 2023
    Assignee: Beijing Xiaomi Pinecone Electronics Co., Ltd.
    Inventors: Lingsong Zhou, Fei Xiang
  • Patent number: 11709655
    Abstract: Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker.
    Type: Grant
    Filed: August 3, 2022
    Date of Patent: July 25, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyungtak Choi, Hyeonmok Ko, Jihie Kim, Hongchul Kim, Inchul Hwang
  • Patent number: 11705125
    Abstract: A processor may receive data regarding a context for a first dialog turn. The processor may monitor a voice input from a user for the first dialog turn. The processor may detect a first pause in the voice input, the first pause having a duration that satisfies a time threshold. The processor may receive, based on the first pause, first voice input data. The processor may analyze the first voice input data. The processor may determine that additional time is recommended for the voice input to be provided by the user.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: July 18, 2023
    Assignee: International Business Machines Corporation
    Inventors: Andrew R. Freed, Corville O. Allen, Shikhar Kwatra, Joseph Kozhaya
  • Patent number: 11698261
    Abstract: Embodiments of the present disclosure provide a method, apparatus, computer device, and storage medium for determining a POI alias. The method may include: acquiring a to-be-processed target POI, and generating a candidate alias list corresponding to the target POI based on a query behavior-associated log matching the target POI; and screening out at least one target alias corresponding to the target POI in the candidate alias list, according to an association relationship between each candidate alias in the candidate alias list and the target POI.
    Type: Grant
    Filed: December 11, 2019
    Date of Patent: July 11, 2023
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Yanyan Li, Jianguo Duan, Hui Xiong
  • Patent number: 11699443
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.
    Type: Grant
    Filed: June 2, 2021
    Date of Patent: July 11, 2023
    Assignee: GOOGLE LLC
    Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
  • Patent number: 11694694
    Abstract: A method is provided for identifying synthetic “deep-fake” audio samples versus organic audio samples. Methods may include: generating a model of a vocal tract using one or more organic audio samples from a user; identifying a set of bigram-feature pairs from the one or more audio samples; estimating the cross-sectional area of the vocal tract of the user when speaking the set of bigram-feature pairs; receiving a candidate audio sample; identifying bigram-feature pairs of the candidate audio sample that are in the set of bigram-feature pairs; calculating a cross-sectional area of a theoretical vocal tract of a user when speaking the identified bigram-feature pairs; and identifying the candidate audio sample as a deep-fake audio sample in response to the calculated cross-sectional area of the theoretical vocal tract of a user failing to correspond within a predetermined measure of the estimated cross sectional area of the vocal tract of the user.
    Type: Grant
    Filed: July 27, 2021
    Date of Patent: July 4, 2023
    Assignee: UNIVERSITY OF FLORIDA RESEARCH FOUNDATION, INCORPORATED
    Inventors: Patrick G. Traynor, Kevin Butler, Logan E. Blue, Luis Vargas, Kevin S. Warren, Hadi Abdullah, Cassidy Gibson, Jessica Nicole Odell
  • Patent number: 11689472
    Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for allocating computing resources. The exemplary embodiments may include collecting data of one or more users, wherein the collected data comprises calendar data of the one or more users, extracting one or more features from the collected data, and allocating one or more computing resources to one or more of the users based on the extracted one or more features and one or more models.
    Type: Grant
    Filed: July 9, 2021
    Date of Patent: June 27, 2023
    Assignee: International Business Machines Corporation
    Inventors: Venkata Vara Prasad Karri, Saraswathi Sailaja Perumalla, Sarbajit K. Rakshit, Ryan Jackson, Ram Kumar Vadlamani
  • Patent number: 11670290
    Abstract: A speech signal processing method and apparatus is disclosed. The speech signal processing method includes receiving an input token that is based on a speech signal, calculating first probability values respectively corresponding to candidate output tokens based on the input token, adjusting at least one of the first probability values based on a priority of each of the first probability values, and processing the speech signal based on an adjusted probability value obtained by the adjusting.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: June 6, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Tae Gyoon Kang
  • Patent number: 11636846
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: April 25, 2023
    Assignee: Google LLC
    Inventors: Michael Buchanan, Pravir Kumar Gupta, Christopher Bo Tandiono
  • Patent number: 11600279
    Abstract: A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: March 7, 2023
    Assignee: Sorenson IP Holdings, LLC
    Inventors: Brian Chevrier, Shane Roylance, Kenneth Boehme
  • Patent number: 11594215
    Abstract: Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user's command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utterance corresponding to an inquiry with respect to the processing performed to respond to the command. For example, the user may state “why did you tell me that?” In response thereto, the speech processing system may determine the stored speech processing pipeline data used to respond to the command, and may generate output audio data that describes the data and computing decisions involved in determining the content deemed responsive to the command.
    Type: Grant
    Filed: October 11, 2019
    Date of Patent: February 28, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Michael James Moniz, Abishek Ravi, Ryan Scott Aldrich, Michael Bennett Adams
  • Patent number: 11580967
    Abstract: A speech feature extraction apparatus 100 includes a voice activity detection unit 103 that drops non-voice frames from frames corresponding to an input speech utterance, and calculates a posterior of being voiced for each frame, a voice activity detection process unit 106 calculates a function value as weights in pooling frames to produce an utterance-level feature, from a given a voice activity detection posterior, and an utterance-level feature extraction unit 112 that extracts an utterance-level feature, from the frame on a basis of multiple frame-level features, using the function values.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: February 14, 2023
    Assignee: NEC CORPORATION
    Inventors: Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka
  • Patent number: 11461948
    Abstract: A system and method voice driven animation of an object in an image by sampling an input video, depicting a puppet object, to obtain an image, receiving audio data, extracting voice related features from the audio data, producing an expression representation based on the voice related features, wherein the expression representation is related to a region of interest, obtaining from the image, auxiliary data related to the image and generating a target image based on the expression representation and the auxiliary data.
    Type: Grant
    Filed: July 15, 2021
    Date of Patent: October 4, 2022
    Assignee: DE-IDENTIFICATION LTD.
    Inventors: Eliran Kuta, Sella Blondheim, Gil Perry, Amitay Nachmani, Matan Ben-Yosef, Or Gorodissky
  • Patent number: 11449720
    Abstract: Provided is an image recognition device. The image recognition device includes a frame data change detector that sequentially receives a plurality of frame data and detects a difference between two consecutive frame data, an ensemble section controller that sets an ensemble section in the plurality of frame data, based on the detected difference, an image recognizer that sequentially identifies classes respectively corresponding to a plurality of section frame data by applying different neural network classifiers to the plurality of section frame data in the ensemble section, and a recognition result classifier that sequentially identifies ensemble classes respectively corresponding to the plurality of section frame data by combining the classes in the ensemble section.
    Type: Grant
    Filed: May 8, 2020
    Date of Patent: September 20, 2022
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Ju-Yeob Kim, Byung Jo Kim, Seong Min Kim, Jin Kyu Kim, Ki Hyuk Park, Mi Young Lee, Joo Hyun Lee, Young-deuk Jeon, Min-Hyung Cho
  • Patent number: 11443749
    Abstract: Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker.
    Type: Grant
    Filed: January 2, 2019
    Date of Patent: September 13, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyungtak Choi, Hyeonmok Ko, Jihie Kim, Hongchul Kim, Inchul Hwang
  • Patent number: 11386712
    Abstract: The present invention discloses method and system for multimodal analysis based emotion recognition. The method comprising segmenting video data of a user into a plurality of video segments. A plurality of visual features, voice features and text features from the plurality of video segments is extracted. Autocorrelation values among each of the plurality of visual features, the voice features, and the text features is determined. Each of the plurality of visual features, the voice features and the text features is aligned based on video segment identifier and the autocorrelation values to obtain a plurality of aligned multimodal features. One of two classes of emotions is determined for each of the plurality of aligned multimodal features. The determined emotion for each of the plurality of aligned multimodal features is compared with historic multimodal features from a database, and emotion of the user is determined at real time based on comparison.
    Type: Grant
    Filed: February 20, 2020
    Date of Patent: July 12, 2022
    Assignee: Wipro Limited
    Inventors: Rahul Yadav, Gopichand Agnihotram
  • Patent number: 11373657
    Abstract: A system for identifying audio data includes a feature extraction module receiving unknown input audio data and dividing the unknown input audio data into a plurality of segments of unknown input audio data. A similarity module receives the plurality of segments of the unknown input audio data and receives known audio data from a known source, the known audio data being divided into a plurality of segments of known audio data. The similarity module performs comparisons between the segments of unknown input audio data and respective segments of known audio data and generates a respective plurality of similarity values representative of similarity between the segments of the comparisons, the comparisons being performed serially. The similarity module terminates the comparisons if the similarity values indicate insufficient similarity between the segments of the comparisons, prior to completing comparisons for all segments of the unknown input audio data.
    Type: Grant
    Filed: May 1, 2020
    Date of Patent: June 28, 2022
    Assignee: Raytheon Applied Signal Technology, Inc.
    Inventors: Jonathan C. Wintrode, Nicholas J. Hinnerschitz, Aleksandr R. Jouravlev
  • Publication number: 20150106089
    Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.
    Type: Application
    Filed: December 30, 2010
    Publication date: April 16, 2015
    Inventors: Evan H. Parker, Michal R. Grabowski
  • Publication number: 20140129226
    Abstract: Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.
    Type: Application
    Filed: November 5, 2012
    Publication date: May 8, 2014
    Inventors: Antonio R. Lee, Petr Novak, Peder A. Olsen, Vaibhava Goel
  • Publication number: 20140114646
    Abstract: A system receives vocal input from one or more persons, and extracts one or more keywords from the vocal input. The system then generates a query using the one or more keywords, searches a database of products and services using the query, and identities a product or service as a function of the query.
    Type: Application
    Filed: October 24, 2012
    Publication date: April 24, 2014
    Applicant: SAP AG
    Inventor: Oleg Figlin
  • Publication number: 20140108010
    Abstract: A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands.
    Type: Application
    Filed: October 11, 2012
    Publication date: April 17, 2014
    Applicant: INTERMEC IP CORP.
    Inventors: Paul Maltseff, Roger Byford, Jim Logan
  • Publication number: 20140067395
    Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 6, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
  • Publication number: 20140058732
    Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.
    Type: Application
    Filed: August 21, 2012
    Publication date: February 27, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
  • Publication number: 20140025380
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Application
    Filed: July 18, 2012
    Publication date: January 23, 2014
    Applicant: International Business Machines Corporation
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Publication number: 20130332167
    Abstract: According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor.
    Type: Application
    Filed: June 12, 2012
    Publication date: December 12, 2013
    Applicant: Nuance Communications, Inc.
    Inventor: Robert M. Kilgore
  • Publication number: 20130325474
    Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
  • Publication number: 20130304471
    Abstract: A method, an apparatus and an article of manufacture for contextual voice query dilation in a Spoken Web search. The method includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.
    Type: Application
    Filed: May 14, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nitendra Rajput, Kundan Shrivastava
  • Publication number: 20130289994
    Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.
    Type: Application
    Filed: April 26, 2012
    Publication date: October 31, 2013
    Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
  • Publication number: 20130262116
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Application
    Filed: March 27, 2012
    Publication date: October 3, 2013
    Applicant: NOVOSPEECH
    Inventor: Yossef Ben-Ezra
  • Publication number: 20130262114
    Abstract: Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli.
    Type: Application
    Filed: April 3, 2012
    Publication date: October 3, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Christopher John Brockett, Piali Choudhury, William Brennan Dolan, Yun-Cheng Ju, Patrick Pantel, Noelle Mallory Sophy, Svitlana Volkova
  • Publication number: 20130226576
    Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.
    Type: Application
    Filed: February 23, 2012
    Publication date: August 29, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
  • Publication number: 20130191128
    Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.
    Type: Application
    Filed: August 28, 2012
    Publication date: July 25, 2013
    Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Chang Dong Yoo, Sung Woong Kim
  • Publication number: 20130179169
    Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.
    Type: Application
    Filed: July 5, 2012
    Publication date: July 11, 2013
    Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY
    Inventors: Yao-Ting Sung, Ju-Ling Chen
  • Publication number: 20130166302
    Abstract: Aspects of customizing digital signage are addressed. For example, an audio feed may be analyzed for keywords occurring in potential customers' speech. These keywords are then employed to customize display screens of a digital display.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Applicant: NCR Corporation
    Inventor: Brennan Eul I. Mercado