Endpoint Detection Patents (Class 704/248)
  • Patent number: 11749296
    Abstract: A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.
    Type: Grant
    Filed: September 27, 2021
    Date of Patent: September 5, 2023
    Assignee: REALTEK SEMICONDUCTOR CORPORATION
    Inventors: Chung-Shih Chu, Ming-Tang Lee, Chieh-Min Tsai
  • Patent number: 11508362
    Abstract: A voice recognition method of an artificial intelligence robot device is disclosed. The voice recognition method includes collecting a first voice spoken by a user and determining whether a wake-up word of the artificial intelligence robot device is recognized based on the collected first voice; if the wake-up word is not recognized, sensing a location of the user using at least one sensor and determining whether the sensed location of the user is included in a set voice collection range; if the location of the user is included in the voice collection range, learning the first voice and determining a noise state of the first voice based on the learned first voice; collecting a second voice in an opposite direction of the location of the user according to a result of the determined noise state of the first voice; and extracting a feature value of a noise based on the second voice and removing the extracted feature value of the noise from the first voice to obtain the wake-up word.
    Type: Grant
    Filed: September 18, 2020
    Date of Patent: November 22, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Inho Lee, Junmin Lee, Keunsang Lee
  • Patent number: 11250849
    Abstract: A voice wake-up apparatus used in an electronic device that includes a voice activity detection circuit, a storage circuit and a smart detection circuit is provided. The voice activity detection circuit receives an input sound signal and detects a voice activity section of the input sound signal. The storage circuit stores a predetermined voice sample. The smart detection circuit receives the input sound signal to perform a time domain and a frequency domain detection on the voice activity section to generate a syllable and frequency characteristic detection result, compare the syllable and frequency characteristic detection result with the predetermined voice sample and generate a wake-up signal to a processing circuit of the electronic device when the syllable and frequency characteristic detection result matches the predetermined voice sample to wake up the processing circuit.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: February 15, 2022
    Assignee: REALTEK SEMICONDUCTOR CORPORATION
    Inventors: Chi-Te Wang, Wen-Yu Huang
  • Patent number: 11024301
    Abstract: Methods and systems for modification of electronic system operation based on acoustic ambience classification are presented. In an example method, at least one audio signal present in a physical environment of a user is detected. The at least one audio signal is analyzed to extract at least one audio feature from the audio signal. The audio signal is classified based on the audio feature to produce at least one classification of the audio signal. Operation of an electronic system interacting with the user in the physical environment is modified based on the classification of the audio signal.
    Type: Grant
    Filed: August 2, 2019
    Date of Patent: June 1, 2021
    Assignee: GRACENOTE, INC.
    Inventors: Suresh Jeyachandran, Vadim Brenner, Markus K. Cremer
  • Patent number: 11004441
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: May 11, 2021
    Assignee: Google LLC
    Inventors: Michael Buchanan, Pravir Kumar Gupta, Christopher Bo Tandiono
  • Patent number: 10997979
    Abstract: A voice recognition device provided with a processor configured to determine a breathing period immediately before uttering which is a period in which a lip of a target person has moved with breathing immediately before uttering based on a captured image of the lip of the target person, to detect a voice period which is a period in which the target person is uttering without including the breathing period immediately before uttering determined above based on the captured image of the lip of the target person captured, to acquire a voice of the target person, and to recognize the voice of the target person based on the voice of the target person acquired above within the voice period detected above.
    Type: Grant
    Filed: June 14, 2019
    Date of Patent: May 4, 2021
    Assignee: CASIO COMPUTER CO., LTD.
    Inventors: Kouichi Nakagome, Keisuke Shimada
  • Patent number: 10909987
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: February 2, 2021
    Assignee: Google LLC
    Inventor: Matthew Sharifi
  • Patent number: 10872620
    Abstract: Embodiments of the present disclosure provide a voice detection method. An audio signal can be divided into a plurality of audio segments. Audio characteristics can be extracted from each of the plurality of audio segments. The audio characteristics of the respective audio segment include a time domain characteristic and a frequency domain characteristic of the respective audio segment. At least one target voice segment can be detected from the plurality of audio segments according to the audio characteristics of the plurality of audio segments.
    Type: Grant
    Filed: May 1, 2018
    Date of Patent: December 22, 2020
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventor: Haijin Fan
  • Patent number: 10867607
    Abstract: A voice dialog device includes a sight line detection unit configured to detect a sight line of a user, a voice acquiring unit configured to acquire voice pronounced by the user, and a processor. The processor is configured to perform a step of acquiring a result of recognizing the voice, a step of determining whether or not the user is driving, and a step of determining whether or not the voice dialog device has a dialog with the user. When the detected sight line of the user is in a certain direction, and a start keyword has been detected from the voice, the processor determines that the user has started a dialog. The processor switches the certain direction based on whether the user is driving.
    Type: Grant
    Filed: July 12, 2019
    Date of Patent: December 15, 2020
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: Atsushi Ikeno, Muneaki Shimada, Kota Hatanaka, Toshifumi Nishijima, Fuminori Kataoka, Hiromi Tonegawa, Norihide Umeyama
  • Patent number: 10595117
    Abstract: Personal audio systems and methods are disclosed. A personal audio system includes a class table storing processing parameters respectively associated with a plurality of annoyance noise classes, a controller, and a processor. The controller identifies an annoyance noise class of an annoyance noise included in an ambient audio stream and retrieves, from the class table, one or more processing parameters associated with the identified annoyance noise class. The processor to processes the ambient audio stream according to the one or more retrieved processing parameters class to provide a personal audio stream. The processor includes a pitch tracker to identify a fundamental frequency of the annoyance noise and a filter bank including a band reject filter tuned to the fundamental frequency.
    Type: Grant
    Filed: March 24, 2017
    Date of Patent: March 17, 2020
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Gints Klimanis, Anthony Parks, Jeff Baker
  • Patent number: 10593330
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 17, 2020
    Assignee: Google LLC
    Inventor: Matthew Sharifi
  • Patent number: 10546063
    Abstract: Natural language processing of raw text data for optimal sentence boundary placement. Raw text is extracted from a document and subject to cleaning. The extracted raw text is examined to identify preliminary sentence boundaries, which are used to identify potential sentences in the raw text. One or more potential sentences are assigned a well-formedness score. A value of the score correlates to whether the potential sentence is a truncated/ill-formed sentence or a well-formed sentence. One or more preliminary sentence boundaries are optimized depending on the value of the score of the potential sentence(s). Accordingly, the processing herein is an optimization that creates a sentence boundary optimized output.
    Type: Grant
    Filed: December 13, 2016
    Date of Patent: January 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Charles E. Beller, Chengmin Ding, Allen Ginsberg, Elinna Shek
  • Patent number: 10304450
    Abstract: A method is implemented at an electronic device for visually indicating a voice processing state. The electronic device includes at least an array of full color LEDs, one or more microphones and a speaker. The electronic device collects via the one or more microphones audio inputs from an environment in proximity to the electronic device, and processes the audio inputs by identifying and/or responding to voice inputs from a user in the environment. A state of the processing is then determined from among a plurality of predefined voice processing states, and for each of the full color LEDs, a respective predetermined LED illumination specification is determined in association with the determined voice processing state. In accordance with the identified LED illumination specifications of the full color LEDs, the electronic device synchronizes illumination of the array of full color LEDs to provide a visual pattern indicating the determined voice processing state.
    Type: Grant
    Filed: May 10, 2017
    Date of Patent: May 28, 2019
    Assignee: GOOGLE LLC
    Inventors: Jung Geun Tak, Amy Martin, Willard McClellan
  • Patent number: 10249292
    Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: April 2, 2019
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Patent number: 10134398
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
    Type: Grant
    Filed: November 9, 2016
    Date of Patent: November 20, 2018
    Assignee: Google LLC
    Inventor: Matthew Sharifi
  • Patent number: 10056096
    Abstract: Provided herein is an electronic device and method of voice recognition, the method including analyzing an audio signal of a first frame when the audio signal is input and extracting a first feature value; determining a similarity between the first feature value extracted from the audio signal of the first frame and a first feature value extracted from an audio signal of a previous frame; analyzing the audio signal of the first frame and extracting a second feature value when the similarity is below a predetermined threshold value; and comparing the extracted first feature value and the second feature value and at least one feature value corresponding to a pre-defined voice signal and determining whether or not the audio signal of the first frame is a voice signal, and thus the electronic device may detect only a voice section from the audio signal while improving the processing speed.
    Type: Grant
    Filed: July 22, 2016
    Date of Patent: August 21, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Jong-uk Yoo
  • Patent number: 9916843
    Abstract: A voice processing apparatus including a memory, and a processor coupled to the memory and the processor configured to acquire a first input signal containing a first voice, and a second input signal containing a second voice, obtain a first signal intensity of the first input signal, and a second signal intensity of the second input signal, specify a correlation coefficient between a time sequence of the first signal intensity and a time sequence of the second signal intensity, determine whether the first voice and the second voice are in the conversation state or not based on the specified correlation coefficient, and output information indicating an association between the first voice and the second voice when it is determined that the first voice and the second voice are in the conversation state.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: March 13, 2018
    Assignee: FUJITSU LIMITED
    Inventors: Taro Togawa, Sayuri Kohmura, Takeshi Otani
  • Patent number: 9818407
    Abstract: An efficient audio streaming method and apparatus includes a client process implemented on a client or local device and a server process implemented on a remote server or server(s). The client process and server process each have speech recognition components and communicate over a network, and together efficiently manage the detection of speech in an audio signal streamed by the local device to the server for speech recognition and potentially further processing at the server. The client process monitors audio input and in a first detection stage, implements endpointing on the local device to determine when speech is detected. The client process may further determine if a “wakeword” is detected, and then the client process opens a connection and begins streaming audio to the server process via the network.
    Type: Grant
    Filed: February 7, 2013
    Date of Patent: November 14, 2017
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Hugh Evan Secker-Walker, Kenneth John Basye, Nikko Strom, Ryan Paul Thomas
  • Patent number: 9799332
    Abstract: A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.
    Type: Grant
    Filed: November 9, 2010
    Date of Patent: October 24, 2017
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Nam-Hoon Kim, Chi-Youn Park, Jeong-Mi Cho, Jeong-su Kim
  • Patent number: 9740690
    Abstract: Some embodiments include a computer-implement method of producing a flexible sentence syntax to facilitate one or more computer applications to generate and publish sentence expressions. For example, the method can include providing a developer interface to define a flexible sentence syntax that controls one or more sentences publishable by an application service. A developer of the application service can customize the flexible sentence syntax including selecting at least one of selectable tokens that is associated with another element to incorporate in the flexible sentence syntax. Based on the selected token, a computing device can generate and publish a target sentence according to the flexible sentence syntax on the application service's behalf.
    Type: Grant
    Filed: February 22, 2017
    Date of Patent: August 22, 2017
    Assignee: Facebook, Inc.
    Inventors: Ling Bao, Hugo Johan van Heuven, Jiangbo Miao
  • Patent number: 9601132
    Abstract: A method comprising: detect a first acoustic signal by using a microphone array; detecting a first angle associated with a first incident direction of the first acoustic signal; and storing, in a memory, a representation of the first acoustic signal and a representation of the first angle.
    Type: Grant
    Filed: February 17, 2016
    Date of Patent: March 21, 2017
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Beakkwon Son, Gangyoul Kim, Namil Lee, Hochul Hwang, Jongmo Kum, Minho Bae
  • Patent number: 9520140
    Abstract: Improved audio data processing method and systems are provided. Some implementations involve dividing frequency domain audio data into a plurality of subbands and determining amplitude modulation signal values for each of the plurality of subbands. A band-pass filter may be applied to the amplitude modulation signal values in each subband, to produce band-pass filtered amplitude modulation signal values for each subband. The band-pass filter may have a central frequency that exceeds an average cadence of human speech. A gain may be determined for each subband based, at least in part, on a function of the amplitude modulation signal values and the band-pass filtered amplitude modulation signal values. The determined gain may be applied to each subband.
    Type: Grant
    Filed: March 31, 2014
    Date of Patent: December 13, 2016
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Erwin Goesnar, Glenn N. Dickins, David Gunawan
  • Patent number: 9514752
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: December 6, 2016
    Assignee: Google Inc.
    Inventor: Matthew Sharifi
  • Patent number: 9443536
    Abstract: Disclosed are an apparatus and method of deducing a user's intention using motion information. The user's intention deduction apparatus includes a speech intention determining unit configured to predict a speech intention regarding a user's speech using motion information sensed by at least one motion capture sensor, and a controller configured to control operation of detecting a voice section from a received sound signal based on the predicted speech intention.
    Type: Grant
    Filed: April 29, 2010
    Date of Patent: September 13, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jeong-Mi Cho, Jeong-Su Kim, Won-Chul Bang, Nam-Hoon Kim
  • Patent number: 9418650
    Abstract: In embodiments, apparatuses, methods and storage media are described that are associated with training adaptive speech recognition systems (“ASR”) using audio and text obtained from captioned video. In various embodiments, the audio and caption may be aligned for identification, such as according to a start and end time associated with a caption, and the alignment may be adjusted to better fit audio to a given caption. In various embodiments, the aligned audio and caption may then be used for training if an error value associated with the audio and caption demonstrates that the audio and caption will aid in training the ASR. In various embodiments, filters may be used on audio and text prior to training. Such filters may be used to exclude potential training audio and text based on filter criteria. Other embodiments may be described and claimed.
    Type: Grant
    Filed: September 25, 2013
    Date of Patent: August 16, 2016
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Sujeeth S. Bharadwaj, Suri B. Medapati
  • Patent number: 9318107
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
    Type: Grant
    Filed: April 1, 2015
    Date of Patent: April 19, 2016
    Assignee: Google Inc.
    Inventor: Matthew Sharifi
  • Patent number: 9147400
    Abstract: The present invention relates to a method and apparatus for generating speaker-specific spoken passwords. One embodiment of a method for generating a spoken password for use by a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and incorporating the speech features in the spoken password.
    Type: Grant
    Filed: December 21, 2011
    Date of Patent: September 29, 2015
    Assignee: SRI INTERNATIONAL
    Inventor: Nicolas Scheffer
  • Patent number: 9123337
    Abstract: Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital au
    Type: Grant
    Filed: March 11, 2014
    Date of Patent: September 1, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Charles W. Cross, Frank L. Jania
  • Patent number: 9043207
    Abstract: The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to the at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for the at least one target speaker to obtain at least one comparison result; and determining whether one of the at least one unknown speakers is identical with the at least one target speaker based on the at least one comparison result.
    Type: Grant
    Filed: November 12, 2009
    Date of Patent: May 26, 2015
    Assignee: Agnitio S.L.
    Inventors: Johan Nikolaas Langehoven Brummer, Luis Buera Rodriguez, Marta Garcia Gomar
  • Publication number: 20140379345
    Abstract: Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST. The apparatus in accordance with an embodiment of the present invention includes: a speech decision portion configured to receive frame units of feature vector converted from a speech signal and to analyze and classify the received feature vector into a speech class or a noise class; a frame level WFST configured to receive the speech class and the noise class and to convert the speech class and the noise class to a WFST format; a speech level WFST configured to detect a speech endpoint by analyzing a relationship between the speech class and noise class and a preset state; a WFST combination portion configured to combine the frame level WFST with the speech level WFST; and an optimization portion configured to optimize the combined WFST having the frame level WFST and the speech level WFST combined therein to have a minimum route.
    Type: Application
    Filed: March 25, 2014
    Publication date: December 25, 2014
    Applicant: Electronic and Telecommunications Research Institute
    Inventors: Hoon CHUNG, Sung-Joo Lee, Yun-Keun Lee
  • Patent number: 8831942
    Abstract: A method is provided for identifying a gender of a speaker. The method steps include obtaining speech data of the speaker, extracting vowel-like speech frames from the speech data, analyzing the vowel-like speech frames to generate a feature vector having pitch values corresponding to the vowel-like frames, analyzing the pitch values to generate a most frequent pitch value, determining, in response to the most frequent pitch value being between a first pre-determined threshold and a second pre-determined threshold, an output of a male Gaussian Mixture Model (GMM) and an output of a female GMM using the pitch values as inputs to the male GMM and the female GMM, and identifying the gender of the speaker by comparing the output of the male GMM and the output of the female GMM based on a pre-determined criterion.
    Type: Grant
    Filed: March 19, 2010
    Date of Patent: September 9, 2014
    Assignee: Narus, Inc.
    Inventor: Antonio Nucci
  • Patent number: 8805685
    Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: August 12, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst J. Schroeter
  • Patent number: 8798991
    Abstract: A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.
    Type: Grant
    Filed: November 13, 2012
    Date of Patent: August 5, 2014
    Assignee: Fujitsu Limited
    Inventors: Nobuyuki Washio, Shoji Hayakawa
  • Patent number: 8793132
    Abstract: An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes: a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
    Type: Grant
    Filed: December 26, 2007
    Date of Patent: July 29, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Nobuyasu Itoh, Gakuto Kurata
  • Patent number: 8781832
    Abstract: Techniques are disclosed for overcoming errors in speech recognition systems. For example, a technique for processing acoustic data in accordance with a speech recognition system comprises the following steps/operations. Acoustic data is obtained in association with the speech recognition system. The acoustic data is recorded using a combination of a first buffer area and a second buffer area, such that the recording of the acoustic data using the combination of the two buffer areas at least substantially minimizes one or more truncation errors associated with operation of the speech recognition system.
    Type: Grant
    Filed: March 26, 2008
    Date of Patent: July 15, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Liam D. Comerford, David Carl Frank, Burn L. Lewis, Leonid Rachevksy, Mahesh Viswanathan
  • Patent number: 8775182
    Abstract: Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.
    Type: Grant
    Filed: April 12, 2013
    Date of Patent: July 8, 2014
    Assignee: Intel Corporation
    Inventors: Robert Du, Ye Tao, Daren Zu
  • Patent number: 8762149
    Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.
    Type: Grant
    Filed: December 10, 2008
    Date of Patent: June 24, 2014
    Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martín de los Santos de las Heras, Marta García Gomar
  • Publication number: 20140163986
    Abstract: Disclosed herein is a voice-based CAPTCHA method and apparatus which can perform a CAPTCHA procedure using the voice of a human being. In the voice-based CAPTCHA) method, a plurality of uttered sounds of a user are collected. A start point and an end point of a voice from each of the collected uttered sounds are detected and then speech sections are detected. Uttered sounds of the respective detected speech sections are compared with reference uttered sounds, and then it is determined whether the uttered sounds are correctly uttered sounds. It is determined whether the uttered sounds have been made by an identical speaker if it is determined that the uttered sounds are correctly uttered sounds.
    Type: Application
    Filed: December 3, 2013
    Publication date: June 12, 2014
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Sung-Joo LEE, Ho-Young JUNG, Hwa-Jeon SONG, Eui-Sok CHUNG, Byung-Ok KANG, Hoon CHUNG, Jeon-Gue PARK, Hyung-Bae JEON, Yoo-Rhee OH, Yun-Keun LEE
  • Publication number: 20140149117
    Abstract: A system for distinguishing and identifying speech segments originating from speech of one or more relevant speakers in a predefined detection area. The system includes an optical system which outputs optical patterns, each representing audio signals as detected by the optical system in the area within a specific time frame; and a computer processor which receives each of the outputted optical patterns and analyses each respective optical pattern to provide information that enables identification of speech segments thereby, by identifying blank spaces in the optical pattern, which define beginning or ending of each respective speech segment.
    Type: Application
    Filed: June 21, 2012
    Publication date: May 29, 2014
    Applicant: VOCALZOOM SYSTEMS LTD.
    Inventors: Tal Bakish, Gavriel Horowitz, Yekutiel Avargel, Yechiel Kurtz
  • Publication number: 20140129219
    Abstract: A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold.
    Type: Application
    Filed: November 4, 2013
    Publication date: May 8, 2014
    Applicant: Intellisist, Inc.
    Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
  • Patent number: 8700399
    Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.
    Type: Grant
    Filed: July 6, 2010
    Date of Patent: April 15, 2014
    Assignee: Sensory, Inc.
    Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
  • Patent number: 8700406
    Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.
    Type: Grant
    Filed: August 19, 2011
    Date of Patent: April 15, 2014
    Assignee: Qualcomm Incorporated
    Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
  • Publication number: 20140046665
    Abstract: A system and method for enhancing visual representation to individuals participating in a conversation is provided. Visual data for a plurality of individuals participating in one or more conversations is analyzed. Possible conversational configurations of the individuals are generated. Each possible conversational configuration includes one or more pair-wise probabilities of at least two of the individuals. A probability weight is assigned to each of the pair-wise probabilities having a likelihood that the individuals of that pair-wise probability are participating in a conversation. A probability of each possible conversational configuration is determined by combining the probability weights for the pair-wise probabilities of that possible conversational configuration. The possible conversational configuration with the highest probability is selected as a most probable configuration.
    Type: Application
    Filed: October 18, 2013
    Publication date: February 13, 2014
    Applicant: Palo Alto Research Center Incorporated
    Inventors: Paul M. Aoki, Margaret H. Szymanski, James Thornton, Daniel H. Wilson, Allison G. Woodruff
  • Patent number: 8635065
    Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted
    Type: Grant
    Filed: November 10, 2004
    Date of Patent: January 21, 2014
    Assignee: Sony Deutschland GmbH
    Inventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
  • Patent number: 8606569
    Abstract: The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: December 10, 2013
    Inventor: Alon Konchitsky
  • Patent number: 8595009
    Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.
    Type: Grant
    Filed: July 26, 2012
    Date of Patent: November 26, 2013
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Lie Lu, Claus Bauer
  • Patent number: 8571865
    Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.
    Type: Grant
    Filed: August 10, 2012
    Date of Patent: October 29, 2013
    Assignee: Google Inc.
    Inventor: Philip Hewinson
  • Patent number: 8554563
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Hagai Aronowitz
  • Patent number: 8554562
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Grant
    Filed: November 15, 2009
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Hagai Aronowitz
  • Patent number: 8554547
    Abstract: A voice activity detection method and apparatus, and an electronic device are provided. The method includes: obtaining a time domain parameter and a frequency domain parameter from an audio frame; obtaining a first distance between the time domain parameter and a long-term-sliding mean of the time domain parameter in a history background noise frame, and obtaining a second distance between the frequency domain parameter and a long-term-sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance and a set of decision inequalities based on the first distance and the second distance. The above technical solutions enable the judgment criterion to have an adaptive adjustment capability, thus improving the performance of the voice activity detection.
    Type: Grant
    Filed: July 11, 2012
    Date of Patent: October 8, 2013
    Assignee: Huawei Technologies Co., Ltd.
    Inventor: Zhe Wang