Endpoint Detection Patents (Class 704/248)

Voice capturing method and voice capturing system

Patent number: 11749296

Abstract: A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.

Type: Grant

Filed: September 27, 2021

Date of Patent: September 5, 2023

Assignee: REALTEK SEMICONDUCTOR CORPORATION

Inventors: Chung-Shih Chu, Ming-Tang Lee, Chieh-Min Tsai
Voice recognition method of artificial intelligence robot device

Patent number: 11508362

Abstract: A voice recognition method of an artificial intelligence robot device is disclosed. The voice recognition method includes collecting a first voice spoken by a user and determining whether a wake-up word of the artificial intelligence robot device is recognized based on the collected first voice; if the wake-up word is not recognized, sensing a location of the user using at least one sensor and determining whether the sensed location of the user is included in a set voice collection range; if the location of the user is included in the voice collection range, learning the first voice and determining a noise state of the first voice based on the learned first voice; collecting a second voice in an opposite direction of the location of the user according to a result of the determined noise state of the first voice; and extracting a feature value of a noise based on the second voice and removing the extracted feature value of the noise from the first voice to obtain the wake-up word.

Type: Grant

Filed: September 18, 2020

Date of Patent: November 22, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Inho Lee, Junmin Lee, Keunsang Lee
Voice wake-up detection from syllable and frequency characteristic

Patent number: 11250849

Abstract: A voice wake-up apparatus used in an electronic device that includes a voice activity detection circuit, a storage circuit and a smart detection circuit is provided. The voice activity detection circuit receives an input sound signal and detects a voice activity section of the input sound signal. The storage circuit stores a predetermined voice sample. The smart detection circuit receives the input sound signal to perform a time domain and a frequency domain detection on the voice activity section to generate a syllable and frequency characteristic detection result, compare the syllable and frequency characteristic detection result with the predetermined voice sample and generate a wake-up signal to a processing circuit of the electronic device when the syllable and frequency characteristic detection result matches the predetermined voice sample to wake up the processing circuit.

Type: Grant

Filed: October 24, 2019

Date of Patent: February 15, 2022

Assignee: REALTEK SEMICONDUCTOR CORPORATION

Inventors: Chi-Te Wang, Wen-Yu Huang
Modification of electronic system operation based on acoustic ambience classification

Patent number: 11024301

Abstract: Methods and systems for modification of electronic system operation based on acoustic ambience classification are presented. In an example method, at least one audio signal present in a physical environment of a user is detected. The at least one audio signal is analyzed to extract at least one audio feature from the audio signal. The audio signal is classified based on the audio feature to produce at least one classification of the audio signal. Operation of an electronic system interacting with the user in the physical environment is modified based on the classification of the audio signal.

Type: Grant

Filed: August 2, 2019

Date of Patent: June 1, 2021

Assignee: GRACENOTE, INC.

Inventors: Suresh Jeyachandran, Vadim Brenner, Markus K. Cremer
Speech endpointing based on word comparisons

Patent number: 11004441

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.

Type: Grant

Filed: August 14, 2019

Date of Patent: May 11, 2021

Assignee: Google LLC

Inventors: Michael Buchanan, Pravir Kumar Gupta, Christopher Bo Tandiono
Voice recognition device and voice recognition method

Patent number: 10997979

Abstract: A voice recognition device provided with a processor configured to determine a breathing period immediately before uttering which is a period in which a lip of a target person has moved with breathing immediately before uttering based on a captured image of the lip of the target person, to detect a voice period which is a period in which the target person is uttering without including the breathing period immediately before uttering determined above based on the captured image of the lip of the target person captured, to acquire a voice of the target person, and to recognize the voice of the target person based on the voice of the target person acquired above within the voice period detected above.

Type: Grant

Filed: June 14, 2019

Date of Patent: May 4, 2021

Assignee: CASIO COMPUTER CO., LTD.

Inventors: Kouichi Nakagome, Keisuke Shimada
Hotword detection on multiple devices

Patent number: 10909987

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Type: Grant

Filed: August 28, 2019

Date of Patent: February 2, 2021

Assignee: Google LLC

Inventor: Matthew Sharifi
Voice detection method and apparatus, and storage medium

Patent number: 10872620

Abstract: Embodiments of the present disclosure provide a voice detection method. An audio signal can be divided into a plurality of audio segments. Audio characteristics can be extracted from each of the plurality of audio segments. The audio characteristics of the respective audio segment include a time domain characteristic and a frequency domain characteristic of the respective audio segment. At least one target voice segment can be detected from the plurality of audio segments according to the audio characteristics of the plurality of audio segments.

Type: Grant

Filed: May 1, 2018

Date of Patent: December 22, 2020

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor: Haijin Fan
Voice dialog device and voice dialog method

Patent number: 10867607

Abstract: A voice dialog device includes a sight line detection unit configured to detect a sight line of a user, a voice acquiring unit configured to acquire voice pronounced by the user, and a processor. The processor is configured to perform a step of acquiring a result of recognizing the voice, a step of determining whether or not the user is driving, and a step of determining whether or not the voice dialog device has a dialog with the user. When the detected sight line of the user is in a certain direction, and a start keyword has been detected from the voice, the processor determines that the user has started a dialog. The processor switches the certain direction based on whether the user is driving.

Type: Grant

Filed: July 12, 2019

Date of Patent: December 15, 2020

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventors: Atsushi Ikeno, Muneaki Shimada, Kota Hatanaka, Toshifumi Nishijima, Fuminori Kataoka, Hiromi Tonegawa, Norihide Umeyama
Annoyance noise suppression

Patent number: 10595117

Abstract: Personal audio systems and methods are disclosed. A personal audio system includes a class table storing processing parameters respectively associated with a plurality of annoyance noise classes, a controller, and a processor. The controller identifies an annoyance noise class of an annoyance noise included in an ambient audio stream and retrieves, from the class table, one or more processing parameters associated with the identified annoyance noise class. The processor to processes the ambient audio stream according to the one or more retrieved processing parameters class to provide a personal audio stream. The processor includes a pitch tracker to identify a fundamental frequency of the annoyance noise and a filter bank including a band reject filter tuned to the fundamental frequency.

Type: Grant

Filed: March 24, 2017

Date of Patent: March 17, 2020

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Gints Klimanis, Anthony Parks, Jeff Baker
Hotword detection on multiple devices

Patent number: 10593330

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Type: Grant

Filed: October 26, 2018

Date of Patent: March 17, 2020

Assignee: Google LLC

Inventor: Matthew Sharifi
Processing of string inputs utilizing machine learning

Patent number: 10546063

Abstract: Natural language processing of raw text data for optimal sentence boundary placement. Raw text is extracted from a document and subject to cleaning. The extracted raw text is examined to identify preliminary sentence boundaries, which are used to identify potential sentences in the raw text. One or more potential sentences are assigned a well-formedness score. A value of the score correlates to whether the potential sentence is a truncated/ill-formed sentence or a well-formed sentence. One or more preliminary sentence boundaries are optimized depending on the value of the score of the potential sentence(s). Accordingly, the processing herein is an optimization that creates a sentence boundary optimized output.

Type: Grant

Filed: December 13, 2016

Date of Patent: January 28, 2020

Assignee: International Business Machines Corporation

Inventors: Charles E. Beller, Chengmin Ding, Allen Ginsberg, Elinna Shek
LED design language for visual affordance of voice user interfaces

Patent number: 10304450

Abstract: A method is implemented at an electronic device for visually indicating a voice processing state. The electronic device includes at least an array of full color LEDs, one or more microphones and a speaker. The electronic device collects via the one or more microphones audio inputs from an environment in proximity to the electronic device, and processes the audio inputs by identifying and/or responding to voice inputs from a user in the environment. A state of the processing is then determined from among a plurality of predefined voice processing states, and for each of the full color LEDs, a respective predetermined LED illumination specification is determined in association with the determined voice processing state. In accordance with the identified LED illumination specifications of the full color LEDs, the electronic device synchronizes illumination of the array of full color LEDs to provide a visual pattern indicating the determined voice processing state.

Type: Grant

Filed: May 10, 2017

Date of Patent: May 28, 2019

Assignee: GOOGLE LLC

Inventors: Jung Geun Tak, Amy Martin, Willard McClellan
Using long short-term memory recurrent neural network for speaker diarization segmentation

Patent number: 10249292

Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

Type: Grant

Filed: December 14, 2016

Date of Patent: April 2, 2019

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Hotword detection on multiple devices

Patent number: 10134398

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Type: Grant

Filed: November 9, 2016

Date of Patent: November 20, 2018

Assignee: Google LLC

Inventor: Matthew Sharifi
Electronic device and method capable of voice recognition

Patent number: 10056096

Abstract: Provided herein is an electronic device and method of voice recognition, the method including analyzing an audio signal of a first frame when the audio signal is input and extracting a first feature value; determining a similarity between the first feature value extracted from the audio signal of the first frame and a first feature value extracted from an audio signal of a previous frame; analyzing the audio signal of the first frame and extracting a second feature value when the similarity is below a predetermined threshold value; and comparing the extracted first feature value and the second feature value and at least one feature value corresponding to a pre-defined voice signal and determining whether or not the audio signal of the first frame is a voice signal, and thus the electronic device may detect only a voice section from the audio signal while improving the processing speed.

Type: Grant

Filed: July 22, 2016

Date of Patent: August 21, 2018

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Jong-uk Yoo
Voice processing apparatus, voice processing method, and non-transitory computer-readable storage medium to determine whether voice signals are in a conversation state

Patent number: 9916843

Abstract: A voice processing apparatus including a memory, and a processor coupled to the memory and the processor configured to acquire a first input signal containing a first voice, and a second input signal containing a second voice, obtain a first signal intensity of the first input signal, and a second signal intensity of the second input signal, specify a correlation coefficient between a time sequence of the first signal intensity and a time sequence of the second signal intensity, determine whether the first voice and the second voice are in the conversation state or not based on the specified correlation coefficient, and output information indicating an association between the first voice and the second voice when it is determined that the first voice and the second voice are in the conversation state.

Type: Grant

Filed: September 15, 2016

Date of Patent: March 13, 2018

Assignee: FUJITSU LIMITED

Inventors: Taro Togawa, Sayuri Kohmura, Takeshi Otani
Distributed endpointing for speech recognition

Patent number: 9818407

Abstract: An efficient audio streaming method and apparatus includes a client process implemented on a client or local device and a server process implemented on a remote server or server(s). The client process and server process each have speech recognition components and communicate over a network, and together efficiently manage the detection of speech in an audio signal streamed by the local device to the server for speech recognition and potentially further processing at the server. The client process monitors audio input and in a first detection stage, implements endpointing on the local device to determine when speech is detected. The client process may further determine if a “wakeword” is detected, and then the client process opens a connection and begins streaming audio to the server process via the network.

Type: Grant

Filed: February 7, 2013

Date of Patent: November 14, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Hugh Evan Secker-Walker, Kenneth John Basye, Nikko Strom, Ryan Paul Thomas
Apparatus and method for providing a reliable voice interface between a system and multiple users

Patent number: 9799332

Abstract: A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.

Type: Grant

Filed: November 9, 2010

Date of Patent: October 24, 2017

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-Hoon Kim, Chi-Youn Park, Jeong-Mi Cho, Jeong-su Kim
Methods and systems for generation of flexible sentences in a social networking system

Patent number: 9740690

Abstract: Some embodiments include a computer-implement method of producing a flexible sentence syntax to facilitate one or more computer applications to generate and publish sentence expressions. For example, the method can include providing a developer interface to define a flexible sentence syntax that controls one or more sentences publishable by an application service. A developer of the application service can customize the flexible sentence syntax including selecting at least one of selectable tokens that is associated with another element to incorporate in the flexible sentence syntax. Based on the selected token, a computing device can generate and publish a target sentence according to the flexible sentence syntax on the application service's behalf.

Type: Grant

Filed: February 22, 2017

Date of Patent: August 22, 2017

Assignee: Facebook, Inc.

Inventors: Ling Bao, Hugo Johan van Heuven, Jiangbo Miao
Method and apparatus for managing audio signals

Patent number: 9601132

Abstract: A method comprising: detect a first acoustic signal by using a microphone array; detecting a first angle associated with a first incident direction of the first acoustic signal; and storing, in a memory, a representation of the first acoustic signal and a representation of the first angle.

Type: Grant

Filed: February 17, 2016

Date of Patent: March 21, 2017

Assignee: Samsung Electronics Co., Ltd.

Inventors: Beakkwon Son, Gangyoul Kim, Namil Lee, Hochul Hwang, Jongmo Kum, Minho Bae
Speech dereverberation methods, devices and systems

Patent number: 9520140

Abstract: Improved audio data processing method and systems are provided. Some implementations involve dividing frequency domain audio data into a plurality of subbands and determining amplitude modulation signal values for each of the plurality of subbands. A band-pass filter may be applied to the amplitude modulation signal values in each subband, to produce band-pass filtered amplitude modulation signal values for each subband. The band-pass filter may have a central frequency that exceeds an average cadence of human speech. A gain may be determined for each subband based, at least in part, on a function of the amplitude modulation signal values and the band-pass filtered amplitude modulation signal values. The determined gain may be applied to each subband.

Type: Grant

Filed: March 31, 2014

Date of Patent: December 13, 2016

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Erwin Goesnar, Glenn N. Dickins, David Gunawan
Hotword detection on multiple devices

Patent number: 9514752

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Type: Grant

Filed: April 1, 2016

Date of Patent: December 6, 2016

Assignee: Google Inc.

Inventor: Matthew Sharifi
Apparatus and method for detecting voice based on motion information

Patent number: 9443536

Abstract: Disclosed are an apparatus and method of deducing a user's intention using motion information. The user's intention deduction apparatus includes a speech intention determining unit configured to predict a speech intention regarding a user's speech using motion information sensed by at least one motion capture sensor, and a controller configured to control operation of detecting a voice section from a received sound signal based on the predicted speech intention.

Type: Grant

Filed: April 29, 2010

Date of Patent: September 13, 2016

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jeong-Mi Cho, Jeong-Su Kim, Won-Chul Bang, Nam-Hoon Kim
Training speech recognition using captions

Patent number: 9418650

Abstract: In embodiments, apparatuses, methods and storage media are described that are associated with training adaptive speech recognition systems (“ASR”) using audio and text obtained from captioned video. In various embodiments, the audio and caption may be aligned for identification, such as according to a start and end time associated with a caption, and the alignment may be adjusted to better fit audio to a given caption. In various embodiments, the aligned audio and caption may then be used for training if an error value associated with the audio and caption demonstrates that the audio and caption will aid in training the ASR. In various embodiments, filters may be used on audio and text prior to training. Such filters may be used to exclude potential training audio and text based on filter criteria. Other embodiments may be described and claimed.

Type: Grant

Filed: September 25, 2013

Date of Patent: August 16, 2016

Assignee: Verizon Patent and Licensing Inc.

Inventors: Sujeeth S. Bharadwaj, Suri B. Medapati
Hotword detection on multiple devices

Patent number: 9318107

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Type: Grant

Filed: April 1, 2015

Date of Patent: April 19, 2016

Assignee: Google Inc.

Inventor: Matthew Sharifi
Method and apparatus for generating speaker-specific spoken passwords

Patent number: 9147400

Abstract: The present invention relates to a method and apparatus for generating speaker-specific spoken passwords. One embodiment of a method for generating a spoken password for use by a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and incorporating the speech features in the spoken password.

Type: Grant

Filed: December 21, 2011

Date of Patent: September 29, 2015

Assignee: SRI INTERNATIONAL

Inventor: Nicolas Scheffer
Indexing digitized speech with words represented in the digitized speech

Patent number: 9123337

Abstract: Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital au

Type: Grant

Filed: March 11, 2014

Date of Patent: September 1, 2015

Assignee: Nuance Communications, Inc.

Inventors: Charles W. Cross, Frank L. Jania
Speaker recognition from telephone calls

Patent number: 9043207

Abstract: The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to the at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for the at least one target speaker to obtain at least one comparison result; and determining whether one of the at least one unknown speakers is identical with the at least one target speaker based on the at least one comparison result.

Type: Grant

Filed: November 12, 2009

Date of Patent: May 26, 2015

Assignee: Agnitio S.L.

Inventors: Johan Nikolaas Langehoven Brummer, Luis Buera Rodriguez, Marta Garcia Gomar
METHOD AND APPARATUS FOR DETECTING SPEECH ENDPOINT USING WEIGHTED FINITE STATE TRANSDUCER

Publication number: 20140379345

Abstract: Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST. The apparatus in accordance with an embodiment of the present invention includes: a speech decision portion configured to receive frame units of feature vector converted from a speech signal and to analyze and classify the received feature vector into a speech class or a noise class; a frame level WFST configured to receive the speech class and the noise class and to convert the speech class and the noise class to a WFST format; a speech level WFST configured to detect a speech endpoint by analyzing a relationship between the speech class and noise class and a preset state; a WFST combination portion configured to combine the frame level WFST with the speech level WFST; and an optimization portion configured to optimize the combined WFST having the frame level WFST and the speech level WFST combined therein to have a minimum route.

Type: Application

Filed: March 25, 2014

Publication date: December 25, 2014

Applicant: Electronic and Telecommunications Research Institute

Inventors: Hoon CHUNG, Sung-Joo Lee, Yun-Keun Lee
System and method for pitch based gender identification with suspicious speaker detection

Patent number: 8831942

Abstract: A method is provided for identifying a gender of a speaker. The method steps include obtaining speech data of the speaker, extracting vowel-like speech frames from the speech data, analyzing the vowel-like speech frames to generate a feature vector having pitch values corresponding to the vowel-like frames, analyzing the pitch values to generate a most frequent pitch value, determining, in response to the most frequent pitch value being between a first pre-determined threshold and a second pre-determined threshold, an output of a male Gaussian Mixture Model (GMM) and an output of a female GMM using the pitch values as inputs to the male GMM and the female GMM, and identifying the gender of the speaker by comparing the output of the male GMM and the output of the female GMM based on a pre-determined criterion.

Type: Grant

Filed: March 19, 2010

Date of Patent: September 9, 2014

Assignee: Narus, Inc.

Inventor: Antonio Nucci
System and method for detecting synthetic speaker verification

Patent number: 8805685

Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.

Type: Grant

Filed: August 5, 2013

Date of Patent: August 12, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Horst J. Schroeter
Non-speech section detecting method and non-speech section detecting device

Patent number: 8798991

Abstract: A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.

Type: Grant

Filed: November 13, 2012

Date of Patent: August 5, 2014

Assignee: Fujitsu Limited

Inventors: Nobuyuki Washio, Shoji Hayakawa
Method for segmenting utterances by using partner's response

Patent number: 8793132

Abstract: An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes: a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.

Type: Grant

Filed: December 26, 2007

Date of Patent: July 29, 2014

Assignee: Nuance Communications, Inc.

Inventors: Nobuyasu Itoh, Gakuto Kurata
Methods and apparatus for buffering data for use in accordance with a speech recognition system

Patent number: 8781832

Abstract: Techniques are disclosed for overcoming errors in speech recognition systems. For example, a technique for processing acoustic data in accordance with a speech recognition system comprises the following steps/operations. Acoustic data is obtained in association with the speech recognition system. The acoustic data is recorded using a combination of a first buffer area and a second buffer area, such that the recording of the acoustic data using the combination of the two buffer areas at least substantially minimizes one or more truncation errors associated with operation of the speech recognition system.

Type: Grant

Filed: March 26, 2008

Date of Patent: July 15, 2014

Assignee: Nuance Communications, Inc.

Inventors: Liam D. Comerford, David Carl Frank, Burn L. Lewis, Leonid Rachevksy, Mahesh Viswanathan
Method and apparatus for speech segmentation

Patent number: 8775182

Abstract: Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.

Type: Grant

Filed: April 12, 2013

Date of Patent: July 8, 2014

Assignee: Intel Corporation

Inventors: Robert Du, Ye Tao, Daren Zu
Method for verifying the identity of a speaker and related computer readable medium and computer

Patent number: 8762149

Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.

Type: Grant

Filed: December 10, 2008

Date of Patent: June 24, 2014

Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martín de los Santos de las Heras, Marta García Gomar
VOICE-BASED CAPTCHA METHOD AND APPARATUS

Publication number: 20140163986

Abstract: Disclosed herein is a voice-based CAPTCHA method and apparatus which can perform a CAPTCHA procedure using the voice of a human being. In the voice-based CAPTCHA) method, a plurality of uttered sounds of a user are collected. A start point and an end point of a voice from each of the collected uttered sounds are detected and then speech sections are detected. Uttered sounds of the respective detected speech sections are compared with reference uttered sounds, and then it is determined whether the uttered sounds are correctly uttered sounds. It is determined whether the uttered sounds have been made by an identical speaker if it is determined that the uttered sounds are correctly uttered sounds.

Type: Application

Filed: December 3, 2013

Publication date: June 12, 2014

Applicant: Electronics and Telecommunications Research Institute

Inventors: Sung-Joo LEE, Ho-Young JUNG, Hwa-Jeon SONG, Eui-Sok CHUNG, Byung-Ok KANG, Hoon CHUNG, Jeon-Gue PARK, Hyung-Bae JEON, Yoo-Rhee OH, Yun-Keun LEE
METHOD AND SYSTEM FOR IDENTIFICATION OF SPEECH SEGMENTS

Publication number: 20140149117

Abstract: A system for distinguishing and identifying speech segments originating from speech of one or more relevant speakers in a predefined detection area. The system includes an optical system which outputs optical patterns, each representing audio signals as detected by the optical system in the area within a specific time frame; and a computer processor which receives each of the outputted optical patterns and analyses each respective optical pattern to provide information that enables identification of speech segments thereby, by identifying blank spaces in the optical pattern, which define beginning or ending of each respective speech segment.

Type: Application

Filed: June 21, 2012

Publication date: May 29, 2014

Applicant: VOCALZOOM SYSTEMS LTD.

Inventors: Tal Bakish, Gavriel Horowitz, Yekutiel Avargel, Yechiel Kurtz
Computer-Implemented System And Method For Masking Special Data

Publication number: 20140129219

Abstract: A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold.

Type: Application

Filed: November 4, 2013

Publication date: May 8, 2014

Applicant: Intellisist, Inc.

Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
Systems and methods for hands-free voice control and voice search

Patent number: 8700399

Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.

Type: Grant

Filed: July 6, 2010

Date of Patent: April 15, 2014

Assignee: Sensory, Inc.

Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
Preserving audio data collection privacy in mobile devices

Patent number: 8700406

Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.

Type: Grant

Filed: August 19, 2011

Date of Patent: April 15, 2014

Assignee: Qualcomm Incorporated

Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
Computer-Implemented System And Method For Enhancing Visual Representation To Individuals Participating In A Conversation

Publication number: 20140046665

Abstract: A system and method for enhancing visual representation to individuals participating in a conversation is provided. Visual data for a plurality of individuals participating in one or more conversations is analyzed. Possible conversational configurations of the individuals are generated. Each possible conversational configuration includes one or more pair-wise probabilities of at least two of the individuals. A probability weight is assigned to each of the pair-wise probabilities having a likelihood that the individuals of that pair-wise probability are participating in a conversation. A probability of each possible conversational configuration is determined by combining the probability weights for the pair-wise probabilities of that possible conversational configuration. The possible conversational configuration with the highest probability is selected as a most probable configuration.

Type: Application

Filed: October 18, 2013

Publication date: February 13, 2014

Applicant: Palo Alto Research Center Incorporated

Inventors: Paul M. Aoki, Margaret H. Szymanski, James Thornton, Daniel H. Wilson, Allison G. Woodruff
Apparatus and method for automatic extraction of important events in audio signals

Patent number: 8635065

Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted

Type: Grant

Filed: November 10, 2004

Date of Patent: January 21, 2014

Assignee: Sony Deutschland GmbH

Inventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
Automatic determination of multimedia and voice signals

Patent number: 8606569

Abstract: The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.

Type: Grant

Filed: November 12, 2012

Date of Patent: December 10, 2013

Inventor: Alon Konchitsky
Method and apparatus for performing song detection on audio signal

Patent number: 8595009

Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.

Type: Grant

Filed: July 26, 2012

Date of Patent: November 26, 2013

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Lie Lu, Claus Bauer
Inference-aided speaker recognition

Patent number: 8571865

Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.

Type: Grant

Filed: August 10, 2012

Date of Patent: October 29, 2013

Assignee: Google Inc.

Inventor: Philip Hewinson
Method and system for speaker diarization

Patent number: 8554563

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Grant

Filed: September 11, 2012

Date of Patent: October 8, 2013

Assignee: Nuance Communications, Inc.

Inventor: Hagai Aronowitz
Method and system for speaker diarization

Patent number: 8554562

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Grant

Filed: November 15, 2009

Date of Patent: October 8, 2013

Assignee: Nuance Communications, Inc.

Inventor: Hagai Aronowitz
Voice activity decision base on zero crossing rate and spectral sub-band energy

Patent number: 8554547

Abstract: A voice activity detection method and apparatus, and an electronic device are provided. The method includes: obtaining a time domain parameter and a frequency domain parameter from an audio frame; obtaining a first distance between the time domain parameter and a long-term-sliding mean of the time domain parameter in a history background noise frame, and obtaining a second distance between the frequency domain parameter and a long-term-sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance and a set of decision inequalities based on the first distance and the second distance. The above technical solutions enable the judgment criterion to have an adaptive adjustment capability, thus improving the performance of the voice activity detection.

Type: Grant

Filed: July 11, 2012

Date of Patent: October 8, 2013

Assignee: Huawei Technologies Co., Ltd.

Inventor: Zhe Wang

1 2 3 4 next