Endpoint Detection Patents (Class 704/248)
-
Patent number: 11749296Abstract: A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.Type: GrantFiled: September 27, 2021Date of Patent: September 5, 2023Assignee: REALTEK SEMICONDUCTOR CORPORATIONInventors: Chung-Shih Chu, Ming-Tang Lee, Chieh-Min Tsai
-
Patent number: 11508362Abstract: A voice recognition method of an artificial intelligence robot device is disclosed. The voice recognition method includes collecting a first voice spoken by a user and determining whether a wake-up word of the artificial intelligence robot device is recognized based on the collected first voice; if the wake-up word is not recognized, sensing a location of the user using at least one sensor and determining whether the sensed location of the user is included in a set voice collection range; if the location of the user is included in the voice collection range, learning the first voice and determining a noise state of the first voice based on the learned first voice; collecting a second voice in an opposite direction of the location of the user according to a result of the determined noise state of the first voice; and extracting a feature value of a noise based on the second voice and removing the extracted feature value of the noise from the first voice to obtain the wake-up word.Type: GrantFiled: September 18, 2020Date of Patent: November 22, 2022Assignee: LG ELECTRONICS INC.Inventors: Inho Lee, Junmin Lee, Keunsang Lee
-
Patent number: 11250849Abstract: A voice wake-up apparatus used in an electronic device that includes a voice activity detection circuit, a storage circuit and a smart detection circuit is provided. The voice activity detection circuit receives an input sound signal and detects a voice activity section of the input sound signal. The storage circuit stores a predetermined voice sample. The smart detection circuit receives the input sound signal to perform a time domain and a frequency domain detection on the voice activity section to generate a syllable and frequency characteristic detection result, compare the syllable and frequency characteristic detection result with the predetermined voice sample and generate a wake-up signal to a processing circuit of the electronic device when the syllable and frequency characteristic detection result matches the predetermined voice sample to wake up the processing circuit.Type: GrantFiled: October 24, 2019Date of Patent: February 15, 2022Assignee: REALTEK SEMICONDUCTOR CORPORATIONInventors: Chi-Te Wang, Wen-Yu Huang
-
Patent number: 11024301Abstract: Methods and systems for modification of electronic system operation based on acoustic ambience classification are presented. In an example method, at least one audio signal present in a physical environment of a user is detected. The at least one audio signal is analyzed to extract at least one audio feature from the audio signal. The audio signal is classified based on the audio feature to produce at least one classification of the audio signal. Operation of an electronic system interacting with the user in the physical environment is modified based on the classification of the audio signal.Type: GrantFiled: August 2, 2019Date of Patent: June 1, 2021Assignee: GRACENOTE, INC.Inventors: Suresh Jeyachandran, Vadim Brenner, Markus K. Cremer
-
Patent number: 11004441Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.Type: GrantFiled: August 14, 2019Date of Patent: May 11, 2021Assignee: Google LLCInventors: Michael Buchanan, Pravir Kumar Gupta, Christopher Bo Tandiono
-
Patent number: 10997979Abstract: A voice recognition device provided with a processor configured to determine a breathing period immediately before uttering which is a period in which a lip of a target person has moved with breathing immediately before uttering based on a captured image of the lip of the target person, to detect a voice period which is a period in which the target person is uttering without including the breathing period immediately before uttering determined above based on the captured image of the lip of the target person captured, to acquire a voice of the target person, and to recognize the voice of the target person based on the voice of the target person acquired above within the voice period detected above.Type: GrantFiled: June 14, 2019Date of Patent: May 4, 2021Assignee: CASIO COMPUTER CO., LTD.Inventors: Kouichi Nakagome, Keisuke Shimada
-
Patent number: 10909987Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.Type: GrantFiled: August 28, 2019Date of Patent: February 2, 2021Assignee: Google LLCInventor: Matthew Sharifi
-
Patent number: 10872620Abstract: Embodiments of the present disclosure provide a voice detection method. An audio signal can be divided into a plurality of audio segments. Audio characteristics can be extracted from each of the plurality of audio segments. The audio characteristics of the respective audio segment include a time domain characteristic and a frequency domain characteristic of the respective audio segment. At least one target voice segment can be detected from the plurality of audio segments according to the audio characteristics of the plurality of audio segments.Type: GrantFiled: May 1, 2018Date of Patent: December 22, 2020Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventor: Haijin Fan
-
Patent number: 10867607Abstract: A voice dialog device includes a sight line detection unit configured to detect a sight line of a user, a voice acquiring unit configured to acquire voice pronounced by the user, and a processor. The processor is configured to perform a step of acquiring a result of recognizing the voice, a step of determining whether or not the user is driving, and a step of determining whether or not the voice dialog device has a dialog with the user. When the detected sight line of the user is in a certain direction, and a start keyword has been detected from the voice, the processor determines that the user has started a dialog. The processor switches the certain direction based on whether the user is driving.Type: GrantFiled: July 12, 2019Date of Patent: December 15, 2020Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHAInventors: Atsushi Ikeno, Muneaki Shimada, Kota Hatanaka, Toshifumi Nishijima, Fuminori Kataoka, Hiromi Tonegawa, Norihide Umeyama
-
Patent number: 10595117Abstract: Personal audio systems and methods are disclosed. A personal audio system includes a class table storing processing parameters respectively associated with a plurality of annoyance noise classes, a controller, and a processor. The controller identifies an annoyance noise class of an annoyance noise included in an ambient audio stream and retrieves, from the class table, one or more processing parameters associated with the identified annoyance noise class. The processor to processes the ambient audio stream according to the one or more retrieved processing parameters class to provide a personal audio stream. The processor includes a pitch tracker to identify a fundamental frequency of the annoyance noise and a filter bank including a band reject filter tuned to the fundamental frequency.Type: GrantFiled: March 24, 2017Date of Patent: March 17, 2020Assignee: Dolby Laboratories Licensing CorporationInventors: Gints Klimanis, Anthony Parks, Jeff Baker
-
Patent number: 10593330Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.Type: GrantFiled: October 26, 2018Date of Patent: March 17, 2020Assignee: Google LLCInventor: Matthew Sharifi
-
Patent number: 10546063Abstract: Natural language processing of raw text data for optimal sentence boundary placement. Raw text is extracted from a document and subject to cleaning. The extracted raw text is examined to identify preliminary sentence boundaries, which are used to identify potential sentences in the raw text. One or more potential sentences are assigned a well-formedness score. A value of the score correlates to whether the potential sentence is a truncated/ill-formed sentence or a well-formed sentence. One or more preliminary sentence boundaries are optimized depending on the value of the score of the potential sentence(s). Accordingly, the processing herein is an optimization that creates a sentence boundary optimized output.Type: GrantFiled: December 13, 2016Date of Patent: January 28, 2020Assignee: International Business Machines CorporationInventors: Charles E. Beller, Chengmin Ding, Allen Ginsberg, Elinna Shek
-
Patent number: 10304450Abstract: A method is implemented at an electronic device for visually indicating a voice processing state. The electronic device includes at least an array of full color LEDs, one or more microphones and a speaker. The electronic device collects via the one or more microphones audio inputs from an environment in proximity to the electronic device, and processes the audio inputs by identifying and/or responding to voice inputs from a user in the environment. A state of the processing is then determined from among a plurality of predefined voice processing states, and for each of the full color LEDs, a respective predetermined LED illumination specification is determined in association with the determined voice processing state. In accordance with the identified LED illumination specifications of the full color LEDs, the electronic device synchronizes illumination of the array of full color LEDs to provide a visual pattern indicating the determined voice processing state.Type: GrantFiled: May 10, 2017Date of Patent: May 28, 2019Assignee: GOOGLE LLCInventors: Jung Geun Tak, Amy Martin, Willard McClellan
-
Patent number: 10249292Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.Type: GrantFiled: December 14, 2016Date of Patent: April 2, 2019Assignee: International Business Machines CorporationInventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
-
Patent number: 10134398Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.Type: GrantFiled: November 9, 2016Date of Patent: November 20, 2018Assignee: Google LLCInventor: Matthew Sharifi
-
Patent number: 10056096Abstract: Provided herein is an electronic device and method of voice recognition, the method including analyzing an audio signal of a first frame when the audio signal is input and extracting a first feature value; determining a similarity between the first feature value extracted from the audio signal of the first frame and a first feature value extracted from an audio signal of a previous frame; analyzing the audio signal of the first frame and extracting a second feature value when the similarity is below a predetermined threshold value; and comparing the extracted first feature value and the second feature value and at least one feature value corresponding to a pre-defined voice signal and determining whether or not the audio signal of the first frame is a voice signal, and thus the electronic device may detect only a voice section from the audio signal while improving the processing speed.Type: GrantFiled: July 22, 2016Date of Patent: August 21, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Jong-uk Yoo
-
Patent number: 9916843Abstract: A voice processing apparatus including a memory, and a processor coupled to the memory and the processor configured to acquire a first input signal containing a first voice, and a second input signal containing a second voice, obtain a first signal intensity of the first input signal, and a second signal intensity of the second input signal, specify a correlation coefficient between a time sequence of the first signal intensity and a time sequence of the second signal intensity, determine whether the first voice and the second voice are in the conversation state or not based on the specified correlation coefficient, and output information indicating an association between the first voice and the second voice when it is determined that the first voice and the second voice are in the conversation state.Type: GrantFiled: September 15, 2016Date of Patent: March 13, 2018Assignee: FUJITSU LIMITEDInventors: Taro Togawa, Sayuri Kohmura, Takeshi Otani
-
Patent number: 9818407Abstract: An efficient audio streaming method and apparatus includes a client process implemented on a client or local device and a server process implemented on a remote server or server(s). The client process and server process each have speech recognition components and communicate over a network, and together efficiently manage the detection of speech in an audio signal streamed by the local device to the server for speech recognition and potentially further processing at the server. The client process monitors audio input and in a first detection stage, implements endpointing on the local device to determine when speech is detected. The client process may further determine if a “wakeword” is detected, and then the client process opens a connection and begins streaming audio to the server process via the network.Type: GrantFiled: February 7, 2013Date of Patent: November 14, 2017Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Hugh Evan Secker-Walker, Kenneth John Basye, Nikko Strom, Ryan Paul Thomas
-
Patent number: 9799332Abstract: A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.Type: GrantFiled: November 9, 2010Date of Patent: October 24, 2017Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Nam-Hoon Kim, Chi-Youn Park, Jeong-Mi Cho, Jeong-su Kim
-
Patent number: 9740690Abstract: Some embodiments include a computer-implement method of producing a flexible sentence syntax to facilitate one or more computer applications to generate and publish sentence expressions. For example, the method can include providing a developer interface to define a flexible sentence syntax that controls one or more sentences publishable by an application service. A developer of the application service can customize the flexible sentence syntax including selecting at least one of selectable tokens that is associated with another element to incorporate in the flexible sentence syntax. Based on the selected token, a computing device can generate and publish a target sentence according to the flexible sentence syntax on the application service's behalf.Type: GrantFiled: February 22, 2017Date of Patent: August 22, 2017Assignee: Facebook, Inc.Inventors: Ling Bao, Hugo Johan van Heuven, Jiangbo Miao
-
Patent number: 9601132Abstract: A method comprising: detect a first acoustic signal by using a microphone array; detecting a first angle associated with a first incident direction of the first acoustic signal; and storing, in a memory, a representation of the first acoustic signal and a representation of the first angle.Type: GrantFiled: February 17, 2016Date of Patent: March 21, 2017Assignee: Samsung Electronics Co., Ltd.Inventors: Beakkwon Son, Gangyoul Kim, Namil Lee, Hochul Hwang, Jongmo Kum, Minho Bae
-
Patent number: 9520140Abstract: Improved audio data processing method and systems are provided. Some implementations involve dividing frequency domain audio data into a plurality of subbands and determining amplitude modulation signal values for each of the plurality of subbands. A band-pass filter may be applied to the amplitude modulation signal values in each subband, to produce band-pass filtered amplitude modulation signal values for each subband. The band-pass filter may have a central frequency that exceeds an average cadence of human speech. A gain may be determined for each subband based, at least in part, on a function of the amplitude modulation signal values and the band-pass filtered amplitude modulation signal values. The determined gain may be applied to each subband.Type: GrantFiled: March 31, 2014Date of Patent: December 13, 2016Assignee: Dolby Laboratories Licensing CorporationInventors: Erwin Goesnar, Glenn N. Dickins, David Gunawan
-
Patent number: 9514752Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.Type: GrantFiled: April 1, 2016Date of Patent: December 6, 2016Assignee: Google Inc.Inventor: Matthew Sharifi
-
Patent number: 9443536Abstract: Disclosed are an apparatus and method of deducing a user's intention using motion information. The user's intention deduction apparatus includes a speech intention determining unit configured to predict a speech intention regarding a user's speech using motion information sensed by at least one motion capture sensor, and a controller configured to control operation of detecting a voice section from a received sound signal based on the predicted speech intention.Type: GrantFiled: April 29, 2010Date of Patent: September 13, 2016Assignee: Samsung Electronics Co., Ltd.Inventors: Jeong-Mi Cho, Jeong-Su Kim, Won-Chul Bang, Nam-Hoon Kim
-
Patent number: 9418650Abstract: In embodiments, apparatuses, methods and storage media are described that are associated with training adaptive speech recognition systems (“ASR”) using audio and text obtained from captioned video. In various embodiments, the audio and caption may be aligned for identification, such as according to a start and end time associated with a caption, and the alignment may be adjusted to better fit audio to a given caption. In various embodiments, the aligned audio and caption may then be used for training if an error value associated with the audio and caption demonstrates that the audio and caption will aid in training the ASR. In various embodiments, filters may be used on audio and text prior to training. Such filters may be used to exclude potential training audio and text based on filter criteria. Other embodiments may be described and claimed.Type: GrantFiled: September 25, 2013Date of Patent: August 16, 2016Assignee: Verizon Patent and Licensing Inc.Inventors: Sujeeth S. Bharadwaj, Suri B. Medapati
-
Patent number: 9318107Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.Type: GrantFiled: April 1, 2015Date of Patent: April 19, 2016Assignee: Google Inc.Inventor: Matthew Sharifi
-
Patent number: 9147400Abstract: The present invention relates to a method and apparatus for generating speaker-specific spoken passwords. One embodiment of a method for generating a spoken password for use by a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and incorporating the speech features in the spoken password.Type: GrantFiled: December 21, 2011Date of Patent: September 29, 2015Assignee: SRI INTERNATIONALInventor: Nicolas Scheffer
-
Patent number: 9123337Abstract: Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital auType: GrantFiled: March 11, 2014Date of Patent: September 1, 2015Assignee: Nuance Communications, Inc.Inventors: Charles W. Cross, Frank L. Jania
-
Patent number: 9043207Abstract: The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to the at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for the at least one target speaker to obtain at least one comparison result; and determining whether one of the at least one unknown speakers is identical with the at least one target speaker based on the at least one comparison result.Type: GrantFiled: November 12, 2009Date of Patent: May 26, 2015Assignee: Agnitio S.L.Inventors: Johan Nikolaas Langehoven Brummer, Luis Buera Rodriguez, Marta Garcia Gomar
-
Publication number: 20140379345Abstract: Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST. The apparatus in accordance with an embodiment of the present invention includes: a speech decision portion configured to receive frame units of feature vector converted from a speech signal and to analyze and classify the received feature vector into a speech class or a noise class; a frame level WFST configured to receive the speech class and the noise class and to convert the speech class and the noise class to a WFST format; a speech level WFST configured to detect a speech endpoint by analyzing a relationship between the speech class and noise class and a preset state; a WFST combination portion configured to combine the frame level WFST with the speech level WFST; and an optimization portion configured to optimize the combined WFST having the frame level WFST and the speech level WFST combined therein to have a minimum route.Type: ApplicationFiled: March 25, 2014Publication date: December 25, 2014Applicant: Electronic and Telecommunications Research InstituteInventors: Hoon CHUNG, Sung-Joo Lee, Yun-Keun Lee
-
Patent number: 8831942Abstract: A method is provided for identifying a gender of a speaker. The method steps include obtaining speech data of the speaker, extracting vowel-like speech frames from the speech data, analyzing the vowel-like speech frames to generate a feature vector having pitch values corresponding to the vowel-like frames, analyzing the pitch values to generate a most frequent pitch value, determining, in response to the most frequent pitch value being between a first pre-determined threshold and a second pre-determined threshold, an output of a male Gaussian Mixture Model (GMM) and an output of a female GMM using the pitch values as inputs to the male GMM and the female GMM, and identifying the gender of the speaker by comparing the output of the male GMM and the output of the female GMM based on a pre-determined criterion.Type: GrantFiled: March 19, 2010Date of Patent: September 9, 2014Assignee: Narus, Inc.Inventor: Antonio Nucci
-
Patent number: 8805685Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.Type: GrantFiled: August 5, 2013Date of Patent: August 12, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst J. Schroeter
-
Patent number: 8798991Abstract: A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.Type: GrantFiled: November 13, 2012Date of Patent: August 5, 2014Assignee: Fujitsu LimitedInventors: Nobuyuki Washio, Shoji Hayakawa
-
Patent number: 8793132Abstract: An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes: a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.Type: GrantFiled: December 26, 2007Date of Patent: July 29, 2014Assignee: Nuance Communications, Inc.Inventors: Nobuyasu Itoh, Gakuto Kurata
-
Patent number: 8781832Abstract: Techniques are disclosed for overcoming errors in speech recognition systems. For example, a technique for processing acoustic data in accordance with a speech recognition system comprises the following steps/operations. Acoustic data is obtained in association with the speech recognition system. The acoustic data is recorded using a combination of a first buffer area and a second buffer area, such that the recording of the acoustic data using the combination of the two buffer areas at least substantially minimizes one or more truncation errors associated with operation of the speech recognition system.Type: GrantFiled: March 26, 2008Date of Patent: July 15, 2014Assignee: Nuance Communications, Inc.Inventors: Liam D. Comerford, David Carl Frank, Burn L. Lewis, Leonid Rachevksy, Mahesh Viswanathan
-
Patent number: 8775182Abstract: Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.Type: GrantFiled: April 12, 2013Date of Patent: July 8, 2014Assignee: Intel CorporationInventors: Robert Du, Ye Tao, Daren Zu
-
Patent number: 8762149Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.Type: GrantFiled: December 10, 2008Date of Patent: June 24, 2014Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martín de los Santos de las Heras, Marta García Gomar
-
Publication number: 20140163986Abstract: Disclosed herein is a voice-based CAPTCHA method and apparatus which can perform a CAPTCHA procedure using the voice of a human being. In the voice-based CAPTCHA) method, a plurality of uttered sounds of a user are collected. A start point and an end point of a voice from each of the collected uttered sounds are detected and then speech sections are detected. Uttered sounds of the respective detected speech sections are compared with reference uttered sounds, and then it is determined whether the uttered sounds are correctly uttered sounds. It is determined whether the uttered sounds have been made by an identical speaker if it is determined that the uttered sounds are correctly uttered sounds.Type: ApplicationFiled: December 3, 2013Publication date: June 12, 2014Applicant: Electronics and Telecommunications Research InstituteInventors: Sung-Joo LEE, Ho-Young JUNG, Hwa-Jeon SONG, Eui-Sok CHUNG, Byung-Ok KANG, Hoon CHUNG, Jeon-Gue PARK, Hyung-Bae JEON, Yoo-Rhee OH, Yun-Keun LEE
-
Publication number: 20140149117Abstract: A system for distinguishing and identifying speech segments originating from speech of one or more relevant speakers in a predefined detection area. The system includes an optical system which outputs optical patterns, each representing audio signals as detected by the optical system in the area within a specific time frame; and a computer processor which receives each of the outputted optical patterns and analyses each respective optical pattern to provide information that enables identification of speech segments thereby, by identifying blank spaces in the optical pattern, which define beginning or ending of each respective speech segment.Type: ApplicationFiled: June 21, 2012Publication date: May 29, 2014Applicant: VOCALZOOM SYSTEMS LTD.Inventors: Tal Bakish, Gavriel Horowitz, Yekutiel Avargel, Yechiel Kurtz
-
Publication number: 20140129219Abstract: A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold.Type: ApplicationFiled: November 4, 2013Publication date: May 8, 2014Applicant: Intellisist, Inc.Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
-
Patent number: 8700399Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.Type: GrantFiled: July 6, 2010Date of Patent: April 15, 2014Assignee: Sensory, Inc.Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
-
Patent number: 8700406Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.Type: GrantFiled: August 19, 2011Date of Patent: April 15, 2014Assignee: Qualcomm IncorporatedInventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
-
Publication number: 20140046665Abstract: A system and method for enhancing visual representation to individuals participating in a conversation is provided. Visual data for a plurality of individuals participating in one or more conversations is analyzed. Possible conversational configurations of the individuals are generated. Each possible conversational configuration includes one or more pair-wise probabilities of at least two of the individuals. A probability weight is assigned to each of the pair-wise probabilities having a likelihood that the individuals of that pair-wise probability are participating in a conversation. A probability of each possible conversational configuration is determined by combining the probability weights for the pair-wise probabilities of that possible conversational configuration. The possible conversational configuration with the highest probability is selected as a most probable configuration.Type: ApplicationFiled: October 18, 2013Publication date: February 13, 2014Applicant: Palo Alto Research Center IncorporatedInventors: Paul M. Aoki, Margaret H. Szymanski, James Thornton, Daniel H. Wilson, Allison G. Woodruff
-
Patent number: 8635065Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extractedType: GrantFiled: November 10, 2004Date of Patent: January 21, 2014Assignee: Sony Deutschland GmbHInventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
-
Patent number: 8606569Abstract: The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.Type: GrantFiled: November 12, 2012Date of Patent: December 10, 2013Inventor: Alon Konchitsky
-
Patent number: 8595009Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.Type: GrantFiled: July 26, 2012Date of Patent: November 26, 2013Assignee: Dolby Laboratories Licensing CorporationInventors: Lie Lu, Claus Bauer
-
Patent number: 8571865Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.Type: GrantFiled: August 10, 2012Date of Patent: October 29, 2013Assignee: Google Inc.Inventor: Philip Hewinson
-
Patent number: 8554563Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: GrantFiled: September 11, 2012Date of Patent: October 8, 2013Assignee: Nuance Communications, Inc.Inventor: Hagai Aronowitz
-
Patent number: 8554562Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: GrantFiled: November 15, 2009Date of Patent: October 8, 2013Assignee: Nuance Communications, Inc.Inventor: Hagai Aronowitz
-
Patent number: 8554547Abstract: A voice activity detection method and apparatus, and an electronic device are provided. The method includes: obtaining a time domain parameter and a frequency domain parameter from an audio frame; obtaining a first distance between the time domain parameter and a long-term-sliding mean of the time domain parameter in a history background noise frame, and obtaining a second distance between the frequency domain parameter and a long-term-sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance and a set of decision inequalities based on the first distance and the second distance. The above technical solutions enable the judgment criterion to have an adaptive adjustment capability, thus improving the performance of the voice activity detection.Type: GrantFiled: July 11, 2012Date of Patent: October 8, 2013Assignee: Huawei Technologies Co., Ltd.Inventor: Zhe Wang