Detect Speech In Noise Patents (Class 704/233)

Audio input system, audio input apparatus, and recording medium therefor

Patent number: 10331403

Abstract: An audio input system includes an audio input apparatus and a plurality of electronic devices. The audio input apparatus includes an audio input unit, a start instruction accepting unit, and an instruction transmitter. The plurality of electronic devices includes a first electronic device. The first electronic device is a target of a process execution instruction based on the audio input by the audio input unit among the instructions. The instruction transmitter transmits a process reducing instruction for reducing an execution of a process as the instruction to the plurality of electronic devices, when an input sound volume of a microphone exceeds a specific sound volume. The audio input unit starts the audio input when the input sound volume of the microphone is equal to or less than the specific sound volume. The instruction transmitter transmits the process execution instruction to the first electronic device.

Type: Grant

Filed: February 28, 2018

Date of Patent: June 25, 2019

Assignee: Kyocera Document Solutions Inc.

Inventor: Kazuki Dozen
Method, device, and system for audio data processing

Patent number: 10325612

Abstract: A method and apparatus that filters audio data received from a speaking person that includes a specific filter for that speaker. The audio characteristics of the speaker's voice may be collected and the specific filter may be formed to reduce noise while also enhancing voice quality. For instance, if a speaker's voice does not contain specific frequencies, then a filter may cancel the noise at such frequencies to ease noise cancellation and reduce processing sound spectrum for cleaning that is not needed. Additionally, the strength frequencies of a speaker's voice may be identified from the collected audio characteristics and those spectrums can be filtered with finer granularity to provide a speaker specific filter that enhances the voice quality of the speaker's voice data that is transmitted or output by a communication device. The audio data may also be output based upon a user's predefined hearing spectrum.

Type: Grant

Filed: August 1, 2017

Date of Patent: June 18, 2019

Assignee: Unify GmbH & Co. KG

Inventors: Bizhan Karimi-Cherkandi, Farrokh Mohammadzadeh Kouchri, Schah Walli Ali
Method and system of estimating clean speech parameters from noisy speech parameters

Patent number: 10319377

Abstract: A method and system is provided for estimating clean speech parameters from noisy speech parameters. The method is performed by acquiring speech signals, estimating noise from the acquired speech signals, computing speech features from the acquired speech signals, estimating model parameters from the computed speech features and estimating clean parameters from the estimated noise and the estimated model parameters.

Type: Grant

Filed: February 28, 2017

Date of Patent: June 11, 2019

Assignee: Tata Consultancy Services Limited

Inventors: Ashish Panda, Sunil Kumar Kopparapu
Speech recognition system and gain setting system

Patent number: 10319393

Abstract: When an instruction to start voice input is received from the user, a gain controller acquires, from a gain table which defines a correspondence between vehicle speed ranges and gains, a gain corresponding to a vehicle speed range including the vehicle speed of a vehicle detected by a vehicle speed detector, and sets the acquired gain as the gain of an input amplifier that amplifies an input audio signal output by a microphone. As a gain corresponding to each vehicle speed range, the gain table records a gain of the input amplifier corresponding, in an experimentally determined frequency distribution of peak values in the vehicle speed range, to a maximum frequency in the range of magnitude of voice output as an input audio signal by the microphone and to be input to a speech recognition engine as voice having a magnitude within the input range of the speech recognition engine.

Type: Grant

Filed: July 27, 2016

Date of Patent: June 11, 2019

Assignee: ALPINE ELECTRONICS, INC.

Inventors: Hirokazu Suzuki, Toru Marumoto
Information processing device, information processing method, computer program product, and recognition system

Patent number: 10319373

Abstract: An information processing device includes a phonetic converting unit, an HMM converting unit, and a searching unit. The phonetic converting unit converts a phonetic symbol sequence into a hidden Markov model (HMM) state sequence in which states of an HMM are aligned. The HMM converting unit converts the HMM state sequence into a score vector sequence indicating the degree of similarity to a specific pronunciation using a similarity matrix defining the similarity between the states of the HMM. The searching unit searches for a path having a better score for the score vector sequence than that of the other paths out of paths included in a search network and outputs a phonetic symbol sequence corresponding to the retrieved path.

Type: Grant

Filed: December 23, 2016

Date of Patent: June 11, 2019

Assignee: Kabushiki Kaisha Toshiba

Inventor: Manabu Nagao
Vehicle having dynamic acoustic model switching to improve noisy speech recognition

Patent number: 10297251

Abstract: An automatic speech recognition system for a vehicle includes a controller configured to select an acoustic model from a library of acoustic models based on ambient noise in a cabin of the vehicle and operating parameters of the vehicle. The controller is further configured to apply the selected acoustic model to noisy speech to improve recognition of the speech.

Type: Grant

Filed: January 21, 2016

Date of Patent: May 21, 2019

Assignee: Ford Global Technologies, LLC

Inventors: Ali Hassani, Scott Andrew Amman, Francois Charette, Brigitte Frances Mora Richardson, Gintaras Vincent Puskorius, An Ji, Ranjani Rangarajan, John Edward Huber
Selective sound storage device

Patent number: 10297283

Abstract: A computer-readable medium, controller and a method of automatically recording a sound signal is provided. A sound signal is received by the controller from a sound generating device. A frequency of the received sound signal is determined by the controller. When the determined frequency is within a predetermined frequency range, the controller starts recording the received sound signal.

Type: Grant

Filed: December 30, 2014

Date of Patent: May 21, 2019

Assignee: Gibson Brands, Inc.

Inventor: Shota Terai
Information handling system having acoustic noise reduction

Patent number: 10290294

Abstract: An information handling system includes a processor configured to operate in one of a plurality of power states. An audio circuit measures an ambient audio environment within the information handling system, classifies the measured ambient audio into one of a plurality of categories, and implements a power management policy for the processor in response to the measured ambient audio being classified into the one of the categories.

Type: Grant

Filed: November 9, 2017

Date of Patent: May 14, 2019

Assignee: Dell Products, LP

Inventors: Ray V. Kacelenga, Merle J. Wood, III, Travis C. North
Context sensitive rule-based alerts for fraud monitoring

Patent number: 10284724

Abstract: According to embodiments of the present invention, various computer implemented methods are provided for generating context sensitive alerts for the purposes of fraud detection and prevention. According to one embodiment, a communication (e.g., a call) is received at (or initiated by) a workstation by a human representative or agent. Verbal (voice) communication between the agent and the third party customer is monitored and converted into text. The agent's activity in applications executing in the workstation is also tracked. The combination of converted text and the activity of the agent is evaluated in a behavior engine to detect inconsistencies between the activity of the agent and authorized or typical activity. When inconsistencies are detected, security actions are performed to alert administrators of potential fraud and, in extreme cases, further action by the agent may be prevented in the workstation to prevent additional fraud.

Type: Grant

Filed: April 5, 2017

Date of Patent: May 7, 2019

Assignee: TELEPERFORMANCE SE

Inventor: Lyle Hardy
Media capture and process system

Patent number: 10276155

Abstract: A user media device may include a microphone array and a communication interface. The microphone array may include an omnidirectional microphone and a directional microphone. The microphone array may be selectively switchable. The communication interface may communicatively couple the user media device with a computer and may transmit audio captured by the microphone array to the computer for transfer to a remote service. The remote service may generate text of the processed audio via natural language processing. The remote service may further perform semantic reasoning of the processed audio via a semantic reasoning engine. The remote service may also generate content based at least in part on the semantic reasoning performed on the processed audio. The curated content may include a report having results of the semantic reasoning organized to demonstrate the results in a meaningful way with respect to the processed audio.

Type: Grant

Filed: December 22, 2016

Date of Patent: April 30, 2019

Assignee: FUJITSU LIMITED

Inventor: James Montantes
Wirelessly powered audio devices

Patent number: 10264352

Abstract: Techniques herein provide wireless energy transfer to audio devices such as headphones, headsets, hearing aids, and the like. Audio devices are integrated with a device resonator. The device resonator may be positioned and oriented to reduce interaction with lossy or sensitive components of the audio device. A repeater resonator and/or a source resonator is integrated into a headrest of a seat or a chair providing continuous power to the headphones while in use. The audio devices may be recharged wirelessly when positioned near source resonators that may be embedded in pads, tables, carrying cases, cups, and the like.

Type: Grant

Filed: January 9, 2017

Date of Patent: April 16, 2019

Assignee: WiTricity Corporation

Inventors: Steven J. Ganem, Hiroshi A. Mendoza, Morris P. Kesler, Konrad J. Kulikowski, Andre B. Kurs, Alexander P. McCauley, Eric R. Giler, Katherine L. Hall, Gozde Guckaya
Emotion estimation apparatus using facial images of target individual, emotion estimation method, and non-transitory computer readable medium

Patent number: 10255487

Abstract: A speech determiner determines whether or not a target individual is speaking when facial images of the target individual are captured. An emotion estimator estimates the emotion of the target individual using the facial images of the target individual, on the basis of the determination results of the speech determiner.

Type: Grant

Filed: September 26, 2016

Date of Patent: April 9, 2019

Assignee: CASIO COMPUTER CO., LTD.

Inventors: Takashi Yamaya, Kouichi Nakagome, Katsuhiko Satoh
Tailoring beamforming techniques to environments

Patent number: 10249299

Abstract: Techniques for tailoring beamforming techniques to environments such that processing resources may be devoted to a portion of an audio signal corresponding to a lobe of a beampattern that is most likely to contain user speech. The techniques take into account both acoustic characteristics of an environment and heuristics regarding lobes that have previously been found to include user speech.

Type: Grant

Filed: May 1, 2017

Date of Patent: April 2, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Gregory Michael Hart, Kavitha Velusamy, William Spencer Worley, III
Sound reproduction device including auditory scenario simulation

Patent number: 10250960

Abstract: A sound reproduction device includes a signal processing chain configured to render an acoustic useful signal for reproduction to a listener, a simulation scenario processor configured to provide auditory scenario information for a simulated auditory scenario, the simulated auditory scenario influencing perception, by the listener, of the reproduction of the useful signal and/or defining a useful signal type, a user interface configured to detect reproduction parameter settings from a user which represent an individual preference of the listener in view of the simulated auditory scenario, a signal modifier configured to receive the reproduction parameter settings and modify reproduction of the useful signal in dependence on the reproduction parameter settings, and a storage provided for storing the reproduction parameter setting and the auditory scenario information relative to one another.

Type: Grant

Filed: May 12, 2016

Date of Patent: April 2, 2019

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Jens Ekkehart Appell, Jan Rennies-Hochmuth
Adaptive directional audio enhancement and selection

Patent number: 10250975

Abstract: A hands-free audio device has a receive path and a transmit path, which may operate at different audio sampling rates. The transmit path has an interference suppressor that receives a reference signal from the receive path and that suppresses interference in microphone signals received from a microphone array. The interference suppressor is followed in the transmit path by a multi-channel adaptive beamformer that produce a plurality of directional audio signals. A beam selector is configured to select one of the directional audio signals based on voice activity, echo detection, and signal energy.

Type: Grant

Filed: October 6, 2017

Date of Patent: April 2, 2019

Assignee: Amazon Technologies, Inc.

Inventor: Jun Yang
Speaker dependent voiced sound pattern detection thresholds

Patent number: 10242677

Abstract: Various implementations disclosed herein include a training module configured to determining a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection. In some implementations, a method includes obtaining segment templates characterizing a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, each segment template provides a stochastic characterization of how a particular portion of the VSP is vocalized by a particular speaker; generating a noisy segment matrix using a second subset of the plurality of vocalization instances of the VSP, wherein the noisy segment matrix includes one or more noisy copies of segment representations of the second subset; scoring segments from the noisy segment matrix against the segment templates; and determining detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring.

Type: Grant

Filed: August 25, 2015

Date of Patent: March 26, 2019

Assignee: MALASPINA LABS (BARBADOS), INC.

Inventor: Alexander Escott
Determining customer service quality through digitized voice characteristic measurement and filtering

Patent number: 10244113

Abstract: Methods and apparatuses are described for determining customer service quality through digitized voice characteristic measurement and filtering. A voice analysis module captures a first digitized voice segment corresponding to speech submitted by a user of a remote device. The voice analysis module extracts a first set of voice features from the first voice segment, and determines an emotion level of the user based upon the first set of voice features. The voice analysis module captures a second digitized voice segment corresponding to speech submitted by the user. The voice analysis module extracts a second set of voice features from the second voice segment, and determines a change in the emotion level of the user by comparing the first set of voice features to the second set of voice features. The module normalizes the change in the emotion level of the user using emotion influence factors, and generates a service score.

Type: Grant

Filed: April 26, 2016

Date of Patent: March 26, 2019

Assignee: FMR LLC

Inventors: Jason Kao, Xinxin Sheng, Bahram Omidfar, Erkang Zheng
Media capture and process system

Patent number: 10235995

Abstract: A user media device may include a microphone array and a communication interface. The microphone array may include an omnidirectional microphone and a directional microphone. The microphone array may be selectively switchable. The communication interface may communicatively couple the user media device with a computer and may transmit audio captured by the microphone array to the computer for transfer to a remote service. The remote service may generate text of the processed audio via natural language processing. The remote service may further perform semantic reasoning of the processed audio via a semantic reasoning engine. The remote service may also generate content based at least in part on the semantic reasoning performed on the processed audio. The curated content may include a report having results of the semantic reasoning organized to demonstrate the results in a meaningful way with respect to the processed audio.

Type: Grant

Filed: December 22, 2016

Date of Patent: March 19, 2019

Assignee: FUJITSU LIMITED

Inventor: James Montantes
Contextual sound filter

Patent number: 10235128

Abstract: An embodiments of a contextual sound apparatus may include a sound identifier to identify a sound, a context identifier to identify a context, and an action identifier communicatively coupled to the sound identifier and the context identifier to identify an action based on the identified sound and the identified context. Other embodiments are disclosed and claimed.

Type: Grant

Filed: May 19, 2017

Date of Patent: March 19, 2019

Assignee: Intel Corporation

Inventors: Robert L. Vaughn, James B. Eynard
Voice input exception determining method, apparatus, terminal, and storage medium

Patent number: 10231070

Abstract: A voice input exception determining method, an apparatus, a terminal, and a storage medium are provided. The method is applied to an electronic device including an audio collection module, and includes determining whether an amplitude value of an audio signal collected by the audio collection module is less than a preset amplitude threshold and/or whether energy distribution of the audio signal meets a preset condition; and if the amplitude value of the audio signal is less than the preset amplitude threshold and/or the energy distribution of the audio signal does not meet the preset condition, determining that voice input of the electronic device is abnormal. A solution provided in the present disclosure resolves a problem that there is no effective method for determining a sound reception exception caused when a sound reception hole of the electronic device is blocked.

Type: Grant

Filed: April 29, 2016

Date of Patent: March 12, 2019

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Lin Yang, Zhaoyang Yin, Jingwen Yang
Methods and apparatus for speech segmentation using multiple metadata

Patent number: 10229686

Abstract: Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata.

Type: Grant

Filed: August 18, 2014

Date of Patent: March 12, 2019

Assignee: NUANCE COMMUNICATIONS, INC.

Inventors: Markus Buck, Tobias Herbig, Simon Graf, Christophe Ris
Method for locating a sound source, and humanoid robot using such a method

Patent number: 10222447

Abstract: A method for locating a sound source by maximizing a directed response strength calculated for a plurality of vectors of the interauricular time differences forming a set comprises: a first subset of vectors compatible with sound signals from a single sound source at an unlimited distance from the microphones; and a second subset of vectors not compatible with sound signals from a signal sound source at an unlimited distance from the microphones. Each vector of the first subset is associated with a direction for locating the corresponding single sound source, and each vector of the second subset is associated with the locating direction of a vector of the first subset closest thereto according to a predefined metric. A humanoid robot including: a set of at least three microphones, arranged on a surface higher than the head of thereof; and a processor for implementing one such method is provided.

Type: Grant

Filed: September 29, 2014

Date of Patent: March 5, 2019

Assignee: SOFTBANK ROBOTICS EUROPE

Inventor: Grégory Rump
Electronic device and speech recognition method thereof

Patent number: 10217477

Abstract: An electronic device and a speech recognition method that is capable of adjusting an end-of-utterance detection period dynamically are disclosed. The electronic device includes a microphone, a display, an input device formed as a part of the display or connected to the electronic device as a separate device, a processor electrically connected to the microphone, the display, and the input device, and a memory electrically connected to the processor. The memory stores instructions, executable by the processor, for receiving an utterance input by a user through the microphone, converting the utterance to text comprised of a series of words or phrases with spaces, displaying the text on the display, the text comprising at least one space formed at an incorrect position, and receiving a user input for updating a predetermined time period through the input device.

Type: Grant

Filed: October 31, 2016

Date of Patent: February 26, 2019

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sungwoon Jang, Sangwook Shin, Sungwan Youn
Controlling an audio system

Patent number: 10210857

Abstract: A method of controlling an audio system comprises: receiving an audio signal, and applying a first gain to the audio signal and outputting an amplified audio signal. On receiving a user input to increase the first gain applied to the audio signal, if the first gain is at a first threshold value, the method comprises: receiving an ambient noise signal, processing the ambient noise signal with a second gain value and outputting a noise cancellation signal, and changing the second gain value in response to the user input.

Type: Grant

Filed: October 18, 2017

Date of Patent: February 19, 2019

Assignee: Cirrus Logic, Inc.

Inventors: Nigel Burgess, Mark Allan Watts, Darren Holding
Speech recognition using associative mapping

Patent number: 10204619

Abstract: Methods, systems, and apparatus are described that receive audio data for an utterance. Association data is accessed that indicates associations between data corresponding to uncorrupted audio segments, and data corresponding to corrupted versions of the uncorrupted audio segments, where the associations are determined before receiving the audio data for the utterance. Using the association data and the received audio data for the utterance, data corresponding to at least one uncorrupted audio segment is selected. A transcription of the utterance is determined based on the selected data corresponding to the at least one uncorrupted audio segment.

Type: Grant

Filed: February 22, 2016

Date of Patent: February 12, 2019

Assignee: Google LLC

Inventors: Olivier Siohan, Pedro J. Moreno Mengibar
Directional audio selection

Patent number: 10194259

Abstract: Various implementations include wearable audio devices and related methods for controlling such devices. In some particular implementations, a computer-implemented method of controlling a wearable audio device includes: receiving an initiation command to initiate a spatial audio mode; providing a plurality of audio samples corresponding with spatially delineated zones in an array defined relative to a physical position of the wearable audio device, in response to the initiation command, where each audio sample is associated with a source of audio content; receiving a selection command selecting one of the plurality of audio samples; and initiating playback of the source of audio content associated with the selected audio sample.

Type: Grant

Filed: February 28, 2018

Date of Patent: January 29, 2019

Assignee: BOSE CORPORATION

Inventors: Keith Dana Martin, Todd Richard Reily, Mark Raymond Blewett, Daniel M. Gauger, Jr.
System and method for customized voice response

Patent number: 10192547

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.

Type: Grant

Filed: April 21, 2016

Date of Patent: January 29, 2019

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventor: Nicholas Duffield
System and method for characterizing an upper airway using speech characteristics

Patent number: 10182787

Abstract: The present invention relates to systems and methods for characterizing at least one anatomical parameter of an upper airway of a patient by analysing spectral properties of an utterance, comprising: a mechanical coupler comprising means for restricting the jaw position of the patient; means for recording an utterance; and processing means for determining at least one anatomical parameter of the upper airway from the recorded utterance and comparing the recorded utterance to a threshold value. In addition the present invention relates to the use of the above mentioned systems as a diagnostics tool for assessing obstructive sleep apnea.

Type: Grant

Filed: October 11, 2012

Date of Patent: January 22, 2019

Assignee: KONINKLIJKE PHILIPS N.V.

Inventors: Stijn De Waele, Stefan Winter, Alexander Cornelis Geerlings
Spoken utterance stop event other than pause or cessation in spoken utterances stream

Patent number: 10186263

Abstract: Speech recognition of a stream of spoken utterances is initiated. Thereafter, a spoken utterance stop event to stop the speech recognition is detected, such as in in relation to the stream. The spoken utterance stop event is other than a pause or cessation in the stream of spoken utterances. In response to the spoken utterance stop event being detected, the speech recognition of the stream of spoken utterances is stopped, while the stream of spoken utterances continues. After stopping the speech recognition of the stream of spoken utterances has been stopped, an action is caused to be performed that corresponds to the spoken utterances from a beginning of the stream through and until the spoken utterance stop event.

Type: Grant

Filed: August 30, 2016

Date of Patent: January 22, 2019

Assignee: Lenovo Enterprise Solutions (Singapore) PTE. LTD.

Inventors: Amy Leigh Rose, John Scott Crowe, Gary David Cudak, Jennifer J. Lee-Baron, Nathan J. Peterson, Bryan L. Young
Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and/or generating sanitized anomaly detection models

Patent number: 10178113

Abstract: Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and generating anomaly detection models are provided. In some embodiments, methods for sanitizing anomaly detection models are provided. The methods including: receiving at least one abnormal anomaly detection model from at least one remote location; comparing at least one of the at least one abnormal anomaly detection model to a local normal detection model to produce a common set of features common to both the at least one abnormal anomaly detection model and the local normal detection model; and generating a sanitized normal anomaly detection model by removing the common set of features from the local normal detection model.

Type: Grant

Filed: July 13, 2015

Date of Patent: January 8, 2019

Assignee: The Trustees of Columbia University in the City of New York

Inventors: Gabriela F. Ciocarlie, Angelos Stavrou, Salvatore J. Stolfo, Angelos D. Keromytis
In-call virtual assistants

Patent number: 10134395

Abstract: Techniques for providing virtual assistants to assist users during a voice communication between the users. For instance, a first user operating a device may establish a voice communication with respective devices of one or more additional users, such as with a device of a second user. For instance, the first user may utilize her device to place a telephone call to the device of the second user. A virtual assistant may also join the call and, upon invocation by a user on the call, may identify voice commands from the call and may perform corresponding tasks for the users in response.

Type: Grant

Filed: September 25, 2013

Date of Patent: November 20, 2018

Assignee: Amazon Technologies, Inc.

Inventor: Marcello Typrin
Power estimation system and method

Patent number: 10123221

Abstract: A communication device can be configured to estimate a Reference-Signal-Received-Power (RSRP). The communication device can include a transceiver and a controller. The transceiver can be configured to downsample a received signal having a plurality of reference signal resource elements to generate a downsampled signal. The downsampling can alias a first of the plurality of reference signal resource elements into a second of the plurality of reference signal resource elements to generate an aliased reference signal resource element. The transceiver can also be configured to extract the aliased reference signal resource element from the downsampled signal. The controller can be connected to the transceiver and be configured to estimate the RSRP based on the extracted aliased reference signal resource element.

Type: Grant

Filed: September 23, 2016

Date of Patent: November 6, 2018

Assignee: Intel IP Corporation

Inventors: Matthew Hayes, Denis Markovic, Viswanath Vajepeyazula
Video categorization method and apparatus, and storage medium

Patent number: 10115019

Abstract: A video may be categorized into a picture category or a video category. A key frame of the video includes a face and a face feature in the key frame is obtained. Face features respectively associated with a plurality of picture categories are acquired and the video is assigned to one of the picture categories based on a comparison of the key frame face feature and the face features of the picture categories. Videos may first be associated with a video category by comparing key frame face features from the videos, and then the video category may be assigned to a picture category based on comparison of a video category face feature with a plurality of picture category face features. Alternatively, a video may be assigned to a picture category based on matching capture times and capture locations between the video and a reference picture in the picture category.

Type: Grant

Filed: August 19, 2016

Date of Patent: October 30, 2018

Assignee: Xiaomi Inc.

Inventors: Zhijun Chen, Wendi Hou, Fei Long
Methods and apparatus for speech recognition using visual information

Patent number: 10109277

Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.

Type: Grant

Filed: April 27, 2015

Date of Patent: October 23, 2018

Assignee: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
Direction based end-pointing for speech recognition

Patent number: 10102850

Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

Type: Grant

Filed: February 25, 2013

Date of Patent: October 16, 2018

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Kenneth John Basye, Jeffrey Penrod Adams
Analog voice activity detection

Patent number: 10090005

Abstract: According to some embodiments, an analog processing portion may receive an audio signal from a microphone. The analog processing portion may then convert the audio signal into sub-band signals and estimate an energy statistic value, such as a Signal-to-Noise Ratio (“SNR”) value, for each sub-band signal. A classification element may classify the estimated energy statistic values with analog processing such that a wakeup signal is generated when voice activity is detected. The wakeup signal may be associated with, for example, a battery-powered, always-listening audio application.

Type: Grant

Filed: March 10, 2017

Date of Patent: October 2, 2018

Assignee: ASPINITY, INC.

Inventors: Brandon David Rumberg, David W. Graham
Selective wakeup of digital sensing and processing systems using reconfigurable analog circuits

Patent number: 10088893

Abstract: According to some embodiments, a sensor network may be provided with re-programmable and/or reconfigurable analog circuitry configured to monitor data collected by the sensor network. The re-programmable and/or reconfigurable analog circuitry may also generate a wakeup signal in response to a defined wakeup event detected by the sensor network.

Type: Grant

Filed: May 12, 2017

Date of Patent: October 2, 2018

Inventors: Vinod Kulathumani, David W. Graham, Brandon David Rumberg
Low bit rate signal coder and decoder

Patent number: 10084475

Abstract: An improved mixed oscillator-and-external excitation model and methods for estimating the model parameters, for evaluating model quality, and for combining it with known in the art methods are disclosed. The improvement over existing oscillators allows the model to receive, as an input, all except the most recent point in the acquired data. Model stability is achieved through a process which includes restoring unavailable to the decoder data from the optimal model parameters and by using metrics to select a stable restored model output. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech, audio, and image data, and for analysis, detection, and classification of signals. Operations can be performed in real time, and parameterization can be achieved at a user-specified level of compression.

Type: Grant

Filed: October 28, 2011

Date of Patent: September 25, 2018

Inventors: Irina Gorodnitsky, Anton Yen
Voice control system, voice control method, and computer readable medium

Patent number: 10083710

Abstract: A voice control system including a voice receiving unit, an image capturing unit, a storage unit and a control unit is disclosed. The voice receiving unit receives a voice. The image capturing unit captures a video image stream including several human face images. The storage unit stores the voice and the video image stream. The control unit is electrically connected to the voice receiving unit, the image capturing unit and the storage unit. The control unit detects a feature of a human face from the human face images, defines a mouth motion detection region from the feature of the human face, and generates a control signal according to a variation of the mouth motion detection region and a variation of the voice over time. A voice control method, a computer program product and a computer readable medium are also disclosed.

Type: Grant

Filed: May 17, 2016

Date of Patent: September 25, 2018

Assignee: BXB Electronics Co., Ltd.

Inventors: Kai-Sheng Chiou, Chih-Lin Hung, Chung-Nan Lee, Chao-Wen Wu
Method and apparatus for identifying acoustic background environments based on time and speed to enhance automatic speech recognition

Patent number: 10083687

Abstract: Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.

Type: Grant

Filed: October 16, 2017

Date of Patent: September 25, 2018

Assignee: NUANCE COMMUNICATIONS, INC.

Inventor: Mazin Gilbert
Method and device for presenting content

Patent number: 10078690

Abstract: It is provided a method for triggering an action on a second device. It comprises the steps of obtaining audio of a multimedia content presented on a first device; comparing the obtained audio with reference audio data in a database; if finding the obtained audio exists in the database containing reference audio, determining an action corresponding to the matched reference audio; and triggering the action in the second device.

Type: Grant

Filed: December 31, 2011

Date of Patent: September 18, 2018

Assignee: Thomson Licensing DTV

Inventors: Jianfeng Chen, Xiaojun Ma, Zhigang Zhang
Voice-based communications

Patent number: 10074369

Abstract: Systems, methods, and devices for escalating voice-based interactions via speech-controlled devices are described. Speech-controlled devices capture audio, including wakeword portions and payload portions, for sending to a server to relay messages between speech-controlled devices. In response to determining the occurrence of an escalation event, such as repeated messages between the same two devices, the system may automatically change a mode of a speech-controlled device, such as no longer requiring a wakeword, no longer requiring an indication of a desired recipient, or automatically connecting two speech-controlled devices in a voice-chat mode. In response to determining the occurrence of further escalation events, the system may initiate a real-time call between the speech-controlled devices.

Type: Grant

Filed: September 1, 2016

Date of Patent: September 11, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Christo Frank Devaraj, Manish Kumar Dalmia, Tony Roy Hardie, Ran Mokady, Nick Ciubotariu, Sandra Lemon
Providing an indication of the suitability of speech recognition

Patent number: 10074360

Abstract: This relates to providing an indication of the suitability of an acoustic environment for performing speech recognition. One process can include receiving an audio input and determining a speech recognition suitability based on the audio input. The speech recognition suitability can include a numerical, textual, graphical, or other representation of the suitability of an acoustic environment for performing speech recognition. The process can further include displaying a visual representation of the speech recognition suitability to indicate the likelihood that a spoken user input will be interpreted correctly. This allows a user to determine whether to proceed with the performance of a speech recognition process, or to move to a different location having a better acoustic environment before performing the speech recognition process.

Type: Grant

Filed: August 24, 2015

Date of Patent: September 11, 2018

Assignee: Apple Inc.

Inventor: Yoon Kim
Active noise cancelation with controllable levels

Patent number: 10049653

Abstract: A system including an automatic noise canceling (ANC) headphone and a processor. The ANC headphone has a microphone configured to generate a microphone signal and at least two non-zero ANC gain levels. The processor is configured to receive the microphone signal, determine a characteristic of the microphone signal, identify a revised ANC level from the ANC gain levels based on a comparison of the characteristic to at least one threshold, and output a signal corresponding to the revised ANC level. Methods are also disclosed.

Type: Grant

Filed: October 14, 2016

Date of Patent: August 14, 2018

Assignee: AVNERA CORPORATION

Inventors: Amit Kumar, Eric Sorensen
Speaker and call characteristic sensitive open voice search

Patent number: 10032454

Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.

Type: Grant

Filed: June 25, 2015

Date of Patent: July 24, 2018

Assignee: Nuance Communications, Inc.

Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
Arbitration between voice-enabled devices

Patent number: 10026399

Abstract: Architectures and techniques for selecting a voice-enabled device to handle audio input that is detected by multiple voice-enabled devices are described herein. In some instances, multiple voice-enabled devices may detect audio input from a user at substantially the same time, due to the voice-enabled devices being located within proximity to the user. The architectures and techniques may analyze a variety of audio signal metric values for the voice-enabled devices to designate a voice-enabled device to handle the audio input.

Type: Grant

Filed: September 11, 2015

Date of Patent: July 17, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Ramya Gopalan, Shiva Kumar Sundaram
Real-time audio source separation using deep neural networks

Patent number: 10014002

Abstract: Methods and systems for audio source separation in real-time are described. In an embodiment, the present disclosure describes reading and decoding an audio source into PCM samples, fragmenting Pulse Code Modulation (PCM) samples into fragments, transforming fragments into spectrograms, performing audio source separation using a deep neural network (DNN) to generate an estimated magnitude spectrogram of the component(s) of the audio source, reconstructing the estimated time domain component signals, and streaming the component signals to a playback engine. In an embodiment, a semantic equalizer graphical user allows for real-time mixing of individual component signals.

Type: Grant

Filed: October 24, 2017

Date of Patent: July 3, 2018

Assignee: Red Pill VR, Inc.

Inventors: Alejandro Koretzky, Karthiek Reddy Bokka, Naveen Sasalu Rajashekharappa
Method and apparatus for separating speech data from background data in audio communication

Patent number: 9990936

Abstract: A method and an apparatus for separating speech data from background data in an audio communication are suggested. The method comprises: applying a speech model to the audio communication for separating the speech data from the background data of the audio communication; and updating the speech model as a function of the speech data and the background data during the audio communication.

Type: Grant

Filed: October 12, 2015

Date of Patent: June 5, 2018

Assignee: THOMSON Licensing

Inventors: Alexey Ozerov, Quang Khanh Ngoc Duong, Louis Chevallier
Automatic speech recognition using multi-dimensional models

Patent number: 9984683

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for automatic speech recognition using multi-dimensional models. In some implementations, audio data that describes an utterance is received. A transcription for the utterance is determined using an acoustic model that includes a neural network having first memory blocks for time information and second memory blocks for frequency information. The transcription for the utterance is provided as output of an automated speech recognizer.

Type: Grant

Filed: July 22, 2016

Date of Patent: May 29, 2018

Assignee: Google LLC

Inventors: Bo Li, Tara N. Sainath
Apparatus and method for improved signal fade out in different domains during error concealment

Patent number: 9978378

Abstract: An apparatus for decoding an audio signal is provided, having a receiving interface, configured to receive a first frame having a first audio signal portion of the audio signal, and configured to receive a second frame having a second audio signal portion of the audio signal; a noise level tracing unit, wherein the noise level tracing unit is configured to determine noise level information depending on at least one of the first audio signal portion and the second audio signal portion; a first reconstruction unit for reconstructing, in a first reconstruction domain, a third audio signal portion of the audio signal depending on the noise level information; a transform unit for transforming the noise level information to a second reconstruction domain; and a second reconstruction unit for reconstructing, in the second reconstruction domain, a fourth audio signal portion of the audio signal depending on the noise level information.

Type: Grant

Filed: December 21, 2015

Date of Patent: May 22, 2018

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Michael Schnabel, Goran Markovic, Ralph Sperschneider, Jérémie Lecomte, Christian Helmrich

prev 1 2 3 4 5 6 7 8 9 … next