Patents Examined by Stephen Brinich
  • Patent number: 10255503
    Abstract: There is disclosed a method for generating movie recommendations, based on automatic extraction of features from a multimedia content, wherein the extracted features are visual features representing mise-en-scène characteristics of the movie defined on the basis of Applied Media Aesthetic theory, said extracted features being then fed to content-based recommendation algorithm in order to generate personalized recommendation.
    Type: Grant
    Filed: September 27, 2016
    Date of Patent: April 9, 2019
    Assignee: POLITECNICO DI MILANO
    Inventors: Paolo Cremonesi, Mehdi Elahi, Yashar Deldjoo
  • Patent number: 10229700
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting voice activity. In one aspect, a method include actions of receiving, by a neural network included in an automated voice activity detection system, a raw audio waveform, processing, by the neural network, the raw audio waveform to determine whether the audio waveform includes speech, and provide, by the neural network, a classification of the raw audio waveform indicating whether the raw audio waveform includes speech.
    Type: Grant
    Filed: January 4, 2016
    Date of Patent: March 12, 2019
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Gabor Simko, Maria Carolina Parada San Martin, Ruben Zazo Candil
  • Patent number: 10186255
    Abstract: A method for generating a language model for an organization includes: receiving, by a processor, organization-specific training data; receiving, by the processor, generic training data; computing, by the processor, a plurality of similarities between the generic training data and the organization-specific training data; assigning, by the processor, a plurality of weights to the generic training data in accordance with the computed similarities; combining, by the processor, the generic training data with the organization-specific training data in accordance with the weights to generate customized training data; training, by the processor, a customized language model using the customized training data; and outputting, by the processor, the customized language model, the customized language model being configured to compute the likelihood of phrases in a medium.
    Type: Grant
    Filed: August 25, 2016
    Date of Patent: January 22, 2019
    Inventors: Tamir Tapuhi, Amir Lev-Tov, Avraham Faizakof, Yochai Konig
  • Patent number: 10186282
    Abstract: Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.
    Type: Grant
    Filed: April 30, 2015
    Date of Patent: January 22, 2019
    Assignee: Apple Inc.
    Inventors: Devang K. Naik, Sachin Kajarekar
  • Patent number: 10170103
    Abstract: A method, a system, and a computer program product are provided for discriminatively training a feature-space transform. The method includes performing feature-space discriminative training (f-DT) on an initialized feature-space transform, using manually transcribed data, to obtain a pre-stage trained feature-space transform. The method further includes performing f-DT on the pre-stage trained feature-space transform as a newly initialized feature-space transform, using automatically transcribed data, to obtain a main-stage trained feature-space transform. The method additionally includes performing f-DT on the main-stage trained feature-space transform as a newly initialized feature-space transform, using manually transcribed data, to obtain a post-stage trained feature-space transform.
    Type: Grant
    Filed: January 22, 2016
    Date of Patent: January 1, 2019
    Assignee: International Business Machines Corporation
    Inventor: Takashi Fukuda
  • Patent number: 10127922
    Abstract: A sound source identification apparatus includes a sound collection unit including a plurality of microphones, a sound source localization unit configured to localize a sound source on the basis of an acoustic signal collected by the sound collection unit, a sound source separation unit configured to perform separation of the sound source on the basis of the signal localized by the sound source localization unit, and a sound source identification unit configured to perform identification of a type of sound source on the basis of a result of the separation in the sound source separation unit, and a signal input to the sound source identification unit is a signal having a magnitude equal to or greater than a first threshold value which is a predetermined value.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: November 13, 2018
    Assignee: HONDA MOTOR CO., LTD.
    Inventors: Kazuhiro Nakadai, Satoshi Uemura
  • Patent number: 10114817
    Abstract: Techniques for identifying multilingual cognates and using the multilingual cognates are provided. In one technique, multilingual cognates identified from multiple user profiles are used to train one or more translation models. In another technique, multilingual cognates identified from a single user's profile are used to translate text provided by that user. In another technique, multilingual cognates from a single user are used to align sentences in one language to sentences in another language and the aligned sentences are used to train a language model. In another technique, multilingual cognates identified from multiple user profiles are used to expand search queries. In another technique, multilingual cognates identified from multiple user profiles are used to translate other users' profiles into a target language so that users associated with a source language are viewing the other users' profiles.
    Type: Grant
    Filed: August 6, 2015
    Date of Patent: October 30, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bing Zhao, Kin Kan
  • Patent number: 10098604
    Abstract: A medical image processing device includes a port, a processor and a display. The port acquires a plurality of image data from a living body. The processor classifies the plurality of image dam to genes ate a plurality of image groups based OH a first time component. The first time component is defined by a first time interval among imaging times at which the plurality of image data are generated. The processor correlates each image data in one image groups with each image data in another image group, based on both an actual time and a time ratio of a second time component. The second time component is defined by a second time interval among the imaging times being shorter than the first time interval. The display displays images based on the plurality of image data based on the correlation of the image data in the image groups.
    Type: Grant
    Filed: September 27, 2016
    Date of Patent: October 16, 2018
    Assignee: ZIOSOFT, INC.
    Inventors: Kenichiro Yasuhiro, Shinichiro Seo
  • Patent number: 10078690
    Abstract: It is provided a method for triggering an action on a second device. It comprises the steps of obtaining audio of a multimedia content presented on a first device; comparing the obtained audio with reference audio data in a database; if finding the obtained audio exists in the database containing reference audio, determining an action corresponding to the matched reference audio; and triggering the action in the second device.
    Type: Grant
    Filed: December 31, 2011
    Date of Patent: September 18, 2018
    Assignee: Thomson Licensing DTV
    Inventors: Jianfeng Chen, Xiaojun Ma, Zhigang Zhang
  • Patent number: 10073828
    Abstract: Technology is described for refining a language model for a language recognition system based on aggregating and analyzing word tag metadata from multiple users of the language. The technology allows a user to mark a word or phrase in a selected language (e.g., as offensive or misspelled, or as a part of speech or other category), combines information collected from multiple users of the selected language, and updates the user's language model based on the combined information from multiple users of the selected language.
    Type: Grant
    Filed: February 27, 2015
    Date of Patent: September 11, 2018
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventors: Christopher Breske, Ethan Bradford, David Field, Wendy Bannister
  • Patent number: 10074380
    Abstract: Method for performing speech enhancement using a Deep Neural Network (DNN)-based signal starts with training DNN offline by exciting a microphone using target training signal that includes signal approximation of clean speech. Loudspeaker is driven with a reference signal and outputs loudspeaker signal. Microphone then generates microphone signal based on at least one of: near-end speaker signal, ambient noise signal, or loudspeaker signal. Acoustic-echo-canceller (AEC) generates AEC echo-cancelled signal based on reference signal and microphone signal. Loudspeaker signal estimator generates estimated loudspeaker signal based on microphone signal and AEC echo-cancelled signal. DNN receives microphone signal, reference signal, AEC echo-cancelled signal, and estimated loudspeaker signal and generates a speech reference signal that includes signal statistics for residual echo or for noise.
    Type: Grant
    Filed: August 3, 2016
    Date of Patent: September 11, 2018
    Assignee: Apple Inc.
    Inventors: Jason Wung, Ramin Pishehvar, Daniele Giacobello, Joshua D. Atkins
  • Patent number: 10062394
    Abstract: A system encourages experimentation with audio frequency and speaker technologies while causing an inanimate object to appear to lip-sync. The system applies a bandpass filter to an incoming audio stream to determine a magnitude of audio content in a frequency band of interest. For example, the system may filter results directed at the voice band, associated with speech. A controller controls a strobe light to flash at a particular point of travel of a platform reciprocating at a known frequency. An illusion is created that a sculpture, such as a piece of paper formed into a ring, is lip-synching to music.
    Type: Grant
    Filed: March 31, 2015
    Date of Patent: August 28, 2018
    Assignee: BOSE CORPORATION
    Inventor: Lee Zamir
  • Patent number: 10056076
    Abstract: According to some embodiments of the present invention there is provided a computerized method for speech processing using a Gaussian Mixture Model. The method comprises the action of receiving by hardware processor(s) two or more covariance values representing relationships between distributions of speech coefficient values that represent two or more audible input speech signals recorded by a microphone. The method comprises the action of computing two or more eigenvectors and eignevalues using a principle component analysis of the covariance values and transforming the speech coefficient values using the eigenvectors and computing two or more second covariance values from the transformed speech coefficient values. The method comprises the action of modifying some of the second covariance values according to the eignevalues, the covariance values, and two or more indices of the speech coefficient values. The second covariance values to the speech processor comprising the Gaussian Mixture Model.
    Type: Grant
    Filed: September 6, 2015
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventor: Hagai Aronowitz
  • Patent number: 10019992
    Abstract: A device includes a plurality of components, a memory having a keyword recognition module and a context recognition module, a microphone configured to receive an input speech spoken by a user, an analog-to-digital converter configured to convert the input speech from an analog form to a digital form and generate a digitized speech, and a processor. The processor is configured to detect, using the keyword recognition module, a keyword in the digitized speech, initiate, in response to detecting the keyword by the keyword recognition module, an action to be taken one of the plurality of components, wherein the keyword is associated with the action, determine, using the context recognition module, a context for the keyword, and execute the action if the context determined by the context recognition module indicates that the keyword is a command.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: July 10, 2018
    Assignee: Disney Enterprises, Inc.
    Inventors: Jill Fain Lehman, Samer Al Moubayed
  • Patent number: 10007478
    Abstract: A control system for a vehicle having a seat with a first moveable portion and an adjustment actuator coupled with the first moveable seat portion includes a voice input device, a touchscreen input device, and a controller in communication with the adjustment actuator, the voice input device, and the touchscreen input device. The controller has a processor programmed to interpret a first adjustment command from one of a voice command received from the voice input device and a manual command received from the touchscreen input device, carry out a first seat adjustment by causing the adjustment actuator to move the first moveable seat portion according to the first adjustment command, and to present information related to the first adjustment command on the touchscreen.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: June 26, 2018
    Assignee: Ford Global Technologies, LLC
    Inventors: Tony Wang, Frank Wu, Rose Tong
  • Patent number: 9990741
    Abstract: Motion correction is performed in time-of-flight (TOF) positron emission tomography (PET). Rather than applying motion correction to reconstructed images or as part of reconstruction, the motion correction is applied in the projection domain of the PET data. The TOF data from the PET scan is altered to account for the motion. The TOF data is altered prior to starting reconstruction. The motion in the patient or image domain is forward projected to provide motion in the projection domain of the TOF data. The projected motion of different phases is applied to the TOF data from different phases, respectively, to create a combined dataset of motion corrected TOF data representing the patient at a reference phase. The dataset is larger (e.g., similar size from projection data dimension point of view, but contains more counts per projection data unit or is more dense) than available at one phase of the physiological cycle and is then used in reconstruction.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: June 5, 2018
    Assignee: Siemens Medical Solutions USA, Inc.
    Inventor: Vladimir Y. Panin
  • Patent number: 9978370
    Abstract: One embodiment provides a method, including: receiving, from an audio capture device, speech input; converting, using a processor, the speech input to machine text; receiving, from an alternate input source, an input comprising at least one character; identifying, using a processor, a location associated with the machine text to insert the at least one character; and inserting, using a processor, the at least one character at the location identified. Other aspects are described and claimed.
    Type: Grant
    Filed: July 31, 2015
    Date of Patent: May 22, 2018
    Assignee: Lenovo (Singapore) Pte. Ltd.
    Inventors: Song Wang, Jianbang Zhang, Ming Qian, Jian Li
  • Patent number: 9959863
    Abstract: A method, which is performed by an electronic device, for obtaining a speaker-independent keyword model of a keyword designated by a user is disclosed. The method may include receiving at least one sample sound from the user indicative of the keyword. The method may also generate a speaker-dependent keyword model for the keyword based on the at least one sample sound, send a request for the speaker-independent keyword model of the keyword to a server in response to generating the speaker-dependent keyword model, and receive the speaker-independent keyword model adapted for detecting the keyword spoken by a plurality of users from the server.
    Type: Grant
    Filed: September 8, 2014
    Date of Patent: May 1, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Minsub Lee, Taesu Kim, Sungrack Yun
  • Patent number: 9916846
    Abstract: A system and method for determining an amount of speech in an audio signal may include for example: obtaining segments of the audio signal, wherein the segments are grouped into blocks; for each one of the segments, calculating a segment value indicative of an amplitude of the audio signal of a respective segment; for each one of the blocks calculating a block value indicative of the amplitude of the audio signal of a respective block; and calculating an audio signal speech grade based on segment values and block values, wherein the audio signal speech grade is indicative of the amount of speech in the audio signal.
    Type: Grant
    Filed: February 10, 2015
    Date of Patent: March 13, 2018
    Assignee: NICE LTD.
    Inventors: Frits Lassche, Ivar Meijer, Victor Bastiaan Mosch, Steven St. John Logan, Jurgen Willem Wessel, Gerardus B. J. Stam
  • Patent number: 9911416
    Abstract: A method for controlling an electronic device in response to speech spoken by a user is disclosed. The method may include receiving an input sound by a sound sensor. The method may also detect the speech spoken by the user in the input sound, determine first characteristics of a first frequency range and second characteristics of a second frequency range of the speech in response to detecting the speech in the input sound, and determine whether a direction of departure of the speech spoken by the user is toward the electronic device based on the first and second characteristics.
    Type: Grant
    Filed: March 27, 2015
    Date of Patent: March 6, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Sungrack Yun, Taesu Kim, Duck Hoon Kim, Kyuwoong Hwang