Abstract: Generally the present disclosure is directed to appliances that provide a user-specific response to a received voice command. In particular, the appliance can store a plurality of voice samples respectively associated with a plurality of users. The appliance can also store one or more preferences for each of the plurality of users. For example, the preferences can be input by the user and/or learned or inferred over time. When the appliance receives a human speech signal or voice command, it can match the received speech signal against one or more of the plurality of voice samples to identify the user. The preferences stored and associated with the identified user can then be obtained and the appliance can perform any requested operations in accordance with the obtained preferences. In such fashion, the appliance can provide a user-specific response to a received voice command.
Type:
Grant
Filed:
October 30, 2013
Date of Patent:
August 2, 2016
Assignee:
Haier US Appliance Solutions, Inc.
Inventors:
William Everette Gardner, Joel Erik Hitzelberger, Keith Wesley Wait
Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.
Abstract: A method of providing access to/delivering textual content on the internet or data bases, the method comprising the steps of inputting at a user device, a search query or request for a webpage or data base results, identifying the content to be delivered to the user; identifying one or more web pages or off line data base corresponding to the query/request; parsing a first identified web page or data base result to extract textual content from the first web page or data base; inputting some or all of the extracted textual content to a text-to-speech synthesiser to generate an audio output; further inputting some or all of extracted textual content to an animation unit configured to synchronise the generated audio output with one or more predetermined animation sequences to provide the visual output of an animated figure delivering the audio output; displaying, at the user device the visual output of the animated figure reading the extracted textual content.
Abstract: The application provides a method and system for determinism in non-linear systems for speech processing, particularly automatic speech segmentation for building speech recognition systems. More particularly, the application enables a method and system for detecting boundary of coarticulated units from isolated speech using recurrence plot.
Type:
Grant
Filed:
July 18, 2012
Date of Patent:
July 5, 2016
Assignee:
Tata Consultancy Services Limited
Inventors:
Mohd Bilal Arif Syed, Arijit Sinharay, Tanushyam Chattopadhyay
Abstract: Provided are an encoding apparatus and a decoding apparatus of a multi-channel signal. The encoding apparatus of the multi-channel signal may process a phase parameter associated with phase information between a plurality of channels constituting the multi-channel signal, based on a characteristic of the multi-channel signal. The encoding apparatus may generate an encoded bitstream with respect to the multi-channel signal using the processed phase parameter and a mono signal extracted from the multi-channel signal.
Abstract: A system and method for determining whether a textual work submitted for publishing is machine generated or non-machine generated by identifying and quantifying various aspects of the textual work and comparing those aspects to known works. For example, the system and method may identify aspects of a textual work, including, a relationship between the sentences within the textual work, a writing style of the author of the textual work, a grammatical structure of the sentences within the textual work, a quality of the textual work, and other aspects of the textual work. Upon determining that the textual work is machine generated the textual work may be rejected for publishing.
Type:
Grant
Filed:
December 19, 2012
Date of Patent:
June 21, 2016
Assignee:
Amazon Technologies, Inc.
Inventors:
Mitsuo Takaki, Divya Mahalingam, David Gordon Leatham, David Rezazadeh Azari
Abstract: The present disclosure is directed towards a process for estimating the signal to noise ratio of a speech signal. The process may include receiving, at a computing device, a speech signal having a bitstream and a signal-to-noise ratio (“SNR”) associated therewith. The process may further include estimating the SNR directly from the bitstream or using a partial decoder that is configured to extract one or more parameters, the parameters including at least one of a fixed codebook gain, an adaptive codebook gain, a pitch lag, and a line spectral frequency (“LSF”) coefficient.
Type:
Grant
Filed:
July 2, 2014
Date of Patent:
June 7, 2016
Assignee:
Nuance Communications, Inc.
Inventors:
Jose Lainez, Daniel A. Barreda, Dushyant Sharma, Patrick Naylor, Sridhar Pilli
Abstract: The present invention provides a method and system directed to predicting implicit rhetorical relations between two spans of text, e.g., in a large annotated corpus, such as the Penn Discourse Treebank (“PDTB”), Rhetorical Structure Theory corpus, and the Discourse Graph Bank, and particularly directed to determining a rhetorical relation in the absence of an explicit discourse marker. Surface level features may be used to capture pragmatic information encoded in the absent marker. In one manner a simplified feature set based only on raw text and semantic dependencies is used to improve performance for all relations. By using surface level features to predict implicit rhetorical relations for the large annotated corpus the invention approaches a theoretical maximum performance, suggesting that more data will not necessarily improve performance based on these and similarly situated features.
Abstract: An aspect provides a method, identifying: receiving, at an input component of an information handling device, user input comprising one or more words; identifying, using a processor of the information handling device, an emotion associated with the one or more words; creating, using the processor, an emotion tag including the emotion associated with the one or more words; and storing the emotion tag in a memory. Other embodiments are described and claimed.
Type:
Grant
Filed:
October 30, 2013
Date of Patent:
May 17, 2016
Assignee:
Lenovo (Singapore) Pte. Ltd.
Inventors:
Suzanne Marion Beaumont, Russell Speight VanBlon, Rod D. Waltermann
Abstract: A voice processing apparatus includes: a dividing unit which divides a voice signal into frames in such a manner that any two successive frames overlap each other by a predetermined amount; a first windowing unit which multiplies each frame by a first windowing function that attenuates a signal at both ends of the frame; an orthogonal transform unit which computes a frequency spectrum for each frame multiplied by the first windowing function; a frequency signal processing unit which computes a corrected frequency spectrum; an inverse orthogonal transform unit which computes a corrected frame by applying an inverse orthogonal transform to the corrected frequency spectrum; a second windowing unit which multiplies each corrected frame by a second windowing function that attenuates a signal at both ends of the corrected frame; and an addition unit which adds up the each corrected frame multiplied by the second windowing function, sequentially in time order.
Abstract: A system and method is provided for voice activated Web based infrastructure (Voice Portal) which accepts spoken input from a variety of devices, including desktop and laptop computers, tablets, smart phones, standard mobile phones, and ordinary hard-wired telephones.
Abstract: A method for providing information on the validity of encoded audio data is disclosed, the encoded audio data being a series of coded audio data units. Each coded audio data unit can include information on the valid audio data. The method includes: providing either information on a coded audio data level which describes the amount of data at the beginning of an audio data unit being invalid, or providing information on a coded audio data level which describes the amount of data at the end of an audio data unit being invalid, or providing information on a coded audio data level which describes both the amount of data at the beginning and the end of an audio data unit being invalid. A method for receiving encoded data including information on the validity of data and providing decoded output data is also disclosed. Furthermore, a corresponding encoder and a corresponding decoder are disclosed.
Type:
Grant
Filed:
October 11, 2012
Date of Patent:
April 26, 2016
Assignee:
FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWAN
Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
Abstract: The voice clarification apparatus includes a plurality of band-pass filters that respectively extract a plurality of band components, which are included in a voice band, from an input audio signal; a gain determination unit that determines a gain according to the level of a signal of a band component which is extracted by at least one band-pass filter of the plurality of band-pass filters; a level adjustment unit that adjusts the levels of signals of the plurality of band components which are extracted by the plurality of band-pass filters using the gain; and a first addition unit that adds a signal which is based on the audio signal to a signal in which the gain is adjusted by the level adjustment unit, and outputs a signal obtained through the addition.
Abstract: A method for switching a current information providing mode is provided, wherein the method comprises steps as follows: user context information related to a user device is firstly collected. A current user context of the user device is then identified in accordance with the collected user context information, so as to an identified consequence data is the generated. A current information providing mode suitable for the current user context of the user device is subsequently switched according to the consequence data.
Type:
Grant
Filed:
May 23, 2014
Date of Patent:
April 12, 2016
Assignee:
TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Inventors:
Wei Li, Bo Hu, Ting-Yong Tang, Ying Huang, Hui-Jiao Yang, Kai Zhang, Rui-Yi Zhou, Zheng-Kai Xie, Cheng Feng, Zhi-Pei Wang, Xi Wang, Yu-Lei Liu
Abstract: A source language sentence is tagged with non-lexical tags, such as part-of-speech tags and is parsed using a lexicalized parser trained in the source language. A target language sentence that is a translation of the source language sentence is tagged with non-lexical labels (e.g., part-of speech tags) and is parsed using a delexicalized parser that has been trained in the source language to produce k-best parses. The best parse is selected based on the parse's alignment with lexicalized parse of the source language sentence. The selected best parse can be used to update the parameter vector of a lexicalized parser for the target language.
Abstract: A computer-implemented method can include initializing, at a computing device including one or more processors, an input method editor for composing an electronic message. The method can include receiving, at the computing device, an input from a user identifying a recipient for the electronic message. The method can include obtaining, at the computing device, language information corresponding to the recipient, the language information indicating one or more suggested natural languages for composing the electronic message. The method can include selecting, at the computing device, a natural language for composing the electronic message based on the language information to obtain a selected natural language. The method can also include configuring, at the computing device, the input method editor based on the selected natural language.
Type:
Grant
Filed:
October 30, 2013
Date of Patent:
March 8, 2016
Assignee:
Google Inc.
Inventors:
Jean-Michel Roland Trivi, Bjorn Erik Bringert
Abstract: A system and method for measuring sound is described. In one embodiment frequency-banded-noise samples, which collectively cover at least a portion of a spectrum, are sequentially generated at different points in time, and a baseline sound-pressure-level reading for each of the frequency banded noise samples is received. Using data received from a microphone, a sound pressure level reading is generated for each of the frequency banded noise samples. Calibration data is then produced for the microphone as a function of a difference between each of the baseline sound-pressure-level readings and a corresponding one of each of the generated sound pressure level readings for each of the frequency banded noise samples.
Abstract: Methods, systems, and apparatuses are described for performing speaker-identification-assisted speech processing in an uplink path of a communication device. In accordance with certain embodiments, a communication device includes speaker identification (SID) logic that is configured to identify the identity of a near-end speaker. Knowledge of the identity of the near-end speaker is then used to improve the performance of one or more uplink speech processing algorithms implemented on the communication device.
Type:
Grant
Filed:
October 31, 2013
Date of Patent:
February 23, 2016
Assignee:
Broadcom Corporation
Inventors:
Juin-Hwey Chen, Jes Thyssen, Elias Nemer, Bengt J. Borgstrom, Ashutosh Pandey, Robert W. Zopf
Abstract: A vehicle based system and method for receiving voice inputs and determining whether to perform a voice recognition analysis using in-vehicle resources or resources external to the vehicle.
Type:
Grant
Filed:
June 24, 2011
Date of Patent:
February 16, 2016
Assignee:
Honda Motor Co., Ltd.
Inventors:
Ritchie Winson Huang, Pedram Vaghefinazari, Stuart Yamamoto