Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.
Abstract: According to an embodiment, an information processing apparatus includes a dividing unit, an assigning unit, and a generating unit. The dividing unit is configured to divide speech data into pieces of utterance data. The assigning unit is configured to assign speaker identification information to each piece of utterance data based on an acoustic feature of the each piece of utterance data. The generating unit is configured to generate a candidate list that indicates candidate speaker names so as to enable a user to determine a speaker name to be given to the piece of utterance data identified by instruction information, based on operation history information in which at least pieces of utterance identification information, pieces of the speaker identification information, and speaker names given by the user to the respective pieces of utterance data are associated with one another.
Abstract: Methods, apparatus, and computer-readable media are described herein related to a user interface (UI) that can be implemented on a head-mountable device (HMD). The UI can include a voice-navigable UI. The voice-navigable UI can include a voice navigable menu that includes one or more menu items. The voice-navigable UI can also present a first visible menu that includes at least a portion of the voice navigable menu. In response to a first utterance comprising one of the one or more menu items, the voice-navigable UI can modify the first visible menu to display one or more commands associated with the first menu item. In response to a second utterance comprising a first command, the voice-navigable UI can invoke the first command. In some embodiments, the voice-navigable UI can display a second visible menu, where the first command can be displayed above other menu items in the second visible menu.
Abstract: This is directed to processing voice inputs received by an electronic device. In particular, this is directed to receiving a voice input and identifying the user providing the voice input. The voice input can be processed using a subset of words from a library used to identify the words or phrases of the voice input. The particular subset can be selected such that voice inputs provided by the user are more likely to include words from the subset. The subset of the library can be selected using any suitable approach, including for example based on the user's interests and words that relate to those interests. For example, the subset can include one or more words related to media items selected by the user for storage on the electronic device, names of the user's contacts, applications or processes used by the user, or any other words relating to the user's interactions with the device.
Abstract: Embodiments related to recognizing speech inputs are disclosed. One disclosed embodiment provides a method for recognizing a speech input including receiving depth information of a physical space from a depth camera, determining an identity of a user in the physical space based on the depth information, receiving audio information from one or more microphones, and determining a speech input from the audio input. If the speech input comprises an ambiguous term, the ambiguous term in the speech input is compared to one or more of depth image data received from the depth image sensor and digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term. After identifying the unambiguous term, an action is taken on the computing device based on the speech input and the unambiguous term.
Abstract: A method on an electronic device is described. A set of graphics and a set of stored keywords are received in a higher-power mode of operation, each graphic corresponding to one or more of the stored keywords. The higher-power mode of operation is discontinued to enter a lower-power mode of operation. In the lower-power mode: audio signals are listened for; it is detected whether any keywords of the set of stored keywords are present in the audio signals; detected keywords present in the audio signals are stored; a graphic is selected from the set of graphics based on a comparison between the set of detected keywords and the set of stored keywords; a first portion of the selected graphic is displayed, in response to the at least one control signal, on a first area of a touch screen display that is within a first portion of the touch screen display.
Type:
Grant
Filed:
August 9, 2013
Date of Patent:
November 10, 2015
Assignee:
Google Technology Holdings LLC
Inventors:
Michael J. Lombardi, Mitul R. Patel, Amber M. Pierce, Natalie J. Stevens
Abstract: An audio-based system may perform dynamic level adjustment by detecting voice activity in an input signal and evaluating voice levels during periods of voice activity. The current voice level is compared to a plurality of thresholds to determine a corresponding gain strategy, and the input signal is scaled in accordance with this gain strategy. Further adjustment to the signal is performed to reduce output clipping that might otherwise be produced.
Abstract: A system, a method, and a computer-program product for providing multi-language support in applications are disclosed. A first textual expression contained within an application is obtained. The first textual expression is expressed in a first language. A unique key from a hash of the first textual expression is generated. A language code representative of a second language is determined. Based on the generated unique key and the determined language code, a second textual expression in the second language representative of a translation from the first language into the second language indicated by the language code is determined. The second textual expression is provided to the application to replace the first textual expression in a view presented to a user.
Type:
Grant
Filed:
December 17, 2012
Date of Patent:
October 27, 2015
Assignee:
SAP SE
Inventors:
Frank Brunswig, Frank Jentsch, Bare Said
Abstract: An example of identifying tasks and commitments can include receiving a communication message. A task and a parameter can be identified in the communication message. Information related to the task can be extracted from the communication message using natural language processing (NLP) and machine learning (ML). A commitment related to the task can be identified using NLP extracted information. A state of the commitment can be identified using NLP and ML based on the extracted information.
Type:
Grant
Filed:
January 29, 2013
Date of Patent:
October 27, 2015
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: Embodiments are disclosed that relate to the use of speech inputs including indefinite quantitative terms as computing device inputs. For example, one disclosed embodiment provides a method of operating a computing device, the method including receiving a speech input comprising an indefinite quantitative term, determining a definite quantity corresponding to the indefinite quantitative term, and applying the definite quantity to an action performed via the computing device in response to the speech input.
Abstract: An audio-based system may perform automatic noise reduction to enhance speech intelligibility in an audio signal. Described techniques include initially analyzing audio frames in the time domain to identify frames having relatively low power levels. Those frames are then further analyzed in the frequency domain to estimate noise. For example, the initially identified frames may be analyzed at each of multiple frequencies to detect the lowest exhibited power at each of those frequencies. The lowest power values are used as an estimation of noise across the frequency spectrum, and as the basis for calculating a spectral gain for filtering the audio signal in the frequency domain.
Abstract: A method for encoding three dimensional audio by a wireless communication device is disclosed. The wireless communication device detects an indication of a plurality of localizable audio sources. The wireless communication device also records a plurality of audio signals associated with the plurality of localizable audio sources. The wireless communication device also encodes the plurality of audio signals.
Abstract: Systems, methods, and computer-readable storage media for generating personalized tag recommendations using speech analytics. The system first analyzes an audio stream to identify topics in the audio stream. Next, the system identifies tags related to the topics to yield identified tags. Based on the identified tags, the system then generates a tag recommendation for tagging the audio stream. The system can also send the tag recommendation to a device associated with a user for presentation to the user.
Abstract: A method for automatically identifying voice tags on an electronic device. After failure to initiate a communication using a voice input command, the user may then subsequently contact the recipient using an application program of the electronic device. The original audio of the voice input command may be identified as a potential voice tag for the now-identified recipient. The method includes: receiving, through a voice interface program, a voice input command, the voice input command including a command element and a content element; ending the voice interface program without performing the voice input command; receiving, through an application program, a user input which identifies data for executing an application program command; performing the application program command; and identifying audio of the content element as a voice tag associated with the data identified by the user input.
Abstract: An apparatus includes a breath sensor including a film configured to sense a variation in electrical impedance based on a moisture gradient and output the sensed variation as an output signal; and a controller configured to process the output signal from the breath sensor. The apparatus is configured to receive the output signal from the breath sensor and provide a signal in response thereto.
Type:
Grant
Filed:
January 23, 2013
Date of Patent:
September 29, 2015
Assignee:
Nokia Technologies Oy
Inventors:
Richard White, Jani Kivioja, Andrew Peter Matthews, Michael Astley, Stefano Marco Borini
Abstract: Systems for receiving, analyzing, and organizing audio content contained within a plurality of media files are disclosed. The systems generally include a server that is configured to receive, index, and store a plurality of media files, which are received by the server from a plurality of sources, within at least one database in communication with the server. The server is further configured to organize audio content included within each of the plurality of media files into a bipartite graph, wherein the bipartite graph will include vertices that are correlated with a specific media file or an individual who is associated with a specific media file. These vertices will comprise edges that are labeled with a word that is detected from within the audio content of the media file.
Abstract: Certain aspects of the present disclosure relate to techniques for low-complexity encoding (compression) of broad class of signals, which are typically not well modeled as sparse signals in either time-domain or frequency-domain. First, the signal can be split in time-segments that may be either sparse in time domain or sparse in frequency domain, for example by using absolute second order differential operator on the input signal. Next, different encoding strategies can be applied for each of these time-segments depending in which domain the sparsity is present.
Type:
Grant
Filed:
August 30, 2011
Date of Patent:
September 15, 2015
Assignee:
QUALCOMM Incorporated
Inventors:
Pawan Kumar Baheti, Harinath Garudadri, Yuejie Chi
Abstract: Provided is an orthographical variant detection apparatus which detects orthographical variant candidates with a high precision. The orthographical variant detection apparatus includes a term extraction unit that extracts terms from document data, a similarity computation unit that computes similarity of an arbitrary pair of the extracted terms, an orthographical variant candidate determination unit that determines, based on the similarity, whether or not the terms in the pair of terms are orthographical variant candidates, and a group classification unit that groups the orthographical variant candidates based on a character string commonly included in pair of terms as the orthographical variant candidates.
Abstract: A speech segment determination device includes a frame division portion, a power spectrum calculation portion, a power spectrum operation portion, a spectral entropy calculation portion and a determination portion. The frame division portion divides an input signal in units of frames. The power spectrum calculation portion calculates, using an analysis length, a power spectrum of the input signal for each of the frames that have been divided. The power spectrum operation portion adds a value of the calculated power spectrum to a value of power spectrum in each of frequency bins. The spectral entropy calculation portion calculates spectral entropy using the power spectrum whose value has been increased. The determination portion determines, based on a value of the spectral entropy, whether the input signal is a signal in a speech segment.
Abstract: A speech enhancement system controls the gain of an excitation signal to prevent uncontrolled gain adjustments. The system includes a first device that converts sound waves into operational signals. An ambient noise estimator is linked to the first device and an echo canceller. The ambient noise estimator estimates how loud a background noise would be near the first device before or after an echo cancellation. The system then compares the ambient noise estimate to a current ambient noise estimate near the first device to control a gain of an excitation signal.