Abstract: Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language.
Abstract: The present invention provides a system and method for representing quasi-periodic (“qp”) waveforms comprising, representing a plurality of limited decompositions of the qp waveform, wherein each decomposition includes a first and second amplitude value and at least one time value. In some embodiments, each of the decompositions is phase adjusted such that the arithmetic sum of the plurality of limited decompositions reconstructs the qp waveform. These decompositions are stored into a data structure having a plurality of attributes. Optionally, these attributes are used to reconstruct the qp waveform, or patterns or features of the qp wave can be determined by using various pattern-recognition techniques. Some embodiments provide a system that uses software, embedded hardware or firmware to carry out the above-described method. Some embodiments use a computer-readable medium to store the data structure and/or instructions to execute the method.
Abstract: A computer system generates a vector space model based on an ontology of concepts. One or more training examples are extracted for one or more concepts of a hierarchical ontology, wherein the one or more training examples for the one or more concepts are based on neighboring concepts in the hierarchical ontology. A plurality of vectors, each including one or more features, are initialized, wherein each vector corresponds to a concept of the one or more concepts. A vector space model is generated by iteratively modifying one or more vectors of the plurality of vectors to optimize a loss function. Natural language processing is performed using the vector space model. Embodiments of the present invention further include a method and program product for generating a vector space model in substantially the same manner described above.
Type:
Grant
Filed:
August 20, 2019
Date of Patent:
November 16, 2021
Assignee:
International Business Machines Corporation
Inventors:
Brendan Bull, Paul L. Felt, Andrew G. Hicks
Abstract: A method is described comprising receiving a conversational transcript of a conversational interaction among a plurality of participants, wherein each participant contributes a sequence of contributions to the conversational interaction. The method includes projecting contributions of the plurality of participants into a semantic space using a natural language vectorization, wherein the semantic space describes semantic relationships among words of the conversational interaction. The method includes computing interaction process measures using information of the conversational transcript, the conversational interaction, and the natural language vectorization.
Abstract: There is a need for more effective and efficient natural language processing. This need can be addressed by, for example, solutions for performing/executing natural language processing using hybrid document embedding. In one example, a method includes identifying a natural language document associated with one or more document attributes, wherein the natural language document comprises one or more natural language words; determining an attribute-based document embedding for the natural language document, wherein the attribute-based document embedding is generated based on a document vector for the natural language document and a word vector for each natural language word of the one or more natural language words; processing the attribute-based document embedding using a predictive inference model to determine one or more document-related predictions for the natural language document; and performing one or more prediction-based actions based on the one or more document-related predictions.
Abstract: A clustering method, clustering program, and clustering device are described herein for clustering of words with multiple meanings based on generating vectors for each meaning of a word. The generated vectors provide a distributed representation of a word in a vector space that account for the multiple meanings of the word so as, for instance, to learn semantic representations with high accuracy.
Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
Abstract: A method includes performing, with at least one processing device, natural language understanding by iteratively (i) generating a semantic word and clause representation and (ii) generating a syntax. The generation of the semantic word and clause representation and the generation of the syntax occur iteratively such that (i) semantics are calculated from syntax by aggregating weights of syntactically-labeled context in which words or clauses appear and (ii) syntax is calculated from semantics by grouping common pairs of words or clauses with similar semantic relations, thereby producing a self-consistent coupled notion of syntax and semantics.
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for announcing and detecting automated conversation are disclosed. One of the methods includes initiating, over a natural language communication channel, a conversation with a communication participant using a natural language communication method that includes a dialogue of natural language communications. The communication participant is determined to be automated using a pre-defined adaptive interactive protocol that specifies natural language linguistic transformations defined in a sequence. The conversation can be transitioned to a communication method that is different form the natural language communication method in response to determining that the communication participant is automated.
Abstract: A system and method for generating event timelines by analyzing natural language texts from a textual dataset is provided. In one or more examples, the systems and methods can ingest a textual dataset and generate a visual timeline that illustrates the sequence of events contained within the textual dataset and approximately when in time each event in the textual dataset occurred. In one or more examples, machine learning classifiers can be employed to automatically extract event trigger words and time mentions in the textual dataset and anchor the extracted event trigger words to points in time expressed on the timeline. Machine learning classifiers can be employed to extract event trigger words from the textual dataset, relate the extracted event trigger words to one or more time mentions in the textual dataset, and to relate the extracted event trigger words to one or more document creation times found within the textual dataset.
Type:
Grant
Filed:
September 29, 2017
Date of Patent:
September 28, 2021
Assignee:
The MITRE Corporation
Inventors:
Ransom K. Winder, Joseph P. Jubinski, Christopher Giannella
Abstract: Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
Type:
Grant
Filed:
May 13, 2019
Date of Patent:
September 21, 2021
Assignee:
Smule, Inc.
Inventors:
Parag Chordia, Mark Godfrey, Alexander Rae, Prerna Gupta, Perry R. Cook
Abstract: Provided are a vector quantization device, a voice coding device, a vector quantization method, and a voice coding method which enable a reduction in the calculation amount of voice codec without deterioration of voice quality. In the vector quantization device, a first reference vector calculation unit (201) calculates a first reference vector by multiplying a target vector (x) by an auditory weighting LPC synthesis filter (H), and a second reference vector calculation unit (202) calculates a second reference vector by multiplying an element of the first reference vector by a filter having a high pass characteristic. A polarity preliminary selection unit (205) generates a polar vector by disposing a unit pulse having a positive or negative polarity, which is selected on the basis of the polarity of an element of the second reference vector, in the position of said element.
Type:
Grant
Filed:
January 3, 2019
Date of Patent:
September 7, 2021
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Abstract: A single-bit audio stream can be converted to a modified single-bit audio stream with a constant edge rate while maintaining a modulation index of the original audio stream using direct mapping. With direct mapping, a pre-filter bank may be combined with a multi-bit symbol mapper to select symbols for the modified audio stream with a constant edge rate per symbol and the same modulation index as the original audio stream. The output of the pre-filter bank may be an audio stream with no consecutive full-scale symbols. Using the output of the pre-filter bank, a multi-bit symbol mapper may use the symbol selector to output a symbol with a constant edge rate per symbol and the same modulation index as the original signal. The symbols may be converted to an analog signal for reproduction of audio content using a transducer.
Type:
Grant
Filed:
August 2, 2019
Date of Patent:
August 31, 2021
Assignee:
Cirrus Logic, Inc.
Inventors:
Shafagh Kamkar, Dylan A. Hester, Bruce E. Duewer
Abstract: An assistance device that more appropriately identifies a user using speech. The assistance device is provided with a speech detector, a name acquiring section that acquires a name of a person based on speech detected by the speech detector, and a user identifying section that identifies a care receiver who is a user of the assistance device based on the name acquired by the name acquiring device.
Abstract: A system and a method for detecting a simulated Emergency Alert Signal (EAS) are disclosed. The method includes detecting, by a first detector, one or more tones in a plurality of audio frames. Further, the method includes detecting, by a second detector, one or more beeps in the plurality of audio frames. Thereafter, the method includes detecting, by a third detector, at least one emergency word in the plurality of audio frames based at least on the detected one or more tones or the detected one or more beeps, and thereby detecting the simulated EAS.
Abstract: Described is a voice dialogue system that includes a voice input unit which acquires a user utterance, an intention understanding unit which interprets an intention of utterance of a voice acquired by the voice input unit, a dialogue text creator which creates a text of a system utterance, and a voice output unit which outputs the system utterance as voice data. When creating a text of a system utterance, the dialogue text creator creates the text by inserting a tag in a position in the system utterance, and the intention understanding unit interprets an utterance intention of a user in accordance with whether a timing at which the user utterance is made is before or after an output of a system utterance at a position corresponding to the tag from the voice output unit.
Abstract: A real-time agreement comprehension tool is described. Initially, a user is selected as a signing party to an agreement. A document deployment system enables a computing device associated with the user to access the agreement. The computing device presents the agreement via a display device for digital signing by the user. While the agreement is presented, a voice assistant platform obtains a query from the user about at least a portion of the agreement. Responsive to the query, an agreement comprehension tool of the computing device determines an answer to the query by processing a limited set of documents that are relevant to the portion of the agreement. This limited set of documents includes a corpus of documents corresponding to the authoring organization and previous agreements with which the signing user has interacted. The agreement comprehension tool then causes the answer to be presented for display and/or audibly output.
Type:
Grant
Filed:
March 22, 2019
Date of Patent:
July 27, 2021
Assignee:
Adobe Inc.
Inventors:
Jonathan David Herbach, Saurabh Khurana, Ben Sidney Tepfer
Abstract: A text recognition method and apparatus, and a storage medium are provided. The method includes: obtaining sample text data, the sample text data comprising a plurality of sample phrases; and generating a recognition model based on the sample phrases by performing training on a plurality of training nodes. Generating the recognition model includes respectively obtaining, by each of the plurality of training nodes, recognition coefficients of the sample phrases distributed to the corresponding training node; and determining, by the plurality of training nodes, model parameters of the recognition model according to the recognition coefficients of the sample phrases. The method also includes obtaining to-be-recognized text data; inputting the text data to the recognition model; and obtaining recognized target text data output by the recognition model and corresponding to the text data.
Type:
Grant
Filed:
November 30, 2018
Date of Patent:
July 20, 2021
Assignee:
TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Abstract: Aspects of the disclosure are related to synthesizing speech or other audio based on input data. Additionally, aspects of the disclosure are related to using one or more recurrent neural networks. For example, a computing device may receive text input; may determine features based on the text input; may provide the features as input to an recurrent neural network; may determine embedded data from one or more activations of a hidden layer of the recurrent neural network; may determine speech data based on a speech unit search that attempts to select, from a database, speech units based on the embedded data; and may generate speech output based on the speech data.
Abstract: An encoding apparatus and a decoding apparatus in a transform between a Modified Discrete Cosine Transform (MDCT)-based coder and a different coder are provided. The encoding apparatus may encode additional information to restore an input signal encoded according to the MDCT-based coding scheme, when switching occurs between the MDCT-based coder and the different coder. Accordingly, an unnecessary bitstream may be prevented from being generated, and minimum additional information may be encoded.
Type:
Grant
Filed:
September 25, 2017
Date of Patent:
July 13, 2021
Assignees:
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KWANGWOON UNIVERSITY INDUSTRY-ACADEMIC COLLABORATION FOUNDATION
Inventors:
Seung Kwon Beack, Tae Jin Lee, Min Je Kim, Dae Young Jang, Kyeongok Kang, Jin Woo Hong, Ho Chong Park, Young-cheol Park