Abstract: An automatic interpretation method performed by a correspondent terminal communicating with an utterer terminal includes receiving, by a communication unit, voice feature information about an utterer and an automatic translation result, obtained by automatically translating a voice uttered in a source language by the utterer in a target language, from the utterer terminal and performing, by a sound synthesizer, voice synthesis on the basis of the automatic translation result and the voice feature information to output a personalized synthesis voice as an automatic interpretation result. The voice feature information about the utterer includes a hidden variable including a first additional voice result and a voice feature parameter and a second additional voice feature, which are extracted from a voice of the utterer.
Type:
Grant
Filed:
August 11, 2020
Date of Patent:
April 4, 2023
Assignee:
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
Abstract: A method for machine reading comprehension includes: S1, obtaining a character-level indication vector of a question and a character-level indication vector of an article; S2, obtaining an encoded question vector and an encoded article vector; S3, obtaining an output P1 of a bidirectional attention model and an output P2 of a shared attention model; S4, obtaining an aggregated vector P3; S5, obtaining a text encoding vector P4; S6, obtaining global interaction information between words within the article; S7, obtaining a text vector P5 after using the self-attention model; S8, obtaining aggregated data P6 according to the text encoding vector P4 and the text vector P5; S9, obtaining a context vector of the article according to the aggregated data P6 and an unencoded article vector P; and S10, predicting an answer position according to the context vector of the article and the encoded question vector to complete the machine reading comprehension.
Type:
Grant
Filed:
September 18, 2020
Date of Patent:
April 4, 2023
Assignee:
UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA
Abstract: An apparatus for encoding an audio or image signal, includes: a controllable windower for windowing the audio or image signal to provide the sequence of blocks of windowed samples; a converter for converting the sequence of blocks of windowed samples into a spectral representation including a sequence of frames of spectral values; a transient location detector for identifying a location of a transient within a transient look-ahead region of a frame; and a controller for controlling the controllable windower to apply a specific window having a specified overlap length to the audio or image signal in response to an identified location of the transient, wherein the controller is configured to select the specific window from a group of at least three windows, wherein the specific window is selected based on the transient location.
Type:
Grant
Filed:
May 28, 2020
Date of Patent:
April 4, 2023
Assignee:
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
Inventors:
Christian Helmrich, Jérémie Lecomte, Goran Markovic, Markus Schnell, Bernd Edler, Stefan Reuschl
Abstract: An analogue-to-digital converter (ADC), comprising: an adaptive whitening filter configured to filter an analogue input signal and output a whitened analogue input signal; a first converter configured to receive said whitened analogue input signal and output a whitened digital signal; a controller configured to adapt the whitening filter based on the received analogue input signal.
Abstract: The present disclosure describes methods and systems to predict predicate metadata parameters in knowledge graphs via neural networks. The method includes receiving a knowledge graph based on a knowledge base including a graph-based dataset. The knowledge graph includes a predicate between two nodes and a set of predicate metadata.
Abstract: Disclosed is an apparatus and method for encoding/decoding an audio signal using information of a previous frame. An audio signal encoding method includes: generating a current latent vector by reducing dimension of a current frame of an audio signal; generating a concatenation vector by concatenating a previous latent vector generated by reducing dimension of a previous frame of the audio signal with the current latent vector; and encoding and quantizing the concatenation vector.
Type:
Grant
Filed:
November 27, 2020
Date of Patent:
February 14, 2023
Assignee:
Electronics and Telecommunications Research Institute
Inventors:
Woo-Taek Lim, Seung Kwon Beack, Jongmo Sung, Mi Suk Lee, Tae Jin Lee
Abstract: System and method multitask prediction. The system include a computing device. The computing device has a processer and a storage device storing computer executable code. The computer executable code is configured to: provide a head entity and a document containing the head entity; process the head entity and the document by a language model to obtain head extraction corresponding to the head entity, tail extractions corresponding to tail entities in the document, and sentence extraction corresponding to sentences in the document; predict a head-tail relation between the head extraction and the tail extractions using a first bilinear layer; combine the sentence extraction and a relation vector corresponding to the predicted head-tail relation using a second bilinear layer to obtain a sentence-relation combination; and predict an evidence sentence supporting the head-tail relation using a third bilinear layer based on the sentence-relation combination and attention extracted from the language model.
Type:
Grant
Filed:
August 25, 2020
Date of Patent:
January 31, 2023
Assignees:
BEIJING WODONG TIANJUN INFORMATION TECHNOLOGY CO., LTD., JD.COM AMERICAN TECHNOLOGIES CORPORATION
Abstract: A method and apparatus for predicting a mouth-shape feature, and an electronic device are provided. A specific implementation of the method comprises: recognizing a phonetic posterior gram (PPG) of a phonetic feature; and performing a prediction on the PPG by using a neural network model, to predict a mouth-shape feature of the phonetic feature, the neural network model being obtained by training with training samples and an input thereof including a PPG and an output thereof including a mouth-shape feature, and the training samples including a PPG training sample and a mouth-shape feature training sample.
Abstract: Estimation accuracies of a conversation satisfaction and a speech satisfaction are improved. A learning data storage unit (10) stores learning data including a conversation voice containing a conversation including a plurality of speeches, a correct answer value of a conversation satisfaction for the conversation, and a correct answer value of a speech satisfaction for each speech included in the conversation.
Type:
Grant
Filed:
July 20, 2018
Date of Patent:
January 17, 2023
Assignee:
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Abstract: Techniques for the generation of dubbed audio for an audio/video are described.
Type:
Grant
Filed:
December 10, 2019
Date of Patent:
January 3, 2023
Assignee:
Amazon Technologies, Inc.
Inventors:
Marcello Federico, Robert Enyedi, Yaser Al-Onaizan, Roberto Barra-Chicote, Andrew Paul Breen, Ritwik Giri, Mehmet Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
Abstract: An entity resolution system performs a method of resolving one or more candidate entities based on a data set. The entity resolution system has a rules-based module, a machine learning module, a narrative module, and an evaluation module. The rules-based module compares the first entity features to the second entity features and determines whether a rule identifies a relationship between the first entity and the second entity. The machine learning module rates a similarity of the first entity features and the second entity features. The narrative module generates a narrative output based on one or more of the rules-based module and the machine learning module, the narrative output stating an identified relationship between the first entity and the second entity. The evaluation module determines one or more metrics to apply feedback to the system.
Type:
Grant
Filed:
August 29, 2019
Date of Patent:
January 3, 2023
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A method is provided for determining support of a hypothesis by opinion sentences. The method converts sentence structures in the opinion sentences using various sentence structure conversion methods to obtain converted opinion sentences. For each converted opinion sentence, the method calculates a difference between a proximity label value indicating proximity to the hypothesis and an intermediate score before and after a conversion, adopts the conversion responsive to a condition being met relative to the difference, and adopts the opinion sentence instead responsive to the condition being unmet. The method creates sub-opinions using the various methods applied to adopted conversions and opinion sentences, and obtains an intermediate score for each sub-opinion.
Type:
Grant
Filed:
March 2, 2020
Date of Patent:
January 3, 2023
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.
Type:
Grant
Filed:
June 11, 2020
Date of Patent:
January 3, 2023
Assignee:
Smule, Inc.
Inventors:
Spencer Salazar, Rebecca A. Fiebrink, Ge Wang, Mattias Ljungstrom, Jeffrey C. Smith, Perry R. Cook
Abstract: This document describes a data processing system for processing a speech signal for voice-based profiling. The data processing system segments the speech signal into a plurality of segments, with each segment representing a portion of the speech signal. For each segment, the data processing system generates a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment and determines whether the feature vector comprises data indicative of one or more features with a threshold amount of confidence. For each of a subset of the generated feature vectors, the system processes data in that feature vector to generate a prediction of a value of a profile parameter and transmits an output responsive to machine executable code that generates a visual representation of the prediction of the value of the profile parameter.
Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.
Type:
Grant
Filed:
June 28, 2022
Date of Patent:
December 27, 2022
Assignee:
SAS INSTITUTE INC.
Inventors:
Xiaolong Li, Samuel Norris Henderson, Xiaozhuo Cheng, Xu Yang
Abstract: A method of embodying an online media service having a multiple voice system includes a first operation of collecting preset online articles and content from a specific media site and displaying the online articles and content on a screen of a personal terminal, a second operation of inputting a voice of a subscriber or setting a voice of a specific person among voices that are pre-stored in a database, a third operation of recognizing and classifying the online articles and content, a fourth operation of converting the classified online articles and content into speech, and a fifth operation of outputting the online articles and content using the voice of the subscriber or the specific person, which is set in the second operation.
Abstract: An agent automation system includes a memory configured to store a natural language understanding (NLU) framework and a processor configured to execute instructions of the NLU framework to cause the agent automation system to perform actions. These actions comprise: generating an annotated utterance tree of an utterance using a combination of rules-based and machine-learning (ML)-based components, wherein a structure of the annotated utterance tree represents a syntactic structure of the utterance, and wherein nodes of the annotated utterance tree include word vectors that represent semantic meanings of words of the utterance; and using the annotated utterance tree as a basis for intent/entity extraction of the utterance.
Type:
Grant
Filed:
June 23, 2020
Date of Patent:
December 6, 2022
Assignee:
ServiceNow, Inc.
Inventors:
Edwin Sapugay, Anil Kumar Madamala, Maxim Naboka, Srinivas SatyaSai Sunkara, Lewis Savio Landry Santos, Murali B. Subbarao
Abstract: Technology is provided for identifying synthesized conversation features from recorded conversations. The technology can identify, for each of one or more utterances, data for multiple modalities, such as acoustic data, video data, and text data. The technology can extract features, for each particular utterance of the one or more utterances, from each of the data for the multiple modalities associated with that particular utterance. The technology can also apply a machine learning model that receives the extracted features and/or previously synthesized conversation features and produces one or more additional synthesized conversation features.
Type:
Grant
Filed:
February 21, 2020
Date of Patent:
December 6, 2022
Assignee:
BetterUp, Inc.
Inventors:
Andrew Reece, Peter Bull, Gus Cooney, Casey Fitzpatrick, Gabriella Rosen Kellerman, Ryan Sonnek
Abstract: Disclosed are a device and method for wirelessly communicating. The device according to one example embodiment of the present disclosure may comprise a transceiver and a controller connected to the transceiver, wherein the controller is configured to identify at least one additional sample on the basis of a digital signal by using a neural network model and upscale the digital signal by adding the at least one identified additional sample to a plurality of samples of the digital signal.
Abstract: An electronic apparatus is provided. The electronic apparatus includes a microphone, a transceiver, a memory configured to store a control command identification tool based on a control command identified by a voice recognition server that performs voice recognition processing on a user voice received from the electronic apparatus, and at least one processor configured to, based on the user voice being received through the microphone, acquire user intention information by performing the voice recognition processing on the received user voice, receive status information of external devices related to the acquired user intention information from a device control server, identify a control command for controlling a device to be controlled among the plurality of external devices by applying the acquired user intention information and the received status information of the external devices to the control command identification tool, and transmit the identified control command to the device control server.