Patents Issued in April 30, 2020
  • Publication number: 20200135168
    Abstract: The present invention provides a sound pressure signal output apparatus capable of synthesizing and outputting a sound pressure signal that simulates the sound of a real engine with reduced processing load in real time while flexibly adapting to specification changes. The sound pressure signal output apparatus comprises: an interface that acquires single sound data corresponding to the sound generated by one cylinder of a vehicle-mounted internal combustion engine during one combustion cycle in the cylinder, acquires order sound data corresponding to order sound for a frequency corresponding to the engine rotation speed, and acquires random sound data generated corresponding to at least either the material or the shape of the structure that makes up an engine; and a synthesis unit that synthesizes and outputs the sound pressure signal of an engine sound using the single sound data and the like acquired.
    Type: Application
    Filed: April 13, 2018
    Publication date: April 30, 2020
    Inventor: Osamu Maeda
  • Publication number: 20200135169
    Abstract: An audio playback device receives an instruction from a user to select a target voice model from a plurality of voice models and assigns the target voice model to a target character in a text. The audio playback device also transforms the text into a speech, and during the process of transforming the text into the speech, transforms sentences of the target character in the text into the speech of the target character according to the target voice model.
    Type: Application
    Filed: November 30, 2018
    Publication date: April 30, 2020
    Inventors: Guang-Feng DENG, Cheng-Hung TSAI, Tsun KU, Zhi-Guo ZHU, Han-Wen LIU
  • Publication number: 20200135170
    Abstract: An electronic mail server, computer-readable medium and method of delivering an electronic message to a wireless communication device are provided. The wireless communications device comprises a long-range wireless transceiver, a short-range wireless transceiver having a range less than the long-range wireless transceiver, and a display screen. A new text message is received via the long-range wireless transceiver. Based upon a short-range wireless connection being established with another device via the short-range wireless transceiver, the wireless communication device is switched to an audio message mode. An indication that the new text message has been received is displayed on the display screen. The new message is selected and, when in the audio message mode, an audio message comprising speech generated based upon the new text message is output via the short-range wireless transceiver.
    Type: Application
    Filed: October 26, 2018
    Publication date: April 30, 2020
    Inventors: Darrell Reginald MAY, ALAIN R. GAGNE
  • Publication number: 20200135171
    Abstract: A training apparatus includes an autoregressive model configured to estimate a current signal from a past signal sequence and a current context label, a vocal tract feature analyzer configured to analyze an input speech signal to determine a vocal tract filter coefficient representing a vocal tract feature, a residual signal generator configured to output a residual signal, a quantization unit configured to quantize the residual signal output from the residual signal generator to generate a quantized residual signal, and a training controller configured to provide as a condition, a context label of an already known input text for the input speech signal corresponding to the already known input text to the autoregressive model and to train the autoregressive model by bringing a past sequence of the quantized residual signals for the input speech signal and the current context label into correspondence with a current signal of the quantized residual signal.
    Type: Application
    Filed: February 21, 2018
    Publication date: April 30, 2020
    Inventors: Kentaro Tachibana, Tomoki Toda
  • Publication number: 20200135172
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers.
    Type: Application
    Filed: October 28, 2019
    Publication date: April 30, 2020
    Inventors: Yutian Chen, Scott Ellison Reed, Aaron Gerard Antonius van den Oord, Oriol Vinyals, Heiga Zen, Ioannis Alexandros Assael, Brendan Shillingford, Joao Ferdinando Gomes de Freitas
  • Publication number: 20200135173
    Abstract: A computer-implemented method comprising: receiving, by a computing device, an input phrase from a text generator; determining, by the computing device, a complexity level for an audience; generating, by the computing device, a plurality of target phrases including a modification of the input phrase; generating, by the computing device, respective readability scores for each of the plurality of target phrases; mapping, by the computing device, the plurality of the target phrases to the target audience complexity level to select a particular target phrase of the plurality of the target phrases; and outputting, by the computing device, the selected particular target phrase to a text-to-speech (T2S) component to cause the T2S component to output the selected particular target phrase as audible speech.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Craig M. TRIM, John M. GANCI, JR., Aaron K. BAUGHMAN, Veronica WYATT
  • Publication number: 20200135174
    Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.
    Type: Application
    Filed: October 24, 2018
    Publication date: April 30, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
  • Publication number: 20200135175
    Abstract: A device includes a processor configured to, in response to determining that an input phrase includes a first term that is included in a term hierarchy, generate a second phrase by replacing the first term in the input phrase with a second term included in the term hierarchy. The processor is configured to determine that interactive response (IR) training data indicates that the input phrase is associated with a user intent indicator. The processor is configured to determine that user interaction data indicates that a first proportion of user phrases received by an IR system correspond to the user intent indicator. The processor is configured to update speech-to-text training data based on the input phrase and the second phrase so that a second proportion of training phrases of the speech-to-text training data correspond to the user intent indicator. The second proportion is based on the first proportion. A speech-to-text model is trained based on the speech-to-text training data.
    Type: Application
    Filed: October 29, 2018
    Publication date: April 30, 2020
    Inventors: Edward G. Katz, Alexander C. Tonetti, John A. Riendeau, Sean T. Thatcher
  • Publication number: 20200135176
    Abstract: An electronic device receives audio data for a media item. The electronic device generates, from the audio data, a plurality of samples, each sample having a predefined maximum length. The electronic device, using a neural network trained to predict character probabilities, generates a probability matrix of characters for a first portion of a first sample of the plurality of samples. The probability matrix includes character information, timing information, and respective probabilities of respective characters at respective times. The electronic device identifies, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.
    Type: Application
    Filed: September 12, 2019
    Publication date: April 30, 2020
    Inventors: Daniel Stoller, Simon René Georges Durand, Sebastian Ewert
  • Publication number: 20200135177
    Abstract: In one embodiment, a computer-implemented method includes obtaining a pronunciation of a first word of a particular language and identifying a phonetic component of the pronunciation. The method includes obtaining a phonetic component mapping table for the type of phonetic component identified in the pronunciation of the first word and assigning a phonetic value to the identified phonetic component using the phonetic component mapping table. For a second word, the method includes obtaining a pronunciation of a second word, identifying a phonetic component of the pronunciation, and assigning a phonetic value to the identified phonetic component. In addition, the method includes calculating a phonetic distance between the identified phonetic component of the first word and the identified phonetic component of the second word, using the assigned phonetic values of the respective identified phonetic components of the first word second word, and storing the calculated phonetic distance in a file.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Min Li, Yunyao Li, Marina D. Hailpern, Sara Noeman
  • Publication number: 20200135178
    Abstract: The present invention relates to a real-time voice recognition apparatus equipped with an application-specific integrated circuits (ASIC) chip and a smartphone, capable, by using one smartphone and one ASIC chip and without using a cloud computer, of assuring personal privacy, and, due to a short delay time, enabling real-time conversion of voice input signals into text for output. When one DRAM chip is optionally added to the real-time voice recognition apparatus, the number of neural network layers is increased thereby significantly improving accuracy of conversion of voice input signals into text.
    Type: Application
    Filed: April 26, 2018
    Publication date: April 30, 2020
    Inventors: Hong June PARK, Hyeon Kyu NOH, Won Cheol LEE, Kyeong Won JEONG
  • Publication number: 20200135179
    Abstract: Various embodiments include methods and devices for implementing automatic grammar augmentation for improving voice command recognition accuracy in systems with a small footprint acoustic model. Alternative expressions that may capture acoustic model decoding variations may be added to a grammar set. An acoustic model-specific statistical pronunciation dictionary may be derived by running the acoustic model through a large general speech dataset and constructing a command-specific candidate set containing potential grammar expressions. Greedy based and cross-entropy-method (CEM) based algorithms may be utilized to search the candidate set for augmentations with improved recognition accuracy.
    Type: Application
    Filed: October 28, 2019
    Publication date: April 30, 2020
    Inventors: Yang Yang, Anusha Lalitha, Jin Won Lee, Christopher Lott
  • Publication number: 20200135180
    Abstract: Systems and methods for e-commerce systems using natural language understanding are described. A computing device is configured receive a user utterance and identify at least one semantic element within the user utterance. An intent associated with the at least one semantic element is identified and an intent flow associated with the identified intent is executed. The intent flow includes a set of tasks executed in a predetermined order. A system utterance is generated by instantiating a response template selected from a plurality of response templates associated with the executed intent.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Snehasish Mukherjee, Shankara Bhargava Subramanya
  • Publication number: 20200135181
    Abstract: Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected age range and/or “vocabulary level” of a user who is engaging with the automated assistant. In various implementations, data indicative of a user's utterance may be used to estimate one or more of the user's age range and/or vocabulary level. The estimated age range/vocabulary level may be used to influence various aspects of a data processing pipeline employed by an automated assistant. In various implementations, aspects of the data processing pipeline that may be influenced by the user's age range/vocabulary level may include one or more of automated assistant invocation, speech-to-text (“STT”) processing, intent matching, intent resolution (or fulfillment), natural language generation, and/or text-to-speech (“TTS”) processing. In some implementations, one or more tolerance thresholds associated with one or more of these aspects, such as grammatical tolerances, vocabularic tolerances, etc.
    Type: Application
    Filed: December 27, 2019
    Publication date: April 30, 2020
    Inventors: Pedro Gonnet Anders, Victor Carbune, Daniel Keysers, Thomas Deselaers, Sandro Feuz
  • Publication number: 20200135182
    Abstract: An electronic computing device for providing a response to an audio query where the response is determined to have a public safety impact. The electronic computing device includes a microphone and an electronic processor. The electronic processor is configured to receive audio data including a speech segment and determine a plurality of possible meanings of the speech segment. Each possible meaning is associated with a probability that the possible meaning is a correct meaning for the speech segment and a first possible meaning is associated with a first probability that is higher than a second probability associated with a second possible meaning. The electronic processor is also configured to determine public safety impact context information of the second possible meaning. The electronic processor is further configured to output a second response associated with the second possible meaning without first outputting a first response associated with the first possible meaning.
    Type: Application
    Filed: October 25, 2018
    Publication date: April 30, 2020
    Inventors: Haim Kahlon, Alexander Aperstein, David Lev, Tamar Mordel
  • Publication number: 20200135183
    Abstract: Methods and systems for a transportation vehicle are provided. One method includes receiving a user input for a valid communication session by a processor executable, digital assistant at a device on a transportation vehicle; tagging by the digital assistant, the user input words with a grammatical connotation; generating an action context, a filter context and a response context by a neural network, based on the tagged user input; storing by the digital assistant, a key-value pair for a parameter of the filter context at a short term memory, based on an output from the neural network; updating by the digital assistant, the key-value pair at the short term memory after receiving a reply to a follow-up request and another output from the trained neural network; and providing a response to the reply by the digital assistant.
    Type: Application
    Filed: December 27, 2019
    Publication date: April 30, 2020
    Applicant: Panasonic Avionics Corproation
    Inventors: Rawad Hilal, Gurmukh Khabrani, Chin Perng
  • Publication number: 20200135184
    Abstract: Determining a language for speech recognition of a spoken utterance received via an automated assistant interface for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Implementations determine a user profile that corresponds to audio data that captures a spoken utterance, and utilize language(s), and optionally corresponding probabilities, assigned to the user profile in determining a language for speech recognition of the spoken utterance. Some implementations select only a subset of languages, assigned to the user profile, to utilize in speech recognition of a given spoken utterance of the user.
    Type: Application
    Filed: April 16, 2018
    Publication date: April 30, 2020
    Inventors: Pu-sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
  • Publication number: 20200135185
    Abstract: An electronic device receives audio data for a media item. The electronic device generates, from the audio data, a plurality of samples, each sample having a predefined maximum length. The electronic device, using a neural network trained to predict textal unit probabilities, generates a probability matrix of textual units for a first portion of a first sample of the plurality of samples. The probability matrix includes information about textual units, timing information, and respective probabilities of respective textual units at respective times. The electronic device identifies, for the first portion of the first sample, a first sequence of textual units based on the generated probability matrix.
    Type: Application
    Filed: November 21, 2019
    Publication date: April 30, 2020
    Inventors: Daniel STOLLER, Simon René Georges DURAND, Sebastian EWERT
  • Publication number: 20200135186
    Abstract: Systems and methods are provided for speech recognition. An example method may be implementable by a server. The method may comprise adding a key phrase into a dictionary comprising a plurality of dictionary phrases, and for each one or more of the dictionary phrases, obtaining a first probability that the dictionary phrase is after the key phrase in a phrase sequence. The key phrase and the dictionary phrase may each comprise one or more words. The first probability may be independent of the key phrase.
    Type: Application
    Filed: December 27, 2019
    Publication date: April 30, 2020
    Applicant: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.
    Inventor: Chen HUANG
  • Publication number: 20200135187
    Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
    Type: Application
    Filed: April 16, 2018
    Publication date: April 30, 2020
    Inventors: Pu-sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
  • Publication number: 20200135188
    Abstract: An audio-video (AV) reproduction device that comprises at least one audio capturing device, at least one speaker, a memory, and circuitry. The memory stores setup information associated with first-time device setup of the audio-video reproduction device. The first-time device setup is associated with a plurality of configuration settings of the AV reproduction device. The circuitry controls the at least one speaker to output a message in the setup information, and controls the at least one audio capturing device to receive a user input based on the message. The circuitry compares the user input with at least one condition associated with the message. The circuitry configures a configuration setting from the plurality of configuration settings based on the comparison. The circuitry controls at least a function of the AV reproduction device based on the configured configuration setting.
    Type: Application
    Filed: October 24, 2018
    Publication date: April 30, 2020
    Inventors: ALLISON BURGUENO, LINDSAY MILLER, MARVIN DEMERCHANT, HYEHOON YI, TRACY BARNES, David YOUNG, YUKO NISHIKAWA
  • Publication number: 20200135189
    Abstract: A system and method for integrated printing of search results includes digital device having a processor and associated memory. Data is communicated with an associated document rendering device via a data interface. A microphone captures digitized speech to facilitate natural language processing of user input. The processor extracts commands from the digitized voice input and accesses a digital data record responsive to a verbal lookup command extracted from the digitized voice. The processor extracts a data record summary from the data record and completes a text-to-speech translation of the summary. The processor then reads the resultant text through a speaker. The processor then commences a print of the digital data record responsive to a verbal print command extracted from the digitized voice input.
    Type: Application
    Filed: October 25, 2018
    Publication date: April 30, 2020
    Inventor: Gareth M. JENSEN
  • Publication number: 20200135190
    Abstract: A vehicle includes a controller, programmed to responsive to detecting a voice command from a user and a location of the user inside the vehicle via a microphone, authenticate an identity of the user using facial recognition on an image captured, by a camera, of the location of the user; and responsive to a successful authentication, execute the voice command.
    Type: Application
    Filed: October 26, 2018
    Publication date: April 30, 2020
    Inventors: Nevrus Kaja, Glen Munro Green
  • Publication number: 20200135191
    Abstract: A universal voice control system (aka: a Digital Voice Butler or DVB) is used to communicate with and control one or more voice activated smart devices (VASDs) with a single shared activation word. The DVB is embodied in a housing that contains a microphone, a speaker, a voice synthesizer, a list of understood spoken commands, a look up table having objects acted upon by the commands and ecosystem specific commands, and a processor in electronic communication with the microphone and speaker. A device such as a smart phone is in communication with the processor and provides a user interface for the DVB that allows specific VASDs and their associated functions to be linked to the DVB.
    Type: Application
    Filed: October 30, 2018
    Publication date: April 30, 2020
    Applicant: BBY Solutions, Inc.
    Inventor: Farhad Nourbakhsh
  • Publication number: 20200135192
    Abstract: Systems and methods for e-commerce systems using natural language understanding are described. A computing device is configured receive a user utterance including at least one identified semantic component and at least one missing semantic component and generate a context stack including a set of context entries. Each of the context entries includes a root intent element, an entity list element, and a dialogue stack and each context entry in the set of context entries is associated with one of a user utterance or a system utterance. The computing device is further configured to review at least one context entry in the set of context entries to locate the at least one missing semantic element within the dialogue stack and generate an intent flow execution request including the at least one semantic element from the first speech data and the missing semantic element.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Snehasish Mukherjee, Shankara Bhargava Subramanya
  • Publication number: 20200135193
    Abstract: A driving assistance apparatus includes a memory and a processor including hardware. The processor is configured to acquire voice information uttered by a driver, recognize content of the voice information, output information on content of a process based on a recognition result of the voice information before executing the process, and execute the process when an approval signal that approves the execution of the process is input from an operation member disposed on a steering wheel that the driver holds to steer a vehicle.
    Type: Application
    Filed: August 12, 2019
    Publication date: April 30, 2020
    Applicant: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventor: Naoki HAYASHI
  • Publication number: 20200135194
    Abstract: Disclosed herein is an electronic device.
    Type: Application
    Filed: July 5, 2017
    Publication date: April 30, 2020
    Applicant: LG ELECTRONICS INC.
    Inventor: Gyuhyeok JEONG
  • Publication number: 20200135195
    Abstract: An evaluation system includes a speech data acquisition unit configured to acquire speech data from speech of a member engaged in a task, a speech recognition unit configured to recognize content of the speech from the speech data and convert the content into text data, a task data acquisition unit configured to acquire data indicating an action related to the task of the member and data indicating a task status, an extraction unit configured to extract conversation data related to the task from the text data based on the content of the speech recognized by the speech recognition unit, and an analysis unit configured to analyze correlations between the conversation data, the data indicating the action, and the data indicating the task status.
    Type: Application
    Filed: August 15, 2018
    Publication date: April 30, 2020
    Applicant: MITSUBISHI HEAVY INDUSTRIES, LTD.
    Inventors: Yoshiki Ishikawa, Yasuhiko Fukuyama, Kohei Masuda, Keiko Ishikawa, Tomohiro Komine
  • Publication number: 20200135196
    Abstract: An electronic device includes a voice receiving unit configured to receive a voice input, a first communication unit configured to communicate with an external device having a voice recognition function, and a control unit. The control unit receives a notification indicating whether the external device is ready to recognize the voice input, via the first communication unit. In a case where the notification indicates that the external device is not ready to recognize the voice input, the control unit controls the external device to be ready to recognize the voice input via the first communication unit when a predetermined voice input including a phrase corresponding to the external device is received through the voice receiving unit.
    Type: Application
    Filed: October 22, 2019
    Publication date: April 30, 2020
    Inventor: Shunji Fujita
  • Publication number: 20200135197
    Abstract: A communication device includes a calculation unit which calculates class probabilities that are probabilities an input speech belongs to a plurality of respective classified classes previously defined as types of speech contents, a plurality of response generation modules provided for respective types of responses each generates a response speech corresponding to the type, a determination unit which selects one of the plurality of response generation modules based on association probabilities and the class probabilities calculated by the calculation unit and determines the response speech generated by the selected response generation module as an output speech to be emitted to the user, the association probabilities being, set for each of the plurality of response generation modules, and the association probabilities each indicating a level of association between the response generation module and each of the plurality of classified classes, and an output unit which outputs the output speech.
    Type: Application
    Filed: October 23, 2019
    Publication date: April 30, 2020
    Applicant: Toyota Jidosha Kabushiki Kaisha
    Inventors: Ryosuke Nakanishi, Mina Funazukuri
  • Publication number: 20200135198
    Abstract: An embodiment relates to an apparatus, wherein the apparatus is configured to use a first sensor to identify a user of the apparatus, to obtain a temporarily identified user, wherein the apparatus is configured to use a second sensor, different from the first sensor, to spatially track the identified user, in order to update a position assigned to the identified user, to obtain an identified and localized user, and wherein the apparatus is configured to link a user interaction to the identified and localized user by determining whether the user interaction was performed by the identified and localized user.
    Type: Application
    Filed: October 28, 2019
    Publication date: April 30, 2020
    Inventors: Christian Mandl, Daniel Neumaier
  • Publication number: 20200135199
    Abstract: Techniques for synchronously outputting content by one or more devices are described. A system may receive a user command and may receive content responsive to the command from an application(s). The content may include various kinds of data (e.g., audio data, image data, video data, etc.). The system may also receive a presentation framework from the application, with the presentation framework indicating how content responsive to the input command should be synchronously output by one or more devices. The system determines one or more devices proximate to the user, determines which of the one or more devices may be used to output content indicated in the presentation framework, and causes the one or more devices to output content in a synchronous manner.
    Type: Application
    Filed: October 28, 2019
    Publication date: April 30, 2020
    Inventors: Felix Wu, Rohan Mutagi, Manuel Jesus Leon Rivas, Noel Evans, Frédéric Johan Georges Deramat, Miguel Alberdi Lorenzo, Lev Danielyan, Vikram Kumar Gundeti, Vijitha Raji
  • Publication number: 20200135200
    Abstract: A method and an apparatus for processing audio commands includes receiving an audio command from a user, determining that a proper response to the audio command is unavailable in a first assistant device based on analyzing the audio command, transmitting the audio command to at least one second assistant device, and receiving at least one response to the audio command from the at least one second assistant device.
    Type: Application
    Filed: October 31, 2019
    Publication date: April 30, 2020
    Inventors: Ankit TAPARIA, Mugula Satya Shankar kameshwar SHARMA, Deepak KUMAR, Shaktiman DUBEY, Rishabh RAJ, Srikant PADALA, Aishwarya Vitthal RAIMULE, Kislay PANDEY, Mohit LOGANATHAN
  • Publication number: 20200135201
    Abstract: Conversational understanding systems allow users to conversationally interface with a computing device. In examples, a query may be received that includes a request for execution of a task. A data exchange task definition may be accessed. The data exchange task definition assists a conversational understanding system in managing task state tracking for information needed for task execution. Using the data exchange task definition, a per-turn policy for interacting with the user computing device is generated based on the state of a dialogue with a computing device and an evaluation of a process flow chart provided by a task owner resource. The task owner resource may be independent from the conversational understanding system. A response to the query may be generated and output based on the per-turn policy. In examples, the per-turn policy is used to generate one or more responses during a dialogue with a user via a computing device.
    Type: Application
    Filed: December 20, 2019
    Publication date: April 30, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Paul CROOK, Vasiliy RADOSTEV, Omar Zia KHAN, Vipul AGARWAL, Ruhi SARIKAYA, Marius Alexandru MARIN, Alexandre ROCHETTE, Jean-Philippe ROBICHAUD
  • Publication number: 20200135202
    Abstract: According to one embodiment, an electronic device determines whether one or more devices should be controlled based on a second utterance input subsequent to a first utterance input from outside in accordance with the first utterance. The electronic device includes a management unit and a controller. The management unit prepares and manages a determination audio data item for determining whether the first utterance is a desired utterance by utterances input from outside at a plurality of times, and determines whether the first utterance is the desired utterance using the prepared and managed determination audio data item. The controller controls the one or more devices based on the second utterance.
    Type: Application
    Filed: December 27, 2019
    Publication date: April 30, 2020
    Inventors: Hidehito ZAWA, Reiko KAWACHI, Kunio HONSAWA, Hiroyuki NOMOTO
  • Publication number: 20200135203
    Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.
    Type: Application
    Filed: January 2, 2020
    Publication date: April 30, 2020
    Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
  • Publication number: 20200135204
    Abstract: In one embodiment, a method for transcript generation includes receiving an audio file and dividing it into a plurality of chunks. The method further includes sending each instance of the plurality of chunks to a speech service module. The method further includes converting speech to text for each instance of the plurality of chunks and returning the text for each instance of the plurality of chunks. The method further includes merging the text for each instance of the plurality of chunks to yield an audio file transcript and sending the audio file and chunks to a diarization module. The method further includes performing first pass diarization on the chunks to yield a plurality of diarized chunks and performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file. The method further includes merging the files to yield a final transcript.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Jean-Philippe Robichaud, Alexei Skurikhin, Migüel Jetté, Petrov Evgeny Stanislavovich
  • Publication number: 20200135205
    Abstract: The disclosure relates to a method, device, apparatus, and storage medium. The method includes recognizing voice data inputted by a user; obtaining a voice text corresponding to the voice data; obtaining, based on the voice text, a text to-be-input corresponding to the voice data, wherein the text to-be-input includes a plurality of words constituting a phrase or a sentence; and displaying the text to-be-input in an input textbox of an input interface.
    Type: Application
    Filed: August 30, 2019
    Publication date: April 30, 2020
    Applicant: Beijing Xiaomi Mobile Software Co., Ltd.
    Inventors: Jiefu TAN, Senhua CHEN, Dan LI, Xinyi REN
  • Publication number: 20200135206
    Abstract: Techniques to enhance group decision-making within messaging platforms. A conversation thread between two or more participants is analyzed. One or more keywords occurring in the conversation thread that are associated with an event characteristic are identified by comparing messages of each of the two or more participants against a keyword listing. Natural language processing is used to determine a contextual use of the one or more keywords. One or more events relevant to the event characteristic and the contextual use are determined. An application programming interface is used to locate and retrieve event-related information for the one or more events.
    Type: Application
    Filed: December 23, 2019
    Publication date: April 30, 2020
    Inventors: Jennifer HEBERT, Anthony MUTALIPASSI, Nicholas LEWERKE, Levon KARAYAN
  • Publication number: 20200135207
    Abstract: Method, system, and apparatus for storing conversation data of a conversation onto a blockchain network, the conversation data comprising terms of an agreement, the method comprising: receiving audio data of a conversation between two or more participants; creating a transcript of at least some of the audio data; accessing a database comprising a plurality of words or phrases. The method, system, and apparatus are also for obtaining, from the database, predefined one or more words associated with a predefined topic; searching the transcript for the predefined one or more words; filtering the transcript based on the predefined one or more words; and storing the conversation data onto a first block of a blockchain stored on the blockchain network, wherein the conversation data comprises the filtered transcript.
    Type: Application
    Filed: October 29, 2018
    Publication date: April 30, 2020
    Inventor: Rasit O. TOPALOGLU
  • Publication number: 20200135208
    Abstract: Systems and methods for establishing communication connections using speech, such as establishing calls between speech-controlled devices, are described. A first speech-controlled device receives a communication request in the form of audio and sends audio data corresponding to the captured audio to a server. The server performs speech processing on the audio data to determine a recipient, a subject for the call, and a device associated with the recipient. The server then sends a message indicating the communication request and audio data corresponding to the communication topic to the recipient's speech-controlled device. The recipient device outputs audio to the recipient requesting whether the recipient accepts the communication request. The recipient audibly refuses or accepts the communication request, and the recipient's speech-controlled device sends an indication of the recipient's audible decision to the server.
    Type: Application
    Filed: November 25, 2019
    Publication date: April 30, 2020
    Inventors: Tapas Kanti Roy, Brian Oliver, Christo Frank Devaraj
  • Publication number: 20200135209
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.
    Type: Application
    Filed: August 7, 2019
    Publication date: April 30, 2020
    Inventors: Masood DELFARAH, Ossama A. ABDELHAMID, Kyuyeon HWANG, Donald R. MCALLASTER, Sabato Marco SINISCALCHI
  • Publication number: 20200135210
    Abstract: Embodiments of a speech broadcasting method, device, apparatus and a computer-readable storage medium are provided. The method can include: receiving recorded speech data from a plurality of speakers; extracting respective text features of the plurality of speakers from the recorded speech data, and allocating the plurality of speakers with respective identifications; and inputting the text features and the identifications of the speakers to a text-acoustic mapping model, to output speech features of the plurality of speakers; and establishing a mapping relationship between the text feature and the speech feature of each speaker. In the embodiments of the present application, a broadcaster can be selected to broadcast a text, greatly improving user experience of the text broadcasting.
    Type: Application
    Filed: September 6, 2019
    Publication date: April 30, 2020
    Inventor: Yongguo KANG
  • Publication number: 20200135211
    Abstract: The information processing method in the present disclosure is performed as below. At least one speech segment is detected from speech input to a speech input unit. A first feature quantity is extracted from each speech segment detected, the first feature quantity identifying a speaker whose voice is contained in the speech segment. The first feature quantity extracted is compared with each of second feature quantities stored in storage and identifying the respective voices of registered speakers who are target speakers in speaker recognition. The comparison is performed for each of consecutive speech segments, and under a predetermined condition, among the second feature quantities stored in the storage, at least one second feature quantity whose similarity with the first feature quantity is less than or equal to a threshold is deleted, thereby removing the at least one registered speaker identified by the at least one second feature quantity.
    Type: Application
    Filed: October 21, 2019
    Publication date: April 30, 2020
    Inventor: Misaki DOI
  • Publication number: 20200135212
    Abstract: Provided are an artificial intelligence (AI) system that utilizes a machine learning algorithm such as deep learning, etc. and an application of the AI system. A speech recognition method, performed by a speech recognition apparatus, of performing speech recognition in a space in which a plurality of speech recognition apparatuses are present includes extracting a speech signal of a speaker from an input audio signal; obtaining a first speaker recognition score indicating a similarity between the speech signal and a speech signal of a registration speaker; and outputting a speech recognition result with respect to the speech signal based on a second speaker recognition score obtained from another speech recognition apparatus among the plurality of speech recognition apparatuses and the first speaker recognition score.
    Type: Application
    Filed: October 24, 2019
    Publication date: April 30, 2020
    Inventors: Keunseok CHO, Jaeyoung ROH, Jiwon HYUNG, Donghan JANG, Jaewon LEE
  • Publication number: 20200135213
    Abstract: An electronic apparatus and a control method are provided, including an input interface, a communication interface, a memory including at least one command, and at least one processor configured to control the electronic device and execute the at least one command to receive a user speech through the input interface, determine whether or not the user speech is a speech related to a task requiring user confirmation by analyzing the user speech, generate a question for the user confirmation when it is determined that the user speech is the speech related to the task requiring the user confirmation, and perform a task corresponding to the user speech when a user response corresponding to the question is input through the input interface. Embodiments may use an artificial intelligence model learned according to at least one of machine learning, a neural network, and a deep learning algorithm.
    Type: Application
    Filed: October 28, 2019
    Publication date: April 30, 2020
    Inventors: Chanwoo KIM, Kyungmin LEE, Jaeyoung ROH, Donghan JANG, Keunseok CHO, Jiwon HYUNG
  • Publication number: 20200135214
    Abstract: A method and device for decoding a signal, where the method includes: obtaining an average quantity of allocated bits per spectral coefficient of a sub-band of a current frame of the audio signal, wherein the sub-band includes a plurality of spectral coefficients; obtaining a noise filling gain for the sub-band when the average quantity of allocated bits per spectral coefficient is less than a classification threshold; reconstructing, according to the noise filling gain, at least some of the spectral coefficients to generate reconstructed spectral coefficients when the average quantity of allocated bits per spectral coefficient is less than a classification threshold; obtaining a frequency domain signal according to the reconstructed spectral coefficients; and generating a time domain signal based on the frequency domain signal. Therefore, a sub-band with unsaturated bit allocation in a frequency domain signal may be obtained by classification, thereby improving signal decoding quality.
    Type: Application
    Filed: December 31, 2019
    Publication date: April 30, 2020
    Inventors: Zexin Liu, Fengyan Qi, Lei Miao
  • Publication number: 20200135215
    Abstract: Methods of compensating for lost data packets in hearing aid systems wherein a data streaming device streams packets of data to at least two hearing aids are disclosed.
    Type: Application
    Filed: October 30, 2018
    Publication date: April 30, 2020
    Inventor: Brendan LARKIN
  • Publication number: 20200135216
    Abstract: There is provided a decoding device including at least one circuit configured to acquire one or more encoded audio signals including a plurality of channels and/or a plurality of objects and priority information for each of the plurality of channels and/or the plurality of objects, and to decode the one or more encoded audio signals according to the priority information.
    Type: Application
    Filed: December 24, 2019
    Publication date: April 30, 2020
    Applicant: Sony Corporation
    Inventors: Toru Chinen, Masayuki Nishiguchi, Runyu Shi, Mitsuyuki Hatanaka, Yuki Yamamoto
  • Publication number: 20200135217
    Abstract: To provide a bandwidth extension method which allows reduction of computation amount in bandwidth extension and suppression of deterioration of quality in the bandwidth to be extended. In the bandwidth extension method: a low frequency bandwidth signal is transformed into a QMF domain to generate a first low frequency QMF spectrum; pitch-shifted signals are generated by applying different shifting factors on the low frequency bandwidth signal; a high frequency QMF spectrum is generated by time-stretching the pitch-shifted signals in the QMF domain; the high frequency QMF spectrum is modified; and the modified high frequency QMF spectrum is combined with the first low frequency QMF spectrum.
    Type: Application
    Filed: December 30, 2019
    Publication date: April 30, 2020
    Inventors: Tomokazu ISHIKAWA, Takeshi NORIMATSU, Huan ZHOU, Kok Seng CHONG, Haishan ZHONG