Patents Issued in November 12, 2020

IN-VEHICLE NOISE CANCELLATION ADAPTIVE FILTER DIVERGENCE CONTROL

Publication number: 20200357377

Abstract: A active noise cancellation (ANC) system may include an adaptive filter divergence detector for detecting divergence of the one or more controllable filters as they adapt, based on various temporal or frequency domain amplitude characteristics. Upon detection of a controllable filter divergence, the ANC system may be deactivated, or certain speakers may be muted. Alternatively, the ANC system may modify the diverged controllable filters to restore proper operation of the noise cancelling system. This may include adjusting a leakage value of an adaptive filter controller.

Type: Application

Filed: May 7, 2019

Publication date: November 12, 2020

Inventors: Kevin J. BASTYR, David TRUMPY, Shiyu CHEN
DYNAMIC IN-VEHICLE NOISE CANCELLATION DIVERGENCE CONTROL

Publication number: 20200357378

Abstract: An active noise cancellation (ANC) system may include an adaptive filter divergence detector for detecting divergence of the one or more controllable filters as they adapt, based on dynamically adapted thresholds. Upon detection of a controllable filter divergence, the ANC system may be deactivated, or certain speakers may be muted. Alternatively, the ANC system may modify the diverged controllable filters to restore proper operation of the noise cancelling system.

Type: Application

Filed: April 27, 2020

Publication date: November 12, 2020

Inventors: Kevin J. BASTYR, James MAY, David TRUMPY
TRANSMIT BEAMFORMING OF A TWO-DIMENSIONAL ARRAY OF ULTRASONIC TRANSDUCERS

Publication number: 20200357379

Abstract: In a method for transmit beamforming of a two-dimensional array of ultrasonic transducers, a beamforming pattern to apply to a beamforming space of the two-dimensional array of ultrasonic transducers is defined. The beamforming space includes a plurality of elements, where each element of the beamforming space corresponds to an ultrasonic transducer of the two-dimensional array of ultrasonic transducers, where the beamforming pattern identifies which ultrasonic transducers within the beamforming space are activated during a transmit operation of the two-dimensional array of ultrasonic transducers, and wherein at least some of the ultrasonic transducers that are activated are phase delayed with respect to other ultrasonic transducers that are activated. The beamforming pattern is applied to the two-dimensional array of ultrasonic transducers. A transmit operation is performed by activating the ultrasonic transducers of the beamforming space according to the beamforming pattern.

Type: Application

Filed: July 6, 2020

Publication date: November 12, 2020

Applicant: InvenSense, Inc.

Inventors: Bruno W. GARLEPP, James Christian SALVIA, Yang PAN, Michael H. PERROTT
ELECTRONIC EFFECTS DEVICE AND METHOD

Publication number: 20200357380

Abstract: An electronic effects device comprising: an input circuit for receiving an input audio signal; a gas discharge tube in communication with the input circuit; wherein the input circuit comprises a transducer for converting the input signal into a signal suitable for producing a discharge in the gas discharge tube; an output circuit in communication with the gas discharge tube for converting the gas discharge into an output signal. A corresponding method for producing electronic effects for musical instruments is also described.

Type: Application

Filed: January 4, 2019

Publication date: November 12, 2020

Applicant: Gamechanger Audio SIA

Inventors: Ilja Krumins, Martins Melkis, Kristaps Kalva
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, SPEECH SYNTHESIS MODEL TRAINING DEVICE, SPEECH SYNTHESIS MODEL TRAINING METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20200357381

Abstract: A speech synthesis device of an embodiment includes a memory unit, a creating unit, a deciding unit, a generating unit and a waveform generating unit. The memory unit stores, as statistical model information of a statistical model, an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution. The creating unit creates a statistical model sequence from context information and the statistical model information. The deciding unit decides a pitch-cycle waveform count of each state using a duration based on the duration distribution of each state of each statistical model in the statistical model sequence, and pitch information based on the output distribution of the pitch feature parameters. The generating unit generates an output distribution sequence based on the pitch-cycle waveform count, and acoustic feature parameters based on the output distribution sequence.

Type: Application

Filed: July 29, 2020

Publication date: November 12, 2020

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masatsune TAMURA, Masahiro MORITA
ORAL, FACIAL AND GESTURE COMMUNICATION DEVICES AND COMPUTING ARCHITECTURE FOR INTERACTING WITH DIGITAL MEDIA CONTENT

Publication number: 20200357382

Abstract: The display of digital media content includes graphical user interfaces and predefined data fields that limit interaction between a person and a computing system. An oral communication device and a data enablement platform are provided for ingesting oral conversational data from people, and using machine learning to provide intelligence. At the front end, an oral conversational bot, or chatbot, interacts with a user. The chatbot is specific to a customized digital magazine, which both evolve over time to a user for a given topic. On the backend, the data enablement platform has a computing architecture that ingests data from various external data sources as well as data from internal applications and databases. These data and algorithms are applied to surface new data, identify trends, provide recommendations, infer new understanding, predict actions and events, and automatically act on this computed information. The chatbot then reads out the content to the user.

Type: Application

Filed: August 10, 2018

Publication date: November 12, 2020

Inventors: STUART OGAWA, LINDSAY ALEXANDER SPARKS, KOICHI NISHIMURA, WILFRED P. SO
Speech Synthesis Method and Speech Synthesis Apparatus

Publication number: 20200357383

Abstract: A speech synthesis method and a speech synthesis apparatus to synthesize speeches of different emotional intensities in the field of artificial intelligence, where the method includes obtaining a target emotional type and a target emotional intensity parameter that correspond to an input text, determining a corresponding target emotional acoustic model based on the target emotional type and the target emotional intensity parameter, inputting a text feature of the input text into the target emotional acoustic model to obtain an acoustic feature of the input text, and synthesizing a target emotional speech based on the acoustic feature of the input text.

Type: Application

Filed: July 31, 2020

Publication date: November 12, 2020

Inventors: Liqun Deng, Yuezhi Hu, Zhanlei Yang, Wenhua Sun
MODEL TRAINING METHOD AND APPARATUS

Publication number: 20200357384

Abstract: A model training method and apparatus is disclosed, where the model training method acquires first output data of a student model for first input data and second output data of a teacher model for second input data and trains the student model such that the first output data and the second output data are not distinguished from each other. The student model and the teacher model have different structures.

Type: Application

Filed: August 23, 2019

Publication date: November 12, 2020

Applicant: Samsung Electronics Co., Ltd.

Inventors: Hogyeong KIM, Hyohyeong KANG, Hwidong NA, Hoshik LEE
INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD

Publication number: 20200357385

Abstract: To more smoothly communicate with a user. Provided is an information processing apparatus including an output control unit configured to control information presentation to a user, in which, in a case where a specific expression in a group including the user is applicable regarding content of information to be presented, the output control unit causes the information presentation including the specific expression to be executed using at least one of a sound or an image. Furthermore, provided is an information processing apparatus including a learning unit configured to learn a recognition target and a linguistic expression regarding the recognition target in association with each other, in which the linguistic expression includes a specific expression in a group including a user, and the learning unit learns the specific expression on the basis of at least one of a collected sound or a collected image.

Type: Application

Filed: October 29, 2018

Publication date: November 12, 2020

Inventor: MARI SAITO
METHOD FOR DETECTING KEYWORD IN SPEECH SIGNAL, TERMINAL, AND STORAGE MEDIUM

Publication number: 20200357386

Abstract: A method for detecting a keyword, applied to a terminal, includes: extracting a speech eigenvector of a speech signal; obtaining, according to the speech eigenvector, a posterior probability of each target character being a key character in any keyword in an acquisition time period of the speech signal; obtaining confidences of at least two target character combinations according to the posterior probability of each target character; and determining that the speech signal includes the keyword upon determining that all the confidences of the at least two target character combinations meet a preset condition. The target character is a character in the speech signal whose pronunciation matches a pronunciation of the key character. Each target character combination includes at least one target character, and a confidence of a target character combination represents a probability of the target character combination being the keyword or a part of the keyword.

Type: Application

Filed: July 20, 2020

Publication date: November 12, 2020

Inventors: Yi GAO, Meng YU, Dan SU, Jie CHEN, Min LUO
Contextual Biasing for Speech Recognition

Publication number: 20200357387

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a first attention module, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

Type: Application

Filed: March 31, 2020

Publication date: November 12, 2020

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
Using Context Information With End-to-End Models for Speech Recognition

Publication number: 20200357388

Abstract: A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.

Type: Application

Filed: March 24, 2020

Publication date: November 12, 2020

Applicant: Google LLC

Inventors: Ding Zhao, Bo Li, Ruoming Pang, Tara N. Sainath, David Rybach, Deepti Bhatia, Zelin Wu
DATA PROCESSING METHOD BASED ON SIMULTANEOUS INTERPRETATION, COMPUTER DEVICE, AND STORAGE MEDIUM

Publication number: 20200357389

Abstract: A data processing method based on simultaneous interpretation, applied to a server in a simultaneous interpretation system, including: obtaining audio transmitted by a simultaneous interpretation device; processing the audio by using a simultaneous interpretation model to obtain an initial text; transmitting the initial text to a user terminal; receiving a modified text fed back by the user terminal, the modified text being obtained after the user terminal modifies the initial text; and updating the simultaneous interpretation model according to the initial text and the modified text.

Type: Application

Filed: July 28, 2020

Publication date: November 12, 2020

Inventors: Jingliang BAI, Caisheng OUYANG, Haikang LIU, Lianwu CHEN, Qi CHEN, Yulu ZHANG, Min LUO, Dan SU
APPARATUS FOR MEDIA ENTITY PRONUNCIATION USING DEEP LEARNING

Publication number: 20200357390

Abstract: Methods, systems, and related products for voice-enabled computer systems are described. A machine-learning model is trained to produce pronunciation output based on text input. The trained machine-learning model is used to produce pronunciation data for text input even where the text input includes numbers, punctuation, emoji, or other non-letter characters. The machine-learning model is further trained based on real-world data from users to improve pronunciation output.

Type: Application

Filed: May 10, 2019

Publication date: November 12, 2020

Inventor: Daniel Bromand
REDUCING DIGITAL ASSISTANT LATENCY WHEN A LANGUAGE IS INCORRECTLY DETERMINED

Publication number: 20200357391

Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example process includes causing a first recognition result for a received natural language speech input to be displayed, where the first recognition result is in a first language and a second recognition result for the received natural language speech input is available for display responsive to receiving input indicative of user selection of the first recognition result, the second recognition result being in a second language. The example process further includes receiving the input indicative of user selection of the first recognition result and in response to receiving the input indicative of user selection of the first recognition result, causing the second recognition result to be displayed.

Type: Application

Filed: August 22, 2019

Publication date: November 12, 2020

Inventors: Arnab GHOSHAL, Roger HSIAO, Gorm AMAND, Patrick L. COFFMAN, Mary YOUNG
SPEECH RECOGNITION USING NATURAL LANGUAGE UNDERSTANDING RELATED KNOWLEDGE VIA DEEP FEEDFORWARD NEURAL NETWORKS

Publication number: 20200357392

Abstract: A framework ranks multiple hypotheses generated by one or more ASR engines for each input speech utterance. The framework jointly implements ASR improvement and NLU. It makes use of NLU related knowledge to facilitate the ranking of competing hypotheses, and outputs the top-ranked hypothesis as the improved ASR result together with the NLU results of the speech utterance. The NLU results include intent detection results and the slot filling results.

Type: Application

Filed: May 5, 2020

Publication date: November 12, 2020

Inventors: Zhengyu ZHOU, Xuchen SONG
VOICE CONTROL METHOD AND VOICE CONTROL SYSTEM FOR IN-VEHICLE DEVICE

Publication number: 20200357393

Abstract: A voice control method for an in-vehicle device includes receiving an audio signal by an information capturing device, transmitting the audio signal to a base of the information capturing device by the information capturing device, performing voice recognition on the audio signal by the base to generate at least one context instruction, transmitting the at least one context instruction to a host of an in-vehicle device by the base, and correspondingly controlling an operation of at least one function module of the in-vehicle device according to the at least one context instruction to perform at least one context operation by the host.

Type: Application

Filed: May 7, 2019

Publication date: November 12, 2020

Inventor: MING-ZONG WU
SYSTEM AND METHOD FOR PROVIDING INFORMATION BASED ON SPEECH RECOGNITION

Publication number: 20200357394

Abstract: A system and a method for providing information based on speech recognition are provided. The system for providing information based on speech recognition includes a vehicle, and a server that provides an automatic wake-up context of a speech recognition function based on driving environment information and vehicle information of the vehicle, receives speech information associated with the automatic wake-up context from the vehicle, and generates service information through processing of the received speech information to provide the service information to another vehicle, wherein the vehicle automatically obtains the speech information by using the speech recognition when the automatic wake-up context is uttered, and transmits the speech information to the server.

Type: Application

Filed: August 2, 2019

Publication date: November 12, 2020

Applicants: HYUNDAI MOTOR COMPANY, KIA MOTORS CORPORATION

Inventors: Jang Won Choi, Tae Hyun Sung
PERFORMING SUBTASK(S) FOR A PREDICTED ACTION IN RESPONSE TO A SEPARATE USER INTERACTION WITH AN AUTOMATED ASSISTANT PRIOR TO PERFORMANCE OF THE PREDICTED ACTION

Publication number: 20200357395

Abstract: Implementations herein relate to pre-caching data, corresponding to predicted interactions between a user and an automated assistant, using data characterizing previous interactions between the user and the automated assistant. An interaction can be predicted based on details of a current interaction between the user and an automated assistant. One or more predicted interactions can be initialized, and/or any corresponding data pre-cached, prior to the user commanding the automated assistant in furtherance of the predicted interaction. Interaction predictions can be generated using a user-parameterized machine learning model, which can be used when processing input(s) that characterize a recent user interaction with the automated assistant.

Type: Application

Filed: May 31, 2019

Publication date: November 12, 2020

Inventors: Lucas Mirelmann, Zaheed Sabur, Bohdan Vlasyuk, Marie Patriarche Bledowski, Sergey Nazarov, Denis Burakov, Behshad Behzadi, Michael Golikov, Steve Cheng, Daniel Cotting, Mario Bertschler
AIR PURIFIER

Publication number: 20200357396

Abstract: An uttered utterance text that is a high priority text to be reported to a user is uttered reliably. The air purifier (10) includes an utterance text extraction unit (11a) that extracts an utterance text associated with an apparatus state and an utterance control unit (11b) that causes the utterance text extracted by the utterance text extraction unit (11a) to be uttered in descending order of priorities assigned to categories including the utterance text according to one of the priorities that is assigned to one of the categories that includes the utterance text.

Type: Application

Filed: January 10, 2018

Publication date: November 12, 2020

Inventors: TAKAHIDE FUJII, SHINTARO NOMURA, DAISUKE MORIUCHI
SPEECH SKILL CREATING METHOD AND SYSTEM

Publication number: 20200357397

Abstract: The present disclosure provides a speech skill creating method and system, wherein the method comprises: providing a speech skill creating interface in response to a developer's speech skill creating instruction; obtaining basic information and content configuration of the speech skill through the speech skill creating interface; in response to the developer's online publication instruction, adding a corresponding speech interaction capability for the content of the speech skill, and creating and publishing the speech skill. It is possible to, by employing the solutions of the present disclosure, complete the creation of the speech skill without performing any programming, and improve the development efficiency of the speech skill.

Type: Application

Filed: December 12, 2019

Publication date: November 12, 2020

Inventor: Yaowen QI
INFORMATION COLLECTION SYSTEM

Publication number: 20200357398

Abstract: An information collection system is configured to collect information from information providers. The information collection system includes a first communication terminal to be owned by one of the information providers, and a server configured to perform communication with the first communication terminal. The first communication terminal includes an input interface to which speech information is input and a first transmitter configured to transmit the speech information input through the input surface. The server includes a converter configured to execute a speech recognition to convert the speech information transmitted from the first transmitter into text information, and a storage device configured to store the text information.

Type: Application

Filed: May 8, 2020

Publication date: November 12, 2020

Inventor: Nana Sakamoto
COMMUNICATING ANNOUNCEMENTS

Publication number: 20200357399

Abstract: Techniques for synchronizing communication across devices are described. A system receives an input command corresponding to an announcement and sends data representing the announcement to devices of the system. The system receives responses from the devices and causes the device that originated the announcement to output content corresponding to the responses.

Type: Application

Filed: May 20, 2020

Publication date: November 12, 2020

Inventors: Christo Frank Devaraj, Farookh Mohammed, James Alexander Stanton, Brandon Taylor, Peter Chin, Mahesh Rajagopalan
DETECTING AND SUPPRESSING VOICE QUERIES

Publication number: 20200357400

Abstract: A computing system receives requests from client devices to process voice queries that have been detected in local environments of the client devices. The system identifies that a value that is based on a number of requests to process voice queries received by the system during a specified time interval satisfies one or more criteria. In response, the system triggers analysis of at least some of the requests received during the specified time interval to trigger analysis of at least some received requests to determine a set of requests that each identify a common voice query. The system can generate an electronic fingerprint that indicates a distinctive model of the common voice query. The fingerprint can then be used to detect an illegitimate voice query identified in a request from a client device at a later time.

Type: Application

Filed: May 27, 2020

Publication date: November 12, 2020

Applicant: Google LLC

Inventors: Alexander H. Gruenstein, Aleksander Kacun, Matthew Sharifi
QUERY ENDPOINTING BASED ON LIP DETECTION

Publication number: 20200357401

Abstract: Systems and methods are described for improving endpoint detection of a voice query submitted by a user. In some implementations, a synchronized video data and audio data is received. A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.

Type: Application

Filed: July 23, 2020

Publication date: November 12, 2020

Inventors: Chanwoo Kim, Rajeev Nongpiur, Michiel Bacchiani
MULTIMODAL TRANSMISSION OF PACKETIZED DATA

Publication number: 20200357402

Abstract: A system of multi-modal transmission of packetized data in a voice activated data packet based computer network environment is provided. A natural language processor component can parse an input audio signal to identify a request and a trigger keyword. Based on the input audio signal, a direct action application programming interface can generate a first action data structure, and a content selector component can select a content item. An interface management component can identify first and second candidate interfaces, and respective resource utilization values. The interface management component can select, based on the resource utilization values, the first candidate interface to present the content item. The interface management component can provide the first action data structure to the client computing device for rendering as audio output, and can transmit the content item converted for a first modality to deliver the content item for rendering from the selected interface.

Type: Application

Filed: July 23, 2020

Publication date: November 12, 2020

Inventors: Gaurav Bhaya, Robert Stets, Umesh Patil
CONVERSATIONAL AGENT RESPONSE DETERMINED USING A SENTIMENT

Publication number: 20200357403

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.

Type: Application

Filed: July 27, 2020

Publication date: November 12, 2020

Inventors: Johnny Chen, Thomas L. Dean, Qiangfeng Peter Lau, Sudeep Gandhe, Gabriel Schine
DYNAMIC SEQUENCE-BASED ADJUSTMENT OF PROMPT GENERATION

Publication number: 20200357404

Abstract: Systems and methods for dynamic sequence-based adjustment of prompt generation are provided. The system can receive a first interaction and a second interaction via a client device and identify a first sequence based on the first interaction and the second interaction. The system can map the first sequence to a node data structure and identify a node in the node data structure that matches the first sequence. The system can generate an adjusted parameter for a first digital component object responsive to a match with an attribute of the node in the node data structure. The system can execute a real-time digital component selection process among a plurality of digital component objects including the first digital component object to select the first digital component object. The system can transmit a prompt with the first digital component object to a client device to cause the client device to present the prompt.

Type: Application

Filed: July 30, 2020

Publication date: November 12, 2020

Inventors: Justin Lewis, Thomas Price
INTERACTIVE SYSTEM

Publication number: 20200357405

Abstract: A server is an interactive system that performs the interaction by performing a reverse question with respect to an input by the user and providing response content. An input acquisition unit and an answer generation unit constitute an interaction execution unit that repeatedly performs the interaction until a question sentence and an answer, which are the response content, satisfy a prescribed condition. Further, the stoppage determination execution unit performs control for stopping the interaction performed by the input acquisition unit and the answer generation unit based on the interaction state by the user or the other user. In a case where the interaction is stopped, the output unit provides the question sentence and the answer thereof at the time of stoppage to the communication terminal.

Type: Application

Filed: December 27, 2018

Publication date: November 12, 2020

Applicant: NTT DOCOMO, INC.

Inventors: Takanori HASHIMOTO, Hiroshi FUJIMOTO, Yuriko OZAKI
SPOKEN NOTIFICATIONS

Publication number: 20200357406

Abstract: An example method includes, at an electronic device: receiving an indication of a notification; in accordance with receiving the indication of the notification: obtaining one or more data streams from one or more sensors; determining, based on the one or more data streams, whether a user associated with the electronic device is speaking; and in accordance with a determination that the user is not speaking: causing an output associated with the notification to be provided.

Type: Application

Filed: August 19, 2019

Publication date: November 12, 2020

Inventors: William M. YORK, Rebecca P. FISH, Gagan A. GUPTA, Xinyuan HUANG, Heriberto NIETO, Benjamin S. PHIPPS, Kurt PIERSOL
SPEECH RECOGNITION WITH IMAGE SIGNAL

Publication number: 20200357407

Abstract: A method of speech recognition and person identification based thereon, comprising: recording speech from a speech signal using a microphone; illuminating a speaking mouth; recording a degree of light reflected by the mouth from a reflection signal using a sensor; and recording combined parameters of the speech signal and of the reflection signal, and coupling them to letters associated therewith, per predetermined time duration; comparing a combination occurring in speech of parameters of the speech signal and of the reflection signal to the recorded combined parameters of the speech signal and of the reflection signal which are coupled to letters; and deciding on the basis of the comparison to which letter the combination occurring in the speech of parameters of the speech signal and of the reflection signal corresponds, using block-width modulation of the reflection signal.

Type: Application

Filed: January 25, 2019

Publication date: November 12, 2020

Applicant: IEBM B.V.

Inventors: Olaf Petrus Quirinus MOSSINKOFF, Johannes Leonardus Jozef MEIJER
TRANSCRIPTION SUMMARY PRESENTATION

Publication number: 20200357408

Abstract: A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.

Type: Application

Filed: May 10, 2019

Publication date: November 12, 2020

Inventors: Scott Boekweg, David Thomson
INTERPRETING SPOKEN REQUESTS

Publication number: 20200357409

Abstract: In an exemplary process for interpreting spoken requests, audio input containing a user utterance is received. In accordance with a determination that a text representation of the user utterance does not exactly match any of a plurality of user-defined invocation phrases, the process determines whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies one or more rule-based conditions. In accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, the text representation and the user-defined invocation phrase is processed using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase.

Type: Application

Filed: August 28, 2019

Publication date: November 12, 2020

Inventors: Qiwei SUN, Gagan ANEJA, Xinjie DI, William Pui Lum LI, Deepak MURALIDHARAN
Contextualization of Voice Inputs

Publication number: 20200357410

Abstract: Disclosed herein are example techniques to provide contextual information corresponding to a voice command. An example implementation may involve receiving voice data indicating a voice command, receiving contextual information indicating a characteristic of the voice command, and determining a device operation corresponding to the voice command. Determining the device operation corresponding to the voice command may include identifying, among multiple zones of a media playback system, a zone that corresponds to the characteristic of the voice command, and determining that the voice command corresponds to one or more particular devices that are associated with the identified zone. The example implementation may further involve causing the one or more particular devices to perform the device operation.

Type: Application

Filed: March 16, 2020

Publication date: November 12, 2020

Inventors: Jonathan P. Lang, Romi Kadri, Christopher Butts
VOICE ACTION DISCOVERABILITY SYSTEM

Publication number: 20200357411

Abstract: Methods, systems, and apparatus for receiving, by a voice action system, data specifying trigger terms that trigger an application to perform a voice action and a context that specifies a status of the application when the voice action can be triggered. The voice action system receives data defining a discoverability example for the voice action that comprises one or more of the trigger terms that trigger the application to perform the voice action when a status of the application satisfies the specified context. The voice action system receives a request for discoverability examples for the application from a user device having the application installed, and provides the data defining the discoverability examples to the user device in response to the request. The user device is configured to provide a notification of the one or more of the trigger terms when a status of the application satisfies the specified context.

Type: Application

Filed: July 23, 2020

Publication date: November 12, 2020

Inventors: Bo Wang, Sunil Vemuri, Barnaby John James, Pravir Kumar Gupta, Nitin Mangesh Shetti
WORD CORRECTION USING AUTOMATIC SPEECH RECOGNITION (ASR) INCREMENTAL RESPONSE

Publication number: 20200357412

Abstract: An exemplary automatic speech recognition (ASR) system may receive an audio input including a segment of speech. The segment of speech may be independently processed by general ASR and domain-specific ASR to generate multiple ASR results. A selection between the multiple ASR results may be performed based on respective confidence levels for the general ASR and domain-specific ASR. As incremental ASR is performed, a composite result may be generated based on general ASR and domain-specific ASR.

Type: Application

Filed: May 9, 2019

Publication date: November 12, 2020

Inventor: Jeffry Copps Robert Jose
SPEECH RECOGNITION WITH PARALLEL RECOGNITION TASKS

Publication number: 20200357413

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Type: Application

Filed: May 27, 2020

Publication date: November 12, 2020

Applicant: Google LLC

Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
DISPLAY APPARATUS AND METHOD FOR CONTROLLING THEREOF

Publication number: 20200357414

Abstract: A display apparatus and control method thereof for based on a user input for user voice registration of a user of the display apparatus being received, obtain one or more of information on a surrounding environment of the display apparatus and information on the user, obtain an utterance sentence based on the obtained information, control the display to display the obtained utterance sentence, and based on an utterance voice of the user corresponding to the displayed utterance sentence being received, obtain voice information of the user based on the input utterance voice, and store by matching the voice information to the authenticated user account of the user, the voice information in the memory.

Type: Application

Filed: May 7, 2020

Publication date: November 12, 2020

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Junyong PARK, Dahye SHIM, Youngah LEE, Sungdo SON, Seokho BAN
MAGNETIC DETECTION OF REPLAY ATTACK

Publication number: 20200357415

Abstract: A method of detecting a replay attack on a voice biometrics system comprises: receiving an audio signal representing speech; detecting a magnetic field; determining if there is a correlation between the audio signal and the magnetic field; and if there is a correlation between the audio signal and the magnetic field, determining that the audio signal may result from a replay attack.

Type: Application

Filed: July 27, 2020

Publication date: November 12, 2020

Applicant: Cirrus Logic International Semiconductor Ltd.

Inventors: César ALONSO, John Paul LESSO
AUDIO ENCODING DEVICE, METHOD AND PROGRAM, AND AUDIO DECODING DEVICE, METHOD AND PROGRAM

Publication number: 20200357416

Abstract: An audio packet error concealment system includes an encoding unit for encoding an audio signal consisting of a plurality of frames, and an auxiliary information encoding unit for estimating and encoding auxiliary information about a temporal change of power of the audio signal. The auxiliary information is used in packet loss concealment in decoding of the audio signal. The auxiliary information about the temporal change of power may contain a parameter that functionally approximates a plurality of powers of subframes shorter than one frame, or may contain information about a vector obtained by vector quantization of a plurality of powers of subframes shorter than one frame.

Type: Application

Filed: July 23, 2020

Publication date: November 12, 2020

Applicant: NTT DOCOMO, INC.

Inventors: Kimitaka Tsutsumi, Kei Kikuiri
ENCODER AND ENCODING METHOD

Publication number: 20200357417

Abstract: In an encoder (100), a signal analysis unit (101) performs signal analysis on an L channel signal and an R channel signal that constitute a stereo signal and generates a parameter used to determine a coding mode for each of an L channel and an R channel. A DMA stereo encoding unit (104) encodes the L channel signal and the R channel signal by using a coding mode common to the L channel signal and the R channel signal. At this time, the DMA stereo encoding unit (104) determines the common coding mode by selecting, out of the L channel and the R channel, the one that has a lower ratio of energy of an environmental sound component to the entire energy of the channel and using the parameter of the selected channel.

Type: Application

Filed: August 31, 2018

Publication date: November 12, 2020

Inventors: SRIKANTH NAGISETTY, HIROYUKI EHARA
Apparatus and Method for Stereo Filling in Multichannel Coding

Publication number: 20200357418

Abstract: An apparatus for decoding an encoded multichannel signal of a current frame to obtain three or more current audio output channels is provided. A multichannel processor is adapted to select two decoded channels from three or more decoded channels depending on first multichannel parameters. Moreover, the multichannel processor is adapted to generate a first group of two or more processed channels based on the selected channels. A noise filling module is adapted to identify for at least one of the selected channels, one or more frequency bands, within which all spectral lines are quantized to zero, and to generate a mixing channel using, depending on side information, a proper subset of three or more previous audio output channels that have been decoded, and to fill the spectral lines of frequency bands, within which all spectral lines are quantized to zero, with noise generated using spectral lines of the mixing channel.

Type: Application

Filed: July 1, 2020

Publication date: November 12, 2020

Inventors: Sascha DICK, Christian HELMRICH, Nikolaus RETTELBACH, Florian SCHUH, Richard FUEG, Frederik NAGEL
SYSTEMS AND METHODS TO IMPROVE TIMESTAMP TRANSITION RESOLUTION

Publication number: 20200357419

Abstract: Systems, methods, apparatus, and articles of manufacture to improve timestamp transition resolution of watermarks are disclosed.

Type: Application

Filed: July 30, 2020

Publication date: November 12, 2020

Inventors: Ken Joseph FRETT, Vladimir KUZNETSOV, David GISH, Sadhana GUPTA
AUDIO DECODER AND DECODING METHOD

Publication number: 20200357420

Abstract: A method for representing a second presentation of audio channels or objects as a data stream, the method comprising the steps of: (a) providing a set of base signals, the base signals representing a first presentation of the audio channels or objects; (b) providing a set of transformation parameters, the transformation parameters intended to transform the first presentation into the second presentation; the transformation parameters further being specified for at least two frequency bands and including a set of multi-tap convolution matrix parameters for at least one of the frequency bands.

Type: Application

Filed: May 26, 2020

Publication date: November 12, 2020

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventors: Dirk Jeroen BREEBAART, David Matthew COOPER, Leif Jonas SAMUELSSON
AUDIO SCENE ENCODER, AUDIO SCENE DECODER AND RELATED METHODS USING HYBRID ENCODER-DECODER SPATIAL ANALYSIS

Publication number: 20200357421

Abstract: An audio scene encoder for encoding an audio scene, the audio scene having at least two component signals, has: a core encoder for core encoding the at least two component signals, wherein the core encoder is configured to generate a first encoded representation for a first portion of the at least two component signals, and to generate a second encoded representation for a second portion of the at least two component signals, a spatial analyzer for analyzing the audio scene to derive one or more spatial parameters or one or more spatial parameter sets for the second portion; and an output interface for forming the encoded audio scene signal, the encoded audio scene signal having the first encoded representation, the second encoded representation, and the one or more spatial parameters or one or more spatial parameter sets for the second portion.

Type: Application

Filed: July 30, 2020

Publication date: November 12, 2020

Inventors: Guillaume FUCHS, Stefan BAYER, Markus MULTRUS, Oliver THIERGART, Alexandre BOUTHÉON, Jürgen HERRE, Florin GHIDO, Wolfgang JAEGERS, Fabian KÜCH
DECODING OF ENCODED AUDIO BITSTREAM WITH METADATA CONTAINER LOCATED IN RESERVED DATA SPACE

Publication number: 20200357422

Abstract: Apparatus and methods for generating an encoded audio bitstream, including by including program loudness metadata and audio data in the bitstream, and optionally also program boundary metadata in at least one segment (e.g., frame) of the bitstream. Other aspects are apparatus and methods for decoding such a bitstream, e.g., including by performing adaptive loudness processing of the audio data of an audio program indicated by the bitstream, or authentication and/or validation of metadata and/or audio data of such an audio program. Another aspect is an audio processing unit (e.g., an encoder, decoder, or post-processor) configured (e.g., programmed) to perform any embodiment of the method or which includes a buffer memory which stores at least one frame of an audio bitstream generated in accordance with any embodiment of the method.

Type: Application

Filed: May 28, 2020

Publication date: November 12, 2020

Applicant: Dolby Laboratories Licensing Corporation

Inventors: Michael GRANT, Scott Gregory NORCROSS, Jeffrey RIEDMILLER, Michael WARD
NOISE REDUCTION IN ROBOT HUMAN COMMUNICATION

Publication number: 20200357423

Abstract: Noise reduction in a robot system includes the use of a gesture library that pairs noise profiles with gestures that can be performed by the robot. A gesture to be performed by the robot is obtained, and the robot performs the gesture. The robot's performance of the gesture creates noise, and when a user speaks to the robot while the robot performs a gesture, incoming audio includes both user audio and robot noise. A noise profile associated with the gesture is retrieved from the gesture library and is applied to remove the robot noise from the incoming audio.

Type: Application

Filed: May 8, 2019

Publication date: November 12, 2020

Applicant: Microsoft T echnotogy Licensing, LLC

Inventors: Katsushi IKEUCHI, Masaaki FUKUMOTO, Johnny H. LEE, Jordan Lee KRAVITZ, David William BAUMERT
METHODS AND APPARATUS TO REDUCE NOISE FROM HARMONIC NOISE SOURCES

Publication number: 20200357424

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to reduce noise from harmonic noise sources. An example apparatus includes a contour tracer to determine a first point of comparatively large amplitude of a frequency component in a frequency spectrum of an audio sample, determine a set of points in the frequency spectrum having amplitude values within an amplitude threshold of the first point, frequency values within a frequency threshold of the first point, and phase values within a phase threshold of the first point, increment a counter when a distance between (1) a second point in the set of points and (2) the first point satisfies a distance threshold, and when the counter satisfies a counter threshold, generate the contour trace, the contour trace including the set of points, and a subtractor to remove the contour trace from the audio sample when the amplitude values satisfy an outlier threshold.

Type: Application

Filed: July 27, 2020

Publication date: November 12, 2020

Inventor: Matthew McCallum
TRANSFORMER WITH GUASSIAN WEIGHTED SELF-ATTENTION FOR SPEECH ENHANCEMENT

Publication number: 20200357425

Abstract: A method and system for providing Gaussian weighted self-attention for speech enhancement are herein provided. According to one embodiment, the method includes receiving a input noise signal, generating a score matrix based on the received input noise signal, and applying a Gaussian weighted function to the generated score matrix.

Type: Application

Filed: October 2, 2019

Publication date: November 12, 2020

Inventors: Jaeyoung KIM, Mostafa EL-KHAMY, Jungwon LEE
DETECTION OF LASER-BASED AUDIO INJECTION ATTACKS USING CHANNEL CROSS CORRELATION

Publication number: 20200357426

Abstract: Techniques are provided for detection of laser-based audio injection attacks. A methodology implementing the techniques according to an embodiment includes calculating cross correlations between signals received from microphones of an array of two or more microphones. The method also includes identifying time delays associated with peaks of the cross correlations, and magnitudes associated with the peaks of the cross correlations. The method further includes calculating a time alignment metric based on the time delays and calculating a similarity metric based on the magnitudes. The method further includes generating a first attack indicator based on a comparison of the time alignment metric to a first threshold and generating a second attack indicator based on a comparison of the similarity metric to a second threshold. The method further includes providing warning of a laser-based audio attack based on the first attack indicator and/or the second attack indicator.

Type: Application

Filed: July 28, 2020

Publication date: November 12, 2020

Applicant: Intel Corportation

Inventors: Pawel Trella, Przemyslaw Maziewski, Jan Banas

prev … 103 104 105 106 107 108 109 110 111 … next