Patents Issued in November 12, 2020
-
Publication number: 20200357377Abstract: A active noise cancellation (ANC) system may include an adaptive filter divergence detector for detecting divergence of the one or more controllable filters as they adapt, based on various temporal or frequency domain amplitude characteristics. Upon detection of a controllable filter divergence, the ANC system may be deactivated, or certain speakers may be muted. Alternatively, the ANC system may modify the diverged controllable filters to restore proper operation of the noise cancelling system. This may include adjusting a leakage value of an adaptive filter controller.Type: ApplicationFiled: May 7, 2019Publication date: November 12, 2020Inventors: Kevin J. BASTYR, David TRUMPY, Shiyu CHEN
-
Publication number: 20200357378Abstract: An active noise cancellation (ANC) system may include an adaptive filter divergence detector for detecting divergence of the one or more controllable filters as they adapt, based on dynamically adapted thresholds. Upon detection of a controllable filter divergence, the ANC system may be deactivated, or certain speakers may be muted. Alternatively, the ANC system may modify the diverged controllable filters to restore proper operation of the noise cancelling system.Type: ApplicationFiled: April 27, 2020Publication date: November 12, 2020Inventors: Kevin J. BASTYR, James MAY, David TRUMPY
-
Publication number: 20200357379Abstract: In a method for transmit beamforming of a two-dimensional array of ultrasonic transducers, a beamforming pattern to apply to a beamforming space of the two-dimensional array of ultrasonic transducers is defined. The beamforming space includes a plurality of elements, where each element of the beamforming space corresponds to an ultrasonic transducer of the two-dimensional array of ultrasonic transducers, where the beamforming pattern identifies which ultrasonic transducers within the beamforming space are activated during a transmit operation of the two-dimensional array of ultrasonic transducers, and wherein at least some of the ultrasonic transducers that are activated are phase delayed with respect to other ultrasonic transducers that are activated. The beamforming pattern is applied to the two-dimensional array of ultrasonic transducers. A transmit operation is performed by activating the ultrasonic transducers of the beamforming space according to the beamforming pattern.Type: ApplicationFiled: July 6, 2020Publication date: November 12, 2020Applicant: InvenSense, Inc.Inventors: Bruno W. GARLEPP, James Christian SALVIA, Yang PAN, Michael H. PERROTT
-
Publication number: 20200357380Abstract: An electronic effects device comprising: an input circuit for receiving an input audio signal; a gas discharge tube in communication with the input circuit; wherein the input circuit comprises a transducer for converting the input signal into a signal suitable for producing a discharge in the gas discharge tube; an output circuit in communication with the gas discharge tube for converting the gas discharge into an output signal. A corresponding method for producing electronic effects for musical instruments is also described.Type: ApplicationFiled: January 4, 2019Publication date: November 12, 2020Applicant: Gamechanger Audio SIAInventors: Ilja Krumins, Martins Melkis, Kristaps Kalva
-
Publication number: 20200357381Abstract: A speech synthesis device of an embodiment includes a memory unit, a creating unit, a deciding unit, a generating unit and a waveform generating unit. The memory unit stores, as statistical model information of a statistical model, an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution. The creating unit creates a statistical model sequence from context information and the statistical model information. The deciding unit decides a pitch-cycle waveform count of each state using a duration based on the duration distribution of each state of each statistical model in the statistical model sequence, and pitch information based on the output distribution of the pitch feature parameters. The generating unit generates an output distribution sequence based on the pitch-cycle waveform count, and acoustic feature parameters based on the output distribution sequence.Type: ApplicationFiled: July 29, 2020Publication date: November 12, 2020Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune TAMURA, Masahiro MORITA
-
Publication number: 20200357382Abstract: The display of digital media content includes graphical user interfaces and predefined data fields that limit interaction between a person and a computing system. An oral communication device and a data enablement platform are provided for ingesting oral conversational data from people, and using machine learning to provide intelligence. At the front end, an oral conversational bot, or chatbot, interacts with a user. The chatbot is specific to a customized digital magazine, which both evolve over time to a user for a given topic. On the backend, the data enablement platform has a computing architecture that ingests data from various external data sources as well as data from internal applications and databases. These data and algorithms are applied to surface new data, identify trends, provide recommendations, infer new understanding, predict actions and events, and automatically act on this computed information. The chatbot then reads out the content to the user.Type: ApplicationFiled: August 10, 2018Publication date: November 12, 2020Inventors: STUART OGAWA, LINDSAY ALEXANDER SPARKS, KOICHI NISHIMURA, WILFRED P. SO
-
Publication number: 20200357383Abstract: A speech synthesis method and a speech synthesis apparatus to synthesize speeches of different emotional intensities in the field of artificial intelligence, where the method includes obtaining a target emotional type and a target emotional intensity parameter that correspond to an input text, determining a corresponding target emotional acoustic model based on the target emotional type and the target emotional intensity parameter, inputting a text feature of the input text into the target emotional acoustic model to obtain an acoustic feature of the input text, and synthesizing a target emotional speech based on the acoustic feature of the input text.Type: ApplicationFiled: July 31, 2020Publication date: November 12, 2020Inventors: Liqun Deng, Yuezhi Hu, Zhanlei Yang, Wenhua Sun
-
Publication number: 20200357384Abstract: A model training method and apparatus is disclosed, where the model training method acquires first output data of a student model for first input data and second output data of a teacher model for second input data and trains the student model such that the first output data and the second output data are not distinguished from each other. The student model and the teacher model have different structures.Type: ApplicationFiled: August 23, 2019Publication date: November 12, 2020Applicant: Samsung Electronics Co., Ltd.Inventors: Hogyeong KIM, Hyohyeong KANG, Hwidong NA, Hoshik LEE
-
Publication number: 20200357385Abstract: To more smoothly communicate with a user. Provided is an information processing apparatus including an output control unit configured to control information presentation to a user, in which, in a case where a specific expression in a group including the user is applicable regarding content of information to be presented, the output control unit causes the information presentation including the specific expression to be executed using at least one of a sound or an image. Furthermore, provided is an information processing apparatus including a learning unit configured to learn a recognition target and a linguistic expression regarding the recognition target in association with each other, in which the linguistic expression includes a specific expression in a group including a user, and the learning unit learns the specific expression on the basis of at least one of a collected sound or a collected image.Type: ApplicationFiled: October 29, 2018Publication date: November 12, 2020Inventor: MARI SAITO
-
Publication number: 20200357386Abstract: A method for detecting a keyword, applied to a terminal, includes: extracting a speech eigenvector of a speech signal; obtaining, according to the speech eigenvector, a posterior probability of each target character being a key character in any keyword in an acquisition time period of the speech signal; obtaining confidences of at least two target character combinations according to the posterior probability of each target character; and determining that the speech signal includes the keyword upon determining that all the confidences of the at least two target character combinations meet a preset condition. The target character is a character in the speech signal whose pronunciation matches a pronunciation of the key character. Each target character combination includes at least one target character, and a confidence of a target character combination represents a probability of the target character combination being the keyword or a part of the keyword.Type: ApplicationFiled: July 20, 2020Publication date: November 12, 2020Inventors: Yi GAO, Meng YU, Dan SU, Jie CHEN, Min LUO
-
Publication number: 20200357387Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a first attention module, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.Type: ApplicationFiled: March 31, 2020Publication date: November 12, 2020Applicant: Google LLCInventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
-
Publication number: 20200357388Abstract: A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.Type: ApplicationFiled: March 24, 2020Publication date: November 12, 2020Applicant: Google LLCInventors: Ding Zhao, Bo Li, Ruoming Pang, Tara N. Sainath, David Rybach, Deepti Bhatia, Zelin Wu
-
Publication number: 20200357389Abstract: A data processing method based on simultaneous interpretation, applied to a server in a simultaneous interpretation system, including: obtaining audio transmitted by a simultaneous interpretation device; processing the audio by using a simultaneous interpretation model to obtain an initial text; transmitting the initial text to a user terminal; receiving a modified text fed back by the user terminal, the modified text being obtained after the user terminal modifies the initial text; and updating the simultaneous interpretation model according to the initial text and the modified text.Type: ApplicationFiled: July 28, 2020Publication date: November 12, 2020Inventors: Jingliang BAI, Caisheng OUYANG, Haikang LIU, Lianwu CHEN, Qi CHEN, Yulu ZHANG, Min LUO, Dan SU
-
Publication number: 20200357390Abstract: Methods, systems, and related products for voice-enabled computer systems are described. A machine-learning model is trained to produce pronunciation output based on text input. The trained machine-learning model is used to produce pronunciation data for text input even where the text input includes numbers, punctuation, emoji, or other non-letter characters. The machine-learning model is further trained based on real-world data from users to improve pronunciation output.Type: ApplicationFiled: May 10, 2019Publication date: November 12, 2020Inventor: Daniel Bromand
-
Publication number: 20200357391Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example process includes causing a first recognition result for a received natural language speech input to be displayed, where the first recognition result is in a first language and a second recognition result for the received natural language speech input is available for display responsive to receiving input indicative of user selection of the first recognition result, the second recognition result being in a second language. The example process further includes receiving the input indicative of user selection of the first recognition result and in response to receiving the input indicative of user selection of the first recognition result, causing the second recognition result to be displayed.Type: ApplicationFiled: August 22, 2019Publication date: November 12, 2020Inventors: Arnab GHOSHAL, Roger HSIAO, Gorm AMAND, Patrick L. COFFMAN, Mary YOUNG
-
Publication number: 20200357392Abstract: A framework ranks multiple hypotheses generated by one or more ASR engines for each input speech utterance. The framework jointly implements ASR improvement and NLU. It makes use of NLU related knowledge to facilitate the ranking of competing hypotheses, and outputs the top-ranked hypothesis as the improved ASR result together with the NLU results of the speech utterance. The NLU results include intent detection results and the slot filling results.Type: ApplicationFiled: May 5, 2020Publication date: November 12, 2020Inventors: Zhengyu ZHOU, Xuchen SONG
-
Publication number: 20200357393Abstract: A voice control method for an in-vehicle device includes receiving an audio signal by an information capturing device, transmitting the audio signal to a base of the information capturing device by the information capturing device, performing voice recognition on the audio signal by the base to generate at least one context instruction, transmitting the at least one context instruction to a host of an in-vehicle device by the base, and correspondingly controlling an operation of at least one function module of the in-vehicle device according to the at least one context instruction to perform at least one context operation by the host.Type: ApplicationFiled: May 7, 2019Publication date: November 12, 2020Inventor: MING-ZONG WU
-
Publication number: 20200357394Abstract: A system and a method for providing information based on speech recognition are provided. The system for providing information based on speech recognition includes a vehicle, and a server that provides an automatic wake-up context of a speech recognition function based on driving environment information and vehicle information of the vehicle, receives speech information associated with the automatic wake-up context from the vehicle, and generates service information through processing of the received speech information to provide the service information to another vehicle, wherein the vehicle automatically obtains the speech information by using the speech recognition when the automatic wake-up context is uttered, and transmits the speech information to the server.Type: ApplicationFiled: August 2, 2019Publication date: November 12, 2020Applicants: HYUNDAI MOTOR COMPANY, KIA MOTORS CORPORATIONInventors: Jang Won Choi, Tae Hyun Sung
-
Publication number: 20200357395Abstract: Implementations herein relate to pre-caching data, corresponding to predicted interactions between a user and an automated assistant, using data characterizing previous interactions between the user and the automated assistant. An interaction can be predicted based on details of a current interaction between the user and an automated assistant. One or more predicted interactions can be initialized, and/or any corresponding data pre-cached, prior to the user commanding the automated assistant in furtherance of the predicted interaction. Interaction predictions can be generated using a user-parameterized machine learning model, which can be used when processing input(s) that characterize a recent user interaction with the automated assistant.Type: ApplicationFiled: May 31, 2019Publication date: November 12, 2020Inventors: Lucas Mirelmann, Zaheed Sabur, Bohdan Vlasyuk, Marie Patriarche Bledowski, Sergey Nazarov, Denis Burakov, Behshad Behzadi, Michael Golikov, Steve Cheng, Daniel Cotting, Mario Bertschler
-
Publication number: 20200357396Abstract: An uttered utterance text that is a high priority text to be reported to a user is uttered reliably. The air purifier (10) includes an utterance text extraction unit (11a) that extracts an utterance text associated with an apparatus state and an utterance control unit (11b) that causes the utterance text extracted by the utterance text extraction unit (11a) to be uttered in descending order of priorities assigned to categories including the utterance text according to one of the priorities that is assigned to one of the categories that includes the utterance text.Type: ApplicationFiled: January 10, 2018Publication date: November 12, 2020Inventors: TAKAHIDE FUJII, SHINTARO NOMURA, DAISUKE MORIUCHI
-
Publication number: 20200357397Abstract: The present disclosure provides a speech skill creating method and system, wherein the method comprises: providing a speech skill creating interface in response to a developer's speech skill creating instruction; obtaining basic information and content configuration of the speech skill through the speech skill creating interface; in response to the developer's online publication instruction, adding a corresponding speech interaction capability for the content of the speech skill, and creating and publishing the speech skill. It is possible to, by employing the solutions of the present disclosure, complete the creation of the speech skill without performing any programming, and improve the development efficiency of the speech skill.Type: ApplicationFiled: December 12, 2019Publication date: November 12, 2020Inventor: Yaowen QI
-
Publication number: 20200357398Abstract: An information collection system is configured to collect information from information providers. The information collection system includes a first communication terminal to be owned by one of the information providers, and a server configured to perform communication with the first communication terminal. The first communication terminal includes an input interface to which speech information is input and a first transmitter configured to transmit the speech information input through the input surface. The server includes a converter configured to execute a speech recognition to convert the speech information transmitted from the first transmitter into text information, and a storage device configured to store the text information.Type: ApplicationFiled: May 8, 2020Publication date: November 12, 2020Inventor: Nana Sakamoto
-
Publication number: 20200357399Abstract: Techniques for synchronizing communication across devices are described. A system receives an input command corresponding to an announcement and sends data representing the announcement to devices of the system. The system receives responses from the devices and causes the device that originated the announcement to output content corresponding to the responses.Type: ApplicationFiled: May 20, 2020Publication date: November 12, 2020Inventors: Christo Frank Devaraj, Farookh Mohammed, James Alexander Stanton, Brandon Taylor, Peter Chin, Mahesh Rajagopalan
-
Publication number: 20200357400Abstract: A computing system receives requests from client devices to process voice queries that have been detected in local environments of the client devices. The system identifies that a value that is based on a number of requests to process voice queries received by the system during a specified time interval satisfies one or more criteria. In response, the system triggers analysis of at least some of the requests received during the specified time interval to trigger analysis of at least some received requests to determine a set of requests that each identify a common voice query. The system can generate an electronic fingerprint that indicates a distinctive model of the common voice query. The fingerprint can then be used to detect an illegitimate voice query identified in a request from a client device at a later time.Type: ApplicationFiled: May 27, 2020Publication date: November 12, 2020Applicant: Google LLCInventors: Alexander H. Gruenstein, Aleksander Kacun, Matthew Sharifi
-
Publication number: 20200357401Abstract: Systems and methods are described for improving endpoint detection of a voice query submitted by a user. In some implementations, a synchronized video data and audio data is received. A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.Type: ApplicationFiled: July 23, 2020Publication date: November 12, 2020Inventors: Chanwoo Kim, Rajeev Nongpiur, Michiel Bacchiani
-
Publication number: 20200357402Abstract: A system of multi-modal transmission of packetized data in a voice activated data packet based computer network environment is provided. A natural language processor component can parse an input audio signal to identify a request and a trigger keyword. Based on the input audio signal, a direct action application programming interface can generate a first action data structure, and a content selector component can select a content item. An interface management component can identify first and second candidate interfaces, and respective resource utilization values. The interface management component can select, based on the resource utilization values, the first candidate interface to present the content item. The interface management component can provide the first action data structure to the client computing device for rendering as audio output, and can transmit the content item converted for a first modality to deliver the content item for rendering from the selected interface.Type: ApplicationFiled: July 23, 2020Publication date: November 12, 2020Inventors: Gaurav Bhaya, Robert Stets, Umesh Patil
-
Publication number: 20200357403Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.Type: ApplicationFiled: July 27, 2020Publication date: November 12, 2020Inventors: Johnny Chen, Thomas L. Dean, Qiangfeng Peter Lau, Sudeep Gandhe, Gabriel Schine
-
Publication number: 20200357404Abstract: Systems and methods for dynamic sequence-based adjustment of prompt generation are provided. The system can receive a first interaction and a second interaction via a client device and identify a first sequence based on the first interaction and the second interaction. The system can map the first sequence to a node data structure and identify a node in the node data structure that matches the first sequence. The system can generate an adjusted parameter for a first digital component object responsive to a match with an attribute of the node in the node data structure. The system can execute a real-time digital component selection process among a plurality of digital component objects including the first digital component object to select the first digital component object. The system can transmit a prompt with the first digital component object to a client device to cause the client device to present the prompt.Type: ApplicationFiled: July 30, 2020Publication date: November 12, 2020Inventors: Justin Lewis, Thomas Price
-
Publication number: 20200357405Abstract: A server is an interactive system that performs the interaction by performing a reverse question with respect to an input by the user and providing response content. An input acquisition unit and an answer generation unit constitute an interaction execution unit that repeatedly performs the interaction until a question sentence and an answer, which are the response content, satisfy a prescribed condition. Further, the stoppage determination execution unit performs control for stopping the interaction performed by the input acquisition unit and the answer generation unit based on the interaction state by the user or the other user. In a case where the interaction is stopped, the output unit provides the question sentence and the answer thereof at the time of stoppage to the communication terminal.Type: ApplicationFiled: December 27, 2018Publication date: November 12, 2020Applicant: NTT DOCOMO, INC.Inventors: Takanori HASHIMOTO, Hiroshi FUJIMOTO, Yuriko OZAKI
-
Publication number: 20200357406Abstract: An example method includes, at an electronic device: receiving an indication of a notification; in accordance with receiving the indication of the notification: obtaining one or more data streams from one or more sensors; determining, based on the one or more data streams, whether a user associated with the electronic device is speaking; and in accordance with a determination that the user is not speaking: causing an output associated with the notification to be provided.Type: ApplicationFiled: August 19, 2019Publication date: November 12, 2020Inventors: William M. YORK, Rebecca P. FISH, Gagan A. GUPTA, Xinyuan HUANG, Heriberto NIETO, Benjamin S. PHIPPS, Kurt PIERSOL
-
Publication number: 20200357407Abstract: A method of speech recognition and person identification based thereon, comprising: recording speech from a speech signal using a microphone; illuminating a speaking mouth; recording a degree of light reflected by the mouth from a reflection signal using a sensor; and recording combined parameters of the speech signal and of the reflection signal, and coupling them to letters associated therewith, per predetermined time duration; comparing a combination occurring in speech of parameters of the speech signal and of the reflection signal to the recorded combined parameters of the speech signal and of the reflection signal which are coupled to letters; and deciding on the basis of the comparison to which letter the combination occurring in the speech of parameters of the speech signal and of the reflection signal corresponds, using block-width modulation of the reflection signal.Type: ApplicationFiled: January 25, 2019Publication date: November 12, 2020Applicant: IEBM B.V.Inventors: Olaf Petrus Quirinus MOSSINKOFF, Johannes Leonardus Jozef MEIJER
-
Publication number: 20200357408Abstract: A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.Type: ApplicationFiled: May 10, 2019Publication date: November 12, 2020Inventors: Scott Boekweg, David Thomson
-
Publication number: 20200357409Abstract: In an exemplary process for interpreting spoken requests, audio input containing a user utterance is received. In accordance with a determination that a text representation of the user utterance does not exactly match any of a plurality of user-defined invocation phrases, the process determines whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user-defined invocation phrases satisfies one or more rule-based conditions. In accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, the text representation and the user-defined invocation phrase is processed using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase.Type: ApplicationFiled: August 28, 2019Publication date: November 12, 2020Inventors: Qiwei SUN, Gagan ANEJA, Xinjie DI, William Pui Lum LI, Deepak MURALIDHARAN
-
Publication number: 20200357410Abstract: Disclosed herein are example techniques to provide contextual information corresponding to a voice command. An example implementation may involve receiving voice data indicating a voice command, receiving contextual information indicating a characteristic of the voice command, and determining a device operation corresponding to the voice command. Determining the device operation corresponding to the voice command may include identifying, among multiple zones of a media playback system, a zone that corresponds to the characteristic of the voice command, and determining that the voice command corresponds to one or more particular devices that are associated with the identified zone. The example implementation may further involve causing the one or more particular devices to perform the device operation.Type: ApplicationFiled: March 16, 2020Publication date: November 12, 2020Inventors: Jonathan P. Lang, Romi Kadri, Christopher Butts
-
Publication number: 20200357411Abstract: Methods, systems, and apparatus for receiving, by a voice action system, data specifying trigger terms that trigger an application to perform a voice action and a context that specifies a status of the application when the voice action can be triggered. The voice action system receives data defining a discoverability example for the voice action that comprises one or more of the trigger terms that trigger the application to perform the voice action when a status of the application satisfies the specified context. The voice action system receives a request for discoverability examples for the application from a user device having the application installed, and provides the data defining the discoverability examples to the user device in response to the request. The user device is configured to provide a notification of the one or more of the trigger terms when a status of the application satisfies the specified context.Type: ApplicationFiled: July 23, 2020Publication date: November 12, 2020Inventors: Bo Wang, Sunil Vemuri, Barnaby John James, Pravir Kumar Gupta, Nitin Mangesh Shetti
-
Publication number: 20200357412Abstract: An exemplary automatic speech recognition (ASR) system may receive an audio input including a segment of speech. The segment of speech may be independently processed by general ASR and domain-specific ASR to generate multiple ASR results. A selection between the multiple ASR results may be performed based on respective confidence levels for the general ASR and domain-specific ASR. As incremental ASR is performed, a composite result may be generated based on general ASR and domain-specific ASR.Type: ApplicationFiled: May 9, 2019Publication date: November 12, 2020Inventor: Jeffry Copps Robert Jose
-
Publication number: 20200357413Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.Type: ApplicationFiled: May 27, 2020Publication date: November 12, 2020Applicant: Google LLCInventors: Brian Strope, Francoise Beaufays, Olivier Siohan
-
Publication number: 20200357414Abstract: A display apparatus and control method thereof for based on a user input for user voice registration of a user of the display apparatus being received, obtain one or more of information on a surrounding environment of the display apparatus and information on the user, obtain an utterance sentence based on the obtained information, control the display to display the obtained utterance sentence, and based on an utterance voice of the user corresponding to the displayed utterance sentence being received, obtain voice information of the user based on the input utterance voice, and store by matching the voice information to the authenticated user account of the user, the voice information in the memory.Type: ApplicationFiled: May 7, 2020Publication date: November 12, 2020Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Junyong PARK, Dahye SHIM, Youngah LEE, Sungdo SON, Seokho BAN
-
Publication number: 20200357415Abstract: A method of detecting a replay attack on a voice biometrics system comprises: receiving an audio signal representing speech; detecting a magnetic field; determining if there is a correlation between the audio signal and the magnetic field; and if there is a correlation between the audio signal and the magnetic field, determining that the audio signal may result from a replay attack.Type: ApplicationFiled: July 27, 2020Publication date: November 12, 2020Applicant: Cirrus Logic International Semiconductor Ltd.Inventors: César ALONSO, John Paul LESSO
-
Publication number: 20200357416Abstract: An audio packet error concealment system includes an encoding unit for encoding an audio signal consisting of a plurality of frames, and an auxiliary information encoding unit for estimating and encoding auxiliary information about a temporal change of power of the audio signal. The auxiliary information is used in packet loss concealment in decoding of the audio signal. The auxiliary information about the temporal change of power may contain a parameter that functionally approximates a plurality of powers of subframes shorter than one frame, or may contain information about a vector obtained by vector quantization of a plurality of powers of subframes shorter than one frame.Type: ApplicationFiled: July 23, 2020Publication date: November 12, 2020Applicant: NTT DOCOMO, INC.Inventors: Kimitaka Tsutsumi, Kei Kikuiri
-
Publication number: 20200357417Abstract: In an encoder (100), a signal analysis unit (101) performs signal analysis on an L channel signal and an R channel signal that constitute a stereo signal and generates a parameter used to determine a coding mode for each of an L channel and an R channel. A DMA stereo encoding unit (104) encodes the L channel signal and the R channel signal by using a coding mode common to the L channel signal and the R channel signal. At this time, the DMA stereo encoding unit (104) determines the common coding mode by selecting, out of the L channel and the R channel, the one that has a lower ratio of energy of an environmental sound component to the entire energy of the channel and using the parameter of the selected channel.Type: ApplicationFiled: August 31, 2018Publication date: November 12, 2020Inventors: SRIKANTH NAGISETTY, HIROYUKI EHARA
-
Publication number: 20200357418Abstract: An apparatus for decoding an encoded multichannel signal of a current frame to obtain three or more current audio output channels is provided. A multichannel processor is adapted to select two decoded channels from three or more decoded channels depending on first multichannel parameters. Moreover, the multichannel processor is adapted to generate a first group of two or more processed channels based on the selected channels. A noise filling module is adapted to identify for at least one of the selected channels, one or more frequency bands, within which all spectral lines are quantized to zero, and to generate a mixing channel using, depending on side information, a proper subset of three or more previous audio output channels that have been decoded, and to fill the spectral lines of frequency bands, within which all spectral lines are quantized to zero, with noise generated using spectral lines of the mixing channel.Type: ApplicationFiled: July 1, 2020Publication date: November 12, 2020Inventors: Sascha DICK, Christian HELMRICH, Nikolaus RETTELBACH, Florian SCHUH, Richard FUEG, Frederik NAGEL
-
Publication number: 20200357419Abstract: Systems, methods, apparatus, and articles of manufacture to improve timestamp transition resolution of watermarks are disclosed.Type: ApplicationFiled: July 30, 2020Publication date: November 12, 2020Inventors: Ken Joseph FRETT, Vladimir KUZNETSOV, David GISH, Sadhana GUPTA
-
Publication number: 20200357420Abstract: A method for representing a second presentation of audio channels or objects as a data stream, the method comprising the steps of: (a) providing a set of base signals, the base signals representing a first presentation of the audio channels or objects; (b) providing a set of transformation parameters, the transformation parameters intended to transform the first presentation into the second presentation; the transformation parameters further being specified for at least two frequency bands and including a set of multi-tap convolution matrix parameters for at least one of the frequency bands.Type: ApplicationFiled: May 26, 2020Publication date: November 12, 2020Applicant: DOLBY LABORATORIES LICENSING CORPORATIONInventors: Dirk Jeroen BREEBAART, David Matthew COOPER, Leif Jonas SAMUELSSON
-
Publication number: 20200357421Abstract: An audio scene encoder for encoding an audio scene, the audio scene having at least two component signals, has: a core encoder for core encoding the at least two component signals, wherein the core encoder is configured to generate a first encoded representation for a first portion of the at least two component signals, and to generate a second encoded representation for a second portion of the at least two component signals, a spatial analyzer for analyzing the audio scene to derive one or more spatial parameters or one or more spatial parameter sets for the second portion; and an output interface for forming the encoded audio scene signal, the encoded audio scene signal having the first encoded representation, the second encoded representation, and the one or more spatial parameters or one or more spatial parameter sets for the second portion.Type: ApplicationFiled: July 30, 2020Publication date: November 12, 2020Inventors: Guillaume FUCHS, Stefan BAYER, Markus MULTRUS, Oliver THIERGART, Alexandre BOUTHÉON, Jürgen HERRE, Florin GHIDO, Wolfgang JAEGERS, Fabian KÜCH
-
Publication number: 20200357422Abstract: Apparatus and methods for generating an encoded audio bitstream, including by including program loudness metadata and audio data in the bitstream, and optionally also program boundary metadata in at least one segment (e.g., frame) of the bitstream. Other aspects are apparatus and methods for decoding such a bitstream, e.g., including by performing adaptive loudness processing of the audio data of an audio program indicated by the bitstream, or authentication and/or validation of metadata and/or audio data of such an audio program. Another aspect is an audio processing unit (e.g., an encoder, decoder, or post-processor) configured (e.g., programmed) to perform any embodiment of the method or which includes a buffer memory which stores at least one frame of an audio bitstream generated in accordance with any embodiment of the method.Type: ApplicationFiled: May 28, 2020Publication date: November 12, 2020Applicant: Dolby Laboratories Licensing CorporationInventors: Michael GRANT, Scott Gregory NORCROSS, Jeffrey RIEDMILLER, Michael WARD
-
Publication number: 20200357423Abstract: Noise reduction in a robot system includes the use of a gesture library that pairs noise profiles with gestures that can be performed by the robot. A gesture to be performed by the robot is obtained, and the robot performs the gesture. The robot's performance of the gesture creates noise, and when a user speaks to the robot while the robot performs a gesture, incoming audio includes both user audio and robot noise. A noise profile associated with the gesture is retrieved from the gesture library and is applied to remove the robot noise from the incoming audio.Type: ApplicationFiled: May 8, 2019Publication date: November 12, 2020Applicant: Microsoft T echnotogy Licensing, LLCInventors: Katsushi IKEUCHI, Masaaki FUKUMOTO, Johnny H. LEE, Jordan Lee KRAVITZ, David William BAUMERT
-
Publication number: 20200357424Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to reduce noise from harmonic noise sources. An example apparatus includes a contour tracer to determine a first point of comparatively large amplitude of a frequency component in a frequency spectrum of an audio sample, determine a set of points in the frequency spectrum having amplitude values within an amplitude threshold of the first point, frequency values within a frequency threshold of the first point, and phase values within a phase threshold of the first point, increment a counter when a distance between (1) a second point in the set of points and (2) the first point satisfies a distance threshold, and when the counter satisfies a counter threshold, generate the contour trace, the contour trace including the set of points, and a subtractor to remove the contour trace from the audio sample when the amplitude values satisfy an outlier threshold.Type: ApplicationFiled: July 27, 2020Publication date: November 12, 2020Inventor: Matthew McCallum
-
Publication number: 20200357425Abstract: A method and system for providing Gaussian weighted self-attention for speech enhancement are herein provided. According to one embodiment, the method includes receiving a input noise signal, generating a score matrix based on the received input noise signal, and applying a Gaussian weighted function to the generated score matrix.Type: ApplicationFiled: October 2, 2019Publication date: November 12, 2020Inventors: Jaeyoung KIM, Mostafa EL-KHAMY, Jungwon LEE
-
Publication number: 20200357426Abstract: Techniques are provided for detection of laser-based audio injection attacks. A methodology implementing the techniques according to an embodiment includes calculating cross correlations between signals received from microphones of an array of two or more microphones. The method also includes identifying time delays associated with peaks of the cross correlations, and magnitudes associated with the peaks of the cross correlations. The method further includes calculating a time alignment metric based on the time delays and calculating a similarity metric based on the magnitudes. The method further includes generating a first attack indicator based on a comparison of the time alignment metric to a first threshold and generating a second attack indicator based on a comparison of the similarity metric to a second threshold. The method further includes providing warning of a laser-based audio attack based on the first attack indicator and/or the second attack indicator.Type: ApplicationFiled: July 28, 2020Publication date: November 12, 2020Applicant: Intel CorportationInventors: Pawel Trella, Przemyslaw Maziewski, Jan Banas