Patents Examined by Shaun A Roberts

Estimation device, estimation method and program

Patent number: 11475911

Abstract: In communication performed among multiple participants, at least one of a participant who will start speaking next and a timing thereof is estimated. An estimation apparatus includes a head motion information generation unit that acquires head motion information representing head motions of communication participants in a time segment corresponding to an end time of an utterance segment and synchronization information for head motions between the communication participants, and an estimation unit that estimates at least one of the speaker of the next utterance segment following the utterance segment and the next utterance start timing following the utterance segment based on the head motion information and the synchronization information for the head motions between the communication participants.

Type: Grant

Filed: February 5, 2019

Date of Patent: October 18, 2022

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Ryo Ishii, Ryuichiro Higashinaka, Junji Tomita, Shiro Kumano, Kazuhiro Otsuka
System and method for reducing noise components in a live audio stream

Patent number: 11462229

Abstract: This disclosure relates generally to a system and method to identify a plurality of noises or their combination to suppress them and enhancing the deteriorated input signal in a dynamic manner. It identifies noises in the audio signal and categorizing them based on the trained database of noises. A combination of deep neural network (DNN) and artificial Intelligence (AI) helps the system for self-learning to understand and capture noises in the environment and retain the model to reduce noises from the next attempt. The system suppresses unwanted noise coming from the external environment with the help of AI based algorithms, by understanding, differentiating, and enhancing human voice in a live environment. The system will help in the reduction of unwanted noises and enhance the experience of business and public meetings, video conferences, musical events, speech broadcasts etc. that could cause distractions, disturbances and create barriers in the conversation.

Type: Grant

Filed: March 6, 2020

Date of Patent: October 4, 2022

Assignee: TATA CONSULTANCY SERVICES LIMITED

Inventors: Robin Tommy, Reshmi Ravindranathan, Navin Infant Raj, Venkatakrishna Akula, Jithin Laiju Ravi, Anita Nanadikar, Anil Kumar Sharma, Pranav Champaklal Shah, Bhasha Prasad Khose
Stereo signal encoding method and apparatus using a residual signal encoding parameter

Patent number: 11462224

Abstract: A stereo signal encoding method includes obtaining a residual signal encoding parameter of a current frame of a stereo signal based on downmixed signal energy and residual signal energy of each of M sub-bands of the current frame, where the residual signal encoding parameter indicates whether to encode residual signals of the M sub-bands, determining whether to encode the residual signals based on the residual signal encoding parameter, and encoding the residual signals when it is determined that the residual signals need to be encoded.

Type: Grant

Filed: November 25, 2020

Date of Patent: October 4, 2022

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Bin Wang, Zexin Liu, Haiting Li
Audio-visual speech separation

Patent number: 11456005

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

Type: Grant

Filed: November 21, 2018

Date of Patent: September 27, 2022

Assignee: Google LLC

Inventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
System and method for smart feedback cancellation

Patent number: 11450336

Abstract: A system and method are described for automatic acoustic feedback cancellation in real time. In some implementations, the system may receive audio data describing an audio signal, which the system may use to determine a set of frames of the audio signal. Spectral analysis may be performed on the one or more frames of the audio to detect spectral patterns of two or more frames indicative of acoustic feedback. An additional delay identification test may be performed to identify a consistent delay indicative of acoustic feedback. In some implementations, a state machine is advanced based in part on accumulated delay votes. Decisions can be made to mute the acoustic feedback and cease the muting operation when silence is detected.

Type: Grant

Filed: November 25, 2020

Date of Patent: September 20, 2022

Assignee: DIALPAD, INC.

Inventors: Qian-Yu Tang, Corey Burke
Spoken language understanding

Patent number: 11450310

Abstract: Systems and methods for spoken language understanding are described. Embodiments of the systems and methods receive audio data for a spoken language expression, encode the audio data using a multi-stage encoder comprising a basic encoder and a sequential encoder, wherein the basic encoder is trained to generate character features during a first training phase and the sequential encoder is trained to generate token features during a second training phase, and decode the token features to generate semantic information representing the spoken language expression.

Type: Grant

Filed: August 10, 2020

Date of Patent: September 20, 2022

Assignee: ADOBE INC.

Inventors: Nikita Kapoor, Jaya Dodeja, Nikaash Puri
Scalable multi-service virtual assistant platform using machine learning

Patent number: 11417320

Abstract: The present invention is a masterbot architecture in a scalable multi-service virtual assistant platform that can construct a fluid and dynamic dialogue by assembling responses to end user utterances from two kinds of agents, information agents and action agents. A plurality of information agents obtain at least one information value from a parsed user input and/or contextual data. A plurality of action agents perform one or more actions in response to the parsed user input, the contextual data, and/or the information value. A masterbot arbitrates an activation of the plurality of information agents and the plurality of action agents. The masterbot comprises access to a machine-learning module to select an appropriate action agent, where one or more information agents are activated based on the selected appropriate action agent.

Type: Grant

Filed: January 11, 2022

Date of Patent: August 16, 2022

Assignee: Linc Global, Inc.

Inventors: Fang Cheng, Dennis Wu, Jian Da Chen
Truncateable predictive coding

Patent number: 11417348

Abstract: A method, system, and computer program to encode and decode a channel coherence parameter applied on a frequency band basis, where the coherence parameters of each frequency band form a coherence vector. The coherence vector is encoded and decoded using a predictive scheme followed by a variable bit rate entropy coding.

Type: Grant

Filed: April 5, 2019

Date of Patent: August 16, 2022

Assignee: TELEFONAKTIEBOLAGET LM ERISSON (PUBL)

Inventors: Erik Norvell, Fredrik Jansson
Speech recognition method in edge computing device

Patent number: 11398238

Abstract: Disclosed herein is a speech recognition method in a distributed network environment. A method of performing a speech recognition operation in an edge computing device includes receiving a natural language understanding (NLU) model from the cloud server, storing the received NLU model, receiving voice data spoken by a user from the client device, performing a natural language processing operation on the received voice data using the NLU model, performing speech recognition according to the natural language processing operation, and transmitting a result of the speech recognition to the client device. At least one of the edge computing device, a voice recognition device, and a server may be associated with an artificial intelligence module, a drone (an unmanned aerial vehicle (UAV)), a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to a 5G service, and the like.

Type: Grant

Filed: June 7, 2019

Date of Patent: July 26, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Sungjin Kim, Dongho Kim, Jingyeong Kim, Taehyun Kim
Speech synthesis unit selection

Patent number: 11393450

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting units for speech synthesis. One of the methods includes determining a sequence of text units that each represent a respective portion of text for speech synthesis; and determining multiple paths of speech units that each represent the sequence of text units by selecting a first speech unit that includes speech synthesis data representing a first text unit; selecting multiple second speech units including speech synthesis data representing a second text unit based on (i) a join cost to concatenate the second speech unit with a first speech unit and (ii) a target cost indicating a degree that the second speech unit corresponds to the second text unit; and defining paths from the selected first speech unit to each of the multiple second speech units to include in the multiple paths of speech units.

Type: Grant

Filed: January 11, 2021

Date of Patent: July 19, 2022

Assignee: Google LLC

Inventor: Ioannis Agiomyrgiannakis
Contextual tagging and biasing of grammars inside word lattices

Patent number: 11386889

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription.

Type: Grant

Filed: November 27, 2019

Date of Patent: July 12, 2022

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Leonid Velikovich
Streaming voice conversion method and apparatus and computer readable storage medium using the same

Patent number: 11367456

Abstract: The present disclosure provides a streaming voice conversion method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining to-be-converted voice data; partitioning the to-be-converted voice data in an order of data obtaining time as a plurality of to-be-converted partition voices, where the to-be-converted partition voice data carries a partition mark; performing a voice conversion on each of the to-be-converted partition voices to obtain a converted partition voice, where the converted partition voice carries a partition mark; performing a partition restoration on each of the converted partition voices to obtain a restored partition voice, where the restored partition voice carries a partition mark; and outputting each of the restored partition voices according to the partition mark carried by the restored partition voice. In this manner, the response time is shortened, and the conversion speed is improved.

Type: Grant

Filed: December 3, 2020

Date of Patent: June 21, 2022

Assignee: UBTECH ROBOTICS CORP LTD

Inventors: Jiebin Xie, Ruotong Wang, Dongyan Huang, Zhichao Tang, Yang Liu, Youjun Xiong
Apparatus and method for encoding or decoding directional audio coding parameters using quantization and entropy coding

Patent number: 11367454

Abstract: An apparatus for encoding directional audio coding parameters having diffuseness parameters and direction parameters, has: a parameter quantizer for quantizing the diffuseness parameters and the direction parameters; a parameter encoder for encoding quantized diffuseness parameters and quantized direction parameters; and an output interface for generating an encoded parameter representation having information on encoded diffuseness parameters and encoded direction parameters.

Type: Grant

Filed: May 6, 2020

Date of Patent: June 21, 2022

Assignee: FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.

Inventors: Guillaume Fuchs, Jürgen Herre, Fabian Küch, Stefan Döhla, Markus Multrus, Oliver Thiergart, Oliver Wübbolt, Florin Ghido, Stefan Bayer, Wolfgang Jaegers
Positive/negative facet identification in similar documents to search context

Patent number: 11361030

Abstract: Facet-based search processing is provided which includes receiving a query search context for querying documents of a document set, and retrieving, by similar document search processing, a document subset from the document set. The document subset includes documents of the set most similar to a search document of the query search context. Facet analysis processing is used to generate M candidate facets most-related to the query search context, and facets of the M candidate facets associated with documents of the subset are identified, and classified into a positive facet set and a negative facet set based, at least in part, on extent of facet commonality across the documents. A listing is of the documents in the document subset is provided, with the listing highlighting facets of the positive facet set.

Type: Grant

Filed: November 27, 2019

Date of Patent: June 14, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Daiki Tsuzuku, Tohru Hasegawa, Shunsuke Ishikawa, Keisuke Nitta, Yasumasa Kajinaga, Masaki Komedani
Multi-device control system and method and non-transitory computer-readable medium storing component for executing the same

Patent number: 11361765

Abstract: Disclosed is a multi-device control method including: performing a voice recognition operation on a voice command generated from a sound source; identifying distances between each of the plurality of devices and the sound source; assigning response rankings to the devices by combining a context-specific correction score of each device corresponding to the voice command and the distances; and selecting a device to respond to the voice command from among the devices according to the response rankings.

Type: Grant

Filed: April 19, 2019

Date of Patent: June 14, 2022

Assignee: LG ELECTRONICS INC.

Inventor: Jisoo Park
Method, apparatus, device and computer readable storage medium for recognizing and decoding voice based on streaming attention model

Patent number: 11355113

Abstract: A method, apparatus, device, and computer readable storage medium for recognizing and decoding a voice based on a streaming attention model are provided. The method may include generating a plurality of acoustic paths for decoding the voice using the streaming attention model, and then merging acoustic paths with identical last syllables of the plurality of acoustic paths to obtain a plurality of merged acoustic paths. The method may further include selecting a preset number of acoustic paths from the plurality of merged acoustic paths as retained candidate acoustic paths. Embodiments of the present disclosure present a concept that acoustic score calculating of a current voice fragment is only affected by its last voice fragment and has nothing to do with earlier voice history, and merge acoustic paths with the identical last syllables of the plurality of candidate acoustic paths.

Type: Grant

Filed: March 9, 2020

Date of Patent: June 7, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventors: Junyao Shao, Sheng Qian, Lei Jia
Implementations for voice assistant on devices

Patent number: 11355116

Abstract: An electronic device configures a device-agnostic voice assistant library for execution on the electronic device based on the electronic device having a first device type. The electronic device also selects an implementation for the voice assistant library. After the configuring, the electronic device receives a verbal input from a user. It extracts request information from the verbal input by processing the verbal input using the voice assistant library executing on the device. It transmits a request to a remote system, the request including the extracted request information. The electronic device receives a response to the request. The response is generated by the remote system in accordance with the extracted request information. The electronic device performs an operation in accordance with the response by one or more voice processing modules of the configured voice assistant library.

Type: Grant

Filed: May 29, 2020

Date of Patent: June 7, 2022

Assignee: Google LLC

Inventors: Kenneth Mixter, Raunaq Shah
Speech filtering in a vehicle

Patent number: 11355136

Abstract: A computer includes a processor and a memory storing instructions executable by the processor to identify an occupant in a passenger cabin of a vehicle, detect a position of a head of the occupant relative to the passenger cabin, apply a first filter to speech from the occupant based on the position of the head, generate a second filter, apply the second filter to the speech, adjust the second filter based on a difference between the speech of the occupant filtered by the second filter and a prestored profile of the occupant, and perform an operation using the speech filtered by the first filter and the second filter.

Type: Grant

Filed: January 11, 2021

Date of Patent: June 7, 2022

Assignee: Ford Global Technologies, LLC

Inventors: Scott Andrew Amman, Cynthia M. Neubecker, Pietro Buttolo, Joshua Wheeler, Brian Bennie
Voice processing method for processing voice signal representing voice, voice processing device for processing voice signal representing voice, and recording medium storing program for processing voice signal representing voice

Patent number: 11348596

Abstract: A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.

Type: Grant

Filed: July 31, 2020

Date of Patent: May 31, 2022

Assignee: YAMAHA CORPORATION

Inventors: Ryunosuke Daido, Hiraku Kayama
Subband block based harmonic transposition

Patent number: 11341984

Abstract: The present document relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), as well as to digital effect processors, e.g. exciters, where generation of harmonic distortion add brightness to the processed signal, and to time stretchers where a signal duration is prolonged with maintained spectral content. A system and method configured to generate a time stretched and/or frequency transposed signal from an input signal is described. The system comprises an analysis filterbank configured to provide an analysis subband signal from the input signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples, each having a phase and a magnitude. Furthermore, the system comprises a subband processing unit configured to determine a synthesis subband signal from the analysis subband signal using a subband transposition factor Q and a subband stretch factor S.

Type: Grant

Filed: June 23, 2020

Date of Patent: May 24, 2022

Assignee: Dolby International AB

Inventor: Lars Villemoes

prev 1 2 3 4 5 6 7 8 9 … next