Patents Examined by Shaun A Roberts
  • Patent number: 11475911
    Abstract: In communication performed among multiple participants, at least one of a participant who will start speaking next and a timing thereof is estimated. An estimation apparatus includes a head motion information generation unit that acquires head motion information representing head motions of communication participants in a time segment corresponding to an end time of an utterance segment and synchronization information for head motions between the communication participants, and an estimation unit that estimates at least one of the speaker of the next utterance segment following the utterance segment and the next utterance start timing following the utterance segment based on the head motion information and the synchronization information for the head motions between the communication participants.
    Type: Grant
    Filed: February 5, 2019
    Date of Patent: October 18, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ryo Ishii, Ryuichiro Higashinaka, Junji Tomita, Shiro Kumano, Kazuhiro Otsuka
  • Patent number: 11462229
    Abstract: This disclosure relates generally to a system and method to identify a plurality of noises or their combination to suppress them and enhancing the deteriorated input signal in a dynamic manner. It identifies noises in the audio signal and categorizing them based on the trained database of noises. A combination of deep neural network (DNN) and artificial Intelligence (AI) helps the system for self-learning to understand and capture noises in the environment and retain the model to reduce noises from the next attempt. The system suppresses unwanted noise coming from the external environment with the help of AI based algorithms, by understanding, differentiating, and enhancing human voice in a live environment. The system will help in the reduction of unwanted noises and enhance the experience of business and public meetings, video conferences, musical events, speech broadcasts etc. that could cause distractions, disturbances and create barriers in the conversation.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: October 4, 2022
    Assignee: TATA CONSULTANCY SERVICES LIMITED
    Inventors: Robin Tommy, Reshmi Ravindranathan, Navin Infant Raj, Venkatakrishna Akula, Jithin Laiju Ravi, Anita Nanadikar, Anil Kumar Sharma, Pranav Champaklal Shah, Bhasha Prasad Khose
  • Patent number: 11462224
    Abstract: A stereo signal encoding method includes obtaining a residual signal encoding parameter of a current frame of a stereo signal based on downmixed signal energy and residual signal energy of each of M sub-bands of the current frame, where the residual signal encoding parameter indicates whether to encode residual signals of the M sub-bands, determining whether to encode the residual signals based on the residual signal encoding parameter, and encoding the residual signals when it is determined that the residual signals need to be encoded.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: October 4, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Bin Wang, Zexin Liu, Haiting Li
  • Patent number: 11456005
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.
    Type: Grant
    Filed: November 21, 2018
    Date of Patent: September 27, 2022
    Assignee: Google LLC
    Inventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
  • Patent number: 11450336
    Abstract: A system and method are described for automatic acoustic feedback cancellation in real time. In some implementations, the system may receive audio data describing an audio signal, which the system may use to determine a set of frames of the audio signal. Spectral analysis may be performed on the one or more frames of the audio to detect spectral patterns of two or more frames indicative of acoustic feedback. An additional delay identification test may be performed to identify a consistent delay indicative of acoustic feedback. In some implementations, a state machine is advanced based in part on accumulated delay votes. Decisions can be made to mute the acoustic feedback and cease the muting operation when silence is detected.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: September 20, 2022
    Assignee: DIALPAD, INC.
    Inventors: Qian-Yu Tang, Corey Burke
  • Patent number: 11450310
    Abstract: Systems and methods for spoken language understanding are described. Embodiments of the systems and methods receive audio data for a spoken language expression, encode the audio data using a multi-stage encoder comprising a basic encoder and a sequential encoder, wherein the basic encoder is trained to generate character features during a first training phase and the sequential encoder is trained to generate token features during a second training phase, and decode the token features to generate semantic information representing the spoken language expression.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: September 20, 2022
    Assignee: ADOBE INC.
    Inventors: Nikita Kapoor, Jaya Dodeja, Nikaash Puri
  • Patent number: 11417320
    Abstract: The present invention is a masterbot architecture in a scalable multi-service virtual assistant platform that can construct a fluid and dynamic dialogue by assembling responses to end user utterances from two kinds of agents, information agents and action agents. A plurality of information agents obtain at least one information value from a parsed user input and/or contextual data. A plurality of action agents perform one or more actions in response to the parsed user input, the contextual data, and/or the information value. A masterbot arbitrates an activation of the plurality of information agents and the plurality of action agents. The masterbot comprises access to a machine-learning module to select an appropriate action agent, where one or more information agents are activated based on the selected appropriate action agent.
    Type: Grant
    Filed: January 11, 2022
    Date of Patent: August 16, 2022
    Assignee: Linc Global, Inc.
    Inventors: Fang Cheng, Dennis Wu, Jian Da Chen
  • Patent number: 11417348
    Abstract: A method, system, and computer program to encode and decode a channel coherence parameter applied on a frequency band basis, where the coherence parameters of each frequency band form a coherence vector. The coherence vector is encoded and decoded using a predictive scheme followed by a variable bit rate entropy coding.
    Type: Grant
    Filed: April 5, 2019
    Date of Patent: August 16, 2022
    Assignee: TELEFONAKTIEBOLAGET LM ERISSON (PUBL)
    Inventors: Erik Norvell, Fredrik Jansson
  • Patent number: 11398238
    Abstract: Disclosed herein is a speech recognition method in a distributed network environment. A method of performing a speech recognition operation in an edge computing device includes receiving a natural language understanding (NLU) model from the cloud server, storing the received NLU model, receiving voice data spoken by a user from the client device, performing a natural language processing operation on the received voice data using the NLU model, performing speech recognition according to the natural language processing operation, and transmitting a result of the speech recognition to the client device. At least one of the edge computing device, a voice recognition device, and a server may be associated with an artificial intelligence module, a drone (an unmanned aerial vehicle (UAV)), a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to a 5G service, and the like.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: July 26, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Sungjin Kim, Dongho Kim, Jingyeong Kim, Taehyun Kim
  • Patent number: 11393450
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting units for speech synthesis. One of the methods includes determining a sequence of text units that each represent a respective portion of text for speech synthesis; and determining multiple paths of speech units that each represent the sequence of text units by selecting a first speech unit that includes speech synthesis data representing a first text unit; selecting multiple second speech units including speech synthesis data representing a second text unit based on (i) a join cost to concatenate the second speech unit with a first speech unit and (ii) a target cost indicating a degree that the second speech unit corresponds to the second text unit; and defining paths from the selected first speech unit to each of the multiple second speech units to include in the multiple paths of speech units.
    Type: Grant
    Filed: January 11, 2021
    Date of Patent: July 19, 2022
    Assignee: Google LLC
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 11386889
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: July 12, 2022
    Assignee: Google LLC
    Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Leonid Velikovich
  • Patent number: 11367456
    Abstract: The present disclosure provides a streaming voice conversion method as well as an apparatus and a computer readable storage medium using the same. The method includes: obtaining to-be-converted voice data; partitioning the to-be-converted voice data in an order of data obtaining time as a plurality of to-be-converted partition voices, where the to-be-converted partition voice data carries a partition mark; performing a voice conversion on each of the to-be-converted partition voices to obtain a converted partition voice, where the converted partition voice carries a partition mark; performing a partition restoration on each of the converted partition voices to obtain a restored partition voice, where the restored partition voice carries a partition mark; and outputting each of the restored partition voices according to the partition mark carried by the restored partition voice. In this manner, the response time is shortened, and the conversion speed is improved.
    Type: Grant
    Filed: December 3, 2020
    Date of Patent: June 21, 2022
    Assignee: UBTECH ROBOTICS CORP LTD
    Inventors: Jiebin Xie, Ruotong Wang, Dongyan Huang, Zhichao Tang, Yang Liu, Youjun Xiong
  • Patent number: 11367454
    Abstract: An apparatus for encoding directional audio coding parameters having diffuseness parameters and direction parameters, has: a parameter quantizer for quantizing the diffuseness parameters and the direction parameters; a parameter encoder for encoding quantized diffuseness parameters and quantized direction parameters; and an output interface for generating an encoded parameter representation having information on encoded diffuseness parameters and encoded direction parameters.
    Type: Grant
    Filed: May 6, 2020
    Date of Patent: June 21, 2022
    Assignee: FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V.
    Inventors: Guillaume Fuchs, Jürgen Herre, Fabian Küch, Stefan Döhla, Markus Multrus, Oliver Thiergart, Oliver Wübbolt, Florin Ghido, Stefan Bayer, Wolfgang Jaegers
  • Patent number: 11361030
    Abstract: Facet-based search processing is provided which includes receiving a query search context for querying documents of a document set, and retrieving, by similar document search processing, a document subset from the document set. The document subset includes documents of the set most similar to a search document of the query search context. Facet analysis processing is used to generate M candidate facets most-related to the query search context, and facets of the M candidate facets associated with documents of the subset are identified, and classified into a positive facet set and a negative facet set based, at least in part, on extent of facet commonality across the documents. A listing is of the documents in the document subset is provided, with the listing highlighting facets of the positive facet set.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: June 14, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Daiki Tsuzuku, Tohru Hasegawa, Shunsuke Ishikawa, Keisuke Nitta, Yasumasa Kajinaga, Masaki Komedani
  • Patent number: 11361765
    Abstract: Disclosed is a multi-device control method including: performing a voice recognition operation on a voice command generated from a sound source; identifying distances between each of the plurality of devices and the sound source; assigning response rankings to the devices by combining a context-specific correction score of each device corresponding to the voice command and the distances; and selecting a device to respond to the voice command from among the devices according to the response rankings.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: June 14, 2022
    Assignee: LG ELECTRONICS INC.
    Inventor: Jisoo Park
  • Patent number: 11355113
    Abstract: A method, apparatus, device, and computer readable storage medium for recognizing and decoding a voice based on a streaming attention model are provided. The method may include generating a plurality of acoustic paths for decoding the voice using the streaming attention model, and then merging acoustic paths with identical last syllables of the plurality of acoustic paths to obtain a plurality of merged acoustic paths. The method may further include selecting a preset number of acoustic paths from the plurality of merged acoustic paths as retained candidate acoustic paths. Embodiments of the present disclosure present a concept that acoustic score calculating of a current voice fragment is only affected by its last voice fragment and has nothing to do with earlier voice history, and merge acoustic paths with the identical last syllables of the plurality of candidate acoustic paths.
    Type: Grant
    Filed: March 9, 2020
    Date of Patent: June 7, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Junyao Shao, Sheng Qian, Lei Jia
  • Patent number: 11355116
    Abstract: An electronic device configures a device-agnostic voice assistant library for execution on the electronic device based on the electronic device having a first device type. The electronic device also selects an implementation for the voice assistant library. After the configuring, the electronic device receives a verbal input from a user. It extracts request information from the verbal input by processing the verbal input using the voice assistant library executing on the device. It transmits a request to a remote system, the request including the extracted request information. The electronic device receives a response to the request. The response is generated by the remote system in accordance with the extracted request information. The electronic device performs an operation in accordance with the response by one or more voice processing modules of the configured voice assistant library.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: June 7, 2022
    Assignee: Google LLC
    Inventors: Kenneth Mixter, Raunaq Shah
  • Patent number: 11355136
    Abstract: A computer includes a processor and a memory storing instructions executable by the processor to identify an occupant in a passenger cabin of a vehicle, detect a position of a head of the occupant relative to the passenger cabin, apply a first filter to speech from the occupant based on the position of the head, generate a second filter, apply the second filter to the speech, adjust the second filter based on a difference between the speech of the occupant filtered by the second filter and a prestored profile of the occupant, and perform an operation using the speech filtered by the first filter and the second filter.
    Type: Grant
    Filed: January 11, 2021
    Date of Patent: June 7, 2022
    Assignee: Ford Global Technologies, LLC
    Inventors: Scott Andrew Amman, Cynthia M. Neubecker, Pietro Buttolo, Joshua Wheeler, Brian Bennie
  • Patent number: 11348596
    Abstract: A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: May 31, 2022
    Assignee: YAMAHA CORPORATION
    Inventors: Ryunosuke Daido, Hiraku Kayama
  • Patent number: 11341984
    Abstract: The present document relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), as well as to digital effect processors, e.g. exciters, where generation of harmonic distortion add brightness to the processed signal, and to time stretchers where a signal duration is prolonged with maintained spectral content. A system and method configured to generate a time stretched and/or frequency transposed signal from an input signal is described. The system comprises an analysis filterbank configured to provide an analysis subband signal from the input signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples, each having a phase and a magnitude. Furthermore, the system comprises a subband processing unit configured to determine a synthesis subband signal from the analysis subband signal using a subband transposition factor Q and a subband stretch factor S.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: May 24, 2022
    Assignee: Dolby International AB
    Inventor: Lars Villemoes