Patents Examined by Douglas Godbold
  • Patent number: 12112767
    Abstract: A method, computer system, and a computer program product for audio data augmentation are provided. Sets of audio data from different sources may be obtained. A respective normalization factor for at least two sources of the different sources may be calculated. The normalization factors from the at least two sources may be mixed to determine a mixed normalization factor. A first set of the sets may be normalized by using the mixed normalization factor and to obtain training data for training an acoustic model.
    Type: Grant
    Filed: May 21, 2021
    Date of Patent: October 8, 2024
    Assignee: International Business Machines Corporation
    Inventors: Toru Nagano, Takashi Fukuda, Masayuki Suzuki
  • Patent number: 12112130
    Abstract: A text style transfer system is described that generates different stylized versions of input text by rewriting the input text according to a target style. To do so, the text style transfer system employs a variational autoencoder to derive separate content and style representations for the input text, where the content representation specifies semantic information conveyed by the input text and the style representation specifies one or more style attributes expressed by the input text. The style representation using counterfactual reasoning to identify different transfer strengths for applying the target style to the input text. Each transfer strength represents a minimum change to the input text that achieves a different expression of the target style. The transfer strengths are then used to generate style representation variants, which are each concatenated with the content representation of the input text to generate the plurality of different stylized versions of the input text.
    Type: Grant
    Filed: November 3, 2021
    Date of Patent: October 8, 2024
    Assignee: Adobe Inc.
    Inventors: Sharmila Reddy Nangi, Niyati Himanshu Chhaya, Hyman Chung, Harshit Nyati, Nikhil Kaushik, Sopan Khosla
  • Patent number: 12112135
    Abstract: An approach is provided for optimizing a feedback-type question answering process. A training set is constructed to detect missing information of a question. A natural language generation model is trained using the missing information. The natural language generation model is executed to generate a rhetorical question. A response to the rhetorical question is combined with the question to generate an input to a language processor. A new question is generated. The new question is applied to a document library. A final answer is generated.
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: October 8, 2024
    Assignee: International Business Machines Corporation
    Inventors: Zhong Fang Yuan, Tong Liu, Chen Gao, Xiang Yu Yang
  • Patent number: 12106061
    Abstract: Implementations analyze transaction data and objectively capture pre-identified desired information about the analyzed transaction data in a consistently organized manner. An example system includes a user interface that enables a user to provide static portions and dynamic portions of a template. The dynamic portions identify variables that are replaced with either data extracted from the transaction or text based on the output of classifiers applied to the transaction. An example method includes applying classifiers to scoring units of a transaction to generate classifier tags for the scoring units and generating a narrative by replacing variables in an automated narrative template with text based on at least some of the classifier tags.
    Type: Grant
    Filed: April 5, 2021
    Date of Patent: October 1, 2024
    Assignee: CLARABRIDGE, INC.
    Inventors: Fabrice Martin, Rafael Algara-Torre, Leonardo Apolonio, Mark Arehart, Zhexin Chen, Sandesh Gade, Caroline Kinsella, Ellen Loeshelle, Ram Ramachandran, Maksym Shcherbina, Eliana Vornov, Ivan Volonsevich
  • Patent number: 12106062
    Abstract: The disclosure provides a method for generating a text. The method includes: obtaining a coding sequence of a first text by coding the first text; obtaining a controllable attribute of a second text to be generated; predicting a hidden state of the second text based on the coding sequence of the first text and the controllable attribute of the second text; and obtaining a second text corresponding to the first text by decoding the coding sequence of the first text based on the hidden state of the second text.
    Type: Grant
    Filed: January 11, 2022
    Date of Patent: October 1, 2024
    Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Zhe Hu, Zhiwei Cao, Jiachen Liu, Xinyan Xiao
  • Patent number: 12100409
    Abstract: An audio decoder provides a decoded audio information on the basis of an encoded audio information including linear prediction coefficients (LPC) and includes a tilt adjuster to adjust a tilt of a noise using linear prediction coefficients of a current frame to acquire a tilt information and a noise inserter configured to add the noise to the current frame in dependence on the tilt information. Another audio decoder includes a noise level estimator to estimate a noise level for a current frame using a linear prediction coefficient of at least one previous frame to acquire a noise level information; and a noise inserter to add a noise to the current frame in dependence on the noise level information provided by the noise level estimator. Thus, side information about a background noise in the bit-stream may be omitted. Methods and computer programs serve a similar purpose.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: September 24, 2024
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Guillaume Fuchs, Christian Helmrich, Manuel Jander, Benjamin Schubert, Yoshikazu Yokotani
  • Patent number: 12094450
    Abstract: A speech processing device includes: first segment means for dividing first speech into a plurality of first speech segments; second segment means for dividing second speech into a plurality of second speech segments; primary speaker recognition means for calculating scores indicating similarities between the plurality of first and second speech segments; threshold value calculation means for calculating a threshold value based on scores indicating similarities between the plurality of first speech segments; speaker clustering means for classifying each of the plurality of second speech segments into one or more clusters having a similarity higher than the similarity indicated by the threshold value; and secondary speaker recognition means for calculating a similarity between each of the one or more clusters and the first speech and determining based on a result of the calculation whether speech corresponding to the first speech is contained in any of the one or more clusters.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: September 17, 2024
    Assignee: NEC CORPORATION
    Inventors: Ling Guo, Hitoshi Yamamoto, Takafumi Koshinaka
  • Patent number: 12094470
    Abstract: The present application discloses a waking-up and responding method for a voice recognition device, a voice recognition device, and a computer storage medium. A plurality of voice recognition devices form an area network. The plurality of voice recognition devices are classified into a central device and at least one non-central device. The waking-up and responding method includes: the central device analyzing the collected voice signal to obtain a response factor of the central device; receiving a response factor of the non-central device, the response factor of the non-central device being obtained by the non-central device analyzing the collected voice signal; comparing the response factor of the central device with the response factor of the non-central device; and determining a pending voice recognition device, the pending voice recognition device being a voice recognition device that is in the area network and responds to the voice signal.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: September 17, 2024
    Assignees: GUANGDONG MIDEA WHITE HOME APPLIANCE TECHNOLOGY INNOVATION CENTER CO., LTD., MIDEA GROUP CO., LTD
    Inventor: Ruicheng He
  • Patent number: 12087268
    Abstract: Systems, devices, and methods are provided for training and/or inferencing using machine-learning models. In at least one embodiment, a user selects a source media (e.g., video or audio file) and a target identity. A content embedding may be extracted from the source media, and an identity embedding may be obtained for the target identity. The content embedding of the source media and the identity embedding of the target identity may be provided to a transfer model that generates synthesized media. For example, a user may select a song that is sung by a first artist and then select a second artist as the target identity to produce a cover of the song in the voice of the second artist.
    Type: Grant
    Filed: December 3, 2021
    Date of Patent: September 10, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Wenbin Ouyang, Naveen Sudhakaran Nair
  • Patent number: 12087310
    Abstract: An apparatus for downmixing three or more audio input channels to obtain two or more audio output channels is provided. The apparatus includes a receiving interface for receiving the three or more audio input channels and for receiving side information. Moreover, the apparatus includes a downmixer for downmixing the three or more audio input channels depending on the side information to obtain the two or more audio output channels. The number of the audio output channels is smaller than the number of the audio input channels. The side information indicates a characteristic of at least one of the three or more audio input channels, or a characteristic of one or more sound waves recorded within the one or more audio input channels, or a characteristic of one or more sound sources which emitted one or more sound waves recorded within the one or more audio input channels.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: September 10, 2024
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Arne Borsum, Stephan Schreiner, Harald Fuchs, Michael Kratz, Bernhard Grill, Sebastian Scharrer
  • Patent number: 12080306
    Abstract: An encoder for providing an audio stream on the basis of a transform-domain representation of an input audio signal includes a quantization error calculator configured to determine a multi-band quantization error over a plurality of frequency bands of the input audio signal for which separate band gain information is available. The encoder also includes an audio stream provider for providing the audio stream such that the audio stream includes information describing an audio content of the frequency bands and information describing the multi-band quantization error. A decoder for providing a decoded representation of an audio signal on the basis of an encoded audio stream representing spectral components of frequency bands of the audio signal includes a noise filler for introducing noise into spectral components of a plurality of frequency bands to which separate frequency band gain information is associated on the basis of a common multi-band noise intensity value.
    Type: Grant
    Filed: November 29, 2023
    Date of Patent: September 3, 2024
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Nikolaus Rettelbach, Bernhard Grill, Guillaume Fuchs, Stefan Geyersberger, Markus Multrus, Harald Popp, Juergen Herre, Stefan Wabnik, Gerald Schuller, Jens Hirschfeld
  • Patent number: 12080305
    Abstract: An encoder for providing an audio stream on the basis of a transform- domain representation of an input audio signal includes a quantization error calculator configured to determine a multi-band quantization error over a plurality of frequency bands of the input audio signal for which separate band gain information is available. The encoder also includes an audio stream provider for providing the audio stream such that the audio stream includes information describing an audio content of the frequency bands and information describing the multi-band quantization error. A decoder for providing a decoded representation of an audio signal on the basis of an encoded audio stream representing spectral components of frequency bands of the audio signal includes a noise filler for introducing noise into spectral components of a plurality of frequency bands to which separate frequency band gain information is associated on the basis of a common multi-band noise intensity value.
    Type: Grant
    Filed: November 29, 2023
    Date of Patent: September 3, 2024
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Nikolaus Rettelbach, Bernhard Grill, Guillaume Fuchs, Stefan Geyersberger, Markus Multrus, Harald Popp, Juergen Herre, Stefan Wabnik, Gerald Schuller, Jens Hirschfeld
  • Patent number: 12079578
    Abstract: Disclosed herein are embodiments of systems, methods, and products that generate semantic resolutions of wireless signals. An analytic server may train a plurality of inductive classifiers generate a respective set of deduction rules. For example, the analytic server may train a first inductive classifier to mine a first set of deduction rules matching service set identifiers (SSIDs) with business entities. As another example, the analytic server may train a second inductive classifier to mine a second set of deduction rules matching proximal groupings of wireless signals (also referred to as hyperclusters) with business entities. Upon receiving an unresolved wireless signal, the analytic server may apply at least one of the first and second set of deduction rules to assign a semantic meaning to the wireless signal.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: September 3, 2024
    Assignee: PwC Product Sales LLC
    Inventors: Srdjan Marinovic, Eric Stanculescu, Rebecca E. Cohen, Kristopher E. Herring, Anne E. Morrow, Robert T. Rogers
  • Patent number: 12080290
    Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.
    Type: Grant
    Filed: February 10, 2022
    Date of Patent: September 3, 2024
    Assignee: Google LLC
    Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
  • Patent number: 12073824
    Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
    Type: Grant
    Filed: December 3, 2020
    Date of Patent: August 27, 2024
    Assignee: GOOGLE LLC
    Inventors: Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Jean Bruguier, Shuo-Yiin Chang, Wei Li
  • Patent number: 12067981
    Abstract: Systems and methods for generating responses to user input such as dialogues, and images are discussed. The system may generate, by a response generation module of at least one server, an optimal generated response to the user communication by applying an generative adversarial network. In some embodiments, the generative adversarial network may include a hierarchical recurrent encoder decoder generative adversarial network including a generator and a discriminator component.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: August 20, 2024
    Inventors: Oluwatobi Olabiyi, Erik T. Mueller
  • Patent number: 12062381
    Abstract: Two-stage speech/music classification device and method classify an input sound signal and select a core encoder for encoding the sound signal. A first stage classifies the input sound signal into one of a number of final classes. A second stage extracts high-level features of the input sound signal and selects the core encoder for encoding the input sound signal in response to the extracted high-level features and the final class selected in the first stage.
    Type: Grant
    Filed: April 8, 2021
    Date of Patent: August 13, 2024
    Assignee: VOICEAGE CORPORATION
    Inventor: Vladimir Malenovsky
  • Patent number: 12063214
    Abstract: Disclosed are various approaches for authenticating a user through a voice assistant device and creating an association between the device and a user account. The request is associated with a network or federated service. The user can use a client device, such as a smartphone, to initiate an authentication flow. A passphrase is provided to the client device can captured by the client device and a voice assistant device. Audio captured by the client device and voice assistant device can be sent to an assistant connection service. The passphrase and an audio signature calculated from the audio can be validated. An association between the user account and the voice assistant device can then be created.
    Type: Grant
    Filed: February 25, 2020
    Date of Patent: August 13, 2024
    Assignee: VMware LLC
    Inventor: Rohit Pradeep Shetty
  • Patent number: 12057107
    Abstract: Apparatuses and methods are provided for adding a word or a character to a machine learning model for training the machine learning model. In particular, a model learning apparatus executes operations comprising adding a word or a character to be added to a neural network as a machine learning model to the output layer of the neural network. The operations further comprise calculating an output probability distribution of an output from the output layer of the neural network when a feature amount of the word or the character is input to the neural network. Given the output probability distribution and a correct feature amount of the word or the character, the operations further comprise updating a parameter of the output layer of the neural network.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: August 6, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takafumi Moriya, Yoshikazu Yamaguchi
  • Patent number: 12051441
    Abstract: This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to N sound areas including multiple users speaking simultaneously; generating a control signal corresponding to each target detection sound area according to user information corresponding to the target detection sound area; processing multi-user speech input signals by using the control signals, to obtain a speech output signal corresponding to each target detection sound area; generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area; and selecting, among the multiple users, a main speaker based on the user information, the speech output signals and speech detection results of multiple users in the N sound areas.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: July 30, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Jimeng Zheng, Lianwu Chen, Weiwei Li, Zhiyi Duan, Meng Yu, Dan Su, Kaiyu Jiang