Neural Network Patents (Class 704/232)
  • Patent number: 11574253
    Abstract: A computer implemented method trains distributed sets of machine learning models by training each of the distributed machine learning models on different subsets of a set of training data, performing a first layer model synchronization operation in a first layer for each set of machine learning models, wherein each model synchronization operation in the first layer generates first updates for each of the machine learning models in each respective set, updating the machine learning models based on the first updates, performing a second layer model synchronization operation in a second layer for first supersets of the machine learning models wherein each model synchronization in the second layer generates second updates for updating each of the machine learning models in the first supersets based on the second updates such that each machine learning model in a respective first superset is the same.
    Type: Grant
    Filed: August 1, 2019
    Date of Patent: February 7, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ivo José Garcia dos Santos, Mehdi Aghagolzadeh, Rihui Peng
  • Patent number: 11568235
    Abstract: Embodiments for implementing mixed precision learning for neural networks by a processor. A neural network may be replicated into a plurality of replicated instances and each of the plurality of replicated instances differ in precision used for representing and determining parameters of the neural network. Data instances may be routed to one or more of the plurality of replicated instances for processing according to a data pre-processing operation.
    Type: Grant
    Filed: November 19, 2018
    Date of Patent: January 31, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Zehra Sura, Parijat Dube, Bishwaranjan Bhattacharjee, Tong Chen
  • Patent number: 11562734
    Abstract: The present disclosure relates to an automatic speech recognition system and a method thereof. The system includes a conformer encoder and a pair of ping-pong buffers. The encoder includes a plurality of encoder layers sequentially executed by one or more graphic processing units. At least one encoder layer includes a first feed forward module, a multi-head self-attention module, a convolution module, and a second feed forward module. The convolution module and the multi-head self-attention module are sandwiched between the first feedforward module and the second feed forward module. The four modules respectively include a plurality of encoder sublayers fused into one or more encoder kernels. The one or more encoder kernels respectively read from one of the pair of ping-pong buffers and write into the other of the pair of ping-pong buffers.
    Type: Grant
    Filed: January 4, 2021
    Date of Patent: January 24, 2023
    Assignee: KWAI INC.
    Inventors: Yongxiong Ren, Yang Liu, Heng Liu, Lingzhi Liu, Jie Li, Kaituo Xu, Xiaorui Wang
  • Patent number: 11557292
    Abstract: A system and method performs speech command verification to determine if audio data includes a representation of a speech command. A first neural network may process portions of the audio data before and after a representation of a wake trigger in the audio data. A second neural network may process the audio data using a recurrent neural network to determine if the audio data includes a representation of a wake trigger.
    Type: Grant
    Filed: December 9, 2020
    Date of Patent: January 17, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Joseph Wang, Michael J Rodehorst, Rajath Kumar Mysore Pradeep Kumar
  • Patent number: 11551707
    Abstract: Disclosed is a method for speech processing, an information device, and a computer program product. The method for speech processing, as implemented by a computer, includes: obtaining a mixed speech signal via a microphone, wherein the mixed speech signal includes a plurality of speech signals uttered by a plurality of unspecified speakers at the same time; generating a set of simulated speech signals according to the mixed speech signal by using a Generative Adversarial Network (GAN), in order to simulate the plurality of speech signals; determining the number of the simulated speech signals in order to estimate the number of the speakers in the surroundings and providing the number as an input of an information application.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: January 10, 2023
    Assignee: RELAJET TECH (TAIWAN) CO., LTD.
    Inventors: Yun-Shu Hsu, Po-Ju Chen
  • Patent number: 11514091
    Abstract: Methods and systems for processing records include extracting feature vectors from words in an unstructured portion of a record. The feature vectors are weighted based similarity to a topic vector from a structured portion of the record associated with the unstructured portion. The weighted feature vectors are classified using a machine learning model to determine respective probability vectors that assign a probability to each of a set of possible relations for each feature vector. Relations between entities are determined within the record based on the probability vectors. An action is performed responsive to the determined relations.
    Type: Grant
    Filed: January 7, 2019
    Date of Patent: November 29, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ke Wang, Pei Ni Liu, Wen Sun, Jing Min Xu, Songfang Huang, Yong Qin
  • Patent number: 11514318
    Abstract: Examples described herein provide a computer-implemented method that includes training, by one or more processing devices, a first neural network for classification based on training data in accordance with a first learning objective, the first neural network producing an intermediate feature function and a final feature function as outputs. The computer-implemented method further includes training, by the one or more processing devices, a second neural network for classification based on the intermediate feature function and the final feature function and further based at least in part on target task samples in accordance with a second learning objective. Training the second neural network includes computing maximal correlation functions of each of the intermediate feature function, the final feature function, and the target task samples.
    Type: Grant
    Filed: April 8, 2020
    Date of Patent: November 29, 2022
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, MASSACHUSETTS INSTITUTE OF TECHNOLOGY
    Inventors: Joshua Ka-Wing Lee, Prasanna Sattigeri, Gregory Wornell
  • Patent number: 11508396
    Abstract: Systems and methods of related to a voice-based system used to determine the severity of emotional distress within an audio recording of an individual is provided. In one non-limiting example, a system comprises a computing device that is configured to receive an audio sample that includes an utterance of a user. Feature extraction is performed on the audio sample to extract a plurality of acoustic emotion features using a base model. Emotion level predictions are generated for an emotion type based at least in part on the acoustic emotion features provided to an emotion specific model. An emotion classification for the audio sample is determined based on the emotion level predictions. The emotion classification comprises the emotion type and a level associated with the emotion type.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: November 22, 2022
    Assignee: TQINTELLIGENCE, INC.
    Inventors: Yared Alemu, Desmond Caulley, Ashutosh A. Joshi
  • Patent number: 11507822
    Abstract: Systems and methods to generate artificial intelligence models with synthetic data are disclosed. An example system includes a deep neural network (DNN) generator to generate a first DNN model using first real data. The example system includes a synthetic data generator to generate first synthetic data from the first real data, the first synthetic data to be used by the DNN generator to generate a second DNN model. The example system includes an evaluator to evaluate performance of the first and second DNN models to determine whether to generate second synthetic data. The example system includes a synthetic data aggregator to aggregate third synthetic data and fourth synthetic data from a plurality of sites to form a synthetic data set. The example system includes an artificial intelligence model deployment processor to deploy an artificial intelligence model trained and tested using the synthetic data set.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: November 22, 2022
    Assignee: General Electric Company
    Inventors: Ravi Soni, Min Zhang, Gopal Avinash
  • Patent number: 11495235
    Abstract: According to one embodiment, a system for creating a speaker model includes one or more processors. The processors change a part of network parameters from an input layer to a predetermined intermediate layer based on a plurality of patterns and inputs a piece of speech into each of neural networks so as to obtain a plurality of outputs from the intermediate layer. The part of network parameters of the each of the neural networks is changed based on one of the plurality of patterns. The processors create a speaker model with respect to one or more words detected from the speech based on the outputs.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: November 8, 2022
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Hiroshi Fujimura
  • Patent number: 11462208
    Abstract: Some techniques described herein determine a correction model for a dialog system, such that the correction model corrects output from an automatic speech recognition (ASR) subsystem in the dialog system. A method described herein includes accessing training data. A first tuple of the training data includes an utterance, where the utterance is a textual representation of speech. The method further includes using an ASR subsystem of a dialog system to convert the utterance to an output utterance. The method further includes storing the output utterance in corrective training data that is based on the training data. The method further includes training a correction model based on the corrective training data, such that the correction model is configured to correct output from the ASR subsystem during operation of the dialog system.
    Type: Grant
    Filed: August 13, 2020
    Date of Patent: October 4, 2022
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Thanh Long Duong, Mark Edward Johnson
  • Patent number: 11461642
    Abstract: An apparatus for processing a signal for input to a neural network, the apparatus configured to: receive a signal comprising a plurality of samples of an analog signal over time; determine at least one frame comprising a group of consecutive samples of the signal, wherein the or each frame includes a first number of samples; for each frame, determine a set of correlation values comprising a second number of correlation values, the second number less than the first number, each correlation value of the set of correlation values based on an autocorrelation of the frame at a plurality of different time lags; provide an output based on the set of correlation values corresponding to the or each of the frames for a neural network for one or more of classification of the analog signal by the neural network and training the neural network based on a predetermined classification.
    Type: Grant
    Filed: September 11, 2019
    Date of Patent: October 4, 2022
    Assignee: NXP B.V.
    Inventors: Jose De Jesus Pineda de Gyvez, Hamed Fatemi, Emad Ayman Taleb Ibrahim
  • Patent number: 11450310
    Abstract: Systems and methods for spoken language understanding are described. Embodiments of the systems and methods receive audio data for a spoken language expression, encode the audio data using a multi-stage encoder comprising a basic encoder and a sequential encoder, wherein the basic encoder is trained to generate character features during a first training phase and the sequential encoder is trained to generate token features during a second training phase, and decode the token features to generate semantic information representing the spoken language expression.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: September 20, 2022
    Assignee: ADOBE INC.
    Inventors: Nikita Kapoor, Jaya Dodeja, Nikaash Puri
  • Patent number: 11437050
    Abstract: Techniques are described for coding audio signals. For example, using a neural network, a residual signal is generated for a sample of an audio signal based on inputs to the neural network. The residual signal is configured to excite a long-term prediction filter and/or a short-term prediction filter. Using the long-term prediction filter and/or the short-term prediction filter, a sample of a reconstructed audio signal is determined. The sample of the reconstructed audio signal is determined based on the residual signal generated using the neural network for the sample of the audio signal.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: September 6, 2022
    Assignee: QUALCOMM Incorporated
    Inventors: Zisis Iason Skordilis, Vivek Rajendran, Guillaume Konrad Sautière, Daniel Jared Sinder
  • Patent number: 11410656
    Abstract: The system identifies one or more entities or content items among a plurality of stored information. The system generates an audio file based on a first text string that represents the entity or content item. Based on the first text string and at least one speech criterion, the system generating, using a speech-to-text module a second text string based on the audio file. The system then compares the text strings and stores the second text string if it is not identical to the first text string. The system generates metadata that includes results from text-speech-text conversions to forecast possible misidentifications when responding to voice queries during search operations. The metadata includes alternative representations of the entity.
    Type: Grant
    Filed: July 31, 2019
    Date of Patent: August 9, 2022
    Assignee: ROVI GUIDES, INC.
    Inventors: Ankur Aher, Indranil Coomar Doss, Aashish Goyal, Aman Puniyani, Kandala Reddy, Mithun Umesh
  • Patent number: 11397784
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving user-specific content, the user-specific content being associated with a user of one or more computer-implemented services, processing the user-specific content using one or more parsers to identify one or more entities and one or more relationships between entities, a parser being specific to a schema, and the one or more entities and the one or more relationships between entities being identified based on the schema, providing one or more user-specific knowledge graphs, a user-specific knowledge graph being specific to the user and including nodes and edges between nodes to define relationships between entities based on the schema, and storing the one or more user-specific knowledge graphs.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: July 26, 2022
    Assignee: GOOGLE LLC
    Inventors: Pranav Khaitan, Shobha Diwakar
  • Patent number: 11398220
    Abstract: A speech processing method executes at least one of first speech processing and second speech processing. The first speech processing identifies a language based on speech, performs signal processing according to the identified language, and transmits the speech on which the signal processing has been performed, to a far-end-side. The second speech processing identifies a language based on speech, receives the speech from the far-end-side, and performs signal processing on the received speech, according to the identified language.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: July 26, 2022
    Assignee: Yamaha Corporation
    Inventor: Mikio Muramatsu
  • Patent number: 11393457
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: July 19, 2022
    Assignee: Google LLC
    Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Tara N. Sainath, Ehsan Variani, Izhak Shafran, Michiel A. u. Bacchiani
  • Patent number: 11380315
    Abstract: One embodiment of the present invention sets forth a technique for analyzing a transcription of a recording. The technique includes generating features representing transcriptions produced by multiple automatic speech recognition (ASR) engines from voice activity in the recording and a best transcription of the recording produced by an ensemble model from the transcriptions. The technique also includes applying a machine learning model to the features to produce a score representing an accuracy of the best transcription. The technique further includes storing the score in association with the best transcription.
    Type: Grant
    Filed: March 9, 2019
    Date of Patent: July 5, 2022
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Ahmad Abdulkader, Mohamed Gamal Mohamed Mahmoud
  • Patent number: 11380312
    Abstract: A system configured to improve wakeword detection. The system may selectively rectify (e.g., attenuate) a portion of an audio signal based on energy statistics corresponding to a keyword (e.g., wakeword). For example, a device may perform echo cancellation to generate isolated audio data, may use the energy statistics to calculate signal quality metric values for a plurality of frequency bands of the isolated audio data, and may select a fixed number of frequency bands (e.g., 5-10%) associated with lowest signal quality metric values. To detect a specific keyword, the system determines a threshold ?(f) corresponding to an expected energy value at each frequency band. During runtime, the device determines signal quality metric values by subtracting residual music from the expected energy values. Thus, the device attenuates only a portion of the total number of frequency bands that include more energy than expected based on the energy statistics of the wakeword.
    Type: Grant
    Filed: June 20, 2019
    Date of Patent: July 5, 2022
    Assignee: Amazon Technologies, Inc.
    Inventor: Mohamed Mansour
  • Patent number: 11380348
    Abstract: A method for correcting infant crying identification includes the following steps: a detecting step provides an audio unit to detect a sound around an infant to generate a plurality of audio samples. A converting step provides a processing unit to convert the audio samples to generate a plurality of audio spectrograms. An extracting step provides a common model to extract the audio spectrograms to generate a plurality of infant crying features. An incremental training step provides an incremental model to train the infant crying features to generate an identification result. A judging step provides the processing unit to judge whether the identification result is correct according to a real result of the infant. When the identification result is different from the real result, an incorrect result is generated. A correcting step provides the processing unit to correct the incremental model according to the incorrect result.
    Type: Grant
    Filed: August 27, 2020
    Date of Patent: July 5, 2022
    Assignee: NATIONAL YUNLIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
    Inventors: Chuan-Yu Chang, Jun-Ying Li
  • Patent number: 11373653
    Abstract: A method and system for detecting speech using close sensor applications, according to some embodiments. In some embodiments, a close microphone is applied to detect sounds with higher muscle or bone transmission components. In some embodiments, a close camera is applied that collects visual information and motion that is correlated with the potential phonemes for such positions and motion. In some embodiments, myography is performed, to detect muscle movement. In an earbud form factor embodiment, processing of different channels of close information is performed to improve the accuracy of the recognition.
    Type: Grant
    Filed: January 17, 2020
    Date of Patent: June 28, 2022
    Inventor: Joseph Alan Epstein
  • Patent number: 11361768
    Abstract: A method includes receiving a spoken utterance that includes a plurality of words, and generating, using a neural network-based utterance classifier comprising a stack of multiple Long-Short Term Memory (LSTM) layers, a respective textual representation for each word of the of the plurality of words of the spoken utterance. The neural network-based utterance classifier trained on negative training examples of spoken utterances not directed toward an automated assistant server. The method further including determining, using the respective textual representation generated for each word of the plurality of words of the spoken utterance, that the spoken utterance is one of directed toward the automated assistant server or not directed toward the automated assistant server, and when the spoken utterance is directed toward the automated assistant server, generating instructions that cause the automated assistant server to generate a response to the spoken utterance.
    Type: Grant
    Filed: July 21, 2020
    Date of Patent: June 14, 2022
    Assignee: Google LLC
    Inventors: Nathan David Howard, Gabor Simko, Maria Carolina Parada San Martin, Ramkarthik Kalyanasundaram, Guru Prakash Arumugam, Srinivas Vasudevan
  • Patent number: 11354459
    Abstract: A synthetic world interface may be used to model digital environments, sensors, and motions for the evaluation, development, and improvement of computer vision and speech algorithms. A synthetic data cloud service with a library of sensor primitives, motion generators, and environments with procedural and game-like capabilities, facilitates engineering design for a manufactural solution that has computer vision and speech capabilities. In some embodiments, a sensor platform simulator operates with a motion orchestrator, an environment orchestrator, an experiment generator, and an experiment runner to test various candidate hardware configurations and computer vision and speech algorithms in a virtual environment, advantageously speeding development and reducing cost. Thus, examples disclosed herein may relate to virtual reality (VR) or mixed reality (MR) implementations.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: June 7, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Michael Ebstyne, Pedro Urbina Escos, Yuri Pekelny, Jonathan Chi Hang Chan, Emanuel Shalev, Alex Kipman, Mark Flick
  • Patent number: 11341242
    Abstract: Disclosed is a computer implemented method for malware detection that analyses a file on a per packet basis. The method receives a packet of one or more packets associated a file, and converting a binary content associated with the packet into a digital representation and tokenizing plain text content associated with the packet. The method extracts one or more n-gram features, an entropy feature, and a domain feature from the converted content of the packet and applies a trained machine learning model to the one or more features extracted from the packet. The output of the machine learning method is a probability of maliciousness associated with the received packet. If the probability of maliciousness is above a threshold value, the method determines that the file associated with the received packet is malicious.
    Type: Grant
    Filed: October 12, 2020
    Date of Patent: May 24, 2022
    Assignee: Zscaler, Inc.
    Inventors: Huihsin Tseng, Hao Xu, Jian L. Zhen
  • Patent number: 11343632
    Abstract: The invention relates to a method for broadcasting a spatialized audio stream to terminals of spectators attending a sports event. The method comprises the acquisition of a plurality of audio streams constituting a soundscape. The soundscape is analyzed by a server in order for the sound spatialization of the audio streams and of the playback thereof on terminals, depending both on the localization of the audio flows and also the position of the spectators.
    Type: Grant
    Filed: September 29, 2020
    Date of Patent: May 24, 2022
    Assignee: INSTITUT MINES TELECOM
    Inventors: Raphael Blouet, Slim Essid
  • Patent number: 11315550
    Abstract: A speaker recognition device according to the present disclosure includes: an acoustic feature calculator that calculates, from utterance data indicating a voice of an obtained utterance, acoustic feature of the voice of the utterance; a statistic calculator that calculates an utterance data statistic from the calculated acoustic feature; a speaker feature extractor that extracts speaker feature of a speaker of the utterance data from the calculated utterance data statistic using a deep neural network (DNN); a similarity calculator that calculates a similarity between the extracted speaker feature and pre-stored speaker feature of at least one registered speaker; and a speaker recognizer that recognizes the speaker of the utterance data based on the calculated similarity.
    Type: Grant
    Filed: November 13, 2019
    Date of Patent: April 26, 2022
    Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA
    Inventors: Kousuke Itakura, Ko Mizuno, Misaki Doi
  • Patent number: 11302303
    Abstract: A method and device for training an acoustic model are provided. The method comprises determining a plurality of tasks for training an acoustic model, obtaining resource occupancies of nodes participating in the training of the acoustic model, and distributing the tasks to the nodes according to the resource occupancies of the nodes and complexities of the tasks. By using computational resources distributed at multiple nodes, tasks for training an acoustic model are performed in parallel in a distributed manner, so as to improve training efficiency.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: April 12, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Yunfeng Li, Qingchang Hao, Yutao Gai, Chenxi Sun, Zhiping Zhou
  • Patent number: 11295203
    Abstract: Neuron placement in a neuromorphic system to minimize cumulative delivery delay is provided. In some embodiments, a neural network description describing a plurality of neurons is read. A relative delivery delay associated with each of the plurality of neurons is determined. An ordering of the plurality of neurons is determined to optimize cumulative delivery delay over the plurality of neurons. An optimized neural network description based on the ordering of the plurality of neurons is written.
    Type: Grant
    Filed: July 27, 2016
    Date of Patent: April 5, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rodrigo Alvarez-Icaza, Pallab Datta, Jeffrey A. Kusnitz
  • Patent number: 11289092
    Abstract: A method, system and computer program product for editing a text using speech recognition includes receiving, by a computer, a first voice input from a user comprising a first target word. The computer identifies instances of the first target word within the text and assigns a first numerical indicator to each instance of the first target word within the text. A selection is received from the user including the first numerical indicator corresponding to a starting point of a selection area. The computer receives a second voice input from the user including a second target word, identifies instances of the second target word within the text, assigns a second numerical indicator to each instance of the second target word, and receives a selection from the user including the second numerical indicator corresponding to an ending point of the selection area.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: March 29, 2022
    Assignee: International Business Machines Corporation
    Inventors: JunXing Yang, XueJun Zhong, Wei Sun, ZhiXia Wang
  • Patent number: 11282535
    Abstract: Disclosed is an electronic apparatus. The electronic apparatus includes a storage for storing a plurality of filters trained in a plurality of convolutional neural networks (CNNs) respectively and a processor configured to acquire a first spectrogram corresponding to a damaged audio signal, input the first spectrogram to a CNN corresponding to each frequency band to apply the plurality of filters trained in the plurality of CNNs respectively, acquire a second spectrogram by merging output values of the CNNs to which the plurality of filters are applied, and acquire an audio signal reconstructed based on the second spectrogram.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: March 22, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-Hyun Choo, Anton Porov, Jong-Hoon Jeong, Ho-Sang Sung, Eun-Mi Oh, Jong-Youb Ryu
  • Patent number: 11263516
    Abstract: Methods and systems for training a neural network include identifying weights in a neural network between a final hidden neuron layer and an output neuron layer that correspond to state matches between a neuron of the final hidden neuron layer and a respective neuron of the output neuron layer. The identified weights are initialized to a predetermined non-zero value and initializing other weights between the final hidden neuron layer and the output neuron layer to zero. The neural network is trained based on a training corpus after initialization.
    Type: Grant
    Filed: August 2, 2016
    Date of Patent: March 1, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata
  • Patent number: 11256973
    Abstract: A neural network embodiment comprises an input layer, an output layer and a filter layer. Each unit of the filter layer receives a filter layer input from a single preceding unit via a respective filter layer input connection. Each filter layer input connection is coupled to a different single preceding unit. The filter layer incentivizes the neural network to learn to produce a target output from the output layer for a given input to the input layer while simultaneously learning weights for each filter layer input connection. The weights learned cause the filter layer to reduce a number of filter layer units that pass respective filter layer inputs as non-zero values. When applied as an initial internal layer between an input layer and an output layer, the filter layer incentivizes the neural network to learn which neural network input features to discard to produce the target output.
    Type: Grant
    Filed: February 5, 2018
    Date of Patent: February 22, 2022
    Assignee: Nuance Communications, Inc.
    Inventors: Nasr Madi, Neil D. Barrett
  • Patent number: 11257483
    Abstract: Spoken language understanding techniques include training a dynamic neural network mask relative to a static neural network using only post-deployment training data such that the mask zeroes out some of the weights of the static neural network and allows some other weights to pass through and applying a dynamic neural network corresponding to the masked static neural network to input queries to identify outputs for the queries.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Krzysztof Czarnowski, Munir Georges
  • Patent number: 11252152
    Abstract: An online system authenticates a user through a voiceprint biometric verification process. When a user needs to be authenticated, the online system generates and provides a random phrase to the user. The online system receives an audio recording of the randomly generated phrase and retrieves a previously trained voiceprint model for the user. The online system analyzes the audio recording by applying the voiceprint model to determine whether the audio recording satisfies a first criteria of whether the voice in the audio recording belongs the user and a second criteria of whether the audio recording includes a vocalization of the randomly generated phrase. If the audio recording satisfies both criteria, the online system authenticates the user. Therefore, the user can be provided access to a new communication session in response to being authenticated.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: February 15, 2022
    Assignee: salesforce.com, inc.
    Inventor: Eugene Lew
  • Patent number: 11244668
    Abstract: A method for generating speech animation from an audio signal includes: receiving the audio signal; transforming the received audio signal into frequency-domain audio features; performing neural-network processing on the frequency-domain audio features to recognize phonemes, wherein the neural-network processing is performed using a neural network trained with a phoneme dataset comprising of audio signals with corresponding ground-truth phoneme labels; and generating the speech animation from the recognized phonemes.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: February 8, 2022
    Assignee: TCL RESEARCH AMERICA INC.
    Inventors: Zixiao Yu, Haohong Wang
  • Patent number: 11244671
    Abstract: A model training method and apparatus is disclosed, where the model training method acquires first output data of a student model for first input data and second output data of a teacher model for second input data and trains the student model such that the first output data and the second output data are not distinguished from each other. The student model and the teacher model have different structures.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: February 8, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hogyeong Kim, Hyohyeong Kang, Hwidong Na, Hoshik Lee
  • Patent number: 11232141
    Abstract: A method for processing an electronic document comprising text is disclosed. The method comprises: splitting the text into at least one sentence, and for each said sentence: associating each word of the sentence with a word-vector; representing the sentence by a sentence-vector, wherein obtaining the sentence-vector comprises computing a weighted average of all word-vectors associated with the sentence; if it is determined that the sentence-vector is associated with a tag in a data set of sentence-vectors associated with tags, obtaining the tag from the database; otherwise, obtaining a tag for the sentence-vector using a classification algorithm; processing the sentence if the tag obtained for the sentence is associated with a predetermined label.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: January 25, 2022
    Inventors: Youness Mansar, Sira Ferradans, Jacopo Staiano
  • Patent number: 11227611
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: January 18, 2022
    Assignee: Google LLC
    Inventors: Andrew E. Rubin, Johan Schalkwyk, Maria Carolin Parada San Martin
  • Patent number: 11227626
    Abstract: An audio response system can generate multimodal messages that can be dynamically updated on viewer's client device based on a type of audio response detected. The audio responses can include keywords or continuum-based signal (e.g., levels of wind noise). A machine learning scheme can be trained to output classification data from the audio response data for content selection and dynamic display updates.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: January 18, 2022
    Assignee: Snap Inc.
    Inventors: Gurunandan Krishnan Gorumkonda, Shree K. Nayar
  • Patent number: 11222641
    Abstract: A speaker recognition device includes: a feature calculator that calculates two or more acoustic features of a voice of an utterance obtained; a similarity calculator that calculates two or more similarities, each being a similarity between one of one or more speaker-specific features of a target speaker for recognition and one of the two or more acoustic features; a combination unit that combines the two or more similarities to obtain a combined value; and a determiner that determines whether a speaker of the utterance is the target speaker based on the combined value. Here, (i) at least two of the two or more acoustic features have different properties, (ii) at least two of the two or more similarities have different properties, or (iii) at least two of the two or more acoustic features have different properties and at least two of the two or more similarities have different properties.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: January 11, 2022
    Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA
    Inventor: Kousuke Itakura
  • Patent number: 11216069
    Abstract: Systems and methods for using neuromuscular information to improve speech recognition. The system includes a plurality of neuromuscular sensors arranged on one or more wearable devices and configured to continuously record a plurality of neuromuscular signals from a user, at least one storage device configured to store one or more trained statistical models for determining text based on audio input and the plurality of neuromuscular signals, at least one input interface configured to receive the audio input, and at least one computer processor programmed to obtain the audio input and the plurality of neuromuscular signals, provide as input to the one or more trained statistical models, the audio input and the plurality of neuromuscular signals or signals derived from the plurality of neuromuscular signals, and determine based, at least in part, on an output of the one or more trained statistical models, the text.
    Type: Grant
    Filed: May 8, 2018
    Date of Patent: January 4, 2022
    Assignee: Facebook Technologies, LLC
    Inventors: Adam Berenzweig, Patrick Kaifosh, Alan Huan Du, Jeffrey Scott Seely
  • Patent number: 11217229
    Abstract: A speech recognition method, apparatus, a computer device and an electronic device for recognizing speech. The method includes receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; performing a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and determining a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals.
    Type: Grant
    Filed: July 6, 2020
    Date of Patent: January 4, 2022
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTD
    Inventors: Yi Gao, Ji Meng Zheng, Meng Yu, Min Luo
  • Patent number: 11205420
    Abstract: A system and method performs wakeword detection using a neural network model that includes a recurrent neural network (RNN) for processing variable-length wakewords. To prevent the model from being influenced by non-wakeword speech, multiple instances of the model are created to process audio data, and each instance is configured to use weights determined by training data. The model may instead or in addition be used to process the audio data only when a likelihood that the audio data corresponds to the wakeword is greater than a threshold. The model may process the audio data as represented by groups of acoustic feature vectors; computations for feature vectors common to different groups may be re-used.
    Type: Grant
    Filed: June 10, 2019
    Date of Patent: December 21, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Gengshen Fu, Thibaud Senechal, Shiv Naga Prasad Vitaladevuni, Michael J. Rodehorst, Varun K. Nagaraja
  • Patent number: 11182566
    Abstract: A computer-implemented method for training a neural network that is configured to generate a score distribution over a set of multiple output positions. The neural network is configured to process a network input to generate a respective score distribution for each of a plurality of output positions including a respective score for each token in a predetermined set of tokens that includes n-grams of multiple different sizes. Example methods described herein provide trained neural networks which produce results with improved accuracy compared to the state of the art, e.g. translations that are more accurate compared to the state of the art, or more accurate speech recognition compared to the state of the art.
    Type: Grant
    Filed: October 3, 2017
    Date of Patent: November 23, 2021
    Assignee: Google LLC
    Inventors: Navdeep Jaitly, Yu Zhang, Quoc V. Le, William Chan
  • Patent number: 11176926
    Abstract: Provided is a speech recognition apparatus. The apparatus includes a preprocessor configured to extract select frames from all frames of a first speech of a user, and a score calculator configured to calculate an acoustic score of a second speech, made up of the extracted select frames, by using a Deep Neural Network (DNN)-based acoustic model, and to calculate an acoustic score of frames, of the first speech, other than the select frames based on the calculated acoustic score of the second speech.
    Type: Grant
    Filed: February 20, 2020
    Date of Patent: November 16, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: In Chul Song
  • Patent number: 11169531
    Abstract: Techniques are discussed for determining predicted trajectories based on a top-down representation of an environment. Sensors of a first vehicle can capture sensor data of an environment, which may include agent(s) separate from the first vehicle, such as a second vehicle or a pedestrian. A multi-channel image representing a top-down view of the agent(s) and the environment and comprising semantic information can be generated based on the sensor data. Semantic information may include a bounding box and velocity information associated with the agent, map data, and other semantic information. Multiple images can be generated representing the environment over time. The image(s) can be input into a prediction system configured to output a heat map comprising prediction probabilities associated with possible locations of the agent in the future. A predicted trajectory can be generated based on the prediction probabilities and output to control an operation of the first vehicle.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: November 9, 2021
    Assignee: Zoox, Inc.
    Inventors: Xi Joey Hong, Benjamin John Sapp
  • Patent number: 11170783
    Abstract: Multi-agent input coordination can be used to for acoustic collaboration of multiple listening agents deployed in smart devices on a premises, improving the accuracy of identifying requests and specifying where that request should be honored, improving quality of detection, and providing better understanding of user commands and user intent throughout the premises. A processor or processors such as those in a smart speaker can identify audio requests received through at least two agents in a network and determine at which of the agents to actively process a selected audio request. The identification can make use of techniques such as location context and secondary trait analysis. The audio request can include simultaneous audio requests received through at least two agents, differing audio requests received from different requesters, or both.
    Type: Grant
    Filed: April 16, 2019
    Date of Patent: November 9, 2021
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: James Pratt, Timothy Innes, Eric Zavesky, Nigel Bradley
  • Patent number: 11164574
    Abstract: One embodiment provides a method, including: obtaining a plurality of conversational logs; generating a human agent emulator and a user emulator; providing a workspace for a conversational agent, so that an agent designer generates a conversational specification for the conversational agent, wherein the generating a conversational specification comprises: receiving a selection, by the agent designer, of at least one intent for the conversational agent, wherein the receiving a selection is responsive to the conversational agent workspace providing suggestions for intents; providing at least one suggestion for a dialog node that corresponds to the selected at least one intent; and generating a dialog flow for the conversational agent, wherein the generating comprises iteratively receiving, from the agent designer, selection of at least one aspect and receiving at least one selection of the at least one suggestion for dialog nodes; and providing the conversational agent.
    Type: Grant
    Filed: January 3, 2019
    Date of Patent: November 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pankaj Dhoolia, Ajay Kumar Gupta, Danish Contractor, Dinesh Raghu, Sachindra Joshi, Vineet Kumar, Dhiraj Madan
  • Patent number: 11157795
    Abstract: Graph partitioning and placement for multi-chip neurosynaptic networks. According to various embodiments, a neural network description is read. The neural network description describes a plurality of neurons. The plurality of neurons has a mapping from an input domain of the neural network. The plurality of neurons is labeled based on the mapping from the input domain. The plurality of neurons is grouped into a plurality of groups according to the labeling. Each of the plurality of groups is continuous within the input domain. Each of the plurality of groups is assigned to at least one neurosynaptic core.
    Type: Grant
    Filed: March 13, 2017
    Date of Patent: October 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Arnon Amir, Pallab Datta, Myron D. Flickner, Dharmendra S. Modha, Tapan K. Nayak