Patents Examined by Nandini Subramani
-
Patent number: 11984135Abstract: System and method for offline embedded abnormal sound fault detection are disclosed, the system comprising a sound acquisition module, a sound audio feature extraction module, and a neural network module. The sound audio feature extraction module uses fast Fourier transform to process sample data in a frequency domain, and then inputs the sample data to the neural network module to complete anomaly classification. The neural network module comprises at least one CNN feature extraction layer, a long short-term memory (LSTM) layer, at least one fully connected and at least one classification layer, and a trigger decision layer. The number of network layers of the at least one CNN feature extraction layer is dynamically adjustable, a network structure of the at least one fully connected layer and the at least one classification layer is dynamically variable, and the trigger decision layer is configured to eliminate generalization errors generated by a neural network.Type: GrantFiled: March 2, 2021Date of Patent: May 14, 2024Assignee: Espressif Systems (Shanghai) Co., Ltd.Inventor: Wangwang Wang
-
Patent number: 11978433Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.Type: GrantFiled: June 22, 2021Date of Patent: May 7, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Felix Weninger, Marco Gaudesi, Ralf Leibold, Puming Zhan
-
Patent number: 11966711Abstract: Embodiments of the present disclosure relate to a solution for translation verification and correction. According to the solution, a neural network is trained to determine an association degree among a group of words in a source or target language. The neural network can be used for translation verification and correction. According to the solution, a group of words in a source language and translations of the group of words in a target language are obtained. An association degree among the group of words and an association degree among the translations can be determined by using the trained neural network. Then, whether there is a wrong translation can be determined based on the association degrees. In some embodiments, corresponding methods, systems and computer program products are provided.Type: GrantFiled: May 18, 2021Date of Patent: April 23, 2024Assignee: International Business Machines CorporationInventors: Guang Ming Zhang, Xiaoyang Yang, Hong Wei Jia, Mo Chi Liu, Yun Wang
-
Patent number: 11961510Abstract: According to one embodiment, an information processing apparatus includes following units. The acquisition unit acquires first training data including a combination of a voice feature quantity and a correct phoneme label of the voice feature quantity. The training unit trains an acoustic model using the first training data in a manner to output the correct phoneme label in response to input of the voice feature quantity. The extraction unit extracts from the first training data, second training data including voice feature quantities of at least one of a keyword, a sub-word, a syllable, or a phoneme included in the keyword. The adaptation processing unit adapts the trained acoustic model using the second training data to a keyword detection model.Type: GrantFiled: February 28, 2020Date of Patent: April 16, 2024Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Ning Ding, Hiroshi Fujimura
-
Patent number: 11900922Abstract: Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.Type: GrantFiled: November 10, 2020Date of Patent: February 13, 2024Assignee: International Business Machines CorporationInventors: Samuel Thomas, Hong-Kwang Kuo, Kartik Audhkhasi, Michael Alan Picheny
-
Patent number: 11881211Abstract: Disclosed are an electronic device and a method of controlling the electronic device. An electronic device according to an embodiment may perform a method comprising: performing natural language understanding for a first text included in learning data, obtaining first information associated with a speech corresponding to the first text being uttered based on a result of the natural language understanding, obtain second information associated with an acoustic feature corresponding to the speech corresponding to the first text being uttered based on the first information, obtaining a plurality of speech signals corresponding to the first text by converting a first speech signal corresponding to the first text based on the first information and the second information, and training a speech recognition model based on the plurality of obtained speech signals and the first text.Type: GrantFiled: March 2, 2021Date of Patent: January 23, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Changwoo Han, Kwangyoun Kim, Chanwoo Kim, Kyungmin Lee, Youngho Han
-
Patent number: 11860684Abstract: A first named entity recognition (NER) system may be adapted to create a second NER system that is able to recognize a new named entity using few-shot learning. The second NER system may process support tokens that provide one or more examples of the new named entity and may process input tokens that may contain the new named entity. The second NER system may use a classifier of the first NER system to compute support token embeddings from the support tokens and input token embeddings from the input tokens. The second NER system may then recognize the new named entity in the input tokens using abstract tag transition probabilities and/or distances between the support token embeddings and the input token embeddings.Type: GrantFiled: September 17, 2020Date of Patent: January 2, 2024Assignee: ASAPP, INC.Inventors: Yi Yang, Arzoo Katiyar
-
Patent number: 11848006Abstract: A method of processing an electrical signal transduced from a voice signal is disclosed. A classification model is applied to the electrical signal to produce a classification indicator. The classification model has been trained using an augmented training dataset. The electrical signal is classified as either one of a first class and a second class in a binary classification. The classifying being performed is a function of the classification indicator. A trigger signal is provided to a user circuit as a result of the electrical signal being classified in the first class of the binary classification.Type: GrantFiled: August 24, 2020Date of Patent: December 19, 2023Assignee: STMicroelectronics S.r.l.Inventors: Nunziata Ivana Guarneri, Filippo Naccari
-
Patent number: 11830478Abstract: A learning device calculates a feature of each data included in a pair of datasets in which two modalities among a plurality of modalities are combined, using a model that receives data on a corresponding modality among the modalities and outputs a feature obtained by mapping the received data into an embedding space. The learning device then selects similar data similar to each target data that is data on a first modality in a first dataset of the datasets, from data on a second modality included in a second dataset of the datasets. The learning device further updates a parameter of the model such that the features of the data in the pair included in the first and the second datasets are similar to one another, and the feature of data paired with the target data is similar to the feature of data paired with the similar data.Type: GrantFiled: April 1, 2021Date of Patent: November 28, 2023Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, MASSACHUSETTS INSTITUTE OF TECHNOLOGYInventors: Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, James R. Glass, David Harwath
-
Patent number: 11823669Abstract: According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.Type: GrantFiled: February 28, 2020Date of Patent: November 21, 2023Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Ning Ding, Hiroshi Fujimura
-
Patent number: 11804211Abstract: Implementations are directed to providing a voice bot development platform that enables a third-party developer to train a voice bot based on training instance(s). The training instance(s) can each include training input and training output. The training input can include a portion of a corresponding conversation and a prior context of the corresponding conversation. The training output can include a corresponding ground truth response to the portion of the corresponding conversation. Subsequent to training, the voice bot can be deployed for conducting conversations on behalf of a third-party. In some implementations, the voice bot is further trained based on a corresponding feature emphasis input that attentions the voice bot to a particular feature of the portion of the corresponding conversation. In some additional or alternative implementations, the voice bot is further trained to interact with third-party system(s) via remote procedure calls (RPCs).Type: GrantFiled: December 4, 2020Date of Patent: October 31, 2023Assignee: GOOGLE LLCInventors: Asaf Aharoni, Yaniv Leviathan, Eyal Segalis, Gal Elidan, Sasha Goldshtein, Tomer Amiaz, Deborah Cohen
-
Patent number: 11798562Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.Type: GrantFiled: May 16, 2021Date of Patent: October 24, 2023Assignee: Google LLCInventors: Ignacio Lopez Moreno, Quan Wang, Jason Pelecanos, Yiling Huang, Mert Saglam
-
Patent number: 11790892Abstract: A method includes capturing an event, analyzing the event to generate graphs, receiving a natural language utterance, identifying an entity and a command, modifying the graphs; and emitting an application prototype. An application prototyping server includes a processor; and a memory storing instructions that, when executed by the processor, cause the server to capture an event, analyze the captured event to generate graphs, receive a natural language utterance, identify an entity and a command, modify the graphs; and emit an application prototype. A non-transitory computer readable medium containing program instructions that when executed, cause a computer to: capture an event, analyze the captured event to generate graphs, receive a natural language utterance, identify an entity and a command, modify the graphs; and emit an application prototype.Type: GrantFiled: May 27, 2020Date of Patent: October 17, 2023Assignee: CDW LLCInventor: Joseph Kessler
-
Patent number: 11783810Abstract: Illustrative embodiments provide a method and system for communicating air traffic control information. An audio signal comprising voice activity is received. Air traffic control information in the voice activity is identified using an artificial intelligence algorithm. A text transcript of the air traffic control information is generated and displayed on a confirmation display. Voice activity in the audio signal may be detected by identifying portions of the audio signal that comprise speech based on a comparison between the power spectrum of the audio signal and the power spectrum of noise and forming speech segments comprising the portions of the audio signal that comprise speech.Type: GrantFiled: July 17, 2020Date of Patent: October 10, 2023Assignee: The Boeing CompanyInventors: Stephen Dame, Yu Qiao, Taylor A. Riccetti, David J. Ross, Joshua Welshmeyer, Matthew Sheridan-Smith, Su Ying Li, Zarrin Khiang-Huey Chua, Jose A. Medina, Michelle D. Warren, Simran Pabla, Jasper P. Corleis
-
Patent number: 11776554Abstract: An audio processor for generating a frequency enhanced audio signal from a source audio signal has: an envelope determiner for determining a temporal envelope of at least a portion of the source audio signal; an analyzer for analyzing the temporal envelope to determine temporal values of certain features of the temporal envelope; a signal synthesizer for generating a synthesis signal, the generating having placing pulses in relation to the determined temporal values, wherein the pulses are weighted using weights derived from amplitudes of the temporal envelope related to the temporal values, where the pulses are placed; and a combiner for combining at least a band of the synthesis signal that is not included in the source audio signal and the source audio signal to obtain the frequency enhanced audio signal.Type: GrantFiled: May 27, 2021Date of Patent: October 3, 2023Assignee: FRAUNHOFER-GESELLSCHAFT ZUR FĂ–RDERUNG DER ANGEWANDTEN FORSCHUNG E.V.Inventors: Sascha Disch, Michael Sturm
-
Patent number: 11769491Abstract: A system configured to perform utterance detection using data processing techniques that are similar to those used for object detection is provided. For example, the system may treat utterances within audio data as analogous to an object represented within an image and employ techniques to separate and identify individual utterances. The system may include one or more trained models that are trained to perform utterance detection. For example, the system may include a first module to process input audio data and identify whether speech is represented in the input audio data, a second module to apply convolution filters, and a third module configured to determine a boundary identifying a beginning and ending of a portion of the input audio data along with an utterance score indicating how closely the portion of the input audio data represents an utterance.Type: GrantFiled: September 29, 2020Date of Patent: September 26, 2023Assignee: Amazon Technologies, Inc.Inventors: Abhishek Bafna, Haithem Albadawi
-
Patent number: 11769520Abstract: Techniques are provided for evaluating multiple machine learning models to identify issues with a communication. One method comprises applying an audio signal associated with a communication to at least two of: (i) a trigger word analysis module that evaluates contextual information to determine if a trigger word is detected in the audio signal; (ii) an audio activity pattern analysis module that determines if a silence pattern anomaly is detected; and (iii) a communication application analysis module that evaluates features provided by a communication application relative to applicable thresholds; and combining results of the at least two of the trigger word analysis module, the audio activity pattern analysis module and the communication application analysis module to identify a communication issue. The combining may evaluate an accuracy of the trigger word analysis module, the audio activity pattern analysis module and/or the communication application analysis module to combine the results.Type: GrantFiled: August 17, 2020Date of Patent: September 26, 2023Assignee: EMC IP Holding Company LLCInventors: Idan Richman Goshen, Shiri Gaber
-
Patent number: 11741986Abstract: A method includes obtaining, by an electronic device, an audio segment comprising one or more audio events of a target subject. The method also includes extracting, by the electronic device, audio embeddings from the one or more audio events using an embedding model, the embedding model comprising a trained machine learning model. The method further includes comparing, by the electronic device, the extracted audio embeddings with a match profile of the target subject, the match profile generated during an enrollment stage. The method also includes generating, by the electronic device, a label for the audio segment based on whether or not the extracted audio embeddings match the match profile, wherein the label enables correlation of the audio segment with the target subject for monitoring a health condition of the target subject.Type: GrantFiled: August 20, 2020Date of Patent: August 29, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Korosh Vatanparvar, Tousif Ahmed, Viswam Nathan, Ebrahim Nematihosseinabadi, Md Mahbubur Rahman, Jilong Kuang, Jun Gao
-
Patent number: 11736428Abstract: An approach is provided that receives a message and applies a deep analytic analysis to the message. The deep analytic analysis results in a set of enriched message embedding (EME) data that is passed to a trained neural network. Based on a set of scores received from the trained neural network, a conversation is identified from a number of available conversations to which the received message belongs. The received first message is then associated with the identified conversation.Type: GrantFiled: June 24, 2019Date of Patent: August 22, 2023Assignee: International Business Machines CorporationInventors: Devin A. Conley, Priscilla S. Moraes, Lakshminarayanan Krishnamurthy, Oren Sar-Shalom
-
Patent number: 11721319Abstract: An artificial intelligence device includes a memory and a processor. The memory is configured to store audio data having a predetermined speech style. The processor is configured to generate a condition vector relating to a condition for determining the speech style of the audio data, reduce a dimension of the condition vector to a predetermined reduction dimension, acquire a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension, and change a vector element value included in the sparse code vector.Type: GrantFiled: February 27, 2020Date of Patent: August 8, 2023Assignee: LG ELECTRONICS INC.Inventors: Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang