Patents Examined by Paras D Shah
  • Patent number: 10971131
    Abstract: The present disclosure discloses a method and apparatus for generating a speech synthesis model. A specific embodiment of the method comprises: acquiring a plurality of types of training samples, each of the plurality of types of training samples including a text of the type, and a speech of the text having a style of speech corresponding to the type read by an announcer corresponding to the type; and training a neural network corresponding to a speech synthesis model using the plurality of types of training samples and an annotation of the style of speech in the each of the plurality of types of training samples to obtain the speech synthesis model, the speech synthesis model being used to synthesize speech of the announcer corresponding to each of the plurality of types having a plurality of styles.
    Type: Grant
    Filed: August 3, 2018
    Date of Patent: April 6, 2021
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventor: Yongguo Kang
  • Patent number: 10971155
    Abstract: Methods and systems are provided for monitoring onboard communications after a change to a functionality of an onboard system. An exemplary method involves identifying a source initiating the change to the functionality of the onboard system, determining an expected response to the change to the functionality of the onboard system by a vehicle operator based at least in part on the change, the source, and one or more callout rules associated with the onboard system, monitoring for the expected response from the vehicle operator, and generating a user notification in response to an absence of the expected response from the vehicle operator.
    Type: Grant
    Filed: April 12, 2018
    Date of Patent: April 6, 2021
    Assignee: HONEYWELL INTERNATIONAL INC.
    Inventors: Anil Kumar Songa, Paula Renee Gardner, Kishore Kumar Sandrana
  • Patent number: 10964308
    Abstract: A speech processing apparatus is provided in which, while face feature points are extracted from moving image data obtained by imaging a speaker's face, for each frame, a first generation network for generating face feature points of the corresponding frame based on speech feature data extracted from uttered speech of the speaker for each frame is generated, and whether the first generation network is appropriate is evaluated using an identification network, then, a second generation network for generating the uttered speech from a plurality of uncertain settings including at least text representing utterance content of the uttered speech and information indicating emotions included in the uttered speech, a plurality of types of fixed settings which define speech quality, and the face feature points generated by the first generation network evaluated as appropriate, is generated, and whether the second generation network is appropriate is evaluated using the identification network.
    Type: Grant
    Filed: October 29, 2018
    Date of Patent: March 30, 2021
    Inventor: Ken-ichi Kainuma
  • Patent number: 10963510
    Abstract: A natural language processing system that includes an artificial intelligence (AI) engine and a tagging engine. The AI engine is configured to receive a set of audio files and to identify concepts within the set of audio files. The AI engine is further configured to determine a usage frequency for each of the identified concepts and to generate an AI-defined tag for concepts with a usage frequency that is greater than a usage frequency threshold. The tagging engine is configured to receive an audio file and to identify observed concepts within the audio file. The tagging engine is further configured to compare the observed concepts to the first set of concepts, to determine one or more observed concepts matches concepts linked with AI-defined tags, and to modify metadata for the audio file to include AI-defined tags.
    Type: Grant
    Filed: August 9, 2018
    Date of Patent: March 30, 2021
    Assignee: Bank of America Corporation
    Inventors: James McCormack, Sean M. Gutman, Manu J. Kurian, Sasidhar Purushothaman, Suki Ramasamy, William P. Jacobson
  • Patent number: 10963644
    Abstract: Computer-implemented techniques are described herein for generating and utilizing a universal encoder component (UEC). The UEC maps a linguistic expression in a natural language to a language-agnostic representation of the linguistic expression. The representation is said to be agnostic with respect to language because it captures semantic content that is largely independent of the syntactic rules associated with the natural language used to compose the linguistic expression. The representations is also agnostic with respect to task because a downstream training system can leverage it to produce different kinds to machine-trained components that serve different respective tasks. The UEC facilitates the generation of downstream machine-trained models by permitting a developer to train a model based on input examples expressed in a language j?, and thereafter apply it to the interpretation of documents in language j?, with no additional training required.
    Type: Grant
    Filed: December 27, 2018
    Date of Patent: March 30, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Armen Aghajanyan, Xia Song, Saurabh Kumar Tiwary
  • Patent number: 10949626
    Abstract: The present disclosure provides a global simultaneous interpretation method and production thereof, the method includes the following steps: receiving a calling request sent by a terminal by a smart phone, connecting the calling request, and establishing a calling connection; receiving a first voice information transmitted through the calling connection by the smart phone, and when the first voice information is identified and is determined as a non-specified language, translating the first voice information into a second voice information of a specified language; and playing the second voice information by using a speaker device by the smart phone.
    Type: Grant
    Filed: March 12, 2019
    Date of Patent: March 16, 2021
    Assignee: WING TAK LEE SILICONE RUBBER TECHNOLOGY (SHENZHEN) CO., LTD
    Inventor: Tak Nam Liu
  • Patent number: 10930287
    Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription; and ge
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: February 23, 2021
    Inventors: Tejas Shastry, Matthew Goldey, Svyat Vergun
  • Patent number: 10923118
    Abstract: An audio input method includes: in an audio-input mode, receiving a first audio input by a user, recognizing the first audio to generate a first recognition result, and displaying corresponding verbal content to the user based on the first recognition result; and in an editing mode, receiving a second audio input by the user and recognizing and generating a second recognition result, converting the second recognition result to an editing instruction, and executing a corresponding operation based on the editing operation. The audio-input mode and the editing mode are switchable.
    Type: Grant
    Filed: November 17, 2016
    Date of Patent: February 16, 2021
    Assignee: BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO., LTD.
    Inventors: Liping Li, Suhang Wang, Congxian Yan, Lei Yang, Min Liu, Hong Zhao, Jia Yao
  • Patent number: 10923109
    Abstract: A computer-implemented method and an apparatus for facilitating training of conversational agents are disclosed. The method includes automatically extracting a workflow associated with each conversation from among a plurality of conversations between agents and customers of an enterprise. The workflow is extracted, at least in part, by encoding one or more utterances associated with the respective conversation and mapping the encoded one or more utterances to predefined workflow stages. A clustering of the plurality of conversations is performed based on a similarity among respective extracted workflows. The clustering of the plurality of conversations configures a plurality of workflow groups. At least one conversational agent is trained in customer engagement using a set of conversations associated with at least one workflow group from among the plurality of workflow groups.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: February 16, 2021
    Assignee: [24]7.ai, Inc.
    Inventors: Abir Chakraborty, Sruti Rallapalli, Vidhya Duthaluru
  • Patent number: 10923139
    Abstract: Systems and methods are provided for processing information of a meeting. An exemplary system may include a communication interface configured to receive meeting information obtained by a plurality of client devices. The meeting information may include multiple audio streams. The system may also include a memory and a processor. The processor may execute instructions stored on the memory to perform operations. The operations may include determining signal-to-noise-ratio (SNR) indicators associated with the audio streams. The operations may also include selecting, from the audio streams, a candidate audio stream based on the SNR indicators. The SNR indicator associated with the candidate audio stream may indicate that the candidate audio stream has a higher average SNR than that of a predetermined number of other audio streams. In addition, the operations may include generating an output data stream including at least a portion of the candidate audio stream.
    Type: Grant
    Filed: August 13, 2018
    Date of Patent: February 16, 2021
    Assignee: MELO INC.
    Inventors: Guobin Shen, Zheng Han
  • Patent number: 10916235
    Abstract: Systems, methods, and computer programs are described which utilize the structure of syllables as an organizing element of automated speech recognition processing to overcome variations in pronunciation, to efficiently resolve confusable aspects, to exploit context, and to map the speech to orthography.
    Type: Grant
    Filed: July 10, 2018
    Date of Patent: February 9, 2021
    Assignee: VOX FRONTERA, INC.
    Inventors: Mark B. Pinson, Darrel T. Pinson
  • Patent number: 10915706
    Abstract: A computer-implemented method includes: receiving, by a computing device, a text report request from a user device associated with a user; obtaining a behavior history and personal information of the user; inputting the behavior history and the personal information of the user into a model, to obtain a plurality of personalized evaluation results, each personalized evaluation result corresponding to a respective text report category of a plurality of text report categories, in which each personalized evaluation result indicates a predicted relevance of the corresponding text report category to a problem faced by the user, and in which the model includes a classification model trained using one or more supervised learning techniques on a plurality of user behavior history samples and a plurality of personal information samples; and determining an order in which the plurality of text report categories are to be presented to the user.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: February 9, 2021
    Assignee: Advanced New Technologies Co., Ltd.
    Inventors: Hong Jin, Weiqiang Wang
  • Patent number: 10896765
    Abstract: A mathematical model may be trained to diagnose a medical condition of a person by processing acoustic features and language features of speech of the person. The performance of the mathematical model may be improved by appropriately selecting the features to be used with the mathematical model. Features may be selected by computing a feature selection score for each acoustic feature and each language feature, and then selecting features using the scores, such as by selecting features with the highest scores. In some implementations, stability determinations may be computed for each feature and features may be selected using both the feature selection scores and the stability determinations. A mathematical model may then be trained using the selected features and deployed. In some implementations, prompts may be selected using computed prompt selection scores, and the deployed mathematical model may be used with the selected prompts.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: January 19, 2021
    Assignee: CANARY SPEECH, LLC
    Inventors: Jangwon Kim, Namhee Kwon, Henry O'Connell, Phillip Walstad, Kevin Shengbin Yang
  • Patent number: 10891953
    Abstract: Embodiments may be implemented by a computing device, such as a head-mountable display, in order to use a single guard phrase to enable different voice commands in different interface modes. An example device includes an audio sensor and a computing system configured to analyze audio data captured by the audio sensor to detect speech that includes a predefined guard phrase, and to operate in a plurality of different interface modes comprising at least a first and a second interface mode. During operation in the first interface mode, the computing system may initially disable one or more first-mode speech commands, and respond to detection of the guard phrase by enabling the one or more first-mode speech commands. During operation in the second interface mode, the computing system may initially disable a second-mode speech command, and to respond to the guard phrase by enabling the second-mode speech command.
    Type: Grant
    Filed: December 11, 2018
    Date of Patent: January 12, 2021
    Assignee: GOOGLE LLC
    Inventors: Michael J. LeBeau, Mat Balez
  • Patent number: 10891948
    Abstract: A system, method and computer product are provided for processing audio signals. An audio signal of a voice and background noise is input, and speech recognition is performed to retrieve speech content of the voice. There is retrieval of content metadata corresponding to the speech content, and environmental metadata corresponding to the background noise. There is a determination of preferences for media content corresponding to the content metadata and the environmental metadata, and an output is provided corresponding to the preferences.
    Type: Grant
    Filed: February 21, 2018
    Date of Patent: January 12, 2021
    Assignee: SPOTIFY AB
    Inventor: Stéphane Hulaud
  • Patent number: 10891444
    Abstract: A computer-implemented method, a computer system and a non-transitory computer-readable medium for constructing human-readable sentences from imaging data of a subject can include: receiving imaging data including image elements of at least one region of interest of the subject; segmenting the imaging data of the region of interest into a plurality of sub-regions, where each sub-region includes a portion of the image elements; calculating an abnormality factor for each of the sub-regions by quantitatively analyzing segmented image information of the imaging data of the sub-regions using data from a normal database; comparing each abnormality factor to a threshold value; constructing a human-understandable sentence for the subject when a corresponding abnormality factor exceeds the threshold, where each human-understandable sentence references a physical structure threshold associated with the calculation for the region or sub-region; and outputting the human-understandable sentences for the at least one regio
    Type: Grant
    Filed: October 26, 2016
    Date of Patent: January 12, 2021
    Assignee: The Johns Hopkins University
    Inventors: Susumu Mori, Michael I. Miller
  • Patent number: 10872620
    Abstract: Embodiments of the present disclosure provide a voice detection method. An audio signal can be divided into a plurality of audio segments. Audio characteristics can be extracted from each of the plurality of audio segments. The audio characteristics of the respective audio segment include a time domain characteristic and a frequency domain characteristic of the respective audio segment. At least one target voice segment can be detected from the plurality of audio segments according to the audio characteristics of the plurality of audio segments.
    Type: Grant
    Filed: May 1, 2018
    Date of Patent: December 22, 2020
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventor: Haijin Fan
  • Patent number: 10847165
    Abstract: Detecting a replay attack on a voice biometrics system comprises: receiving a speech signal from a voice source; generating and transmitting an ultrasound signal through a transducer of the device; detecting a reflection of the transmitted ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of liveness of a speaker based on the detected Doppler shifts. The method further comprises: obtaining information about a position of the device; and adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device.
    Type: Grant
    Filed: October 10, 2018
    Date of Patent: November 24, 2020
    Assignee: Cirrus Logic, Inc.
    Inventor: John Paul Lesso
  • Patent number: 10847177
    Abstract: Described embodiments include an apparatus that includes a network interface and a processor. The processor is configured to receive, via the network interface, a speech signal that represents speech uttered by a subject, the speech including one or more speech segments, divide the speech signal into multiple frames, such that one or more sequences of the frames represent the speech segments, respectively, compute respective estimated total volumes of air exhaled by the subject while the speech segments were uttered, by, for each of the sequences, computing respective estimated flow rates of air exhaled by the subject during the frames belonging to the sequence and, based on the estimated flow rates, computing a respective one of the estimated total volumes of air, and, in response to the estimated total volumes of air, generate an alert. Other embodiments are also described.
    Type: Grant
    Filed: October 11, 2018
    Date of Patent: November 24, 2020
    Assignee: CORDIO MEDICAL LTD.
    Inventor: Ilan D. Shallom
  • Patent number: 10848868
    Abstract: In an example, an audio signal may be routed to an audio device based on an indication of audio device historical usage, a measure of audio quality of the audio device, or a combination thereof.
    Type: Grant
    Filed: February 21, 2017
    Date of Patent: November 24, 2020
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Mohit Gupta