Patents Examined by Vijay B. Chawan
-
Patent number: 10747960Abstract: Systems and methods are disclosed herein for training a model to accurately determine whether two phrases are conversationally connected. A media guidance application may detect a first phrase and a second phrase, translate each phrase to a string of word types, append each string to the back of a prior string to create a combined string, determine a degree to which any of the individual strings, matches any singleton template, and determine a degree to which the combined string matches any conversational template. Based on the degrees to which the individual and combination strings match the singleton and conversational templates, respectively, strengths of association are correspondingly updated.Type: GrantFiled: October 11, 2018Date of Patent: August 18, 2020Assignee: ROVI GUIDES, INC.Inventors: Sashikumar Venkataraman, Ahmed Nizam Mohaideen P, Manik Malhotra
-
Patent number: 10748544Abstract: A voice processing device includes: a sound source localization unit configured to determine a direction of each sound source on the basis of voice signals of a plurality of channels; a sound source separation unit configured to separate signals for respective sound sources indicating components of respective sound sources from the voice signals of the plurality of channels; a speech section detection unit configured to detect a speech section in which the number of speakers is 1 from the signals for respective sound sources; and a speaker identification unit configured to identify a speaker on the basis of the signals for respective sound sources in the speech section.Type: GrantFiled: March 23, 2018Date of Patent: August 18, 2020Assignee: HONDA MOTOR CO., LTD.Inventors: Kazuhiro Nakadai, Tomoyuki Sahata
-
Patent number: 10741176Abstract: Mechanisms are provided for customizing responses to future questions based on identified anomalies in user profile information. An automated dialogue system monitors information associated with a plurality of entities, where the information includes quantities for variable values associated with the entities. The automated dialogue system, in response to determining that a quantity of a variable value associated with an entity in the plurality of entities has changed by an amount equal to or exceeding a corresponding threshold value, generates response information associated with a quantity of the variable value and an entity to respond to at least one future question. In addition, the automated dialogue system stores the responsive information in association with the entity for later retrieval in response to initiation of a dialogue session with the automated dialogue system. Time thresholds may be established for determining when to stop using the responsive information for responding to questions.Type: GrantFiled: January 31, 2018Date of Patent: August 11, 2020Assignee: International Business Machines CorporationInventors: Mustafa Canim, Robert G. Farrell, John A. Gunnels, Arun K. Iyengar
-
Patent number: 10733977Abstract: Methods, systems, and computer readable media for automated transcription model adaptation includes obtaining audio data from a plurality of audio files. The audio data is transcribed to produce at least one audio file transcription which represents a plurality of transcription alternatives for each audio file. Speech analytics are applied to each audio file transcription. A best transcription is selected from the plurality of transcription alternatives for each audio file. Statistics from the selected best transcription are calculated. An adapted model is created from the calculated statistics.Type: GrantFiled: March 29, 2017Date of Patent: August 4, 2020Assignee: Verint Systems Ltd.Inventors: Ran Achituv, Omer Ziv, Roni Romano, Ido Shapira, Daniel Baum
-
Patent number: 10720176Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.Type: GrantFiled: August 22, 2018Date of Patent: July 21, 2020Assignee: Google LLCInventors: Dave Burke, Michael J. Lebeau, Konrad Gianno, Trausti T. Kristjansson, John Nicholas Jitkoff, Andrew W. Senior
-
Patent number: 10719666Abstract: A method for identifying phrases in a text document having a similar discourse to a candidate phrase includes separating text in a document file into a plurality of phrases and generating a plurality of embedding vectors in a textual embedding space by inputting the plurality of phrases into an embedding engine. A mapping of each embedding vector in the textual embedding space is generated with each corresponding phrase and a document location of each corresponding phrase in the document file. A candidate phrase is received by a user and a candidate embedding vector is generated using the embedding engine. Similarity scores are computed based on the plurality of embedding space distances between the candidate phrase embedding vector location and each respective location of each embedding vector in the textual embedding space. A listing of phrases with the highest similarity scores are outputted with respective document locations in the text.Type: GrantFiled: January 31, 2020Date of Patent: July 21, 2020Assignee: Capital One Services, LLCInventors: Austin Walters, Vincent Pham, Ernest Kwak, Galen Rafferty, Reza Farivar, Jeremy Goodsitt, Anh Truong
-
Patent number: 10714120Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.Type: GrantFiled: June 25, 2018Date of Patent: July 14, 2020Assignee: Google LLCInventors: Dave Burke, Michael J. Lebeau, Konrad Gianno, Trausti T. Kristjansson, John Nicholas Jitkoff, Andrew W. Senior
-
Patent number: 10713010Abstract: A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.Type: GrantFiled: October 24, 2018Date of Patent: July 14, 2020Assignee: Google LLCInventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, William J. Byrne, Gudmundur Hafsteinsson, Michael J. Lebeau
-
Patent number: 10714092Abstract: A sensor processing unit comprises a sensor processor. The sensor processor is configured to communicatively couple with a microphone. The sensor processor is configured to acquire, from the microphone, a sample captured by the microphone from an environment in which the microphone is disposed. The sensor processor is configured to perform music activity detection on the audio sample to detect for music within the audio sample. Responsive to detection of music within the audio sample, the sensor processor is configured to send a music detection signal to an external processor located external to the sensor processing unit, the music detection signal indicating that music has been detected in the environment.Type: GrantFiled: October 1, 2018Date of Patent: July 14, 2020Assignee: InvenSense, Inc.Inventors: William Kerry Keal, Ajay Kumar Dhanapalan, Sangnam Choi, Carlo Murgia, Eitan A. Medina, Taro Kimura
-
Patent number: 10706855Abstract: A multimedia processing circuit is provided. The multimedia processing circuit includes a smart interpreter engine and an audio engine. The smart interpreter engine includes a noise suppression module, a vocal identification module and a speech to text converter. The noise suppression module is utilized for performing a noise suppression process on speech data corresponding to a first language. The vocal identification module is utilized for performing a vocal identification process on the noise-suppressed speech data corresponding to the first language to generate vocal identification data corresponding to the first language. The speech to text converter is utilized for converting the vocal identification data corresponding to the first language into text data corresponding to the first language.Type: GrantFiled: November 26, 2018Date of Patent: July 7, 2020Assignee: ACER INCORPORATEDInventors: Gianna Tseng, Shih-Cheng Huang, Shang-Yao Lin, Szu-Ting Chou
-
Patent number: 10699691Abstract: A head-mounted wearable device (HMWD) may incorporate bone conduction speakers (BCS) to generate audio output that is perceptible to a wearer. During operation, vibrations of the BCS may produce sound in the surrounding air that is perceptible to bystanders. An active noise cancellation module monitors the output and generates cancellation audio that is out of phase with the sound leaked by the BCS. An air conduction speaker emits the cancellation audio, producing destructive acoustic interference to the leaked sound. As a result, the user of the HMWD is able to hear audio clearly while bystanders are not.Type: GrantFiled: June 29, 2017Date of Patent: June 30, 2020Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Geng Ye, Zhen Xu, Haoxing Zhang, Bozhao Tan, Xuan Zhong, Jianchun Dong
-
Patent number: 10692503Abstract: A voice data processing method and apparatus are provided. The method includes obtaining an I-Vector vector of each of voice samples, and determining a target seed sample in the voice samples. A first cosine distance is calculated between an I-Vector vector of the target seed sample and an I-Vector vector of a target remaining voice sample, where the target remaining voice sample is a voice sample other than the target seed sample in the voice samples. A target voice sample is filtered from the voice samples or the target remaining voice sample according to the first cosine distance, to obtain a target voice sample whose first cosine distance is greater than a first threshold.Type: GrantFiled: March 3, 2017Date of Patent: June 23, 2020Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Xingming Jin, Wei Li, Fangmai Zheng, Fuzhang Wu, Bilei Zhu, Binghua Qian, Ke Li, Yongjian Wu, Feiyue Huang
-
Patent number: 10692496Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.Type: GrantFiled: May 21, 2019Date of Patent: June 23, 2020Assignee: Google LLCInventors: Alexander H. Gruenstein, Taral Pradeep Joglekar, Vijayaditya Peddinti, Michiel A. U. Bacchiani
-
Patent number: 10694298Abstract: A hearing aid configured for detecting and enhancing speech within an audio environment is disclosed. An incoming audio stream is continuously monitored for the presence of speech within the audio stream. A Codebook Excited Linear Prediction (“CELP”) encoder analyzes the incoming audio stream and outputs an indication of a presence or absence of human speech within the incoming audio stream. Upon detection of human speech, the hearing aid in real time may: amplify the audio input to make the speech more audible to a wearer; filter non-speech audio through isolation of the speech by passing the output of the CELP encoder directly to a CELP decoder; activate a beam-steering process which makes dominant a microphone closest to a speaker while de-prioritizing input from other microphones of the hearing aid; and/or shape the audio spectrum conveyed by the audio input using a response curve optimized for better clarity of human speech.Type: GrantFiled: October 22, 2019Date of Patent: June 23, 2020Inventors: Zeev Neumeier, W. Leo Hoarty
-
Patent number: 10685645Abstract: A method for creating binary classification models and using the binary classification models to select candidate training utterances from a plurality of live utterances is provided. The method may include receiving a plurality of intents and associated training utterances. The method may include creating, from the training utterances, a binary classification model for each intent. The binary classification model may include a vector representation of a line of demarcation between utterances associated with the intent and utterances disassociated from the intent. The method may also include receiving live utterances. An intent may be determined for each live utterance. The method may include creating a vector representation of the live utterance. The method may include selecting candidate training utterances based on a comparison between the vector representation of the live utterance and the vector representation included in the binary classification model of the intent determined for the live utterance.Type: GrantFiled: August 9, 2018Date of Patent: June 16, 2020Assignee: Bank of America CorporationInventors: Maruthi Z. Shanmugam, Luis Gerardo Mojica de la Vega, Donatus Asumu
-
Patent number: 10682086Abstract: Disclosed herein is a system and method for delivering a digital therapeutic, specific to a user emotional or mental state (EMS). This entails recognizing at least one EMS selected by the user from a plurality of EMS, said selected EMS indicating at least one of a feeling, sensation, type of discomfort, mood, mental state, emotional condition, or physical status of the user. Pushing at least a primary-level message personalized to the user based on at least one stored message coupled to the selected EMS; wherein the at least primary-level messages contain at least one of a text, image, sound, video, art asset, suggested action or recommended behavior. The actions suggested or behaviors recommended are supported by at least one independent source of peer-reviewed research, as verified by a credentialed EMS expert.Type: GrantFiled: April 20, 2018Date of Patent: June 16, 2020Assignee: Aebeze labsInventor: Michael Phillips Moskowitz
-
Patent number: 10667878Abstract: A surgical system includes a plurality of voice sensors located in a surgical environment and configured to detect sound and generate a first plurality of signals. The surgical system also includes a position indicator, in proximity to a designated user, configured to indicate a first position of the designated user and generate a second signal representative of the first position. The surgical system further includes a processor configured to receive the first plurality of signals and the second signal and determine, based on the first plurality of signals, a second position. The processor is also configured to compare the detected sound with registered voice command of the designated user stored in a memory to verify the designated user's credentials, and send a command signal to a surgical instrument to carry out an operation related to the voice command based on at least one of the verification of the designated user's credentials, the first position and the second position.Type: GrantFiled: June 20, 2018Date of Patent: June 2, 2020Assignee: Ethicon LLCInventors: David J. Cagle, Eric Smith, Jeffrey L. Aldridge, Mary E. Mootoo, Ryan Asher
-
Patent number: 10657955Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.Type: GrantFiled: January 30, 2018Date of Patent: May 19, 2020Assignee: Baidu USA LLCInventors: Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
-
Patent number: 10650833Abstract: A method for generating a bitstream indicative of an object based audio program is described. The bitstream comprises a sequence of containers. A first container of the sequence of containers comprises a plurality of substream entities for a plurality of substreams of the object based audio program and a presentation section. The method comprises determining a set of object channels. The method further comprises providing a set of object related metadata for the set of object channels. In addition, the method comprises inserting a first set of object channel frames and a first set of object related metadata frames into a respective set of substream entities of the first container. Furthermore, the method comprises inserting presentation data into the presentation section.Type: GrantFiled: October 2, 2018Date of Patent: May 12, 2020Assignee: Dolby International ABInventors: Christof Fersch, Alexander Stahlmann
-
Patent number: 10650810Abstract: Systems and methods of determining phonetic relationships are provided. For instance data indicative of an input text phrase input by a user can be received. An audio output corresponding to a spoken rendering of the input text phrase can be determined. A text transcription of the audio output of the input text phrase can be determined. The text transcription can be a textual representation of the audio output. The text transcription can be compared against a plurality of test phrases to identify a match between the text transcription and at least one test phrase.Type: GrantFiled: September 29, 2017Date of Patent: May 12, 2020Assignee: GOOGLE LLCInventors: Nikhil Chandru Rao, Saisuresh Krishnakumaran