Dynamic Time Warping Patents (Class 704/241)
-
Patent number: 12225269Abstract: Methods, systems, and apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.Type: GrantFiled: November 6, 2023Date of Patent: February 11, 2025Assignee: DISH Network Technologies India Private LimitedInventor: Mayank Verma
-
Patent number: 12167159Abstract: Techniques performed by a data processing system for a machine learning driven teleprompter include displaying a teleprompter transcript associated with a presentation on a display of a computing device associated with a presenter; receiving audio content of the presentation including speech of the presenter in which the presenter is reading the teleprompter transcript; analyzing the audio content of the presentation using a first machine learning model to obtain a real-time textual translation of the audio content, the first machine learning model being a natural language processing model trained to receive audio content including speech and to translate the audio content into a textual representation of the speech; analyzing the real-time textual representation and the teleprompter transcript with a second machine learning model to obtain transcript position information; and automatically scrolling the teleprompter transcript on the display of the computing device based on the transcript position informatioType: GrantFiled: December 7, 2023Date of Patent: December 10, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Chakkaradeep Chinnakonda Chandran, Stephanie Lorraine Horn, Michael Jay Gilmore, Tarun Malik, Sarah Zaki, Tiffany Michelle Smith, Shivani Gupta, Pranjal Saxena, Ridhima Gupta
-
Patent number: 11902690Abstract: Techniques performed by a data processing system for a machine learning driven teleprompter include displaying a teleprompter transcript associated with a presentation on a display of a computing device associated with a presenter; receiving audio content of the presentation including speech of the presenter in which the presenter is reading the teleprompter transcript; analyzing the audio content of the presentation using a first machine learning model to obtain a real-time textual translation of the audio content, the first machine learning model being a natural language processing model trained to receive audio content including speech and to translate the audio content into a textual representation of the speech; analyzing the real-time textual representation and the teleprompter transcript with a second machine learning model to obtain transcript position information; and automatically scrolling the teleprompter transcript on the display of the computing device based on the transcript position informatioType: GrantFiled: January 19, 2022Date of Patent: February 13, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Chakkaradeep Chinnakonda Chandran, Stephanie Lorraine Horn, Michael Jay Gilmore, Tarun Malik, Sarah Zaki, Tiffany Michelle Smith, Shivani Gupta, Pranjal Saxena, Ridhima Gupta
-
Patent number: 11849193Abstract: Methods, Systems, and Apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.Type: GrantFiled: October 19, 2022Date of Patent: December 19, 2023Inventor: Mayank Verma
-
Patent number: 11830473Abstract: A system for synthesising expressive speech includes: an interface configured to receive an input text for conversion to speech; a memory; and at least one processor coupled to the memory. The processor is configured to generate, using an expressivity characterisation module, a plurality of expression vectors, wherein each expression vector is a representation of prosodic information in a reference audio style file, and synthesise expressive speech from the input text, using an expressive acoustic model comprising a deep convolutional neural network that is conditioned by at least one of the plurality of expression vectors.Type: GrantFiled: September 29, 2020Date of Patent: November 28, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jesus Monge Alvarez, Holly Francois, Hosang Sung, Seungdo Choi, Kihyun Choo, Sangjun Park
-
Patent number: 11785278Abstract: Alignment between closed caption and audio/video content may be improved by determining text associated with a portion of the audio or a portion of the video and comparing the determined text to a portion of closed caption text. Based on the comparison, a delay may be determined and the audio/video content may be buffered based on the determined delay.Type: GrantFiled: March 18, 2022Date of Patent: October 10, 2023Assignee: Comcast Cable Communications, LLCInventor: Christopher Stone
-
Patent number: 11736773Abstract: Systems and methods for generating audible pronunciation of a closed captioning word in a content item. For example, a system generates for output on a first device a content item comprising dialogue. The system generates for display on the first device a closed captioning word corresponding to the dialogue where the closed captioning word is selectable via a user interface of the first device. The system receives a selection of the closed captioning word via the user interface of the first device. In response to receiving the selection of the closed captioning word, the system generates for playback on the first device at least a portion of the dialogue corresponding to the selected closed captioning word.Type: GrantFiled: October 15, 2021Date of Patent: August 22, 2023Assignee: Rovi Guides, Inc.Inventor: Serhad Doken
-
Patent number: 11683558Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to determine the speed-up of media programs using speech recognition. An example apparatus disclosed herein is to perform speech recognition on a first audio clip collected by a media meter to recognize a first text string associated with the first audio clip, compare the first text string to a plurality of reference text strings associated with a corresponding plurality of reference audio clips to identify a matched one of the reference text strings, and estimate a presentation rate of the first audio clip based on a first time associated with the first audio clip and a second time associated with a first one of the reference audio clips corresponding to the matched one of the reference text strings.Type: GrantFiled: December 29, 2021Date of Patent: June 20, 2023Assignee: THE NIELSEN COMPANY (US), LLCInventor: Morris Lee
-
Patent number: 11615800Abstract: A speaker recognition system for assessing the identity of a speaker through a speech signal based on speech uttered by said speaker is provided. The system includes a framing module that subdivides the speech signal over time into a set of frames, and a filtering module that analyzes the frames of the set to discard frames affected by noise and frames which do not comprise a speech, based on a spectral analysis of the frames. A feature extraction module extracts audio features from frames which have not been discarded, and a classification module processes the audio features extracted from the frames which have not been discarded for assessing the identity of the speaker.Type: GrantFiled: April 18, 2018Date of Patent: March 28, 2023Assignee: TELECOM ITALIA S.p.A.Inventors: Igor Bisio, Cristina Fra', Chiara Garibotto, Fabio Lavagetto, Andrea Sciarrone, Massimo Valla
-
Patent number: 11509969Abstract: Methods, Systems, and Apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.Type: GrantFiled: May 25, 2021Date of Patent: November 22, 2022Inventor: Mayank Verma
-
Patent number: 11445266Abstract: Audiovisual content in the form of video clip files, streamed or broadcasted may further contain subtitles. Such subtitles are provided with timing information so that each subtitle should be displayed synchronously with the spoken words. However, at times such synchronization with the audio portion of the audiovisual content has a timing offset which when above a predetermined threshold is bothersome. The system and method determine time spans in which a human speaks and attempts to synchronize those time spans with the subtitle content. Indication is provided when an incurable synchronization exists as well as the case where the subtitles and audio are well synchronized. It further is able to determine, when an offset exists, the type of offset (constant or dynamic) and providing the necessary adjustment information so that the timing used in conjunction with the subtitles timing provided may be corrected and synchronization deficiency resolved.Type: GrantFiled: March 12, 2021Date of Patent: September 13, 2022Assignee: IChannel.IO Ltd.Inventor: Oren Jack Maurice
-
Patent number: 11270123Abstract: Embodiments described herein provide a system for localized contextual video annotation. During operation, the system can segment a video into a plurality of segments based on a segmentation unit and parse a respective segment for generating multiple input modalities for the segment. A respective input modality can indicate a form of content in the segment. The system can then classify the segment into a set of semantic classes based on the input modalities and determine an annotation for the segment based on the set of semantic classes.Type: GrantFiled: September 21, 2020Date of Patent: March 8, 2022Assignee: Palo Alto Research Center IncorporatedInventors: Karunakaran Sureshkumar, Raja Bala
-
Patent number: 11223878Abstract: An electronic device is disclosed. The electronic device comprises: a microphone for receiving voice; a memory for storing a plurality of text sets; and a processor for converting the voice, received via the microphone, into text, searching for words common to the converted text with respect to each of the plurality of text sets, and determining at least one text set of the plurality of text sets on the basis of the ratio of the searched common words.Type: GrantFiled: October 25, 2018Date of Patent: January 11, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Jae Hyun Bae
-
Patent number: 11205419Abstract: Low energy deep-learning networks for generating auditory features such as mel frequency cepstral coefficients in audio processing pipelines are provided. In various embodiments, a first neural network is trained to output auditory features such as mel-frequency cepstral coefficients, linear predictive coding coefficients, perceptual linear predictive coefficients, spectral coefficients, filter bank coefficients, and/or spectro-temporal receptive fields based on input audio samples. A second neural network is trained to output a classification based on input auditory features such as mel-frequency cepstral coefficients. An input audio sample is provided to the first neural network. Auditory features such as mel-frequency cepstral coefficients are received from the first neural network. The auditory features such as mel-frequency cepstral coefficients are provided to the second neural network. A classification of the input audio sample is received from the second neural network.Type: GrantFiled: August 28, 2018Date of Patent: December 21, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Davis Barch, Andrew S. Cassidy, Myron D. Flickner
-
Patent number: 11195527Abstract: Apparatus and method for processing speech recognition may include a speech recognition module that recognizes a voice uttered from a user, and a processing module that calls a user DB where information associated with the user is registered when a voice command of the user is input by the speech recognition module, verifies setting information related to a domain corresponding to the voice command, and processes the voice command through a content provider linked to the associated domain.Type: GrantFiled: August 1, 2019Date of Patent: December 7, 2021Assignees: Hyundai Motor Company, Kia Motors CorporationInventor: Jae Min Joh
-
Patent number: 11178463Abstract: An electronic device is disclosed. The electronic device comprises: a microphone for receiving voice; a memory for storing a plurality of text sets; and a processor for converting the voice, received via the microphone, into text, searching for words common to the converted text with respect to each of the plurality of text sets, and determining at least one text set of the plurality of text sets on the basis of the ratio of the searched common words.Type: GrantFiled: October 25, 2018Date of Patent: November 16, 2021Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Jae Hyun Bae
-
Patent number: 11032620Abstract: Methods, Systems, and Apparatuses are described to implement voice search in media content for requesting media content of a video clip of a scene contained in the media content streamed to the client device; for capturing the voice request for the media content of the video clip to display at the client device wherein the streamed media content is a selected video streamed from a video source; for applying a NLP solution to convert the voice request to text for matching to a set of one or more words contained in at least close caption text of the selected video; for associating matched words to close caption text with a start index and an end index of the video clip contained in the selected video; and for streaming the video clip to the client device based on the start index and the end index associated with matched closed caption text.Type: GrantFiled: February 14, 2020Date of Patent: June 8, 2021Assignee: SLING MEDIA PVT LTDInventor: Mayank Verma
-
Patent number: 11024318Abstract: A method of speaker verification comprises: comparing a test input against a model of a user's speech obtained during a process of enrolling the user; obtaining a first score from comparing the test input against the model of the user's speech; comparing the test input against a first plurality of models of speech obtained from a first plurality of other speakers respectively; obtaining a plurality of cohort scores from comparing the test input against the plurality of models of speech obtained from a plurality of other speakers; obtaining statistics describing the plurality of cohort scores; modifying said statistics to obtain adjusted statistics; normalising the first score using the adjusted statistics to obtain a normalised score; and using the normalised score for speaker verification.Type: GrantFiled: November 15, 2019Date of Patent: June 1, 2021Assignee: Cirrus Logic, Inc.Inventors: John Paul Lesso, Gordon Richard McLeod
-
Patent number: 11010645Abstract: A method and system for an AI-based communication training system for individuals and organizations is disclosed. A video analyzer is used to convert a video signal into a plurality of human morphology features with an accompanying audio analyzer converting an audio signal into a plurality of human speech features. A transformation module transforms the morphology features and the speech features into a current multi-dimensional performance vector and combinatorial logic generates an integration of the current multi-dimensional performance vector and one or more prior multi-dimensional performance vectors to generate a multi-session rubric. Backpropagation logic applies a current multi-dimensional performance vector from the combinatorial logic to the video analyzer and the audio analyzer.Type: GrantFiled: August 26, 2019Date of Patent: May 18, 2021Assignee: TalkMeUpInventors: JiaoJiao Xu, Yi Xu, Chenchen Zhu, Matthew Thomas Spettel
-
Patent number: 11011177Abstract: A voice identification method comprises: obtaining audio data, and extracting an audio feature of the audio data; determining whether a voice identification feature having a similarity with the audio feature above a preset matching threshold exists in an associated feature library; and in response to determining that the voice identification feature exists in the associated feature library, updating, by using the audio feature, the voice identification feature obtained through matching.Type: GrantFiled: June 14, 2018Date of Patent: May 18, 2021Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Gang Liu, Qingen Zhao, Guangxing Liu
-
Patent number: 11005620Abstract: Various aspects described herein relate to techniques for uplink reference signal sequence design in wireless communications systems. A method, a computer-readable medium, and an apparatus are provided. In an aspect, the method includes identifying a set of sequences to include at least a base sequence, a reverse order sequence of the base sequence, a complex conjugate sequence of the base sequence, or a reverse order complex conjugate sequence of the base sequence, and transmitting an uplink reference signal based on at least one of the sequences in the set. The techniques described herein may apply to different communications technologies, including 5th Generation (5G) New Radio (NR) communications technology.Type: GrantFiled: June 14, 2018Date of Patent: May 11, 2021Assignee: QUALCOMM IncorporatedInventors: Seyong Park, Renqiu Wang, Yi Huang, Hao Xu, Peter Gaal
-
Patent number: 10884096Abstract: An object of the present invention is to facilitate recognition of a voice command of a user in a situation where multiple devices including microphones are connected through a sensor network. A relative location of each device is determined and a location and a direction of the user are tracked through a time difference in which the voice command is applied. The command is interpreted based on the location and the direction of the user. Such a method as a method for a sensor network, Machine to Machine (M2M), Machine Type Communication (MTC), and Internet of Things (IoT) may be used for an intelligent service (smart home, smart building, etc.), digital education, security and safety related services, and the like.Type: GrantFiled: February 13, 2018Date of Patent: January 5, 2021Assignee: LUXROBO CO., LTD.Inventors: Seungmin Baek, Seungbae Son
-
Patent number: 10747957Abstract: In some applications, it may be desired to process a message to determine an intent of the message, where the intent indicates the meaning of the message. An intent classifier may be used to determine the meaning of a message by processing the message to compute a message embedding vector that represents the message in a vector space. Each possible intent may be represented by a prototype vector, and the intent of the message may be determined by comparing the message embedding to one or more prototype vectors, such as by selecting an intent whose prototype vector is closest to the message embedding. An intent classifier may be used, for example, (i) to implement an automated communications system with states where each state is associated with a subset of the possible intents or (ii) for processing usage data of a communications system to update the intents of the communications system.Type: GrantFiled: November 13, 2018Date of Patent: August 18, 2020Assignee: ASAPP, INC.Inventor: Jeremy Elliot Azriel Wohlwend
-
Patent number: 10540991Abstract: In various example embodiments, a system and method for determining a crowd response for a crowd are presented. One method is disclosed that includes receiving an audio signal that includes concurrent responses from two or more respondents, determining the concurrent responses from the audio signal without regard to the identity of the respondents, and generating a crowd based on the concurrent responses.Type: GrantFiled: August 20, 2015Date of Patent: January 21, 2020Assignee: eBay Inc.Inventor: Sergio Pinzon Gonzales, Jr.
-
Patent number: 10482893Abstract: A sound processing method includes a step of applying a nonlinear filter to a temporal sequence of spectral envelope of an acoustic signal, wherein the nonlinear filter smooths a fine temporal perturbation of the spectral envelope without smoothing out a large temporal change. A sound processing apparatus includes a smoothing processor configured to apply a nonlinear filter to a temporal sequence of spectral envelope of an acoustic signal, wherein the nonlinear filter smooths a fine temporal perturbation of the spectral envelope without smoothing out a large temporal change.Type: GrantFiled: November 1, 2017Date of Patent: November 19, 2019Assignee: YAMAHA CORPORATIONInventors: Ryunosuke Daido, Hiraku Kayama
-
Patent number: 10451710Abstract: The present disclosure relates to a user identification method applicable to a vehicle, the vehicle including at least two microphone arrays, the respective microphone arrays being disposed at different positions of the vehicle, respectively. The user identification method includes: receiving a voice of a user within the vehicle through the at least two microphone arrays; determining directions from the user to the microphone arrays, respectively, according to the voice; calculating an angle between any two of the directions; and identifying a type of the user based at least on the angle.Type: GrantFiled: October 26, 2018Date of Patent: October 22, 2019Assignee: BOE TECHNOLOGY GROUP CO., LTD.Inventors: Hongyang Li, Xin Li, Xiangdong Yang
-
Patent number: 10418028Abstract: Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.Type: GrantFiled: November 15, 2017Date of Patent: September 17, 2019Assignee: Intel CorporationInventors: Oren Shamir, Oren Pereg, Moshe Wasserblat, Jonathan Mamou, Michel Assayag
-
Patent number: 10402924Abstract: A system and method are presented for relationship management workflow processes. At least one embodiment may apply to process automation to health care. More specifically, the system and method may be applied to patient management of healthcare, such as the management of Diabetes or other medical conditions. Other embodiments may apply to process automation in other areas utilizing management workflow software.Type: GrantFiled: February 25, 2014Date of Patent: September 3, 2019Inventors: Zachary Hinkle, Jason Andrew Loucks, Logan H. Weilenman, Ryan Collins
-
Patent number: 10403267Abstract: A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.Type: GrantFiled: January 16, 2015Date of Patent: September 3, 2019Assignee: Samsung Electronics Co., LtdInventors: Chi-youn Park, Il-hwan Kim, Kyung-min Lee, Nam-hoon Kim, Jae-won Lee
-
Patent number: 10360215Abstract: Pattern queries are evaluated in parallel over large N-dimensional datasets to identify features of interest.Type: GrantFiled: March 30, 2015Date of Patent: July 23, 2019Assignee: EMC CorporationInventors: Angelo E. M. Ciarlini, Fabio A. M. Porto, Amir H. K. Moghadam, Jonas F. Bias, Paulo de Figueiredo Pires, Fabio A. Perosi, Alex L. Bordignon, Bruno Carlos da Cunha Costa, Wagner dos Santos Vieira
-
Patent number: 10311863Abstract: There is provided a system including a microphone configured to receive an input speech, an analog to digital (A/D) converter configured to convert the input speech to a digital form and generate a digitized speech including a plurality of segments having acoustic features, a memory storing an executable code, and a processor executing the executable code to extract a plurality of acoustic feature vectors from a first segment of the digitized speech, determine, based on the plurality of acoustic feature vectors, a plurality of probability distribution vectors corresponding to the probabilities that the first segment includes each of a first keyword, a second keyword, both the first keyword and the second keyword, a background, and a social speech, and assign a first classification label to the first segment based on an analysis of the plurality of probability distribution vectors of one or more segments preceding the first segment and the probability distribution vectors of the first segment.Type: GrantFiled: September 2, 2016Date of Patent: June 4, 2019Assignee: Disney Enterprises, Inc.Inventors: Jill Fain Lehman, Nikolas Wolfe, Andre Pereira
-
Patent number: 10217460Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.Type: GrantFiled: December 28, 2016Date of Patent: February 26, 2019Assignee: ZENTIAN LIMITED.Inventor: Mark Catchpole
-
Patent number: 10036818Abstract: Method for reducing computational time in inversion of geophysical data to infer a physical property model (91), especially advantageous in full wavefield inversion of seismic data. An approximate Hessian is pre-calculated by computing the product of the exact Hessian and a sampling vector composed of isolated point diffractors (82), and the approximate Hessian is stored in computer hard disk or memory (83). The approximate Hessian is then retrieved when needed (99) for computing its product with the gradient (93) of an objective function or other vector. Since the approximate Hessian is very sparse (diagonally dominant), its product with a vector may therefore be approximated very efficiently with good accuracy. Once the approximate Hessian is computed and stored, computing its product with a vector requires no simulator calls (wavefield propagations) at all. The pre-calculated approximate Hessian can also be reused in the subsequent steps whenever necessary.Type: GrantFiled: July 14, 2014Date of Patent: July 31, 2018Assignee: ExxonMobil Upstream Research CompanyInventors: Yaxun Tang, Sunwoong Lee
-
Patent number: 9886954Abstract: One or more context aware processing parameters and an ambient audio stream are received. One or more sound characteristics associated with the ambient audio stream are identified using a machine learning model. One or more actions to perform are determined using the machine learning model and based on the one or more context aware processing parameters and the identified one or more sound characteristics. The one or more actions are performed.Type: GrantFiled: September 30, 2016Date of Patent: February 6, 2018Assignee: Doppler Labs, Inc.Inventors: Jacob Meacham, Matthew Sills, Richard Fritz Lanman, III, Jeffrey Baker
-
Patent number: 9881604Abstract: A system and method for identifying special information is provided. Endpoints are defined within a voice recording. One or more of the endpoints are identified within the voice recording and the voice recording is partitioned into segments based on the identified endpoints. Elements of text are identified by applying speech recognition to each of the segments and a list of prompt list candidates are applied to the text elements. The segments with text elements that match one or more prompt list candidates are identified. Portions of the voice recording following the prompt list candidates that include special information are identified and the special information is rendered unintelligible within the voice recording.Type: GrantFiled: February 9, 2015Date of Patent: January 30, 2018Assignee: Intellisist, Inc.Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
-
Patent number: 9858922Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score.Type: GrantFiled: June 23, 2014Date of Patent: January 2, 2018Assignee: Google Inc.Inventors: Eugene Weinstein, Sanjiv Kumar, Ignacio L. Moreno, Andrew W. Senior, Nikhil Prasad Bhat
-
Patent number: 9837069Abstract: Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data. The automatic speech recognition device may perform a syntactic parse based on the recognized words, and determine an end of a sentence based on the syntactic parse. For example, if the syntactic parse indicates that a certain set of consecutive recognized words form a syntactically complete and correct sentence, the automatic speech recognition device may determine that there is an end of a sentence at the end of that set of words.Type: GrantFiled: December 22, 2015Date of Patent: December 5, 2017Assignee: Intel CorporationInventors: Oren Shamir, Oren Pereg, Moshe Wasserblat, Jonathan Mamou, Michel Assayag
-
Patent number: 9805111Abstract: A pattern analysing device (27) in a pattern processing node {21} of a data collection system (10) comprises a pattern updating unit equipped with a pattern collecting element configured to obtain an existing pattern of historical data according to at least one existing data model, where the existing pattern relates to an entity (11) associated with the data collection system and obtain a further pattern of newer data according to a further data model, where the further pattern also relates to the entity, <<pattern updating element configured to compare the patterns with each other, determine if the existing data model can be mapped on the further data model and update the existing pattern with the further pattern in relation to the historical data if the existing data model can be mapped on the further data model.Type: GrantFiled: October 4, 2010Date of Patent: October 31, 2017Assignee: TELEFONAKTIEBOLAGET L M ERICSSONInventors: Johan Hjelm, Mattias Lidstrom, Mona Matti
-
Patent number: 9703350Abstract: The invention relates to an electronic device that includes a wake-up system that operates at a substantially low power level and is applied to wake up the electronic device from a sleep mode. The wake-up system comprises a sound transducer that converts a received sound signal to an electrical signal and a keyword detection logic that preliminarily identifies a speech energy profile that corresponds to at least one of a plurality of keywords in a part of the electrical signal. In some embodiments, a keyword finder is further activated to identify with an enhanced accuracy whether the at least one keyword exists in the part of the electrical signal, and generates a wake-up control to activate a host of the electronic device from its sleep mode.Type: GrantFiled: June 24, 2013Date of Patent: July 11, 2017Assignee: Maxim Integrated Products, Inc.Inventors: Vivek Nigam, Yadong Wang, Anthony Stephen Doy, Todd D. Moore
-
Patent number: 9684872Abstract: A method and an apparatus for generating data in a missing segment of a target time data sequence are disclosed. The method includes: determining whether there is a breakpoint in the missing segment; determining candidate values of the data in the missing segment; and generating values of the data in the missing segment by selectively using the candidate values of the data in the missing segment, according to whether there is the breakpoint in the missing segment. With the method and the apparatus, the data in the missing segment of the target time data sequence can be generated more accurately.Type: GrantFiled: June 4, 2015Date of Patent: June 20, 2017Assignee: International Business Machines CorporationInventors: Wei S. Dong, Wen Q. Huang, Chang S. Li, Yu Wang, Junchi Yan, Chao Zhang, Xin Zhang, Xiu F. Zhu
-
Patent number: 9454976Abstract: A method is disclosed for discriminating voiced and unvoiced sounds in speech. The method detects characteristic waveform features of voiced and unvoiced sounds, by applying integral and differential functions to the digitized sound signal in the time domain. Laboratory tests demonstrate extremely high reliability in separating voiced and unvoiced sounds. The method is very fast and computationally efficient. The method enables voice activation in resource-limited and battery-limited devices, including mobile devices, wearable devices, and embedded controllers. The method also enables reliable command identification in applications that recognize only predetermined commands. The method is suitable as a pre-processor for natural language speech interpretation, improving recognition and responsiveness. The method enables realtime coding or compression of speech according to the sound type, improving transmission efficiency.Type: GrantFiled: April 15, 2014Date of Patent: September 27, 2016Inventor: David Edward Newman
-
Patent number: 9202520Abstract: This disclosure relates to systems and methods for determining when a user likes a piece of content based, at least in part, on analyzing user responses to the content. In one embodiment, the user's response may be monitored by audio and motion detection devices to determine when the user's vocals or movements are emulating the content. When the user's emulation exceeds a threshold amount the content may be designated as “liked.” In certain instances, a similar piece of content may be selected to play when the current content is finished.Type: GrantFiled: October 17, 2012Date of Patent: December 1, 2015Assignee: Amazon Technologies, Inc.Inventor: Joshua K. Tang
-
Patent number: 9190063Abstract: A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different natural languages are used to support and detect a natural language spoken by a user. In some implementations an interactive electronic agent responds in the user's native language to facilitate an real-time, human like dialog.Type: GrantFiled: October 31, 2007Date of Patent: November 17, 2015Assignee: Nuance Communications, Inc.Inventors: Ian Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
-
Patent number: 9165555Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.Type: GrantFiled: November 26, 2014Date of Patent: October 20, 2015Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Patent number: 9123350Abstract: A method and system for extracting audio features from an encoded bitstream for audio classification. The method comprises partially decoding the encoded bitstream; obtaining uniform window block size spectral coefficients of the encoded bitstream; and extracting audio features based on the uniform window block spectral coefficients.Type: GrantFiled: December 14, 2005Date of Patent: September 1, 2015Assignee: Panasonic Intellectual Property Management Co., Ltd.Inventor: Ying Zhao
-
Patent number: 9076448Abstract: A real-time system incorporating speech recognition and linguistic processing for recognizing a spoken query by a user and distributed between client and server, is disclosed. The system accepts user's queries in the form of speech at the client where minimal processing extracts a sufficient number of acoustic speech vectors representing the utterance. These vectors are sent via a communications channel to the server where additional acoustic vectors are derived. Using Hidden Markov Models (HMMs), and appropriate grammars and dictionaries conditioned by the selections made by the user, the speech representing the user's query is fully decoded into text (or some other suitable form) at the server. This text corresponding to the user's query is then simultaneously sent to a natural language engine and a database processor where optimized SQL statements are constructed for a full-text search from a database for a recordset of several stored questions that best matches the user's query.Type: GrantFiled: October 10, 2003Date of Patent: July 7, 2015Assignee: Nuance Communications, Inc.Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
-
Patent number: 9066112Abstract: A method of designing a code book for super resolution encoding. The method includes, for example, via a processor, creating a first group of entries in the code book that includes a plurality of gray font values for encoding data; via the processor, creating a second group of entries in the code book that includes a set of values for each of the gray font values for decoding data; via the processor, creating a third group of entries in the code book that includes a pattern corresponding to each of the plurality of gray font values; and storing the code book in a database in communication with the processor.Type: GrantFiled: August 2, 2012Date of Patent: June 23, 2015Assignee: XEROX CORPORATIONInventors: Guo-Yau Lin, Farzin Blurfrushan
-
Patent number: 9025777Abstract: An audio signal decoder for providing a decoded multi-channel audio signal representation on the basis of an encoded multi-channel audio signal representation has a time warp decoder configured to selectively use individual audio channel specific time warp contours or a joint multi-channel time warp contour for a reconstruction of a plurality of audio channels represented by the encoded multi-channel audio signal representation.Type: GrantFiled: July 1, 2009Date of Patent: May 5, 2015Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Stefan Bayer, Sascha Disch, Ralf Geiger, Guillaume Fuchs, Max Neuendorf, Gerald Schuller, Bernd Edler
-
Patent number: 8996380Abstract: Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.Type: GrantFiled: May 4, 2011Date of Patent: March 31, 2015Assignee: Shazam Entertainment Ltd.Inventors: Avery Li-Chun Wang, Rahul Powar, William Michael Mills, Christopher Jacques Penrose Barton, Philip Georges Inghelbrecht, Dheeraj Shankar Mukherjee
-
Patent number: 8909518Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.Type: GrantFiled: September 22, 2008Date of Patent: December 9, 2014Assignee: NEC CorporationInventor: Tadashi Emori