Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
-
Patent number: 11823669Abstract: According to one embodiment, an information processing apparatus include following units. The first acquisition unit acquires speech data including frames. The second acquisition unit acquires a model trained to, upon input of a feature amount extracted from the speech data, output information indicative of likelihood of each of a plurality of classes including a component of a keyword and a component of background noise. The first calculation unit calculates a keyword score indicative of occurrence probability of the component of the keyword. The second calculation unit calculates a background noise score indicative of occurrence probability of the component of the background noise. The determination unit determines whether or not the speech data includes the keyword.Type: GrantFiled: February 28, 2020Date of Patent: November 21, 2023Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Ning Ding, Hiroshi Fujimura
-
Patent number: 11818291Abstract: Real-time speech analytics (RTSA) provides maintaining real-time speech conditions, rules, and triggers, and real-time actions and alerts to take. A call between a user and an agent is received at an agent computing device. The call is monitored to detect in the call one of the real-time speech conditions, rules, and triggers. Based on the detection, at least one real-time action and/or alert is initiated.Type: GrantFiled: September 1, 2021Date of Patent: November 14, 2023Assignee: Verint Americas Inc.Inventors: David Warren Singer, Daniel Thomas Spohrer, Marc Adam Calahan, Paul Michael Munro, Gary Andrew Duke, Padraig Carberry, Christopher Jerome Schnurr
-
Patent number: 11817095Abstract: A method, computer program product, and computing system for monitoring a plurality of conversations within a monitored space to generate a conversation data set; processing the conversation data set using machine learning to: define a system-directed command for an ACI system, and associate one or more conversational contexts with the system-directed command; detecting the occurrence of a specific conversational context within the monitored space, wherein the specific conversational context is included in the one or more conversational contexts associated with the system-directed command; and executing, in whole or in part, functionality associated with the system-directed command in response to detecting the occurrence of the specific conversational context without requiring the utterance of the system-directed command and/or a wake-up word/phrase.Type: GrantFiled: February 3, 2022Date of Patent: November 14, 2023Assignee: Nuance Communications, Inc.Inventors: Paul Joseph Vozila, Neal Snider
-
Patent number: 11805185Abstract: A server system is provided that includes one or more processors configured to execute a platform for an online multi-user chat service that communicates with a plurality of client devices of users of the online multi-user chat service that exchanges user chat data between the plurality of client devices. The one or more processors are configured to execute a user chat filtering program that performs filter actions for user chat data exchanged on the platform for the online multi-user chat service. The user chat filtering program includes a plurality of trained machine learning models and a filter decision service that determines a filter action to be performed for target portions of user chat data based on output of the plurality of trained machine learning models for those target portions of user chat data.Type: GrantFiled: March 3, 2021Date of Patent: October 31, 2023Assignee: Microsoft Technology Licensing, LLCInventor: Monica Tongya
-
Patent number: 11783825Abstract: Embodiments of the present invention provide a speech recognition method and a terminal. The method includes: listening, by a speech wakeup apparatus, to speech information in a surrounding environment; when determining that the speech information obtained by listening matches a speech wakeup model, buffering, by the speech wakeup apparatus, speech information, of first preset duration, obtained by listening, and sending a trigger signal for triggering enabling of a speech recognition apparatus, where the trigger signal is used to instruct the speech recognition apparatus to read and recognize the speech information buffered by the speech wakeup apparatus; and recognizing first speech information buffered by the speech wakeup apparatus and the second speech information obtained by listening, to obtain a recognition result.Type: GrantFiled: February 17, 2021Date of Patent: October 10, 2023Assignee: Honor Device Co., Ltd.Inventor: Junyang Zhou
-
Patent number: 11776544Abstract: An embodiment of the present invention provides an artificial intelligence (AI) apparatus for recognizing a speech of a user, the artificial intelligence apparatus includes a memory to store a speech recognition model and a processor to obtain a speech signal for a user speech, to convert the speech signal into a text using the speech recognition model, to measure a confidence level for the conversion, to perform a control operation corresponding to the converted text if the measured confidence level is greater than or equal to a reference value, and to provide feedback for the conversion if the measured confidence level is less than the reference value.Type: GrantFiled: May 18, 2022Date of Patent: October 3, 2023Assignee: LG ELECTRONICS INC.Inventors: Jaehong Kim, Hyoeun Kim, Hangil Jeong, Heeyeon Choi
-
Patent number: 11776210Abstract: An electronic device and method for 3D face modeling based on neural networks is provided. The electronic device receives a two-dimensional (2D) color image of a human face with a first face expression and obtains a first three-dimensional (3D) mesh of the human face with the first face expression based on the received 2D color image. The electronic device generates first texture information and a first set of displacement maps. The electronic device feeds the generated first texture information and the first set of displacement maps as an input to the neural network and receives an output of the neural network for the fed input. Thereafter, the electronic device generates a second 3D mesh of the human face with a second face expression which is different from the first face expression based on the received output.Type: GrantFiled: January 22, 2021Date of Patent: October 3, 2023Assignee: SONY GROUP CORPORATIONInventors: Hiroyuki Takeda, Mohammad Gharavi Alkhansari
-
Patent number: 11763813Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.Type: GrantFiled: April 28, 2021Date of Patent: September 19, 2023Assignee: GOOGLE LLCInventors: Lior Alon, Rafael Goldfarb, Dekel Auster, Dan Rasin, Michael Andrew Goodman, Trevor Strohman, Nino Tasca, Valerie Nygaard, Jaclyn Konzelmann
-
Patent number: 11741141Abstract: The present disclosure describes a communication environment having a service provider server that receives an audio command from a display control device within the communication environment. The service provider server can translate this audio command into an electrical command for controlling the display device. The service provider server autonomously performs a specifically tailored search of a catalog of command words and/or phrases for the audio command to translate the audio command to the electrical command. This specifically tailored search can include one or more searching routines having various degrees of complexity. The most simplistic searching routine from among these searching routines represents a textual search to identify one or more command words and/or phrases from the catalog of command words and/or phrases that match the audio command.Type: GrantFiled: December 4, 2020Date of Patent: August 29, 2023Assignee: CSC Holdings, LLCInventors: Jaison P. Antony, Heitor J. Almeida, John Markowski, Peter Caramanica
-
Patent number: 11727928Abstract: Aspects of the disclosure provide a responding method and device, an electronic device and a storage medium. The method is applied to a first electronic device including an audio acquisition component and an audio output component. The method can include acquiring a voice signal through the audio acquisition component, determining whether to respond to the voice signal, and responsive to determining to respond to the voice signal, outputting a first sound signal by the audio output component, the first sound signal being configured to notify at least one second electronic device that the first electronic device responds to the voice signal. In such a manner, an electronic device, responsive to determining to respond to a voice signal, outputs a sound signal to prevent other electronic device(s) from responding to the voice signal, so that competitions between electronic devices are reduced and a user experience is improved.Type: GrantFiled: July 29, 2020Date of Patent: August 15, 2023Assignee: Beijing Xiaomi Pinecone Electronics Co., Ltd.Inventors: Lingsong Zhou, Fei Xiang
-
Patent number: 11709655Abstract: Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker.Type: GrantFiled: August 3, 2022Date of Patent: July 25, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Hyungtak Choi, Hyeonmok Ko, Jihie Kim, Hongchul Kim, Inchul Hwang
-
Patent number: 11705125Abstract: A processor may receive data regarding a context for a first dialog turn. The processor may monitor a voice input from a user for the first dialog turn. The processor may detect a first pause in the voice input, the first pause having a duration that satisfies a time threshold. The processor may receive, based on the first pause, first voice input data. The processor may analyze the first voice input data. The processor may determine that additional time is recommended for the voice input to be provided by the user.Type: GrantFiled: March 26, 2021Date of Patent: July 18, 2023Assignee: International Business Machines CorporationInventors: Andrew R. Freed, Corville O. Allen, Shikhar Kwatra, Joseph Kozhaya
-
Patent number: 11698261Abstract: Embodiments of the present disclosure provide a method, apparatus, computer device, and storage medium for determining a POI alias. The method may include: acquiring a to-be-processed target POI, and generating a candidate alias list corresponding to the target POI based on a query behavior-associated log matching the target POI; and screening out at least one target alias corresponding to the target POI in the candidate alias list, according to an association relationship between each candidate alias in the candidate alias list and the target POI.Type: GrantFiled: December 11, 2019Date of Patent: July 11, 2023Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.Inventors: Yanyan Li, Jianguo Duan, Hui Xiong
-
Patent number: 11699443Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.Type: GrantFiled: June 2, 2021Date of Patent: July 11, 2023Assignee: GOOGLE LLCInventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
-
Patent number: 11694694Abstract: A method is provided for identifying synthetic “deep-fake” audio samples versus organic audio samples. Methods may include: generating a model of a vocal tract using one or more organic audio samples from a user; identifying a set of bigram-feature pairs from the one or more audio samples; estimating the cross-sectional area of the vocal tract of the user when speaking the set of bigram-feature pairs; receiving a candidate audio sample; identifying bigram-feature pairs of the candidate audio sample that are in the set of bigram-feature pairs; calculating a cross-sectional area of a theoretical vocal tract of a user when speaking the identified bigram-feature pairs; and identifying the candidate audio sample as a deep-fake audio sample in response to the calculated cross-sectional area of the theoretical vocal tract of a user failing to correspond within a predetermined measure of the estimated cross sectional area of the vocal tract of the user.Type: GrantFiled: July 27, 2021Date of Patent: July 4, 2023Assignee: UNIVERSITY OF FLORIDA RESEARCH FOUNDATION, INCORPORATEDInventors: Patrick G. Traynor, Kevin Butler, Logan E. Blue, Luis Vargas, Kevin S. Warren, Hadi Abdullah, Cassidy Gibson, Jessica Nicole Odell
-
Patent number: 11689472Abstract: The exemplary embodiments disclose a method, a computer program product, and a computer system for allocating computing resources. The exemplary embodiments may include collecting data of one or more users, wherein the collected data comprises calendar data of the one or more users, extracting one or more features from the collected data, and allocating one or more computing resources to one or more of the users based on the extracted one or more features and one or more models.Type: GrantFiled: July 9, 2021Date of Patent: June 27, 2023Assignee: International Business Machines CorporationInventors: Venkata Vara Prasad Karri, Saraswathi Sailaja Perumalla, Sarbajit K. Rakshit, Ryan Jackson, Ram Kumar Vadlamani
-
Patent number: 11670290Abstract: A speech signal processing method and apparatus is disclosed. The speech signal processing method includes receiving an input token that is based on a speech signal, calculating first probability values respectively corresponding to candidate output tokens based on the input token, adjusting at least one of the first probability values based on a priority of each of the first probability values, and processing the speech signal based on an adjusted probability value obtained by the adjusting.Type: GrantFiled: November 30, 2020Date of Patent: June 6, 2023Assignee: Samsung Electronics Co., Ltd.Inventor: Tae Gyoon Kang
-
Patent number: 11636846Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.Type: GrantFiled: April 30, 2021Date of Patent: April 25, 2023Assignee: Google LLCInventors: Michael Buchanan, Pravir Kumar Gupta, Christopher Bo Tandiono
-
Patent number: 11600279Abstract: A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.Type: GrantFiled: August 26, 2019Date of Patent: March 7, 2023Assignee: Sorenson IP Holdings, LLCInventors: Brian Chevrier, Shane Roylance, Kenneth Boehme
-
Patent number: 11594215Abstract: Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user's command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utterance corresponding to an inquiry with respect to the processing performed to respond to the command. For example, the user may state “why did you tell me that?” In response thereto, the speech processing system may determine the stored speech processing pipeline data used to respond to the command, and may generate output audio data that describes the data and computing decisions involved in determining the content deemed responsive to the command.Type: GrantFiled: October 11, 2019Date of Patent: February 28, 2023Assignee: Amazon Technologies, Inc.Inventors: Michael James Moniz, Abishek Ravi, Ryan Scott Aldrich, Michael Bennett Adams
-
Patent number: 11580967Abstract: A speech feature extraction apparatus 100 includes a voice activity detection unit 103 that drops non-voice frames from frames corresponding to an input speech utterance, and calculates a posterior of being voiced for each frame, a voice activity detection process unit 106 calculates a function value as weights in pooling frames to produce an utterance-level feature, from a given a voice activity detection posterior, and an utterance-level feature extraction unit 112 that extracts an utterance-level feature, from the frame on a basis of multiple frame-level features, using the function values.Type: GrantFiled: June 29, 2018Date of Patent: February 14, 2023Assignee: NEC CORPORATIONInventors: Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka
-
Patent number: 11461948Abstract: A system and method voice driven animation of an object in an image by sampling an input video, depicting a puppet object, to obtain an image, receiving audio data, extracting voice related features from the audio data, producing an expression representation based on the voice related features, wherein the expression representation is related to a region of interest, obtaining from the image, auxiliary data related to the image and generating a target image based on the expression representation and the auxiliary data.Type: GrantFiled: July 15, 2021Date of Patent: October 4, 2022Assignee: DE-IDENTIFICATION LTD.Inventors: Eliran Kuta, Sella Blondheim, Gil Perry, Amitay Nachmani, Matan Ben-Yosef, Or Gorodissky
-
Patent number: 11449720Abstract: Provided is an image recognition device. The image recognition device includes a frame data change detector that sequentially receives a plurality of frame data and detects a difference between two consecutive frame data, an ensemble section controller that sets an ensemble section in the plurality of frame data, based on the detected difference, an image recognizer that sequentially identifies classes respectively corresponding to a plurality of section frame data by applying different neural network classifiers to the plurality of section frame data in the ensemble section, and a recognition result classifier that sequentially identifies ensemble classes respectively corresponding to the plurality of section frame data by combining the classes in the ensemble section.Type: GrantFiled: May 8, 2020Date of Patent: September 20, 2022Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Ju-Yeob Kim, Byung Jo Kim, Seong Min Kim, Jin Kyu Kim, Ki Hyuk Park, Mi Young Lee, Joo Hyun Lee, Young-deuk Jeon, Min-Hyung Cho
-
Patent number: 11443749Abstract: Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker.Type: GrantFiled: January 2, 2019Date of Patent: September 13, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Hyungtak Choi, Hyeonmok Ko, Jihie Kim, Hongchul Kim, Inchul Hwang
-
Patent number: 11386712Abstract: The present invention discloses method and system for multimodal analysis based emotion recognition. The method comprising segmenting video data of a user into a plurality of video segments. A plurality of visual features, voice features and text features from the plurality of video segments is extracted. Autocorrelation values among each of the plurality of visual features, the voice features, and the text features is determined. Each of the plurality of visual features, the voice features and the text features is aligned based on video segment identifier and the autocorrelation values to obtain a plurality of aligned multimodal features. One of two classes of emotions is determined for each of the plurality of aligned multimodal features. The determined emotion for each of the plurality of aligned multimodal features is compared with historic multimodal features from a database, and emotion of the user is determined at real time based on comparison.Type: GrantFiled: February 20, 2020Date of Patent: July 12, 2022Assignee: Wipro LimitedInventors: Rahul Yadav, Gopichand Agnihotram
-
Patent number: 11373657Abstract: A system for identifying audio data includes a feature extraction module receiving unknown input audio data and dividing the unknown input audio data into a plurality of segments of unknown input audio data. A similarity module receives the plurality of segments of the unknown input audio data and receives known audio data from a known source, the known audio data being divided into a plurality of segments of known audio data. The similarity module performs comparisons between the segments of unknown input audio data and respective segments of known audio data and generates a respective plurality of similarity values representative of similarity between the segments of the comparisons, the comparisons being performed serially. The similarity module terminates the comparisons if the similarity values indicate insufficient similarity between the segments of the comparisons, prior to completing comparisons for all segments of the unknown input audio data.Type: GrantFiled: May 1, 2020Date of Patent: June 28, 2022Assignee: Raytheon Applied Signal Technology, Inc.Inventors: Jonathan C. Wintrode, Nicholas J. Hinnerschitz, Aleksandr R. Jouravlev
-
Publication number: 20150106089Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.Type: ApplicationFiled: December 30, 2010Publication date: April 16, 2015Inventors: Evan H. Parker, Michal R. Grabowski
-
Publication number: 20140129226Abstract: Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.Type: ApplicationFiled: November 5, 2012Publication date: May 8, 2014Inventors: Antonio R. Lee, Petr Novak, Peder A. Olsen, Vaibhava Goel
-
Publication number: 20140114646Abstract: A system receives vocal input from one or more persons, and extracts one or more keywords from the vocal input. The system then generates a query using the one or more keywords, searches a database of products and services using the query, and identities a product or service as a function of the query.Type: ApplicationFiled: October 24, 2012Publication date: April 24, 2014Applicant: SAP AGInventor: Oleg Figlin
-
Publication number: 20140108010Abstract: A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands.Type: ApplicationFiled: October 11, 2012Publication date: April 17, 2014Applicant: INTERMEC IP CORP.Inventors: Paul Maltseff, Roger Byford, Jim Logan
-
Publication number: 20140067395Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.Type: ApplicationFiled: August 28, 2012Publication date: March 6, 2014Applicant: Nuance Communications, Inc.Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
-
Publication number: 20140058732Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.Type: ApplicationFiled: August 21, 2012Publication date: February 27, 2014Applicant: Nuance Communications, Inc.Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
-
Publication number: 20140025380Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.Type: ApplicationFiled: July 18, 2012Publication date: January 23, 2014Applicant: International Business Machines CorporationInventors: Fernando Luiz Koch, Julio Nogima
-
Publication number: 20130332167Abstract: According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor.Type: ApplicationFiled: June 12, 2012Publication date: December 12, 2013Applicant: Nuance Communications, Inc.Inventor: Robert M. Kilgore
-
Publication number: 20130325474Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.Type: ApplicationFiled: May 31, 2012Publication date: December 5, 2013Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
-
Publication number: 20130304471Abstract: A method, an apparatus and an article of manufacture for contextual voice query dilation in a Spoken Web search. The method includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.Type: ApplicationFiled: May 14, 2012Publication date: November 14, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nitendra Rajput, Kundan Shrivastava
-
Publication number: 20130289994Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.Type: ApplicationFiled: April 26, 2012Publication date: October 31, 2013Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
-
Publication number: 20130262116Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.Type: ApplicationFiled: March 27, 2012Publication date: October 3, 2013Applicant: NOVOSPEECHInventor: Yossef Ben-Ezra
-
Publication number: 20130262114Abstract: Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli.Type: ApplicationFiled: April 3, 2012Publication date: October 3, 2013Applicant: MICROSOFT CORPORATIONInventors: Christopher John Brockett, Piali Choudhury, William Brennan Dolan, Yun-Cheng Ju, Patrick Pantel, Noelle Mallory Sophy, Svitlana Volkova
-
Publication number: 20130226576Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.Type: ApplicationFiled: February 23, 2012Publication date: August 29, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
-
Publication number: 20130191128Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.Type: ApplicationFiled: August 28, 2012Publication date: July 25, 2013Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Chang Dong Yoo, Sung Woong Kim
-
Publication number: 20130179169Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.Type: ApplicationFiled: July 5, 2012Publication date: July 11, 2013Applicant: NATIONAL TAIWAN NORMAL UNIVERSITYInventors: Yao-Ting Sung, Ju-Ling Chen
-
Publication number: 20130166302Abstract: Aspects of customizing digital signage are addressed. For example, an audio feed may be analyzed for keywords occurring in potential customers' speech. These keywords are then employed to customize display screens of a digital display.Type: ApplicationFiled: December 22, 2011Publication date: June 27, 2013Applicant: NCR CorporationInventor: Brennan Eul I. Mercado
-
Publication number: 20130159000Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: MICROSOFT CORPORATIONInventors: Yun-Cheng Ju, James Garnet Droppo, III
-
Publication number: 20130138441Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.Type: ApplicationFiled: August 14, 2012Publication date: May 30, 2013Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
-
Publication number: 20130110492Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the stability of speech recognition results. In one aspect, a method includes determining a length of time, or a number of occasions, in which a word has remained in an incremental speech recognizer's top hypothesis, and assigning a stability metric to the word based on the length of time or number of occasions.Type: ApplicationFiled: May 1, 2012Publication date: May 2, 2013Applicant: GOOGLE INC.Inventors: Ian C. McGraw, Alexander H. Gruenstein
-
Publication number: 20130103402Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.Type: ApplicationFiled: October 25, 2011Publication date: April 25, 2013Applicant: AT&T Intellectual Property I, L.P.Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
-
Publication number: 20130096918Abstract: A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.Type: ApplicationFiled: August 15, 2012Publication date: April 18, 2013Applicant: FUJITSU LIMITEDInventor: Shouji HARADA
-
Publication number: 20130085757Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.Type: ApplicationFiled: June 29, 2012Publication date: April 4, 2013Applicant: Kabushiki Kaisha ToshibaInventors: Masanobu NAKAMURA, Akinori KAWAMURA
-
Publication number: 20130066635Abstract: An apparatus and a method, which set a remote control command for controlling a home network service in a portable terminal are provided. The apparatus includes a memory for storing configuration types of a remote control command in a set order in a home network service; and a controller for setting the remote control command including the input configuration types of the remote control command and transmitting the remote control command, when the configuration types of the remote control command are input in the set order in the home network service.Type: ApplicationFiled: September 10, 2012Publication date: March 14, 2013Applicant: Samsung Electronics Co., Ltd.Inventors: Jong-Seok KIM, Jin Park