Speech Recognition (epo) Patents (Class 704/E15.001)

  • Patent number: 11687908
    Abstract: A payment button on a device capable of making telephone calls, such as a mobile phone, allows a payer to electronically transfer money while in a phone call with a payee. The payment button also allows a payee to initiate an electronic payment transaction while in a phone call with a payer. The payment button may be a clickable or tappable virtual button presented on a display of the phone when being used to make or receive a call. The payer or the payee can simply enter a payment amount on the phone to complete an electronic payment transaction. A notification of payment is instantly transmitted to the phones being used for the phone call, so that the parties can safely and conveniently conclude a purchase and/or payment transaction during one phone call.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: June 27, 2023
    Assignee: PAYPAL, INC.
    Inventors: Saumil Ashvin Gandhi, Ray Hideki Tanaka
  • Patent number: 11683632
    Abstract: An automatic speech recognition (ASR) triggering system, and a method of providing an ASR trigger signal, is described. The ASR triggering system can include a microphone to generate an acoustic signal representing an acoustic vibration and an accelerometer worn in an ear canal of a user to generate a non-acoustic signal representing a bone conduction vibration. A processor of the ASR triggering system can receive an acoustic trigger signal based on the acoustic signal and a non-acoustic trigger signal based on the non-acoustic signal, and combine the trigger signals to gate an ASR trigger signal. For example, the ASR trigger signal may be provided to an ASR server only when the trigger signals are simultaneously asserted. Other embodiments are also described and claimed.
    Type: Grant
    Filed: August 17, 2021
    Date of Patent: June 20, 2023
    Assignee: Apple Inc.
    Inventors: Sorin V. Dusan, Aram M. Lindahl, Robert D. Watson
  • Patent number: 11615781
    Abstract: A singe audio-visual automated speech recognition model for transcribing speech from audio-visual data includes an encoder frontend and a decoder. The encoder includes an attention mechanism configured to receive an audio track of the audio-visual data and a video portion of the audio-visual data. The video portion of the audio-visual data includes a plurality of video face tracks each associated with a face of a respective person. For each video face track of the plurality of video face tracks, the attention mechanism is configured to determine a confidence score indicating a likelihood that the face of the respective person associated with the video face tack includes a speaking face of the audio track. The decoder is configured to process the audio track and the video face track of the plurality of video face tracks associated with the highest confidence score to determine a speech recognition result of the audio track.
    Type: Grant
    Filed: October 2, 2020
    Date of Patent: March 28, 2023
    Assignee: Google LLC
    Inventor: Otavio Braga
  • Patent number: 11610586
    Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.
    Type: Grant
    Filed: February 23, 2021
    Date of Patent: March 21, 2023
    Assignee: Google LLC
    Inventors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara Sainath, Ian Mcgraw
  • Patent number: 11595535
    Abstract: An information processing apparatus that is capable of reducing time and effort to set settings of a smart speaker that cooperates with the information processing apparatus when a user starts to use the smart speaker. The information processing apparatus acquires identification information of the user, and acquires audio control information associated with the acquired identification information. Then, the information processing apparatus requests the smart speaker to change the audio setting of the smart speaker based on the acquired audio control information.
    Type: Grant
    Filed: June 10, 2021
    Date of Patent: February 28, 2023
    Assignee: CANON KABUSHIKI KAISHA
    Inventor: Ryosuke Kasahara
  • Patent number: 11574638
    Abstract: A system and method are disclosed for generating a teleconference space for two or more communication devices using a computer coupled with a database and comprising a processor and memory. The computer generates a teleconference space and transmits requests to join the teleconference space to the two or more communication devices. The computer stores in memory identification information, and audiovisual data associated with one or more users, for each of the two or more communication devices. The computer stores audio transcription data, transmitted to the computer by each of the two or more communication devices and associated with one or more communication device users, in the computer memory. The computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript, and transmits the master audio transcript to each of the two or more communication devices.
    Type: Grant
    Filed: May 9, 2022
    Date of Patent: February 7, 2023
    Assignee: Nextiva, Inc.
    Inventors: Tomas Gorny, Jean-Baptiste Martinoli, Tracy Conrad, Lukas Gorny
  • Patent number: 11562573
    Abstract: Aspects of the disclosure relate to training and using a phrase recognition model to identify phrases in images. As an example, a selected phrase list may include a plurality of phrases is received. Each phrase of the plurality of phrases includes text. An initial plurality of images may be received. A training image set may be selected from the initial plurality of images by identifying the phrase-containing images that include one or more phrases from the selected phrase list. Each given phrase-containing image of the training image set may be labeled with information identifying the one or more phrases from the selected phrase list included in the given phrase-containing images. The model may be trained based on the training image set such that the model is configured to, in response to receiving an input image, output data indicating whether a phrase of the plurality of phrases is included in the input image.
    Type: Grant
    Filed: December 16, 2020
    Date of Patent: January 24, 2023
    Assignee: Waymo LLC
    Inventors: Victoria Dean, Abhijit S Ogale, Henrik Kretzschmar, David Harrison Silver, Carl Kershaw, Pankaj Chaudhari, Chen Wu, Congcong Li
  • Patent number: 11514787
    Abstract: In an information processing device, a first acquirer acquires, from a user, plan information including a scheduled time and a destination. A second acquirer acquires a spare time. A third acquirer acquires travelling schedule information for enabling arrival at the destination earlier than the scheduled time by the spare time or more. A display controller displays, on a display unit, information regarding the travelling schedule information and the spare time.
    Type: Grant
    Filed: August 1, 2019
    Date of Patent: November 29, 2022
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: Koichi Suzuki, Makoto Akahane
  • Patent number: 11495234
    Abstract: A data mining device, and a speech recognition method and system using the same are disclosed. The speech recognition method includes selecting speech data including a dialect from speech data, analyzing and refining the speech data including a dialect, and learning an acoustic model and a language model through an artificial intelligence (AI) algorithm using the refined speech data including a dialect. The user is able to use a dialect speech recognition service which is improved using services such as eMBB, URLLC, or mMTC of 5G mobile communications.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: November 8, 2022
    Assignee: LG Electronics Inc.
    Inventors: Jee Hye Lee, Seon Yeong Park
  • Patent number: 11488598
    Abstract: The present disclosure relates to a display device. The display device includes a display; a signal receiver configured to receive a user's voice signal through at least one of a plurality of devices; and a processor configured to: display an image of at least one of a plurality of programs on the display by executing the plurality of programs, identify a program corresponding to a device receiving the voice signal among the plurality of programs based on matching information set by the user regarding a mutual correspondence between the plurality of programs and the plurality of devices, in response to the user's voice signal received through any one of the plurality of devices, and control the identified program to operate according to a user command corresponding to the received voice signal. Thereby, it is possible to control a control target program to a user's intention according to a voice command even if a user who inputs the voice command does not separately designate the control target program.
    Type: Grant
    Filed: January 3, 2019
    Date of Patent: November 1, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Youngsoo Yun
  • Patent number: 11482229
    Abstract: A multimedia processing circuit is provided. The multimedia processing circuit includes a smart interpreter engine and an audio engine. The smart interpreter engine includes a speech to text converter, a natural language processing module and a translator. The speech to text converter is utilized for converting speech data into text data corresponding to the first language. The natural language processing module is utilized for converting the text data corresponding to the first language into glossary text data corresponding to the first language according to an application program being executed in a host. The application program comprises a specific game software. The translator is utilized for converting the glossary text data corresponding to the first language into text data corresponding to a second language. The audio engine is utilized for converting the speech data corresponding to the first language into an analog speech signal corresponding to the first language.
    Type: Grant
    Filed: May 26, 2020
    Date of Patent: October 25, 2022
    Assignee: ACER INCORPORATED
    Inventors: Gianna Tseng, Shih-Cheng Huang, Shang-Yao Lin, Szu-Ting Chou
  • Patent number: 11481036
    Abstract: A method for determining an electronic device, a system for determining an electronic device, a computer system, and a computer-readable storage medium, the method includes: acquiring a recognition result by recognizing a first action performed by an operating object through a first electronic device (S201); and determining a second electronic device which is controllable by the first electronic device according to the recognition result (S202).
    Type: Grant
    Filed: April 12, 2019
    Date of Patent: October 25, 2022
    Assignees: Beijing JingDong ShangKe Information Technology Co., Ltd., Beijing Jingdong Century Trading Co., Ltd.
    Inventors: Yazhuo Wang, Yu Guan, Zhongfei Xu
  • Patent number: 11460979
    Abstract: According to an embodiment disclosed in the specification, a display device may include a microphone, a display displaying a screen including a plurality of layers, a memory storing a plurality of application programs, and at least one processor displaying a first user interface (UI) for interacting with a user on a first layer among the plurality of layers, displaying a second UI for displaying information obtained by performing the interaction on a second layer among the plurality of layers, and displaying an image at least partly overlapping with the first UI and the second UI on a third layer among the plurality of layers.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: October 4, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jibum Moon, Jina Kwon, Kyerim Lee
  • Patent number: 11450312
    Abstract: A speech recognition method includes: obtaining speech information; and determining beginning and ending positions of a candidate speech segment in the speech information by using a weighted finite state transducer (WFST) network. The candidate speech segment is identified as corresponding to a preset keyword. The method also includes clipping the candidate speech segment from the speech information according to the beginning and ending positions of the candidate speech segment; detecting whether the candidate speech segment includes a preset keyword by using a machine learning model; and determining, upon determining that the candidate speech segment comprises the preset keyword, that the speech information comprises the preset keyword.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: September 20, 2022
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Shilun Lin, Xilin Zhang, Wenhua Ma, Bo Liu, Xinhui Li, Li Lu, Xiucai Jiang
  • Patent number: 11404064
    Abstract: An information processing apparatus includes a first detector, a textualization device, a second detector, a display device and a display controller. The first detector detects, from audio data in which speech of each person in a group composed of a plurality of persons has been recorded, each utterance made during the speech. The textualization device converts contents of each utterance detected by the first detector into text. The second detector detects predetermined keywords included in each utterance on the basis of text data obtained through textualization by the textualization device. The display controller causes the display device to display the predetermined keywords detected by the second detector.
    Type: Grant
    Filed: November 2, 2018
    Date of Patent: August 2, 2022
    Assignee: KYOCERA Document Solutions Inc.
    Inventors: Yuki Kobayashi, Nami Nishimura, Tomoko Mano
  • Patent number: 11403875
    Abstract: A processing method of face recognition includes steps of: extracting embedding feature information from a face image; outputting a recognition result of face recognition according to the embedding feature information, wherein the recognition result includes a recognized name and embedding feature distance information; determining whether the recognized name is in a list or not; if the recognized name is in the list, performing a removal checking step for determining whether to remove the recognition result based on the embedding feature distance information; if determining that the recognition result is not to be removed, displaying the recognized name; if determining that the recognition result is to be removed, displaying a negative prompt; and dynamically and instantly providing a feedback and updating a recognition method for the face recognition.
    Type: Grant
    Filed: November 20, 2020
    Date of Patent: August 2, 2022
    Assignee: ASKEY COMPUTER CORP.
    Inventors: Chien-Fang Chen, Setya Widyawan Prakosa, Huan-Ruei Shiu, Chien-Ming Lee
  • Patent number: 11393490
    Abstract: According to embodiments of the present disclosure, a method, apparatus, device, and computer readable storage medium for voice interaction are provided. The method includes: determining a text corresponding to the voice signal based on a voice feature of a received voice signal. The method further includes: determining, based on the voice feature and the text, a matching degree between a reference voice feature of an element in the text and a target voice feature of the element. The method further includes: determining a first possibility that the voice signal is an executable command based on the text. The method further includes: determining a second possibility that the voice signal is the executable command based on the voice feature.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: July 19, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Zhijian Wang, Jinfeng Bai, Sheng Qian, Lei Jia
  • Patent number: 11364926
    Abstract: The invention relates to a method for operating a motor vehicle system of a motor vehicle regardless of the driving situation. The method is performed by a personalization device of the motor vehicle and includes identifying the driver of the motor vehicle and using the identity of the driver to determine multiple driver-specific configuration data sets. Each of the determined configuration data sets describes configuration data of a respective user profile of the identified driver in order to personalize the motor vehicle system. The method further includes determining at least one additional occupant in the motor vehicle and using the result of the determination of the at least one occupant, determine an intention of the determined driver. The method further incudes using the determined intention to select a personalization mode from a plurality of personalization modes.
    Type: Grant
    Filed: April 4, 2019
    Date of Patent: June 21, 2022
    Assignee: AUDI AG
    Inventors: Jürgen Lerzer, Nikoletta Sofra, Hans Georg Gruber, André Ebner, Ron Melz
  • Publication number: 20150127345
    Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.
    Type: Application
    Filed: September 30, 2011
    Publication date: May 7, 2015
    Inventors: Evan H. Parker, Michal R. Grabowski
  • Patent number: 8942980
    Abstract: A method of navigating in a sound content wherein at least one key word is stored in association with at least two positions representative of said key word in the sound content, and wherein the method comprises: a step of displaying a representation of the sound content; during playback of the sound content, a step of detecting a current extract representative of a key word stored at a first position; a step of determining at least one second extract representative of said key word and a second position as a function of the stored positions; and a step of highlighting the position of the extracts in the representation of the sound content. The invention also relates to a system adapted to implement the navigation method.
    Type: Grant
    Filed: February 11, 2011
    Date of Patent: January 27, 2015
    Assignee: Orange
    Inventors: Pascal Le Mer, Delphine Charlet, Marc Denjean, Antoine Gonot
  • Patent number: 8930576
    Abstract: The present invention is directed to a secure communication network that enables multi-point to multi-point proxy communication over the network. The network employs a smart server that establishes a secure communication link with each of a plurality of smart client devices deployed on local client networks. Each smart client device is in communication with a plurality of agent devices. A plurality of remote devices can access the smart server directly and communicate with an agent device via the secure communication link between the smart server and one of the smart client devices.
    Type: Grant
    Filed: July 11, 2014
    Date of Patent: January 6, 2015
    Assignee: KE2 Therm Solutions, Inc.
    Inventors: Steve Roberts, Cetin Sert
  • Publication number: 20140372119
    Abstract: In general, the subject matter described in this specification can be embodied in methods, systems, and program products for performing compounded text segmentation. Compounded text that is extracted from one or more search queries submitted to a search engine is received. The compounded text includes a plurality of individual words that are joined together without intervening spaces. An electronic dictionary including words is accessed. A data structure representing possible segmentations of the compounded text is generated based on whether words in the possible segmentations occur in the electronic dictionary. A data store comprising data associated with a same field of usage as the compounded text is accessed to determine a frequency of occurrence for possible segmentations of the data structure. A segmentation of the compounded text that is most probable based on the data is determined. A language model is trained using the determined segmentation of the compounded text.
    Type: Application
    Filed: September 28, 2009
    Publication date: December 18, 2014
    Inventors: Carolina Parada, Boulos Harb, Johan Schalkwyk
  • Patent number: 8850072
    Abstract: The present invention is directed to a secure communication network that enables multi-point to multi-point proxy communication over the network. The network employs a smart server that establishes a secure communication link with each of a plurality of smart client devices installed on local client networks. Each smart client device is in communication with a plurality of agent devices. A plurality of remote devices can access the smart server directly and communicate with agent devices via the secure communication link between the smart server and one of the smart client devices. This communication is enabled without complex configuration of firewall or network parameters by the user.
    Type: Grant
    Filed: July 25, 2013
    Date of Patent: September 30, 2014
    Assignee: KE2 Therm Solutions, Inc.
    Inventors: Steve Roberts, Cetin Sert
  • Publication number: 20140249813
    Abstract: A transcript interface for displaying a plurality of words of a transcript in a text editor can be provided and configured to receive a command to edit the transcript. Limited edits to data corresponding to the transcript can be made based on in response to commands received via the user interface module. For example, edits may be limited to selection of a single word in the text editor for editing via a given command. The edit may affect an adjacent word in some instances, such as when two adjacent words are merged. In some embodiments, data corresponding to the selected word of the transcript is changed to reflect the edit without changing data defining the relative timing of those words of the transcript that are not adjacent to the selected word.
    Type: Application
    Filed: December 1, 2008
    Publication date: September 4, 2014
    Applicant: Adobe Systems Incorporated
    Inventor: Steven Hoeg
  • Patent number: 8731609
    Abstract: A mobile device, such as a cellular telephone includes a voice interface that includes one part that may not be specific to a particular carrier, and a second part that provides an interface to services that are specific to a carrier or to service or information providers that are not necessarily available with all carriers. A voice command interface provides easy access to the carrier services. The set of carrier services is optionally extendible by the carrier.
    Type: Grant
    Filed: August 9, 2011
    Date of Patent: May 20, 2014
    Assignee: Nuanace Communications, Inc.
    Inventors: Daniel L. Roth, Chris Reiner, Mark Furnari, Jordan Cohen
  • Publication number: 20140129218
    Abstract: Computer-based speech recognition can be improved by recognizing words with an accurate accent model. In order to provide a large number of possible accents, while providing real-time speech recognition, a language tree data structure of possible accents is provided in one embodiment such that a computerized speech recognition system can benefit from choosing among accent categories when searching for an appropriate accent model for speech recognition.
    Type: Application
    Filed: November 6, 2012
    Publication date: May 8, 2014
    Applicant: Spansion LLC
    Inventors: Chen Liu, Richard Fastow
  • Publication number: 20140129217
    Abstract: Embodiments of the present invention include an apparatus, method, and system for calculating senone scores for multiple concurrent input speech streams. The method can include the following: receiving one or more feature vectors from one or more input streams; accessing the acoustic model one senone at a time; and calculating separate senone scores corresponding to each incoming feature vector. The calculation uses a single read access to the acoustic model for a single senone and calculates a set of separate senone scores for the one or more feature vectors, before proceeding to the next senone in the acoustic model.
    Type: Application
    Filed: November 6, 2012
    Publication date: May 8, 2014
    Inventor: Ojas A. BAPAT
  • Publication number: 20140122086
    Abstract: Embodiments related to the use of depth imaging to augment speech recognition are disclosed. For example, one disclosed embodiment provides, on a computing device, a method including receiving depth information of a physical space from a depth camera, receiving audio information from one or more microphones, identifying a set of one or more possible spoken words from the audio information, determining a speech input for the computing device based upon comparing the set of one or more possible spoken words from the audio information and the depth information, and taking an action on the computing device based upon the speech input determined.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 1, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Jay Kapur, Ivan Tashev, Mike Seltzer, Stephen Edward Hodges
  • Publication number: 20140100848
    Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.
    Type: Application
    Filed: October 5, 2012
    Publication date: April 10, 2014
    Applicant: AVAYA INC.
    Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
  • Patent number: 8693977
    Abstract: Techniques for achieving personal security via mobile devices are presented. A portable mobile communication device, such as a phone or a personal digital assistant (PDA), is equipped with geographic positioning capabilities and is equipped with audio and visual devices. A panic mode of operation can be automatically detected in which real time audio and video for an environment surrounding the portable communication device are captured along with a geographic location for the portable communication device. This information is streamed over the Internet to a secure site where it can be viewed in real time and/or later inspected.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: April 8, 2014
    Assignee: Novell, Inc.
    Inventors: Sandeep Patnaik, Saheednanda Singh, AnilKumar Bolleni
  • Publication number: 20140074472
    Abstract: A voice control system is adapted for controlling an electrical appliance, and includes a host and a portable voice control device. The portable voice control device is capable of wireless communication with the host, and includes an audio pick-up unit for receiving a voice input. One of the host and the portable voice control device includes a voice recognition control module that is configured to recognize a control command from the voice input. The host controls operation of the electrical appliance according to the control command, and transmits an appliance status message to the portable voice control device. The portable voice control device further includes an output unit for outputting the appliance status message.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Inventors: Chih-Hung Lin, Teh-Jang Chen
  • Publication number: 20140074468
    Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
    Type: Application
    Filed: September 7, 2012
    Publication date: March 13, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
  • Publication number: 20140067391
    Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: INTERACTIVE INTELLIGENCE, INC.
    Inventors: Aravind Ganapathiraju, Yingyi Tan, Felix Immanuel Wyss, Scott Allen Randal
  • Publication number: 20140067392
    Abstract: A method of providing hands-free services using a mobile device having wireless access to computer-based services includes receiving speech in a vehicle from a vehicle occupant; recording the speech using a mobile device; transmitting the recorded speech from the mobile device to a cloud speech service; receiving automatic speech recognition (ASR) results from the cloud speech service at the mobile device; and comparing the recorded speech with the received ASR results at the mobile device to identify one or more error conditions.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Denis R. Burke, Danilo Gurovich, Daniel E. Rudman, Keith A. Fry, Shane M. McCutchen, Marco T. Carnevale, Mukesh Gupta
  • Patent number: 8655660
    Abstract: The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server.
    Type: Grant
    Filed: February 10, 2009
    Date of Patent: February 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Zsolt Szalai, Philippe Bazot, Bernard Pucci, Joel Vitale
  • Publication number: 20140039881
    Abstract: The instant application includes computationally-implemented systems and methods that include managing adaptation data, the adaptation data is at least partly based on at least one speech interaction of a particular party, facilitating transmission of the adaptation data to a target device when there is an indication of a speech-facilitated transaction between the target device and the particular party, such that the adaptation data is to be applied to the target device to assist in execution of the speech-facilitated transaction, and facilitating acquisition of adaptation result data that is based on at least one aspect of the speech-facilitated transaction and to be used in determining whether to modify the adaptation data. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: August 1, 2012
    Publication date: February 6, 2014
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud
  • Publication number: 20140039885
    Abstract: Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information.
    Type: Application
    Filed: August 2, 2012
    Publication date: February 6, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: David E. Reich, Christopher Hardy
  • Publication number: 20140039891
    Abstract: Systems and methods for audio editing are provided. In one implementation, a computer-implemented method is provided. The method includes receiving digital audio data including a plurality of distinct vocal components. Each distinct vocal component is automatically identified using one or more attributes that uniquely identify each distinct vocal component. The audio data is separated into two or more individual tracks where each individual track comprises audio data corresponding to one distinct vocal component. The separated individual tracks are then made available for further processing.
    Type: Application
    Filed: October 16, 2007
    Publication date: February 6, 2014
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: Nariman Sodeifi, David E. Johnston
  • Publication number: 20140029733
    Abstract: A speech server and methods provide audio stream analysis for tone detection in addition to speech recognition to implement an accurate and efficient answering machine detection strategy. By performing both tone detection and speech recognition in a single component, such as the speech server, the number of components for digital signal processing may be reduced. The speech server communicates tone events detected at the telephony level and enables voice applications to detect tone events consistently and provide consistent support and accuracy of both inbound and outbound voice applications independent of the hardware or geographical location of the telephony network. In addition, an improved opportunity for signaling of an appropriate moment for an application to leave a message is provided, thereby supporting automation.
    Type: Application
    Filed: July 26, 2012
    Publication date: January 30, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Kenneth W.D. Smith, Jaques de Broin
  • Publication number: 20140025377
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Application
    Filed: August 10, 2012
    Publication date: January 23, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Publication number: 20140012579
    Abstract: In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20140012582
    Abstract: In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20140006025
    Abstract: This disclosure includes, for example, methods and computer systems for providing audio-activated resource access for user devices. The computer systems may store instructions to cause the processor to perform operations, comprising capturing audio at a user device. The operations may also comprise using a speaker recognition system to identify a speaker in the transmitted audio and/or using a speech-to-text converter to identify text in the captured audio. The speaker identity or a condensed version of the speaker identity or other metadata along with the speaker identity may be transmitted to a server system to determine a corresponding speaker identity entry. The operations may also comprise receiving a resource corresponding to the identified speaker entry in the server system.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Harshini Ramnath Krishnan, Andrew Fregly
  • Publication number: 20130346066
    Abstract: Joint decoding of words and tags may be provided. Upon receiving an input from a user comprising a plurality of elements, the input may be decoded into a word lattice comprising a plurality of words. A tag may be assigned to each of the plurality of words and a most-likely sequence of word-tag pairs may be identified. The most-likely sequence of word-tag pairs may be evaluated to identify an action request from the user.
    Type: Application
    Filed: June 20, 2012
    Publication date: December 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Anoop Kiran Deoras, Dilek Zeynep Hakkani-Tur, Ruhi Sarikaya, Gokhan Tur
  • Publication number: 20130339027
    Abstract: A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.
    Type: Application
    Filed: June 15, 2012
    Publication date: December 19, 2013
    Inventors: Tarek El Dokor, James Holmes, Jordan Cluster, Stuart Yamamoto, Pedram Vaghefinazari
  • Publication number: 20130339021
    Abstract: Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker.
    Type: Application
    Filed: June 19, 2012
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Om D. Deshmukh, Sachindra Joshi, Saurabh Saket, Ashish Verma
  • Publication number: 20130339018
    Abstract: A system and method of verifying the identity of an authorized user in an authorized user group through a voice user interface for enabling secure access to one or more services via a mobile device includes receiving first voice information from a speaker through the voice user interface of the mobile device, calculating a confidence score based on a comparison of the first voice information with a stored voice model associated with the authorized user and specific to the authorized user, interpreting the first voice information as a specific service request, identifying a minimum confidence score for initiating the specific service request, determining whether or not the confidence score exceeds the minimum confidence score, and initiating the specific service request if the confidence score exceeds the minimum confidence score.
    Type: Application
    Filed: July 27, 2012
    Publication date: December 19, 2013
    Applicant: SRI INTERNATIONAL
    Inventors: Nicolas Scheffer, Yun Lei, Douglas A. Bercow
  • Publication number: 20130332147
    Abstract: The technology of the present application provides a method and apparatus to allow for dynamically updating a language model across a large number of similarly situated users. The system identifies individual changes to user profiles and evaluates the change for a broader application, such as, a dialect correction for a speech recognition engine, as administrator for the system identifies similarly situated user profiles and downloads the profile change to effect a dynamic change to the language model of similarly situated users.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: NVOQ INCORPORATED
    Inventor: Charles Corfield
  • Publication number: 20130325459
    Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
  • Publication number: 20130317823
    Abstract: Systems, methods, and computer-readable media that may be used to modify a voice action system to include voice actions provided by advertisers or users are provided. One method includes receiving electronic voice action bids from advertisers to modify the voice action system to include a specific voice action (e.g., a triggering phrase and an action). One or more bids may be selected. The method includes, for each of the selected bids, modifying data associated with the voice action system to include the voice action associated with the bid, such that the action associated with the respective voice action is performed when voice input from a user is received that the voice action system determines to correspond to the triggering phrase associated with the respective voice action.
    Type: Application
    Filed: May 23, 2012
    Publication date: November 28, 2013
    Inventor: Pedro J. Moreno Mengibar