Image To Speech Patents (Class 704/260)
  • Patent number: 10707982
    Abstract: A server system accesses a listening history of a user of the media-providing service, where the user is in a demographic group. For each track of a plurality of tracks in the listening history of the user, the server system determines a year associated with the track and calculates a first metric based at least in part of an affinity of members of a demographic group, as compared to members of other demographic groups, of music from the year associated with the track. The server system selects content for the user based on the first metric and provides the selected content to a client device associated with the user.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: July 7, 2020
    Assignee: Spotify AB
    Inventors: Clay Gibson, Santiago Gil, Ian Anderson, Oguz Semerci, Scott Wolf, Margreth Mpossi
  • Patent number: 10706837
    Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
    Type: Grant
    Filed: June 13, 2018
    Date of Patent: July 7, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
  • Patent number: 10699694
    Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify speech units that are required for synthesizing speech. The system can request from a server the text-to-speech unit needed to synthesize the speech. The system can then synthesize speech using text-to-speech units already stored and a received text-to-speech unit from the server.
    Type: Grant
    Filed: November 19, 2018
    Date of Patent: June 30, 2020
    Inventors: Benjamin J. Stern, Mark Charles Beutnagel, Alistair D. Conkie, Horst J. Schroeter, Amanda Joy Stent
  • Patent number: 10691896
    Abstract: Examples of the present disclosure describe systems and methods relating to conversational system user behavior identification. A user of the conversational system may be evaluated based on one or more factors. The one or more factors may be compared to an aggregated measure for a larger group of conversational system users, such that “anomalous” behavior (e.g., behavior that deviates from a normal behavior) may be identified. When a user is identified as exhibiting anomalous behavior, the conversational system may adapt its interactions with the user in order to encourage, discourage, or further observe the identified behavior. As a result, the conversational system may be able to verify a user's anomalous behavior, discourage the anomalous behavior, or take other action while interacting with the user.
    Type: Grant
    Filed: October 2, 2018
    Date of Patent: June 23, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Joseph Edwin Johnson, Jr., Emmanouil Koukoumidis, Donald Brinkman, Matthew Schuerman
  • Patent number: 10685389
    Abstract: A mobile app for a mobile device includes logic for an overlay for the UI of the mobile device, and a location on the UI where the overlay is to be displayed, as specified by the user. The mobile app also comprises a scanning layer to create a list of items to be located and acquired. Instructions for locating the items, and objectives for ordering the locations of the items on the overlay are received from the user. The objectives may be for a trip to acquire the items, such as total cost, time, number of stops, preferred establishments for acquiring the items, preferred brands of the items, and other bases. The locations of the items are identified and filtered in accordance with the instructions. The locations of the items are displayed on the overlay at the area of the user interface specified by the user, according to geographical location.
    Type: Grant
    Filed: October 28, 2015
    Date of Patent: June 16, 2020
    Assignee: eBay Inc.
    Inventor: Mark Delun Yuan
  • Patent number: 10681104
    Abstract: Techniques are described for handling offsets (gaps or overlaps) that occur between segments of streaming content, e.g., between a segment of primary content (e.g., a live sporting event) and a segment of secondary content (e.g., ad content) dynamically inserted into the stream. The effect of such offsets can be that synchronization between the video and audio portions of the stream can be lost. By tracking a cumulative offset derived from the audio portion of the stream and applying that offset to the presentation times of each affected sample of both the audio and video portions of the stream, synchronization of the audio and video is maintained.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: June 9, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Yongjun Wu, Henry Liu, Amritaansh Verma
  • Patent number: 10679606
    Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. Original text is analyzed to determine characteristics of the text and/or to derive or augment an intent (e.g., an intent code). Non-lexical cue insertion points are determined based on the characteristics of the text and/or the intent. One or more non-lexical cues are inserted at insertion points to generate augmented text. The augmented text is synthesized into speech, including converting the non-lexical cues to speech output.
    Type: Grant
    Filed: July 17, 2018
    Date of Patent: June 9, 2020
    Assignee: Intel Corporation
    Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
  • Patent number: 10671992
    Abstract: Disclosed are a terminal and controlling method thereof. The present invention includes a display unit configured to display a sound output menu for selecting whether to perform a voice output in performing a payment, a communication unit configured to transceive payment information, a controller, if the payment is performed when the sound output menu is ON state, converting the payment information into a voice, and an audio output module configured to output the payment information converted into the voice.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: June 2, 2020
    Inventor: Jaebeom Jeon
  • Patent number: 10665218
    Abstract: The present disclosure provides an audio data processing method performed at an electronic apparatus, the method including: obtaining a corresponding lyric file according to audio data to be processed; dividing the audio data according to a sentence in the lyric file to obtain an audio data segment; extracting data corresponding to an end syllable in the audio data segment; and performing harmonic processing on the data corresponding to the end syllable. In addition, further provided is an audio data processing device matching the method. The audio data processing method and device can prevent entire audio data from being attached with a harmonic sound effect during an entire time period, thereby improving authenticity of harmonic simulation.
    Type: Grant
    Filed: May 1, 2018
    Date of Patent: May 26, 2020
    Inventors: Weifeng Zhao, Xueqi Chen
  • Patent number: 10643600
    Abstract: A method and system for personalizing synthetic speech from a text-to-speech (TTS) system is disclosed. The method uses linguistic feature vectors to correct/modify the synthetic speech, particularly Chinese Mandarin speech. The linguistic feature vectors are used to generate or retrieve onset and rime scaling factors encoding differences between the synthetic speech and a user's natural speech. Together, the onset and rime scaling factors are used to modify every word/syllable of the synthetic speech from a TTS system, for example. In particular, segments of synthetic speech are either compressed or stretched in time for each part of each syllable of the synthetic speech. After modification, the synthetic speech more closely resembles the speech patterns of a speaker for which the scaling factors were generated. The modified synthetic speech may then be transmitted to a user and played to the user via a mobile phone, for example.
    Type: Grant
    Filed: March 9, 2018
    Date of Patent: May 5, 2020
    Assignee: OBEN, INC.
    Inventor: Sandesh Aryal
  • Patent number: 10645023
    Abstract: In one embodiment, a social networking system maintains a moving average of the number of connection problems, including socket timeouts and failed uploads, per client in a geographic area to determine whether the wireless data network serving the geographic area is overloaded. In response to detecting a network overload, the social networking system may transmit an instruction to the clients in the particular geographic area to enter one of a plurality of traffic throttling modes. In particular embodiments, the social networking system maintains a historical log of network overload conditions, and uses the historical log to generate an estimate of the wireless network capacity serving a geographic area. Thus, the social networking system may preemptively transmit instructions to clients to enter a bandwidth-conservation mode when the estimated traffic demand exceeds the estimated capacity for a particular geographic region.
    Type: Grant
    Filed: January 23, 2017
    Date of Patent: May 5, 2020
    Assignee: Facebook, Inc.
    Inventors: David Harry Garcia, Justin Mitchell
  • Patent number: 10642463
    Abstract: One or more embodiments present positional information associated with a text or music to a user. In one embodiment, a determination is made that at least one line from a digital representation of text or music has been selected. Another determination is made that the line is associated with a set of positional information. The set of positional information is presented on a digital representation of a venue along with the presentation of the line of text or music.
    Type: Grant
    Filed: January 15, 2018
    Date of Patent: May 5, 2020
    Inventor: Randall Lee Threewits
  • Patent number: 10635800
    Abstract: Device, system, and method of voice-based user authentication utilizing a challenge. A system includes a voice-based user-authentication unit, to authenticate a user based on a voice sample uttered by the user. A voice-related challenge generator operates to generate a voice-related challenge that induces the user to modify one or more vocal properties of the user. A reaction-to-challenge detector operates to detect a user-specific vocal modification in reaction to the voice-related challenge; by using a processor as well as an acoustic microphone, an optical microphone, or a hybrid acoustic-and-optical microphone. The voice-based user-authentication unit utilizes the user-specific vocal modification, that was detected as reaction to the voice-related challenge, as part of a user-authentication process.
    Type: Grant
    Filed: April 11, 2018
    Date of Patent: April 28, 2020
    Inventor: Tal Bakish
  • Patent number: 10636412
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
    Type: Grant
    Filed: September 17, 2018
    Date of Patent: April 28, 2020
    Assignee: Cerence Operating Company
    Inventor: Alistair D. Conkie
  • Patent number: 10629205
    Abstract: An approach is provided for identifying an accurate transcription of a sentence. Options for transcriptions of each word in the sentence are determined. Probabilistic scores of the options are determined. Variations of a transcription of the sentence are generated by randomly selecting from the options with the probabilistic scores weighting the selections. Plausibility scores for the variations are generated by performing syntactic, semantic, and redundancy analyses of the variations. Based on the plausibility scores, the probabilistic scores, and the variations, tentative transcriptions of the sentence are determined and refined repeatedly by employing a genetic evolution technique until a final refined tentative transcription is the accurate transcription of the sentence.
    Type: Grant
    Filed: June 12, 2018
    Date of Patent: April 21, 2020
    Assignee: International Business Machines Corporation
    Inventors: Giulia Carnevale, Marco Gianfico, Ciro Ragusa, Roberto Ragusa
  • Patent number: 10621968
    Abstract: A method for establishing an articulatory speech synthesis model of a person's voice includes acquiring image data representing a visage of a person, in which the visage includes facial characteristics defining exteriorly visible articulatory speech synthesis model parameters of the person's voice; selecting a predefined articulatory speech synthesis model from among stores of predefined models, the selection based at least in part on one or both of the facial characteristics or the exteriorly visible articulatory speech synthesis model parameters; and associating at least a portion of the selected predefined articulatory speech synthesis model with the articulatory speech synthesis model of the person's voice.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: April 14, 2020
    Assignee: Intel Corporation
    Inventors: Shamim Begum, Alexander A. Oganezov
  • Patent number: 10614122
    Abstract: Modifying computer program output in a voice or non-text input activated environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify a computer program to invoke. The computer program can identify a dialog data structure. The system can modify the identified dialog data structure to include a content item. The system can provide the modified dialog data structure to a computing device for presentation, wherein the dialog data structure comprises a placeholder field, positioned on the dialog data structure based on content, in order to aid in inserting of content.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: April 7, 2020
    Assignee: Google LLC
    Inventors: Laura Eidem, Alex Jacobson
  • Patent number: 10614795
    Abstract: An acoustic model generation method and device, and a speech synthesis method and device. The acoustic model generation method comprises: acquiring personalized data, wherein the personalized data is obtained after processing according to personalized speech data and corresponding personalized text data (S11); acquiring a pre-generated reference acoustic model, wherein the reference acoustic model is generated according to existing large-scale samples (S12); and carrying out self-adaptive model training according to the personalized data and the reference acoustic model to generate a personalized acoustic model (S13). According to the method, an acoustic model can be rapidly generated, and personalized requirements of users can be satisfied.
    Type: Grant
    Filed: July 14, 2016
    Date of Patent: April 7, 2020
    Inventor: Xiulin Li
  • Patent number: 10607609
    Abstract: An augmented reality (AR) device can be configured to monitor ambient audio data. The AR device can detect speech in the ambient audio data, convert the detected speech into text, or detect keywords such as rare words in the speech. When a rare word is detected, the AR device can retrieve auxiliary information (e.g., a definition) related to the rare word from a public or private source. The AR device can display the auxiliary information for a user to help the user better understand the speech. The AR device may perform translation of foreign speech, may display text (or the translation) of a speaker's speech to the user, or display statistical or other information associated with the speech.
    Type: Grant
    Filed: August 10, 2017
    Date of Patent: March 31, 2020
    Assignee: Magic Leap, Inc.
    Inventors: Jeffrey Sommers, Jennifer M. R. Devine, Joseph Wayne Seuck, Adrian Kaehler
  • Patent number: 10579219
    Abstract: An application may be hosted for utilization by a remote computing platform. User interface (UI) elements of a UI generated by the hosted application may be identified. Proxy UI elements may be generated. Each of the proxy UI elements may correspond to one or more of the identified UI elements. A transcript of an audio sample may be processed. The audio sample may comprise an utterance of a user of the remote computing platform. The transcript of the audio sample may comprise at least one word corresponding to one or more of the proxy UI elements. A functionality of the hosted application may be invoked. The invoked functionality may correspond to one or more of the UI elements corresponding to the one or more of the proxy UI elements.
    Type: Grant
    Filed: October 24, 2016
    Date of Patent: March 3, 2020
    Assignee: Citrix Systems, Inc.
    Inventor: Georgy Momchilov
  • Patent number: 10572858
    Abstract: Artificial intelligence is introduced into an electronic meeting context to perform various tasks before, during, and/or after electronic meetings. The tasks may include a wide variety of tasks, such as agenda creation, participant selection, real-time meeting management, meeting content supplementation, and post-meeting processing. The artificial intelligence may analyze a wide variety of data such as data pertaining to other electronic meetings, data pertaining to organizations and users, and other general information pertaining to any topic. Capability is also provided to create, manage, and enforce meeting rules templates that specify requirements and constraints for various aspects of electronic meetings.
    Type: Grant
    Filed: October 11, 2016
    Date of Patent: February 25, 2020
    Assignee: RICOH COMPANY, LTD.
    Inventors: Steven A. Nelson, Hiroshi Kitada, Lana Wong
  • Patent number: 10553201
    Abstract: A method of speech synthesis is provided, which comprises: determining a phoneme sequence of a to-be-processed text; inputting the phoneme sequence into a pre-trained speech model to obtain an acoustic characteristic corresponding to each phoneme in the phoneme sequence, where the speech model is used for characterizing a corresponding relationship between each phoneme in the phoneme sequence and the acoustic characteristic; determining, for each phoneme in the phoneme sequence, at least one speech waveform unit corresponding to each phoneme based on a preset index of phonemes and speech waveform units, and determining a target speech waveform unit of the at least one speech waveform unit based on the acoustic characteristic corresponding to the phoneme and a preset cost function; and synthesizing the target speech waveform unit corresponding to each phoneme in the phoneme sequence to generate a speech.
    Type: Grant
    Filed: September 18, 2018
    Date of Patent: February 4, 2020
    Inventor: Zhiping Zhou
  • Patent number: 10543931
    Abstract: A method and system of monitoring aural and message alerts received during flight in an internet of things (IOT) cockpit of an aircraft generated by systems within the cockpit, the method includes: receiving a plurality of alerts which include at least one of an aural alert or message alert; applying a first natural language processing (NLP) process to the aural alert to convert the aural alert to a text alert consistent in structure with the message alert for aggregating together with the message alert to form a concatenated message alert; and identifying the context of the concatenated message alert by applying a second NLP process to the concatenated message alert in its entirety and subsequent tagging the concatenated message alert to associate a tagged message with a display element wherein the tagged message is a concatenated message.
    Type: Grant
    Filed: October 23, 2017
    Date of Patent: January 28, 2020
    Inventors: Hariharan Saptharishi, Mohan Gowda Chandrashekarappa, Narayanan Srinivasan
  • Patent number: 10529310
    Abstract: A method for converting textual messages to musical messages comprising receiving a text input and receiving a musical input selection. The method includes analyzing the text input to determine text characteristics and analyzing a musical input corresponding to the musical input selection to determine musical characteristics. Based on the text characteristic and the musical characteristic, the method includes correlating the text input with the musical input to generate a synthesizer input, and sending the synthesizer input to a voice synthesizer. The method includes receiving a vocal rendering of the text input from the voice synthesizer and generating a musical message from the vocal rendering and the musical input. The method includes generating a video element based on a video input, incorporating the video element into the musical message, and outputting the musical message including the video element.
    Type: Grant
    Filed: February 13, 2017
    Date of Patent: January 7, 2020
    Assignee: ZYA, INC.
    Inventors: Matthew Michael Serletic, II, Bo Bazylevsky, James Mitchell, Ricky Kovac, Patrick Woodward, Thomas Webb, Ryan Groves
  • Patent number: 10514833
    Abstract: Contextual paste target prediction is used to predict one or more target applications for a paste action, and do so based upon a context associated with the content that has previously been selected and copied. The results of the prediction may be used to present to a user one or more user controls to enable the user to activate one or more predicted application, and in some instances, additionally configure a state of a predicted application to use the selected and copied content once activated. As such, upon completing a copy action, a user may, in some instances, be provided with an ability to quickly switch to an application into which the user was intending to paste the content. This can provide a simpler user interface in a device such as phones and tablet computers with limited display size and limited input device facilities. It can result in a paste operation into a different application with fewer steps than is possible conventionally.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: December 24, 2019
    Assignee: GOOGLE LLC
    Inventors: Aayush Kumar, Gokhan Bakir
  • Patent number: 10516775
    Abstract: Provided is a computer implemented method and system for delivering text messages, emails, and messages from a messenger application to a user while the user is engaged in an activity, such as driving, exercising, or working. Typically, the emails and other messages are announced to the user and read aloud without any user input. In Drive Mode, while the user is driving, a clean interface is shown to the user, and the user can hear announcements and messages/emails aloud without looking at the screen of the phone, and use gestures to operate the phone. After a determination is made that a new text message and/or email has arrived, the user is informed aloud of the text message/email/messenger message and in most instances, and if the user takes no further action, the body and/or subject of the text message/email/messenger message is read aloud to the user. All messages can be placed in a single queue, and read to the user in order of receipt.
    Type: Grant
    Filed: January 11, 2019
    Date of Patent: December 24, 2019
    Assignee: messageLOUD Inc.
    Inventor: Garin Toren
  • Patent number: 10503835
    Abstract: A computerized-social network provides a community of users with features and tools facilitating an immersive, collaborative environment where users can learn a language or help others learn a language. One user (user A) can view another user's (user B) Web page or document and make suggestions or comments for selected content on that Web page. These suggestions are linked specifically to the selected content. User B can review the suggestions, and accept or reject the suggestions by user A and others.
    Type: Grant
    Filed: March 9, 2017
    Date of Patent: December 10, 2019
    Inventors: Sam Neff, Raymond Galang, Sundararajan Parasuraman
  • Patent number: 10499235
    Abstract: A process for locating required responders in a wireless ad-hoc network of mobile computing devices (MCDs) starts with wirelessly receiving, at a second MCD directly from a first MCD, a message indicating a requested type of responder needed and a location at which the responder is needed. The second MCD then compares the requested type of responder needed to a locally stored type of responder associated with a current user. If a match is found, the second MCD alerts the current user provides a response mechanism. The second MCD then makes a forwarding determination of whether or not the message should be forwarded directly to one or more other mobile computing devices. The second MCD one of further directly wirelessly transmits and refrains from further transmitting the message to the one or more other mobile computing devices as a function of the forwarding determination.
    Type: Grant
    Filed: July 29, 2016
    Date of Patent: December 3, 2019
    Inventors: Mateusz Gazdziak, Pawel Arkadiusz Jedrzejewski, Dominik Trzupek
  • Patent number: 10438610
    Abstract: A virtual assistant may communicate with a user in a natural language that simulates a human. The virtual assistant may be associated with a human-configured knowledge base that simulates human responses. In some instances, a parent response may be provided by the virtual assistant and, thereafter, a child response that is associated with the parent response may be provided.
    Type: Grant
    Filed: August 25, 2014
    Date of Patent: October 8, 2019
    Inventors: Fred Brown, Tanya M Miller, Mark Zartler, Molly Q Brown
  • Patent number: 10418025
    Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in
    Type: Grant
    Filed: December 6, 2017
    Date of Patent: September 17, 2019
    Assignee: International Business Machines Corporation
    Inventors: Slava Shechtman, Zvi Kons
  • Patent number: 10419372
    Abstract: Systems and methods provide real-time communication between website operators and website visitors including monitoring, gathering, managing and sharing of information.
    Type: Grant
    Filed: January 31, 2017
    Date of Patent: September 17, 2019
    Assignee: LiveHelpNow, LLC
    Inventor: Michael Kansky
  • Patent number: 10387598
    Abstract: An exemplary bitmap file can be provided, which can include, for example, a map of a cell array structure of a memory(ies), a plurality of memory values superimposed on the cell array structure based on a simulated testing of the memory(ies). The memory values may be values being written to the memory(ies) while the memory(ies) is being tested. The memory values may be values in a test pattern(s) being used to test the memory(ies). Each cell in the cell array structure can have a particular memory value superimposed thereon. A cell(s) in the cell array structure may be highlighted, which may correspond to an incorrect memory value.
    Type: Grant
    Filed: September 13, 2017
    Date of Patent: August 20, 2019
    Assignee: Cadence Design Systems, Inc.
    Inventors: Steven Lee Gregor, Norman Robert Card
  • Patent number: 10387543
    Abstract: Systems and methods for automatically mapping English phonemes to graphemes to support better reading and spelling instruction may include a mapping process for systematically dividing text words into graphemes made up of one or more text characters corresponding to appropriately identified phonemes (which may be represented by one or more phonetic characters). The process may also include automatically correlating each phoneme of a word with a grapheme representing the phoneme in order to produce a phoneme-to-grapheme map that may be optimized for educational use. Some embodiments may include a teaching process for presenting the results of the mapping process to students.
    Type: Grant
    Filed: September 8, 2016
    Date of Patent: August 20, 2019
    Assignee: VKIDZ, INC.
    Inventors: John Edelson, Jose Perez-Diaz, Kris Craig, Obiora Obinyeluaku, Harold Milenkovic
  • Patent number: 10380252
    Abstract: A method includes storing a meaning taxonomy including meaning loaded entities and associations between the meaning loaded entities and a plurality of syntactic structures; receiving, from a first source, first content including one or more first syntactic structures; identifying, based at least in part on identification criteria, one or more meaning loaded entities that are linked to the one or more first syntactic structures by one or more first associations of the plurality of associations, the identification criteria including at least one of a type of the first content, a document size of the first content, a publication date of the first content, and an importance of the meaning load entities; calculating a first content summary indicating a level of coverage of the meaning loaded entities within the first content; and generating a first visual representation of the first content summary.
    Type: Grant
    Filed: January 6, 2017
    Date of Patent: August 13, 2019
    Inventor: C. David Seuss
  • Patent number: 10373603
    Abstract: Systems, methods, and computer-readable storage devices for receiving an utterance from a user and analyzing the utterance to identify the demographics of the user. The system then analyzes the utterance to determine the prosody of the utterance, and retrieves from the Internet data associated with the determined demographics. Using the retrieved data, the system retrieves, also from the Internet, recorded speech matching the identified prosody. The recorded speech, which is based on the demographic data of the utterance and has a prosody matching the utterance, is then saved to a database for future use in generating speech specific to the user.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: August 6, 2019
    Inventors: Srinivas Bangalore, Taniya Mishra
  • Patent number: 10373606
    Abstract: A transliteration support device according to an embodiment includes an acquisition unit, an extraction unit, a generation unit, and a reproduction unit. The acquisition unit acquires a text to be transliterated. The addition unit adds a transliteration tag indicating a transliteration setting of the text to the text. The extraction unit extracts a transliteration pattern in which a frequent appearance transliteration setting frequently appearing in the transliteration settings indicated by the transliteration tags and an applicable condition when the frequent appearance transliteration setting is applied to the text are in association with each other. The generation unit produces a synthesized voice using the transliteration pattern. The reproduction unit reproduces the produced synthesized voice.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: August 6, 2019
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Taira Ashikawa, Kosei Fume, Yuka Kuroda, Yoshiaki Mizuoka
  • Patent number: 10372738
    Abstract: Providing a speculative search result for a search query prior to completion of the search query is described. In response to receiving a search query from a client node, a speculative search result is provided to the client node for the search query prior to receiving an indication from the client node that said search query is completely formed. The speculative search result may be displayed on the same web page on the client node as the search query, while the search query is being entered by the user. As the user further enters the search query, a new speculative search result may be provided to the user.
    Type: Grant
    Filed: July 6, 2016
    Date of Patent: August 6, 2019
    Assignee: Jollify Management Limited
    Inventors: Stephen L Hood, Ralph Rabbat, Mihir Shah, Adam Durfee, Alastair Gourlay, Peter Anick, Richard Kasperski, Oliver Thomas Bayley, Ashley Woodman Hall, Shyam Kapur, John Thrall
  • Patent number: 10354256
    Abstract: A method and system are provided that provide an avatar based customer service experience with a human support agent. The methods and systems receive, from a customer computing (CC) device, a request for assistance fulfillable by one of a plurality of support agents. The methods and systems launch an avatar-based exchange that includes receiving customer issue definition (CID) information from the CC device regarding the request for assistance, defining a virtual character to be presented on the CC device; and providing pre-recorded support (PRS) content based on the CID information. The PRS content is presented in combination with animation of the virtual character. The methods and systems select a support agent, and transition a basis for the avatar-based exchange from the PRS content to support agent content such that the support agent communicates with the customer through the virtual character animated on the CC device.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: July 16, 2019
    Inventor: Michael James McInerny
  • Patent number: 10354661
    Abstract: A multi-channel audio decoder for providing at least two output audio signals on the basis of an encoded representation is configured to perform a weighted combination of a downmix signal, a decorrelated signal and a residual signal, to obtain one of the output audio signals. The multi-channel audio decoder is configured to determine a weight describing a contribution of the decorrelated signal in the weighted combination in dependence on the residual signal. A multi-channel audio encoder for providing an encoded representation of a multi-channel audio signal is configured to obtain a downmix signal on the basis of the multi-channel audio signal, to provide parameters describing dependencies between the channels of the multi-channel audio signal, and to provide a residual signal. The multi-channel audio encoder is configured to vary an amount of residual signal included into the encoded representation in dependence on the multi-channel audio signal.
    Type: Grant
    Filed: May 27, 2016
    Date of Patent: July 16, 2019
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Sascha Dick, Christian Helmrich, Johannes Hilpert, Andreas Hoelzer
  • Patent number: 10347237
    Abstract: According to an embodiment, a device includes a table creator, an estimator, and a dictionary creator. The table creator is configured to create a table based on similarity between distributions of nodes of speech synthesis dictionaries of a specific speaker in respective first and second languages. The estimator is configured to estimate a matrix to transform the speech synthesis dictionary of the specific speaker in the first language to a speech synthesis dictionary of a target speaker in the first language, based on speech and a recorded text of the target speaker in the first language and the speech synthesis dictionary of the specific speaker in the first language. The dictionary creator is configured to create a speech synthesis dictionary of the target speaker in the second language, based on the table, the matrix, and the speech synthesis dictionary of the specific speaker in the second language.
    Type: Grant
    Filed: July 9, 2015
    Date of Patent: July 9, 2019
    Inventors: Kentaro Tachibana, Masatsune Tamura, Yamato Ohtani
  • Patent number: 10347238
    Abstract: Systems and techniques are disclosed for synthesizing a new word or short phrase such that it blends seamlessly in the context of insertion or replacement in an existing narration. In one such embodiment, a text-to-speech synthesizer is utilized to say the word or phrase in a generic voice. Voice conversion is then performed on the generic voice to convert it into a voice that matches the narration. An editor and interface are described that support fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and guidance by the editors own voice.
    Type: Grant
    Filed: October 27, 2017
    Date of Patent: July 9, 2019
    Assignees: Adobe Inc., The Trustees of Princeton University
    Inventors: Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, Adam Finkelstein
  • Patent number: 10339217
    Abstract: Aspects described herein provide quality assurance checks for improving the construction of natural language understanding grammars. An annotation module may obtain a set of annotations for a set of text samples based, at least in part, on an ontology and a grammar. A quality assurance module may automatically perform one or more quality assurance checks on the set of annotations, the ontology, the grammar, or combinations thereof. The quality assurance module may generate a list of flagged annotations during performance of a quality assurance check. The list of flagged annotations may be presented at an annotation review interface displayed at a display device. One of the flagged annotations may be selected and presented at an annotation interface displayed at the display device. Responsive to presentation of the flagged annotation, the ontology, the grammar, the flagged annotation selected, or combinations thereof may be updated based on user input received.
    Type: Grant
    Filed: June 26, 2017
    Date of Patent: July 2, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Real Tremblay, Jerome Tremblay, Serge Robillard, Jackson Liscombe, Alina Andreevskaia, Tagyoung Chung
  • Patent number: 10332297
    Abstract: The invention relates to an electronic note graphical user interface that has a human-like interactive intelligent animated agent and provides specific note processing features including multimodal hands free operation, and includes methods and systems that include a processor configured to provide an Intelligent Interactive Agent as a graphic animation to a user, where the agent receives and processes verbal commands from the user to operates the GUI and executes GUI operations including showing the animation tapping, swiping, pinching, searching for text, entering text, and displaying retrieved content, in the one or more mobile electronic display notes displayed in the container display matrix.
    Type: Grant
    Filed: September 4, 2015
    Date of Patent: June 25, 2019
    Inventor: Vishal Vadodaria
  • Patent number: 10332520
    Abstract: In a particular aspect, an apparatus includes an audio sensor configured to receive an input audio signal. The apparatus also includes speech generative circuitry configured to generate a synthesized audio signal based at least partly on automatic speech recognition (ASR) data associated with the input audio signal and based on one or more parameters indicative of state information associated with the input audio signal.
    Type: Grant
    Filed: February 13, 2017
    Date of Patent: June 25, 2019
    Assignee: Qualcomm Incorporated
    Inventors: Erik Visser, Shuhua Zhang, Lae-Hoon Kim, Yinyi Guo, Sunkuk Moon
  • Patent number: 10318236
    Abstract: Approaches provide for using a voice communications device to control, refine, or otherwise manage the playback of media content in response to a spoken instruction. For example, the voice communications device can receive a request to refine and/or initiate the playback of media content, such as music, news, audio books, audio broadcasts, and other such content. Audio input data that includes the request can be received by the voice communications device and an application executing on the voice communications device or otherwise in communication with the voice communications device can analyze the audio input data to determine how to carry out the request. The application can determine whether there is an active play queue of media content configured to play using the voice communications device. In the situation where there is no media content being played using the voice communications device, the application can determine media content using information in the request.
    Type: Grant
    Filed: May 5, 2016
    Date of Patent: June 11, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Rickesh Pal, Kintan Dilipkumar Brahmbhatt, Brandon Scott Durham, Jonathan Barnett Feinstein, Yun Suk Paik, Daniel Paul Ryan
  • Patent number: 10319365
    Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.
    Type: Grant
    Filed: June 27, 2016
    Date of Patent: June 11, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Marco Nicolis, Adam Franciszek Nadolski
  • Patent number: 10311437
    Abstract: Provided is a method and a telephone-based system with voice-verification capabilities that enable a user to safely and securely conduct transactions with his or her online financial transaction program account over the phone in a convenient and user-friendly fashion, without having to depend on an internet connection.
    Type: Grant
    Filed: November 13, 2017
    Date of Patent: June 4, 2019
    Assignee: PAYPAL, INC.
    Inventor: Will Tonini
  • Patent number: 10304430
    Abstract: An electronic musical instrument includes; a plurality of keys, each of the plurality of keys specifying a pitch; a memory storing musical piece data representing a musical piece; and a processor, wherein the processor executes the following: retrieving the musical piece data of a musical piece from the memory and determining whether the musical piece data contains data of a lyric; and when the musical piece data contains the data of the lyric, and if a note specified by an operation of a key by a user is accompanied by a part of the lyric in the musical piece, causing data of a singing voice sound having the pitch specified by said operated key to be generated in accordance with the part of the lyric in response to the operation of the key, and causing the singing voice sound to be audibly output.
    Type: Grant
    Filed: March 16, 2018
    Date of Patent: May 28, 2019
    Assignee: CASIO COMPUTER CO., LTD.
    Inventor: Atsushi Nakamura
  • Patent number: 10276148
    Abstract: Some examples of assisted media representation can be implemented as a system and method that uses screen reader like functionality to speak information presented on a graphical user interface displayed by a media presentation system, including information that is not navigable by a remote control device. Information can be spoken in an order that follows a relative importance of the information based on a characteristic of the information or the location of the information within the graphical user interface. A history of previously spoken information is monitored to avoid speaking information more than once for a given graphical user interface. A different pitch can be used to speak information based on a characteristic of the information. Information that is not navigable by the remote control device can be spoken after time delay. Voice prompts can be provided for a remote-driven virtual keyboard displayed by the media presentation system. The voice prompts can be spoken with different voice pitches.
    Type: Grant
    Filed: November 4, 2010
    Date of Patent: April 30, 2019
    Assignee: APPLE INC.
    Inventors: Christopher B. Fleizach, Reginald Dean Hudson, Eric Taylor Seymour
  • Patent number: 10275420
    Abstract: The disclosure includes a system and method for summarizing social interactions between users.
    Type: Grant
    Filed: May 11, 2017
    Date of Patent: April 30, 2019
    Assignee: Google LLC
    Inventors: Nadav Aharony, Alan Lee Gardner, III, George Cody Sumter