Image To Speech Patents (Class 704/260)
  • Patent number: 11282498
    Abstract: A speech synthesis method and a speech synthesis apparatus to synthesize speeches of different emotional intensities in the field of artificial intelligence, where the method includes obtaining a target emotional type and a target emotional intensity parameter that correspond to an input text, determining a corresponding target emotional acoustic model based on the target emotional type and the target emotional intensity parameter, inputting a text feature of the input text into the target emotional acoustic model to obtain an acoustic feature of the input text, and synthesizing a target emotional speech based on the acoustic feature of the input text.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: March 22, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Liqun Deng, Yuezhi Hu, Zhanlei Yang, Wenhua Sun
  • Patent number: 11283845
    Abstract: A system, method, and computer program product are provided for managing conference calls between a plurality of conference call systems. In operation, a conference management system monitors a plurality of call conference systems to determine whether at least one first conference system is attempting to connect to at least one second conference system. The conference management system connects the at least one first conference system with the at least one second conference system such that communication between the at least one first conference system and the at least one second conference system is managed by the conference management system. Additionally, the conference management system provides one suite of services to users of the at least one first conference system and the at least one second conference system.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: March 22, 2022
    Assignee: AMDOCS DEVELOPMENT LIMITED
    Inventors: Diego Moskovits, Golan Nuri, Ben Menashe, Aran Azarzar
  • Patent number: 11282496
    Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: March 22, 2022
    Assignee: Google LLC
    Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
  • Patent number: 11276392
    Abstract: A method may include obtaining audio originating at a remote device during a communication session conducted between a first device and the remote device and obtaining a transcription of the audio. The method may also include processing the audio to generate processed audio. In some embodiments, the audio may be processed by a neural network that is trained with respect to an analog voice network and the processed audio may be formatted with respect to communication over the analog voice network. The method may further include processing the transcription to generate a processed transcription that is formatted with respect to communication over the analog voice network and multiplexing the processed audio with the processed transcription to obtain combined data. The method may also include communicating, to the first device during the communication session, the combined data over a same communication channel of the analog voice network.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: March 15, 2022
    Inventor: David Thomson
  • Patent number: 11269591
    Abstract: Aspects of the present invention disclose a method for delivering an artificial intelligence-based response to a voice command to a user. The method includes one or more processors identifying an audio command received by a computing device. The method further includes determining a first engagement level of a user, wherein an engagement level corresponds to an attentiveness level of the user in relation to the computing device based at least in part on indications of activities of the user. The method further includes identifying a first set of conditions within an immediate operating environment of the computing device, wherein the first set of conditions indicate whether to deliver a voice response to the identified audio command. The method further includes determining whether to deliver the voice response to the identified audio command to the user based at least in part on the first engagement level and first set of conditions.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: March 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Shilpa Shetty, Mithun Das, Amitabha Chanda, Sarbajit K. Rakshit
  • Patent number: 11263304
    Abstract: A dyschromatopsia deciding method and apparatus is provided. The apparatus includes an I/O interface configured to receive an input for a program, a memory configured to store the input for the program and a processing result of the input, and a processor configured to execute the program, wherein the processor is configured to provide first CAPTCHA information for distinguishing between a person and a machine together with second CAPTCHA information for deciding dyschromatopsia, receive first CAPTCHA input information corresponding to the first CAPTCHA information and second CAPTCHA input information corresponding to the second CAPTCHA information together with authentication information, authenticate a user based on the first CAPTCHA input information, decide dyschromatopsia of the user based on the second CAPTCHA input information, and store a decision result of the dyschromatopsia in response to a decision that the user has the dyschromatopsia.
    Type: Grant
    Filed: October 8, 2019
    Date of Patent: March 1, 2022
    Assignee: NETMARBLE CORPORATION
    Inventors: Il Hwan Seo, Hye Jeung Jeung, Min Jae Jeon
  • Patent number: 11250844
    Abstract: Agents engage and disengage with users intelligently. Users can tell agents to remain engaged without requiring a wakeword. Engaged states can support modal dialogs and barge-in. Users can cause disengagement explicitly. Disengagement can be conditional based on timeout, change of user, or environmental conditions. Engagement can be one-time or recurrent. Recurrent states can be attentive or locked. Locked states can be unconditional or conditional, including being reserved to support user continuity. User continuity can be tested by matching parameters or tracking user by many modalities including microphone arrays, cameras, and other sensors.
    Type: Grant
    Filed: January 26, 2018
    Date of Patent: February 15, 2022
    Assignee: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Scott Halstvedt, Keyvan Mohajer
  • Patent number: 11238885
    Abstract: A computer-implemented technique for animating a visual representation of a face based on spoken words of a speaker is described herein. A computing device receives an audio sequence comprising content features reflective of spoken words uttered by a speaker. The computing device generates latent content variables and latent style variables based upon the audio sequence. The latent content variables are used to synchronized movement of lips on the visual representation to the spoken words uttered by the speaker. The latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words and are used to synchronize movement of full facial features of the visual representation to the spoken words uttered by the speaker. The computing device causes the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.
    Type: Grant
    Filed: October 29, 2018
    Date of Patent: February 1, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Gaurav Mittal, Baoyuan Wang
  • Patent number: 11232530
    Abstract: Provided is an inspection assistance device which includes: a first acquisition unit that acquires first data which is image data used to acquire results of inspection on a to-be-inspected object; a second acquisition unit that acquires second data that is different in type from the first data and is used to acquire results of inspection on the to-be-inspected object; and a display control unit that causes a display unit to display a result of comparison between first inspection result information pertaining to the inspection results based on the acquired first data and second inspection result information pertaining to the inspection results based on the acquired second data in such a manner as to be superimposed on an image in which the inspected object is displayed.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: January 25, 2022
    Assignee: NEC CORPORATION
    Inventors: Takami Sato, Kota Iwamoto, Yoshinori Saida, Shin Norieda
  • Patent number: 11232789
    Abstract: The present invention keeps a dialogue continuing for a long time without causing uncomfortable feeling to a user. A dialogue system 10 includes at least an input part 1 that receives a user utterance, which is an utterance from a user and a presentation part 5 that presents the utterance. The input part 1 receives a user utterance performed by the user. A presentation part 5-1 presents a dialogue-establishing utterance which does not include any content words. A presentation part 5-2 presents a second utterance associated with a generation target utterance, which is one or more utterances performed before user utterance that includes at least the user utterance, after the dialogue-establishing utterance.
    Type: Grant
    Filed: May 19, 2017
    Date of Patent: January 25, 2022
    Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, OSAKA UNIVERSITY
    Inventors: Hiroaki Sugiyama, Toyomi Meguro, Junji Yamato, Yuichiro Yoshikawa, Hiroshi Ishiguro, Takamasa Iio, Tsunehiro Arimoto
  • Patent number: 11222650
    Abstract: A device and a method for generating synchronous corpus is disclosed. Firstly, script data and a dysarthria voice signal having a dysarthria consonant signal are received and the position of the dysarthria consonant signal is detected, wherein the script data have text corresponding to the dysarthria voice signal. Then, normal phoneme data corresponding to the text are searched and the text is converted into a normal voice signal based on the normal phoneme data corresponding to the text. The dysarthria consonant signal is replaced with the normal consonant signal based on the positions of the normal consonant signal and the dysarthria consonant signal, thereby synchronously converting the dysarthria voice signal into a synthesized voice signal. The synthesized voice signal and the dysarthria voice signal are provided to train a voice conversion model, retain the timbre of the dysarthria voices and improve the communication situations.
    Type: Grant
    Filed: March 18, 2020
    Date of Patent: January 11, 2022
    Assignee: NATIONAL CHUNG CHENG UNIVERSITY
    Inventors: Tay Jyi Lin, Ching Wei Yeh, Shun Pu Yang, Chen Zong Liao
  • Patent number: 11222622
    Abstract: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: January 11, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Emilian Stoimenov, Khuram Shahid, Guoli Ye, Hosam Adel Khalil, Yifan Gong
  • Patent number: 11216901
    Abstract: A system, a method, and computer-readable media for opportunistically authenticating a taxpayer. Specifically, embodiments of the invention leverage the fact that the user has possession of particular documents or access to certain information as evidence that the user is the person referred to in those documents or information. If the user provides sufficient evidence to authenticate themselves while providing the information required for the financial transaction, no separate authentication step may be required. At a high level, documents or other data imported in a first context (e.g., during the process of preparing a tax return for a user) are used as evidence of the user's authenticity in a second context.
    Type: Grant
    Filed: December 19, 2014
    Date of Patent: January 4, 2022
    Assignee: HRB Innovations, Inc.
    Inventor: Eric Roebuck
  • Patent number: 11211056
    Abstract: Systems and techniques for generating natural language understanding (NLU) models are described. A developer of an NLU model may provide data representing runtime NLU functionality. For example, a developer may provide one or more sample natural language user inputs. The NLU model generation system may expand data, provided by the developer, to result in a more robust NLU model for use at runtime. For example, the NLU model generation system may expand sample natural language user inputs, may translate sample natural language user inputs into other languages, etc. The present disclosure also provides a mechanism for transitioning between using NLU models of a first NLU model generation system and NLU models of a second NLU model generation system.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: December 28, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Anthony Bissell, Pragati Verma
  • Patent number: 11212244
    Abstract: A method for using an in-message application. The method includes: receiving a broadcast message; identifying, in the broadcast message, a reference to an external data provider; obtaining an identifier of the in-message application from the external data provider; using the identifier to identify a set of components of the in-message application, where placement of the set of components is defined by a visual structure of the in-message application, and where each of the set of components is a user interface (UI) element; associating data obtained from the external data provider with a component of the set of components; and serving the broadcast message and the data to a consumer client, where the consumer client renders the in-message application based on the visual structure.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: December 28, 2021
    Assignee: Twitter, Inc.
    Inventors: William Morgan, Jeremy Gordon, Grant Monroe, Buster Benson, Russell D'Sa, Adam Singer, Ian Chan, Brian Ellin, Reeve Thompson, Lucas Alonso-Martinez
  • Patent number: 11206182
    Abstract: A computing system including a processor; and a memory communicatively coupled to the processor. The processor is configured to: analyze input received through an input interface of a computing device; determine a context based on the input; and reconfigure the input interface to comprise a key based on a domain associated with the context.
    Type: Grant
    Filed: October 19, 2010
    Date of Patent: December 21, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Feng-wei Chen, Joseph B. Hall, Samuel R. McHan, Jr.
  • Patent number: 11205057
    Abstract: A cognitive communication assistant receives a message transmitted over a communication network from a sender to a recipient. A sender's industry identified with the sender and a recipient's industry identified with the recipient are determined. One or more terms associated with the sender's industry are extracted from the message. A definition associated with the one or more terms is searched for in an on-line reference text. The message is updated based on the definition. The message is transmitted over the communication network to the recipient.
    Type: Grant
    Filed: December 20, 2019
    Date of Patent: December 21, 2021
    Assignee: International Business Machines Corporation
    Inventors: Tara Astigarraga, Itzhack Goldberg, Jose R. Mosqueda Mejia, Daniel J. Winarski
  • Patent number: 11195516
    Abstract: A system that allows non-engineers administrators, without programming, machine language, or artificial intelligence system knowledge, to expand the capabilities of a dialogue system. The dialogue system may have a knowledge system, user interface, and learning model. A user interface allows non-engineers to utilize the knowledge system, defined by a small set of primitives and a simple language, to annotate a user utterance. The annotation may include selecting actions to take based on the utterance and subsequent actions and configuring associations. A dialogue state is continuously updated and provided to the user as the actions and associations take place. Rules are generated based on the actions, associations and dialogue state that allows for computing a wide range of results.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: December 7, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Percy Shuo Liang, David Leo Wright Hall, Joshua James Clausman
  • Patent number: 11195511
    Abstract: Described herein is a method for creating object-based audio content from a text input for use in audio books and/or audio play, the method including the steps of: a) receiving the text input; b) performing a semantic analysis of the received text input; c) synthesizing speech and effects based on one or more results of the semantic analysis to generate one or more audio objects; d) generating metadata for the one or more audio objects; and e) creating the object-based audio content including the one or more audio objects and the metadata. Described herein are further a computer-based system including one or more processors configured to perform said method and a computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.
    Type: Grant
    Filed: July 17, 2019
    Date of Patent: December 7, 2021
    Assignees: Dolby Laboratories Licensing Corporation, Dolby International AB
    Inventors: Toni Hirvonen, Daniel Arteaga, Eduard Aylon Pla, Alex Cabrer Manning, Lie Lu, Karl Jonas Roeden
  • Patent number: 11195513
    Abstract: A technique for estimating phonemes for a word written in a different language is disclosed. A sequence of graphemes of a given word in a source language is received. The sequence of the graphemes in the source language is converted into a sequence of phonemes in the source language. One or more sequences of phonemes in a target language are generated from the sequence of the phonemes in the source language by using a neural network model. One sequence of phonemes in the target language is determined for the given word. Also, technique for estimating graphemes of a word from phonemes in a different language is disclosed.
    Type: Grant
    Filed: September 27, 2017
    Date of Patent: December 7, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Toru Nagano, Yuta Tsuboi
  • Patent number: 11190855
    Abstract: A system and method are provided for generating a descriptive video service track for a video asset. Different scenes and/or scene transitions are detected in a predetermined version of the video asset via automated media analysis. Gaps in dialogue are detected in the at least one scene via automated media analysis. Objects appearing in the at least one scene are recognized via automated media analysis, and text descriptive of at least one of the objects appearing in the at least one scene is automatically generated. An audio file of the text descriptive of the at least one of the objects appearing in the at least one scene of the predetermined version of the video asset is generated and used as part of a descriptive video service track for the video asset.
    Type: Grant
    Filed: August 30, 2017
    Date of Patent: November 30, 2021
    Assignee: ARRIS Enterprises LLC
    Inventor: Michael R. Kahn
  • Patent number: 11184492
    Abstract: An apparatus includes: an operation panel that displays an operation screen and accepts an operation from a user; a hardware processor that accepts an operation from a user by voice, turns off the operation panel in a case where an interval between operations received by the operation panel exceeds a first set time, and resets a setting content stored in the storage in a case where an interval between operations received by the operation panel or the hardware processor exceeds a second set time; a storage that stores a setting content corresponding to an operation received by the operation panel or the hardware processor; and a changer that changes a set time of a timer.
    Type: Grant
    Filed: April 27, 2020
    Date of Patent: November 23, 2021
    Assignee: KONICA MINOLTA, INC.
    Inventor: Takeo Katsuda
  • Patent number: 11183170
    Abstract: The present technology relates to an interaction control apparatus and a method that enable more appropriate interaction control to be performed. The interaction control apparatus includes an interaction progress controller that causes an utterance to be made in one or a plurality of understanding action request positions on the basis of utterance text that has been divided in the one or the plurality of understanding action request positions, the utterance inducing a user to perform an understanding action, and that controls a next utterance on the basis of a result of detecting the understanding action and the utterance text. The present technology is applicable to a speech interaction system.
    Type: Grant
    Filed: August 3, 2017
    Date of Patent: November 23, 2021
    Assignee: SONY CORPORATION
    Inventors: Hiro Iwase, Mari Saito, Shinichi Kawano
  • Patent number: 11183169
    Abstract: A technique to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation is disclosed. The present invention efficiently preserves the speaker identity and improves sound quality by incorporating speaker-independent natural singing information into TTS-based Speech-to-Singing (STS). The Template-based Text-to-Singing (TTTS) system merges qualities of a singing voice generated from a TTS system with qualities of a singing voice generated from an actual voice singing the song. The qualities are represented in terms of Mel-generalized cepstrum (MGC) coefficients. In particular, low-order MGC coefficients from the TTS-based singing voice with high-order MGC coefficients from the voice of an actual singer.
    Type: Grant
    Filed: November 8, 2019
    Date of Patent: November 23, 2021
    Assignee: OBEN, INC.
    Inventors: Kantapon Kaewtip, Fernando Villavicencio
  • Patent number: 11170757
    Abstract: Systems and methods for sending text messages in audio form over voice calls. When a user receives an incoming voice call, the system can enable a user to type a “text” message to the caller. Rather than being sent as a text message, however, the system can send the text message directly to the microphone of the user's equipment (UE) as a voice synthesized audio file, or text-to microphone (TTM) message. The audio file is then sent from the user's UE to the caller's UE, in effect “reading” the text message to the caller. The caller hears the contents of the message, in the form of a voice synthesized audio file over the speaker of the caller's UE. The system can mute the microphones on one or both UEs during the TTM process to create a virtually silent process from the user's standpoint.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: November 9, 2021
    Assignee: T-Mobile USA, Inc.
    Inventor: Hsin-Fu Henry Chiang
  • Patent number: 11170585
    Abstract: A system and method of performing fault diagnosis and analysis for one or more vehicles. The method includes: obtaining design failure mode and effect analysis (DFMEA) data that specifies a plurality of failure modes; receiving diagnostic association data; receiving vehicle operation signals association data; generating augmented DFMEA data that indicates a causal relationship between the diagnostic data and the first set of failure modes, and that indicates a causal relationship between the vehicle operation signals data and the second set of failure modes, wherein the augmented DFMEA data is generated based on the DFMEA data, the diagnostic association data, and the vehicle operation signals association data; and performing fault diagnosis and analysis for the one or more vehicles using the augmented DFMEA data.
    Type: Grant
    Filed: June 17, 2019
    Date of Patent: November 9, 2021
    Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Chaitanya Sankavaram, Dnyanesh G. Rajpathak, Azeem Sarwar, Xiangxing Lu, Dean G. Sorrell, Layne K. Wiggins
  • Patent number: 11170755
    Abstract: The present disclosure relates to a speech synthesis apparatus and method that can remove discontinuity between phoneme units when generating a synthesized sound from the phoneme units, thereby implementing natural utterances and producing a high-quality synthesized sound having stable prosody.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: November 9, 2021
    Assignee: SK TELECOM CO., LTD.
    Inventors: Changheon Lee, Jongjin Kim, Jihoon Park
  • Patent number: 11170051
    Abstract: An apparatus generates property data with a first context relation set between text display areas contained in a display image of a first webpage, and generates, based on the property data, dialog control data with a second context relation set between pieces of text extracted from structural elements of text display areas contained in a second webpage.
    Type: Grant
    Filed: August 28, 2018
    Date of Patent: November 9, 2021
    Assignee: FUJITSU LIMITED
    Inventors: Takumi Baba, Takashi Imai, Kei Taira, Miwa Okabayashi, Tatsuro Matsumoto
  • Patent number: 11159261
    Abstract: A server system accesses a profile of a user of the media-providing service. The profile indicates a demographic group of the user. For each track of a plurality of tracks, the server system determines a year associated with the track. The server system selects content for the user based at least in part on an affinity of members of the demographic group, as compared to members of other demographic groups, of music from the year associated with the track. The server system provides the selected content to a client device associated with the user.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: October 26, 2021
    Assignee: Spotify AB
    Inventors: Clay Gibson, Santiago Gil, Ian Anderson, Oguz Semerci, Scott Wolf, Margreth Mpossi
  • Patent number: 11157075
    Abstract: Systems and methods for providing gaze-activated voice services for interactive workspaces. In some embodiments, an Information Handling System (IHS), may include a processor and a memory coupled to the processor, the memory having program instructions stored thereon that, upon execution, cause the IHS to: transmit a voice command to a voice service provider; receive a textual instruction in response to the voice command; identify a gaze focus of the user; and execute the textual instruction using the gaze focus.
    Type: Grant
    Filed: May 1, 2018
    Date of Patent: October 26, 2021
    Assignee: Dell Products, L.P.
    Inventors: Tyler Ryan Cox, Todd Erick Swierk, Marc Randall Hammons
  • Patent number: 11145289
    Abstract: A system and method for providing an audible explanation of documents upon request is disclosed. The system and method use an intelligent voice assistant that can receive audible requests for document explanations. The intelligent voice assistant can retrieve document summary information and provide an audible response explaining key points of the document.
    Type: Grant
    Filed: May 7, 2019
    Date of Patent: October 12, 2021
    Assignee: United Services Automobile Association (USAA)
    Inventors: Richard Daniel Graham, Ruthie D. Lyle
  • Patent number: 11145288
    Abstract: A computing system and related techniques for selecting content to be automatically converted to speech and provided as an audio signal are provided. A text-to-speech request associated with a first document can be received that includes data associated with a playback position of a selector associated with a text-to-speech interface overlaid on the first document. First content associated with the first document can be determined based at least in part on the playback position, the first content including content that is displayed in the user interface at the playback position. The first document can be analyzed to identify one or more structural features associated with the first content. Speech data can be generated based on the first content and the one or more structural features.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: October 12, 2021
    Assignee: Google LLC
    Inventors: Benedict Davies, Guillaume Boniface, Jack Whyte, Jakub Adamek, Simon Tokumine, Alessio Macri, Matthias Quasthoff
  • Patent number: 11138965
    Abstract: A technique for estimating phonemes for a word written in a different language is disclosed. A sequence of graphemes of a given word in a source language is received. The sequence of the graphemes in the source language is converted into a sequence of phonemes in the source language. One or more sequences of phonemes in a target language are generated from the sequence of the phonemes in the source language by using a neural network model. One sequence of phonemes in the target language is determined for the given word. Also, technique for estimating graphemes of a word from phonemes in a different language is disclosed.
    Type: Grant
    Filed: November 2, 2017
    Date of Patent: October 5, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Toru Nagano, Yuta Tsuboi
  • Patent number: 11138963
    Abstract: A processor-implemented text-to-speech method includes determining, using a sub-encoder, a first feature vector indicating an utterance characteristic of a speaker from feature vectors of a plurality of frames extracted from a partial section of a first speech signal of the speaker, and determining, using an autoregressive decoder, into which the first feature vector is input as an initial value, from context information of the text, a second feature vector of a second speech signal in which a text is uttered according to the utterance characteristic.
    Type: Grant
    Filed: May 7, 2019
    Date of Patent: October 5, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Hoshik Lee
  • Patent number: 11128591
    Abstract: In one example, a trigger is obtained for a dynamic ideogram to dynamically interact with the electronic messaging environment. In response to the trigger, it is determined how the dynamic ideogram is to dynamically interact with the electronic messaging environment including performing an analysis of the electronic messaging environment. Based on the analysis of the electronic messaging environment, instructions to render the dynamic ideogram to dynamically interact with the electronic messaging environment are generated for a first user device configured to communicate with a second user device via the electronic messaging environment.
    Type: Grant
    Filed: August 27, 2020
    Date of Patent: September 21, 2021
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Christopher Deering, Colin Olivier Louis Vidal, Jimmy Coyne
  • Patent number: 11106314
    Abstract: Visual images projected on a projection surface by a projector provide an interactive user interface having end user inputs detected by a detection device, such as a depth camera. The detection device monitors projected images initiated in response to user inputs to determine calibration deviations, such as by comparing the distance between where a user makes an input and where the input is projected. Calibration is performed to align the projected outputs and detected inputs. The calibration may include a coordinate system anchored by its origin to a physical reference point of the projection surface, such as a display mat or desktop edge.
    Type: Grant
    Filed: April 21, 2015
    Date of Patent: August 31, 2021
    Assignee: DELL PRODUCTS L.P.
    Inventors: Karthik Krishnakumar, Michiel Sebastiaan Emanuel Petrus Knoppert, Rocco Ancona, Abu S. Sanaullah, Mark R. Ligameri
  • Patent number: 11107458
    Abstract: An example embodiment may involve receiving, from a client device, a selection of text-based articles from newsfeeds. The selection may specify that the text-based articles have been flagged for audible playout. The example embodiment may also involve, possibly in response to receiving the selection of the text-based articles, retrieving text-based articles from the newsfeeds. The example embodiment may also involve causing the text-based articles to be converted into audio files. The example embodiment may also involve receiving a request to stream the audio files to the client device or another device. The example embodiment may also involve causing the audio files to be streamed to the client device or the other device.
    Type: Grant
    Filed: December 30, 2019
    Date of Patent: August 31, 2021
    Assignee: Gracenote Digital Ventures, LLC
    Inventor: Venkatarama Anilkumar Panguluri
  • Patent number: 11108721
    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for communication using multiple media content items stored on both a sending device and a receiving device. In particular, in one or more embodiments, the disclosed systems receive an application package. The application generates a message from input text and matches a portion of the text input to an audio content item using mapping data. The application generates a message including the text input and an identifier to the audio content item. A receiving system receives an application package. The application receives the message and locates the audio content item on the application package using the identifier and presents the message, including the text and the audio content item.
    Type: Grant
    Filed: April 21, 2020
    Date of Patent: August 31, 2021
    Inventors: David Roberts, Glenn Sugden
  • Patent number: 11107457
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: August 31, 2021
    Assignee: Google LLC
    Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
  • Patent number: 11093691
    Abstract: A system and method of establishing a communication session is disclosed herein. A computing system receives, from a client device, a content item comprising text-based content. The computing system generates a mark-up version of the content item by identifying one or more characters in the text-based content and a relative location of the one or more characters in the content item. The computing system receives, from the client device, an interrogatory related to the content item. The computing system analyzes the mark-up version of the content item to identify an answer to the interrogatory. The computing system generates a response message comprising the identified answer to the interrogatory. The computing system transmits the response message to the client device.
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: August 17, 2021
    Assignee: Capital One Services, LLC
    Inventors: Michael Mossoba, Abdelkader M'Hamed Benkreira, Joshua Edwards
  • Patent number: 11087379
    Abstract: A user registers for an account with an account management system, configures account settings to permit the account management system to receive user computing device data from a user computing device associated with the user, and logs into the account via the user computing device. The account management system receives a user voice purchase command and determines a purchase command context based on the received user computing device data. The account management system identifies a product that the user desires to purchase based on the purchase command context and directs the user computing device web browser to a merchant website to set up a transaction for the identified product.
    Type: Grant
    Filed: February 12, 2015
    Date of Patent: August 10, 2021
    Assignee: GOOGLE LLC
    Inventors: Filip Verley, IV, Stuart Ross Hobbie
  • Patent number: 11087091
    Abstract: Disclosed herein is a method and response generation system for providing contextual responses to user interaction. In an embodiment, input data related to user interaction, which may be received from a plurality of input channels in real-time, may be processed using processing models corresponding to each of the input channels for extracting interaction parameters. Thereafter, the interaction parameters may be combined for computing a contextual variable, which in turn may be analyzed to determine a context of the user interaction. Finally, responses corresponding to the context of the user interaction may be generated and provided to the user for completing the user interaction. In some embodiments, the method of present disclosure accurately detects context of the user interaction and provides meaningful contextual responses to the user interaction.
    Type: Grant
    Filed: February 19, 2019
    Date of Patent: August 10, 2021
    Assignee: Wipro Limited
    Inventors: Gopichand Agnihotram, Rajesh Kumar, Pandurang Naik
  • Patent number: 11080474
    Abstract: Described herein is a system and method for associating audio files with one or more cells in a spreadsheet application. As described, one or more audio files may be associated with a single cell in a spreadsheet application or it may be associated with a range of cells in the spreadsheet application. Information about the audio file, such playback properties and other parameters, may be retrieved from the audio file. Once retrieved, a calculation engine of the spreadsheet application may perform one or more calculations on the information in order to change the content of audio file, the playback of the audio files and so on.
    Type: Grant
    Filed: November 1, 2016
    Date of Patent: August 3, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Samuel C. Radakovitz, Christian M. Canton, Carlos A. Ortero, John Campbell, Allison Rutherford, Benjamin E. Rampson
  • Patent number: 11081111
    Abstract: Methods, systems, and related products that provide emotion-sensitive responses to user's commands and other utterances received at an utterance-based user interface. Acknowledgements of user's utterances are adapted to the user and/or the user device, and emotions detected in the user's utterance that have been mapped from one or more emotion features extracted from the utterance. In some examples, extraction of a user's changing emotion during a sequence of interactions is used to generate a response to a user's uttered command. In some examples, emotion processing and command processing of natural utterances are performed asynchronously.
    Type: Grant
    Filed: March 17, 2020
    Date of Patent: August 3, 2021
    Assignee: Spotify AB
    Inventors: Daniel Bromand, David Gustafsson, Richard Mitic, Sarah Mennicken
  • Patent number: 11082559
    Abstract: A virtual assistant server receives a web request such as a HTTP request with one or more call parameters corresponding to a call redirected from an interactive voice response server. The virtual assistant server inputs the received one or more call parameters to a predictive model, which identifies, based on the one or more call parameters, an intelligent communication mode to route the redirected call to. Subsequently, the virtual assistant server routes the redirected call to the intelligent communication mode.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: August 3, 2021
    Assignee: KORE.AI, INC.
    Inventors: Rajkumar Koneru, Prasanna Kumar Arikala Gunalan, Rajavardhan Nalluri
  • Patent number: 11074904
    Abstract: A speech synthesis method and apparatus based on emotion information are disclosed. A speech synthesis method based on emotion information extracts speech synthesis target text from received data and determines whether the received data includes situation explanation information. First metadata corresponding to first emotion information is generated on the basis of the situation explanation information. When the extracted data does not include situation explanation information, second metadata corresponding to second emotion information generated on the basis of semantic analysis and context analysis is generated. One of the first metadata and the second metadata is added to the speech synthesis target text to synthesize speech corresponding to the extracted data.
    Type: Grant
    Filed: October 4, 2019
    Date of Patent: July 27, 2021
    Assignee: LG Electronics Inc.
    Inventors: Siyoung Yang, Minook Kim, Sangki Kim, Yongchul Park, Juyeong Jang, Sungmin Han
  • Patent number: 11074907
    Abstract: Techniques for generating a prompt coverage score, which measures an extent to which data output to a user during a dialog is repetitive and monotonous, are described. User input data and system output, corresponding to a dialog exchange between a user and a skill, may be determined. A portion of the system output data, corresponding to a system prompt representing default output data, may be determined. A first number, representing possible variants of the prompt, may be determined along with a second number, representing variants of the prompt output during the dialog exchange. A prompt coverage score may be determined based on the first and second numbers.
    Type: Grant
    Filed: May 29, 2019
    Date of Patent: July 27, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ravi Chikkanayakanahalli Mallikarjuniah, Priya Rao Chagaleti, Shiladitya Roy, Christopher Forbes Will, Cole Ira Brendel, Wei Huang, Sarthak Anand
  • Patent number: 11068526
    Abstract: Methods, systems, and computer program products are provided for obtaining enhanced metadata for media content searches. In one embodiment, computer program logic embodies a metadata receiver and a media content metadata matcher and combiner. The metadata receiver receives program metadata for a plurality of programs from a plurality of metadata sources. The media content metadata matcher and combiner is configured to perform a matching process whereby metadata associated with each of the plurality of programs is compared to metadata of each of the other plurality of programs to determine if the compared programs are the same program and if so, to combine the metadata from each program into a single program including enhanced metadata and store such in a database. A subsequent search for a program corresponding to the stored program returns at least some of the metadata associated with the program, and that enables accessing the program.
    Type: Grant
    Filed: January 25, 2019
    Date of Patent: July 20, 2021
    Assignee: Caavo Inc
    Inventors: Amrit P. Singh, Sravan K. Andavarapu, Jayanth Manklu, Anu Godara, Vinu Joseph, Vinod K. Gopinath, Ashish D. Aggarwal
  • Patent number: 11064000
    Abstract: Techniques and systems are described for accessible audio switching options during the online conference. For example, a conferencing system receives presentation content and audio content as part of the online conference from a client device. The conferencing system generates voice-over content from the presentation content by converting text of the presentation content to audio. The conferencing system then divides the presentation content into presentation segments. The conferencing system also divides the audio content into audio segments that correspond to respective presentation segments, and the voice-over content into voice-over segments that correspond to respective presentation segments. As the online conference is output, the conferencing system enables switching between a corresponding audio segment and voice-over segment during output of a respective presentation segment.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: July 13, 2021
    Assignee: Adobe Inc.
    Inventors: Ajay Jain, Sachin Soni, Amit Srivastava
  • Patent number: 11062694
    Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: July 13, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Marco Nicolis, Adam Franciszek Nadolski