Synthesis Patents (Class 704/258)
  • Patent number: 11487821
    Abstract: In some embodiments, methods and systems are provided for processing information requests of workers at a retail facility and retrieving information associated with the retail facility based on the information requests. An electronic device permits a worker at the retail facility to input an information request in association with at least one worker at the retail facility or at least one product at the retail facility. A computing device receives, from the electronic device, electronic data representative of a scope of the information request, analyzes this electronic data to determine the scope of the information request, obtain relevant information from one or more databases, and transmits the obtained information to the electronic device, which in turn outputs the information to the worker.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: November 1, 2022
    Assignee: Walmart Apollo, LLC
    Inventors: William Craig Robinson, Jr., Dong T. Nguyen, Mekhala M. Vithala, Makeshwaran Sampath, Spencer S. Seeger, Bahula Bosetti, Praneeth Gubbala, Songshan Li, Santosh Kumar Kurashetty, Srihari Attuluri, Venkata Maguluri, Lindsay S. Leftwich, George E. Loring
  • Patent number: 11490229
    Abstract: Various embodiments generally relate to systems and methods for creation of voice memos while an electronic device is in a driving mode. In some embodiments, a triggering event can be used to indicate that the electronic device is within a car or about to be within a car and that text communications should be translated (e.g., via an application or a conversion platform) into a voice memo that can be played via a speaker. These triggering events can include a manual selection or an automatic selection based on a set of transition criteria (e.g., electronic device moving above a certain speed, following a roadway, approaching a location in a map of a marked car, etc.).
    Type: Grant
    Filed: June 5, 2020
    Date of Patent: November 1, 2022
    Assignee: T-Mobile USA, Inc.
    Inventor: Niraj Nayak
  • Patent number: 11482207
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: October 25, 2022
    Assignee: Baidu USA LLC
    Inventors: Wei Ping, Kainan Peng, Jitong Chen
  • Patent number: 11478710
    Abstract: A computer functions as: an individual performance data receiver that receives a plurality of pieces of individual performance data from two or more performer user terminals connected via a network, each of the pieces of individual performance data being generated in accordance with a performance operation by a user on the corresponding performer user terminal and transmitted during a performance; a synthetic performance data generator that generates synthetic performance data by synthesizing pieces of individual performance data in an identical time slot, of the plurality of pieces of individual performance data received from the performer user terminals, during the performance, the synthetic performance data being used to reproduce sound in which individual performance contents on the two or more performer user terminals are mixed; and a synthetic performance data transmitter that transmits the synthetic performance data to at least one appreciator user terminal connected via the network during the performa
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: October 25, 2022
    Assignee: SQUARE ENIX CO., LTD.
    Inventors: Kiyotaka Akaza, Yosuke Hayashi, Kei Odagiri
  • Patent number: 11455992
    Abstract: According to an embodiment, an electronic device includes a user interface, a memory, and a processor. The memory is configured to store a main database including at least one input data or an auxiliary database including at least one category including pieces of response data. The processor is configured to receive a user input, using the user interface, and extract first information from the user input. The processor is also configured to identify whether a portion of the input data of the main database matches the first information. The processor is further configured to identify a category of the user input in response to identifying that the input data of the main database does not match the first information. Additionally, the processor is configured to identify the auxiliary database corresponding to the category, and provide first response data based on an event that the auxiliary database is identified.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: September 27, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sunbeom Kwon, Sunok Kim, Hyelim Woo, Jeehun Ha
  • Patent number: 11443732
    Abstract: A speech synthesizer includes a memory configured to store a plurality of sentences and prior information of a word classified into a minor class among a plurality of classes with respect to each sentence, and a processor configured to determine an oversampling rate of the word based on the prior information, determine the number of times of oversampling of the word using the determined oversampling rate and generate sentences including the word by the determined number of times of oversampling. The plurality of classes includes a first class corresponding to first reading break, a second class corresponding to second reading break greater than the first break and a third class corresponding to third reading break greater than the second break, and the minor class has a smallest count among the first to third classes in one sentence.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: September 13, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Jonghoon Chae, Sungmin Han
  • Patent number: 11437037
    Abstract: Aspects of the disclosure relate to using machine learning to simulate an interactive voice response system. A computing platform may receive user interaction information corresponding to interactions between a user and enterprise computing devices. Based on the user interaction information, the computing platform may identify predicted intents for the user, and may generate hotkey information based on the predicted intents. The computing platform may send the hotkey information and commands directing the mobile device to output the hotkey information. The computing platform may receive hotkey input information from the mobile device. Based on the hotkey input information, the computing platform may generate a hotkey response message. The computing platform may send, to the mobile device, the hotkey response message and commands directing the mobile device to convert the hotkey response message to an audio output and to output the audio output.
    Type: Grant
    Filed: July 20, 2020
    Date of Patent: September 6, 2022
    Assignee: Bank of America Corporation
    Inventors: Srinivas Dundigalla, Saurabh Mehta, Pavan K. Chayanam
  • Patent number: 11430423
    Abstract: A method for automatically translating raw data into real human voiced audio content is provided according to an embodiment of the present disclosure. The method may comprise ingesting data, separating the data into or associating the data with a data type, and creating a list of descriptive data associated with the data type. In some embodiments, the method further comprises compiling audio phrases types associated with the descriptive data, associating a pre-recorded audio file with each audio phrase, and merging a plurality of pre-recorded audio files to create a final audio file.
    Type: Grant
    Filed: April 17, 2019
    Date of Patent: August 30, 2022
    Assignee: Weatherology, LLC
    Inventor: Derek Christopher Heit
  • Patent number: 11430427
    Abstract: This application can provide a method and electronic device for separating mixed sound signal. The method includes: obtaining a first hidden variable representing a human voice feature and a second hidden variable representing an accompaniment sound feature by inputting feature data of a mixed sound extracted from a mixed sound signal into a coding model for the mixed sound; obtaining first feature data of a human voice and second feature data of an accompaniment sound by inputting the first hidden variable and the second hidden variable into a first decoding model for the human voice and a second decoding model for the accompaniment sound respectively; and obtaining, based on the first feature data and the second feature data, the human voice and the accompaniment sound.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: August 30, 2022
    Assignee: Beijing Dajia Internet Information Technology Co., Ltd.
    Inventors: Ning Zhang, Yan Li, Tao Jiang
  • Patent number: 11417345
    Abstract: An encoding apparatus performing encoding by an encoding process in which bits are preferentially assigned to a low side to obtain a spectrum code, the encoding apparatus judging whether a sound signal is a hissing sound or not, obtaining and encoding, if the encoding apparatus judges that the sound signal is a hissing sound, what is obtained by exchanging all or a part of a spectrum existing on a lower side than a predetermined frequency in a frequency spectrum sequence of the sound signal for all or a part of a spectrum existing on a higher side of the predetermined frequency in the frequency spectrum sequence, the number of all or the part of the high-side frequency spectrum sequence being the same as the number of all or the part of the low-side frequency spectrum sequence.
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: August 16, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ryosuke Sugiura, Yutaka Kamamoto, Takehiro Moriya
  • Patent number: 11417194
    Abstract: A hand washing supporting device according to embodiments is a hand washing supporting device including a processor, in which the processor is configured to perform a detection process that detects, upon detecting the motion of the hand by the user for a predetermined period of time by the detection process, a motion of a hand by a user, a determination process that determines a start of hand washing by the user, and a notification process that notifies, in accordance with a determination of the start of the hand washing by the determination process, the user of a difference between time required from the start to an end of the hand washing by the user and the predetermined period of time.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: August 16, 2022
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Atsuhiko Maeda, Midori Kodama, Motonori Nakamura, Ippei Shake
  • Patent number: 11399229
    Abstract: Methods, systems, computer-readable media, and apparatuses for audio signal processing are presented. Some configurations include determining that first audio activity in at least one microphone signal is voice activity; determining whether the voice activity is voice activity of a participant in an application session active on a device; based at least on a result of the determining whether the voice activity is voice activity of a participant in the application session, generating an antinoise signal to cancel the first audio activity; and by a loudspeaker, producing an acoustic signal that is based on the antinoise signal. Applications relating to shared virtual spaces are described.
    Type: Grant
    Filed: July 9, 2020
    Date of Patent: July 26, 2022
    Assignee: QUALCOMM Incorporated
    Inventors: Robert Tartz, Scott Beith, Mehrad Tavakoli, Gerhard Reitmayr
  • Patent number: 11368799
    Abstract: Methods and systems for customizing a hearing device. The disclosed methods involve receiving an audio sample associated with a target entity, calculating at least one acoustic parameter from the audio sample, generating an audio stimulus using the at least one calculated acoustic parameter, presenting the audio stimulus to a user, receiving a response to the audio stimulus, and adjusting the hearing device based on an optimal parameter.
    Type: Grant
    Filed: February 4, 2021
    Date of Patent: June 21, 2022
    Assignee: Securboration, Inc.
    Inventor: Lee Krause
  • Patent number: 11335326
    Abstract: A method is performed at a server system of a media-providing service. The server system has one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a text sentence including a plurality of words from a device of a first user and extracting a plurality of audio snippets from one or more audio tracks. A respective audio snippet in the plurality of audio snippets corresponds to one or more words in the plurality of words of the text sentence. The method also includes assembling the plurality of audio snippets in a first order to produce an audible version of the text sentence. The method further includes providing, for playback at the device of the first user, the audible version of the text sentence including the plurality of audio snippets in the first order.
    Type: Grant
    Filed: May 14, 2020
    Date of Patent: May 17, 2022
    Assignee: Spotify AB
    Inventors: Anders Erik Jonatan Liljestrand, Bo Andreas Romin
  • Patent number: 11335325
    Abstract: An electronic device and a controlling method of the electronic device are provided. The electronic device acquires text to respond on a received user's speech, acquires a plurality of pieces of parameter information for determining a style of an output speech corresponding to the text based on information on a type of a plurality of text-to-speech (TTS) databases and the received user's speech, identifies a TTS database corresponding to the plurality of pieces of parameter information among the plurality of TTS databases, identifies a weight set corresponding to the plurality of pieces of parameter information among a plurality of weight sets acquired through a trained artificial intelligence model, adjusts information on the output speech stored in the TTS database based on the weight set, synthesizes the output speech based on the adjusted information on the output speech, and outputs the output speech corresponding to the text.
    Type: Grant
    Filed: January 22, 2020
    Date of Patent: May 17, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hosang Sung, Seonho Hwang, Doohwa Hong, Eunmi Oh, Kyoungbo Min, Jonghoon Jeong, Kihyun Choo
  • Patent number: 11330320
    Abstract: Provided are a display device and a screen display method for of the display device. In detail, disclosed are a display device capable of controlling a screen thereof through voice recognition, and a screen display method of the display device. Some of the disclosed embodiments provide a display device for displaying, on the screen, a recommendation guide corresponding to the voice recognition result, and a screen display method of the display device.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: May 10, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Ji-won Yoo
  • Patent number: 11323800
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for analyzing ultrasonic signals to recognize mouthed articulations. One of the methods includes generating an ultrasonic carrier signal and coupling the ultrasonic carrier signal to a person's vocal tract. The method includes detecting a modulated ultrasonic signal, the modulated ultrasonic signal corresponding to the ultrasonic carrier signal modulated by the person's vocal tract to include information about articulations mouthed by the person; analyzing, using a data processing apparatus, the modulated ultrasonic signal to recognize the articulations mouthed by the person from the information in the modulated ultrasonic signal; and generating, using the data processing apparatus, an output in response to the recognized articulations.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: May 3, 2022
    Assignee: X Development LLC
    Inventor: Thomas Peter Hunt
  • Patent number: 11301645
    Abstract: A language translation assembly includes a housing that is wearable on a user's ear. A control circuit is positioned within the housing and the control circuit stores a language translation program. A retainer is coupled to the housing and the retainer is positionable around the user's ear for retaining the housing on the user's ear. A microphone is coupled to the housing to sense audile sounds. A speaker is coupled to the housing to emit words translated into the native language of the user when the microphone senses spoken words in a non-native language with respect to the user. The operational software selects an appropriate response in the user's native language from the language database. Additionally, the speaker emits the appropriate response to instruct the user to speak in the non-native language.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: April 12, 2022
    Inventor: Aziza Foster
  • Patent number: 11302361
    Abstract: An apparatus for video searching, includes a memory storing instructions, and a processor configured to execute the instructions to split a video into scenes, obtain, from the scenes into which the video is split, one or more textual descriptors describing each of the scenes, encode the obtained one or more textual descriptors describing each of the scenes into a video scene vector of each of the scenes, encode a user query into a query vector having a same semantic representation as that of the video scene vector of each of the scenes into which the one or more textual descriptors describing each of the scenes are encoded, and identify whether the video scene vector of at least one among the scenes corresponds to the query vector into which the user query is encoded.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: April 12, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Caleb Phillips, Iqbal Mohomed, Afsaneh Fazly, Allan Jepson
  • Patent number: 11301466
    Abstract: A non-transitory computer-readable recording medium records an output control program for causing a computer to execute processing of: in a case where input of a question is accepted, extracting an accuracy of each of one or a plurality of answers to the question, the accuracy being stored in a storage unit; and selecting an answer to be output from the one or plurality of answers so that a total value of the accuracy of the one or plurality of answers to the question is equal to or larger than a first threshold.
    Type: Grant
    Filed: June 15, 2020
    Date of Patent: April 12, 2022
    Assignee: FUJITSU LIMITED
    Inventors: Yu Tomita, Masahiro Koya, Taki Kono, Hiroyuki Kashiwagi
  • Patent number: 11282497
    Abstract: Embodiments are disclosed for a method for dynamic text reading. The method includes performing pre-processing for a text document. Pre-processing includes determining the text document comprises an emotional statement based on an indicator of an emotion associated with the emotional statement. Pre-processing also includes identifying a speaker of the emotional statement. Further, pre-processing includes generating a role-to-voice map that associates the speaker with a digital representation of a voice for the speaker. The method additionally includes generating, based on the pre-processing, the voice for the speaker reading aloud a text of the text document using the digital representation of the voice with a tonal modulation that conveys the emotion.
    Type: Grant
    Filed: November 12, 2019
    Date of Patent: March 22, 2022
    Assignee: International Business Machines Corporation
    Inventors: Der-Joung Wang, David Shao Chung Chen, An-Ting Tsai, Peng Chen, Chao Yuan Huang
  • Patent number: 11257480
    Abstract: A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: February 22, 2022
    Assignee: TENCENT AMERICA LLC
    Inventors: Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
  • Patent number: 11258818
    Abstract: Methods and systems for generating stateful attacks for simulating and testing security infrastructure readiness. Attack templates descriptive of a plurality of attacks to be executed against one or more targets are defined. The attack templates are processed to compile a decision tree by traversing through a list of attack templates to create a logical tree with tree branches representing different execution paths through which attacks may be executed against the targets. During attack simulations and/or testing, single and/or multi-stage attacks are executed against targets, wherein attack sequences are dynamically determined using the execution paths in the decision tree in view of real-time results. The attacks may be executed against various types of targets, including target in existing security infrastructures and simulated targets. Moreover, the attacks may originate from computer systems within security infrastructures or remotely using computer systems external to the security infrastructures.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: February 22, 2022
    Assignee: IronSDN Corp.
    Inventor: Vimal Vaidya
  • Patent number: 11252523
    Abstract: A multi-channel decorrelator for providing a plurality of decorrelated signals on the basis of a plurality of decorrelator input signals is configured to premix a first set of N decorrelator input signals into a second set of K decorrelator input signals, wherein K<N. The multi-channel decorrelator is configured to provide a first set of K? decorrelator output signals on the basis of the second set of K decorrelator input signals. The multi-channel decorrelator is further configured to upmix the first set of K? decorrelator output signals into a second set of N? decorrelator output signals, wherein N?>K?. The multi-channel decorrelator can be used in a multi-channel audio decoder. A multi-channel audio encoder provides complexity control information for the multi-channel decorrelator.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: February 15, 2022
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Sascha Disch, Harald Fuchs, Oliver Hellmuth, Juergen Herre, Adrian Murtaza, Jouni Paulus, Falko Ridderbusch, Leon Terentiv
  • Patent number: 11238843
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: February 1, 2022
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
  • Patent number: 11227578
    Abstract: A speech synthesizer using artificial intelligence includes a memory configured to store a first ratio of a word classified into a minor class among a plurality of classes and a synthesized speech model, and a processor configured to determine a class classification probability set of the word using the word, the first ratio and the synthesized speech model. The first ratio indicates a ratio in which the word is classified into the minor class within a plurality of characters, the plurality of classes includes a first class corresponding to first reading break, a second class corresponding to second reading break greater than the first break and a third class corresponding to third reading break greater than the second break, and the minor class has a smallest count among the first to third classes.
    Type: Grant
    Filed: May 15, 2019
    Date of Patent: January 18, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Jonghoon Chae, Sungmin Han
  • Patent number: 11210470
    Abstract: Methods and systems are provided for identifying subparts of a text. A neural network system can receive a set of sentences that includes context sentences and target sentences that indicate a decision point in a text. The neural network system can generate context vector sentences and target sentence vectors by encoding context from the set of sentences. These context sentence vectors can be weighted to focus on relevant information. The weighted context sentence vectors and the target sentence vectors can then be used to output a label for the decision point in the text.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: December 28, 2021
    Assignee: Adobe Inc.
    Inventors: Seokhwan Kim, Walter W. Chang, Nedim Lipka, Franck Dernoncourt, Chan Young Park
  • Patent number: 11205439
    Abstract: A method includes obtaining first audio data corresponding to speech occurring within a communication area. The first audio data is obtained from one or more interior locations inside the communication area. The method includes obtaining second audio data corresponding to the speech. The second audio data is obtained from one or more exterior locations outside of the communication area. The method includes calculating a first intelligibility based on the first audio data and calculating a second intelligibility based on the second audio data. The method includes comparing the first intelligibility to the second intelligibility, and determining, based on the comparing, that the second intelligibility exceeds a threshold. The method includes generating a set of countermeasures in response to the determining. The set of countermeasures includes at least one modification to a parameter of the speech. The method includes providing at least one countermeasure of the set of countermeasures.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: December 21, 2021
    Assignee: International Business Machines Corporation
    Inventors: Gianluca Gargaro, Matteo Rogante, Angela Ghidoni, Sara Moggi
  • Patent number: 11195508
    Abstract: An information processing device according to embodiments includes a communication unit configured to receive audio data of content and text data corresponding to the audio data, an audio data reproduction unit configured to perform reproduction of the audio data, a text data reproduction unit configured to perform the reproduction by audio synthesis of the text data, and a controller that controls the reproduction of the audio data or the text data. The controller causes the text data reproduction unit to perform the reproduction of the text data when the audio data reproduction unit is unable to perform the reproduction of the audio data.
    Type: Grant
    Filed: September 11, 2019
    Date of Patent: December 7, 2021
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventor: Jun Tsukamoto
  • Patent number: 11176092
    Abstract: There is provided a database management system (DBMS) in order to make anonymization processing of the database efficient. When receiving a query including a conversion rule, the database management system is configured to process a relationship table in the database based on the conversion rule. At that time, the DBMS is configured to acquire data from a processing result table (result of processing the relationship table) stored in the past for tuples the number of which for each value appearing in a predetermined attribute satisfies a condition required for the anonymization processing. On the other hand, for tuples the number of which for each value appearing in a predetermined attribute does not satisfy the condition required for the anonymization processing, the DBMS is configured to acquire data from the database or from a result of processing the relationship table, the result being stored in rather than the processing result table.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: November 16, 2021
    Assignees: HITACHI, LTD., THE UNIVERSITY OF TOKYO
    Inventors: Yuya Isoda, Kazuhiko Mogi, Kouji Kimura, Kazuo Goda, Yuto Hayamizu, Masaru Kitsuregawa
  • Patent number: 11170754
    Abstract: [Problem] To allow a user to grasp clearly a source of information with sound. [Solution] There is provided an information processor that includes an output control unit that controls output of an information notification using sound, the output control unit causing, on the basis of a recognized external sound source, the information notification to be output in an output mode that is not similar to an external sound that can be emitted by the external sound source. There is also provided an information processing method that includes controlling, by a processor, output of an information notification using sound, the controlling further including causing, on the basis of a recognized external sound source, the information notification to be output in an output mode that is not similar to an external sound that can be emitted by the external sound source.
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: November 9, 2021
    Assignee: SONY CORPORATION
    Inventors: Ayumi Nakagawa, Takanobu Omata, Soichiro Inatani
  • Patent number: 11151334
    Abstract: In at least one broad aspect, described herein are systems and methods in which a latent representation shared between two languages is built and/or accessed, and then leveraged for the purpose of text generation in both languages. Neural text generation techniques are applied to facilitate text generation, and in particular the generation of sentences (i.e., sequences of words or subwords) in both languages, in at least some embodiments.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: October 19, 2021
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Mehdi Rezagholizadeh, Md Akmal Haidar, Alan Do-Omri, Ahmad Rashid
  • Patent number: 11145100
    Abstract: Novel tools and techniques are provided for implementing three-dimensional facial modeling and visual speech synthesis. In various embodiments, a computing system might determine an orientation, size, and location of a face in a received input image; retrieve a three-dimensional model template comprising a face and head; project the input image onto the model template to generate a three-dimensional model; define, on the model, a polygon mesh in a region of facial feature corresponding to feature in the input image; adjust parameters on the model; and display the model. The computing system might parse a text string into allophonic units; encode each allophonic unit into a point(s) in linguistic space corresponding to mouth movements; retrieve, from a codebook, indexed images/morphs corresponding to encoded points in the linguistic space; render the indexed images/morphs into an animation of the three-dimensional model; synchronize, for output, the animation with audio representations of the text string.
    Type: Grant
    Filed: January 12, 2018
    Date of Patent: October 12, 2021
    Assignee: The Regents of the University of Colorado, a body corporate
    Inventors: Sarel van Vuuren, Nattawut Ngampatipatpong, Robert N. Bowen
  • Patent number: 11114091
    Abstract: A method of processing audio communications over a network, comprising: at a first client device: receiving a first audio transmission from a second client device that is provided in a source language distinct from a default language associated with the first client device; obtaining current user language attributes for the first client device that are indicative of a current language used for the communication session at the first client device; if the current user language attributes suggest a target language currently used for the communication session at the first client device is distinct from the default language associated with the first client device: obtaining a translation of the first audio transmission from the source language into the target language; and presenting the translation of the first audio transmission in the target language to a user at the first client device.
    Type: Grant
    Filed: October 10, 2019
    Date of Patent: September 7, 2021
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Fei Xiong, Jinghui Shi, Lei Chen, Min Ren, Feixiang Peng
  • Patent number: 11087728
    Abstract: Systems, devices, media, and methods are presented for playing audio sounds, such as music, on a portable electronic device using a digital color image of a note matrix on a map. A computer vision engine, in an example implementation, includes a mapping module, a color detection module, and a music playback module. The camera captures a color image of the map, including a marker and a note matrix. Based on the color image, the computer vision engine detects a token color value associated with each field. Each token color value is associated with a sound sample from a specific musical instrument. A global state map is stored in memory, including the token color value and location of each field in the note matrix. The music playback module, for each column, in order, plays the notes associated with one or more the rows, using the corresponding sound sample, according to the global state map.
    Type: Grant
    Filed: December 21, 2019
    Date of Patent: August 10, 2021
    Assignee: Snap Inc.
    Inventors: Ilteris Canberk, Donald Giovannini, Sana Park
  • Patent number: 11069335
    Abstract: Aspects of the disclosure are related to synthesizing speech or other audio based on input data. Additionally, aspects of the disclosure are related to using one or more recurrent neural networks. For example, a computing device may receive text input; may determine features based on the text input; may provide the features as input to an recurrent neural network; may determine embedded data from one or more activations of a hidden layer of the recurrent neural network; may determine speech data based on a speech unit search that attempts to select, from a database, speech units based on the embedded data; and may generate speech output based on the speech data.
    Type: Grant
    Filed: July 12, 2017
    Date of Patent: July 20, 2021
    Assignee: Cerence Operating Company
    Inventors: Vincent Pollet, Enrico Zovato
  • Patent number: 11069339
    Abstract: At least some embodiments described herein relate to computer-assisted conversation. The set of available conversation segments is updated by addressing conversation segments at the granularity of a conversation segment or a group of conversation segments. For instance, an entire class of conversation segments may be addressed to add, delete, turn on, or turn off, the class of conversation segments. Groups of class of conversation segments may also be similarly addressed. Thus, as the scope of a conversation changes, the available set of conversation segments may likewise change with fine-grained control. Accordingly, rather than pre-plan every set of possible conversations, the context and direction of the conversation may be evaluated by code to thereby determine what new sets of conversation segments should be added, deleted, turned on, or turned off. New conversation segments may even be generated dynamically, taking into account the values of parameters that then exist.
    Type: Grant
    Filed: January 21, 2020
    Date of Patent: July 20, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Angshuman Sarkar, John Anthony Taylor, Henrik Frystyk Nielsen
  • Patent number: 11056110
    Abstract: An operation method of a dialog agent includes obtaining an utterance history including at least one of an outgoing utterance to be transmitted to request a service or at least one of an incoming utterance to be received to request the service, updating a requirement specification including items requested for the service based on the utterance history, generating utterance information to be used to request the service based on the updated requirement specification, and outputting the generated utterance information.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: July 6, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Young-Seok Kim, Jeong-Hoon Park, Seongmin Ok, Je Hun Jeon, Jun Hwi Choi
  • Patent number: 11017761
    Abstract: Presented herein are embodiments of a non-autoregressive sequence-to-sequence model that converts text to an audio representation. Embodiment are fully convolutional, and a tested embodiment obtained about 46.7 times speed-up over a prior model at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, a tested embodiment also has fewer attention errors than the autoregressive model on challenging test sentences. In one or more embodiments, the first fully parallel neural text-to-speech system was built by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. System embodiments can synthesize speech from text through a single feed-forward pass. Also disclosed herein are embodiments of a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.
    Type: Grant
    Filed: October 16, 2019
    Date of Patent: May 25, 2021
    Assignee: Baidu USA LLC
    Inventors: Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
  • Patent number: 11003417
    Abstract: A speech recognition method and apparatus for performing speech recognition in response to an activation word determined based on a situation are provided. The speech recognition method and apparatus include an artificial intelligence (AI) system and its application, which simulates functions such as recognition and judgment of a human brain using a machine learning algorithm such as deep learning.
    Type: Grant
    Filed: October 13, 2017
    Date of Patent: May 11, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sung-ja Choi, Eun-kyoung Kim, Ji-sang Yu, Ji-yeon Hong, Jong-youb Ryu, Jae-won Lee
  • Patent number: 10984036
    Abstract: A computing device is programmed to receive data collected from communications of a user. The computer identifies portions of the collected data including a keyword selected from a list of media content elements or lists of keywords associated with each of the media content elements. The computer associates each portion with a media content element. The computer further determines a score for each media content element based on at least one of the number of references, words included in the portion of collected referring to the media content element, and the voice quality of the portion of collected data referring to the media content element. Based on the scores, the computer assigns media content elements to the user. The computer recommends media content items to the user based at least in part on the media content elements assigned to the user.
    Type: Grant
    Filed: May 3, 2016
    Date of Patent: April 20, 2021
    Assignee: DISH Technologies L.L.C.
    Inventors: Prakash Subramanian, Nicholas Brandon Newell
  • Patent number: 10978069
    Abstract: Techniques for altering default language, in system outputs, with language included in system inputs are described. A system may determine a word(s) in user inputs, associated with a particular user identifier, correspond to but are not identical to a word(s) in system outputs. The system may store an association between the user identifier, the word(s) in the user inputs, and the word(s) in the system outputs. Thereafter, when the system is generates a response to a user input, the system may replace the word(s), traditionally in the system outputs, with the word(s) that was present in previous user inputs. Such processing may further be tailored to a natural language intent.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: April 13, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Andrew Starr McCraw, Sheena Yang, Sampat Biswas, Ryan Summers, Michael Sean McPhillips
  • Patent number: 10938976
    Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: a voice delivery application, running on a mobile device of a user, receives a text message from a user; by use of sensor inputs of the mobile device, the mobile device stores data regarding environment of the mobile device including external audio equipment, speed of the user, and bystanders within a hearing range of the environment; various data describing a sender of the text message and the bystanders are analyzed for respective relationships with the user and with each other to determine a confidentiality group dictating whether or not the text message may be heard by the bystander; the text message may be scanned for content screening, then according to configuration of the voice delivery application, the text message is securely delivered to the user by voice.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: March 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Darryl M. Adderly, Jonathan W. Jackson, Ajit Jariwala, Eric B. Libow
  • Patent number: 10930302
    Abstract: Text can be presented with speech indicators generated by a cognitive system by processing the text. The speech indicators can indicate recommended speech characteristics to be exhibited by a user while the user generates spoken utterances representing the text. Data indicating at least one user input changing at least one of the speech indicators from a first state as originally presented to a second state can be received. In response, a value indicating a level of change made to the at least one of the speech indicators can be determined. At least one parameter used by the cognitive system to select the speech indicators can be modified or created based on the value indicating the level of change made to the at least one of the speech indicators.
    Type: Grant
    Filed: December 22, 2017
    Date of Patent: February 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ilse Breedvelt-Schouten, Sasa Matijevic
  • Patent number: 10923103
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting units for speech synthesis. One of the methods includes determining a sequence of text units that each represent a respective portion of text for speech synthesis; and determining multiple paths of speech units that each represent the sequence of text units by selecting a first speech unit that includes speech synthesis data representing a first text unit; selecting multiple second speech units including speech synthesis data representing a second text unit based on (i) a join cost to concatenate the second speech unit with a first speech unit and (ii) a target cost indicating a degree that the second speech unit corresponds to the second text unit; and defining paths from the selected first speech unit to each of the multiple second speech units to include in the multiple paths of speech units.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: February 16, 2021
    Assignee: Google LLC
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 10901685
    Abstract: Embodiments are related to processing of one or more input audio feeds for generation of a target audio stream that includes at least one object of interest to a listener. In some embodiments, the target audio stream may exclusively or primarily include the sound of the object of interest to the listener, without including other persons. This allows a listener to focus on an object of his or her interest and not necessarily have to listen to the performances of other objects in the input audio feed. Some embodiments contemplate multiple audio feeds and/or with multiple objects of interest.
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: January 26, 2021
    Assignee: Sling Media Pvt. Ltd.
    Inventors: Yatish Jayant Naik Raikar, Mohammed Rasool, Trinadha Harish Babu Pallapothu
  • Patent number: 10904385
    Abstract: At least some embodiments, a system includes a memory, and a processor configured to convert an audio stream of a speech of a customer during a customer call session into customer-originated text. The customer-originated text is displayed in a first chat interface. A request from a first call center agent is sent to a second call center agent via the first chat interface to interact with the customer during the customer call session and displayed in a second chat interface. The second agent is allowed to participate in the customer call session when the second call center agent accepts the request from the first call center agent. First agent-originated text and second agent-originated text during the customer call session is merged to form a combined agent-originated text and synthesized to computer-generated agent speech having a voice of a computer-generated agent based on the combined agent-originated text communicated to the customer over the voice channel.
    Type: Grant
    Filed: January 9, 2020
    Date of Patent: January 26, 2021
    Assignee: CAPITAL ONE SERVICES, LLC
    Inventors: Srikanth Reddy Sheshaiahgari, Jignesh Rangwala, Lee Adcock, Vamsi Kavuri, Muthukumaran Vembuli, Mehulkumar Jayantilal Garnara, Soumyajit Ray, Vincent Pham
  • Patent number: 10896669
    Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
    Type: Grant
    Filed: May 8, 2018
    Date of Patent: January 19, 2021
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
  • Patent number: 10888232
    Abstract: Embodiments of the present disclosure are configured to assess the severity of a blockage in a vessel and, in particular, a stenosis in a blood vessel. In some particular embodiments, the devices, systems, and methods of the present disclosure are configured to assess the severity of a stenosis in the coronary arteries without the administration of a hyperemic agent. Further, in some implementations devices, systems, and methods of the present disclosure are configured to normalize and/or temporally align pressure measurements from two different pressure sensing instruments. Further still, in some instances devices, systems, and methods of the present disclosure are configured to exclude outlier cardiac cycles from calculations utilized to evaluate a vessel, including providing visual indication to a user that the cardiac cycles have been excluded.
    Type: Grant
    Filed: January 16, 2014
    Date of Patent: January 12, 2021
    Assignee: PHILIPS IMAGE GUIDED THERAPY CORPORATION
    Inventors: David Anderson, Howard David Alpert
  • Patent number: 10887268
    Abstract: Disclosed embodiments describe systems and methods for prioritizing messages for conversion from text to speech. A message manager can execute on a device. The message manager can identify a plurality of messages accessible via the device and can determine, for each message of the plurality of messages, a conversion score based on one or more parameters of each message. The conversion score can indicate a priority of each message to convert from text to speech. The message manager can identify a message of the plurality of messages for transmission to a text-to-speech converter for converting the message from text to speech. The message manager can also receive, from the text-to-speech converter, speech data of the message to play via an audio output of the device.
    Type: Grant
    Filed: August 19, 2019
    Date of Patent: January 5, 2021
    Assignee: Citrix Systems, Inc.
    Inventors: Thierry Duchastel, Marcos Alejandro Di Pietro