Image To Speech Patents (Class 704/260)
  • Patent number: 11580963
    Abstract: A speech generation method and apparatus are disclosed. The speech generation method includes obtaining, by a processor, a linguistic feature and a prosodic feature from an input text, determining, by the processor, a first candidate speech element through a cost calculation and a Viterbi search based on the linguistic feature and the prosodic feature, generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element, and outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search.
    Type: Grant
    Filed: October 14, 2020
    Date of Patent: February 14, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jangsu Lee, Hoshik Lee, Jehun Jeon
  • Patent number: 11580581
    Abstract: A computer-aided design system enables physical articles to be customized via printing or embroidering and enables digital content to be customized and electronically shared. A user interface may be generated that includes an image of a model of an article of manufacture and user customizable design areas that are graphically indicated on the image corresponding to the model. A design area selection may be received. In response to an add design element instruction and design element specification, the specified design element is rendered in the selected design area on the model image. Customization permissions associated with the selected design area are accessed, and using the customization permissions, a first set of design element edit tools are selected and rendered. User edits to the design element may be received and rendered in real time. Manufacturing instructions may be transmitted to a printing system.
    Type: Grant
    Filed: June 8, 2021
    Date of Patent: February 14, 2023
    Assignee: Best Apps, LLC
    Inventor: Michael Bowen
  • Patent number: 11556782
    Abstract: In a trained attentive decoder of a trained Sequence-to-Sequence (seq2seq) Artificial Neural Network (ANN): obtaining an encoded input vector sequence; generating, using a trained primary attention mechanism of the trained attentive decoder, a primary attention vectors sequence; for each primary attention vector of the primary attention vectors sequence: (a) generating a set of attention vector candidates corresponding to the respective primary attention vector, (b) evaluating, for each attention vector candidate of the set of attention vector candidates, a structure fit measure that quantifies a similarity of the respective attention vector candidate to a desired attention vector structure, (c) generating, using a trained soft-selection ANN, a secondary attention vector based on said evaluation and on state variables of the trained attentive decoder; and generating, using the trained attentive decoder, an output sequence based on the encoded input vector sequence and the secondary attention vectors.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: January 17, 2023
    Assignee: International Business Machines Corporation
    Inventors: Vyacheslav Shechtman, Alexander Sorin
  • Patent number: 11551568
    Abstract: Embodiments of a language learning system and method for implementing or assisting in self-study for improving listening fluency in a target language are disclosed. Such embodiments may simultaneously present the same piece of content in an auditory presentation and a corresponding visual presentation of a transcript of the auditory presentation, where the two presentations are adapted to work in tandem to increase the effectiveness of language learning for users.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: January 10, 2023
    Assignee: JIVEWORLD, SPC
    Inventor: Daniel Paul Raynaud
  • Patent number: 11545131
    Abstract: A reading order extrapolation and management system and process for facilitating auditory comprehension of electronic documents. As an example, a user may access contents of an electronic document via an application and request a speech-synthesized recitation of any media in the electronic document. The application may make use of a reading order that has been specifically generated and improved by reference to eye tracking data from users reading the document. A reading order can be assigned to a document and implemented when, for example, a screen reader is engaged for use with the document. Such systems can be of great benefit to users with visually impairments and/or distracted users seeking a meaningful audio presentation of textual content.
    Type: Grant
    Filed: July 16, 2019
    Date of Patent: January 3, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Tracy ThuyDuyen Tran, Emily Tran, Daniel Yancy Parish
  • Patent number: 11538454
    Abstract: Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: December 27, 2022
    Assignee: Rovi Product Corporation
    Inventor: William Korbecki
  • Patent number: 11531819
    Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.
    Type: Grant
    Filed: January 14, 2020
    Date of Patent: December 20, 2022
    Assignee: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
  • Patent number: 11531807
    Abstract: A method, computer program product, and computer system for encoding, by a computing device, a transcript and text macros into vector representations. A word by word report may be predicted based upon, at least in part, the encoding. An attention mechanism may be queried based upon, at least in part, a decoder state. An attention distribution may be produced over an encoder output. An interpolation of the encoder output may be produced based upon, at least in part, the attention distribution. The interpolation of the encoder output may be input into a decoder for report modeling that includes text macro location and content.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: December 20, 2022
    Assignee: Nuance Communications, Inc.
    Inventors: Paul Joseph Vozila, Joel Praveen Pinto, Frank Diehl
  • Patent number: 11521593
    Abstract: A method of embodying an online media service having a multiple voice system includes a first operation of collecting preset online articles and content from a specific media site and displaying the online articles and content on a screen of a personal terminal, a second operation of inputting a voice of a subscriber or setting a voice of a specific person among voices that are pre-stored in a database, a third operation of recognizing and classifying the online articles and content, a fourth operation of converting the classified online articles and content into speech, and a fifth operation of outputting the online articles and content using the voice of the subscriber or the specific person, which is set in the second operation.
    Type: Grant
    Filed: October 21, 2020
    Date of Patent: December 6, 2022
    Inventor: Jong Yup Lee
  • Patent number: 11514885
    Abstract: An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content (504); obtaining a voice print model for the extracted speeches of the voice (506); processing the extracted speeches by utilizing the voice print model to generate replacement speeches (508); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (510).
    Type: Grant
    Filed: November 21, 2016
    Date of Patent: November 29, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Henry Gabryjelski, Jian Luan, Dapeng Li
  • Patent number: 11501700
    Abstract: A image sticking compensating device according to example embodiments includes a degradation calculator configured to calculate a degradation weight based on input image data, and to calculate degradation data of a frame, an accumulator configured to accumulate the degradation data, and to generate age data using the accumulated degradation data, and a compensator configured to determine a grayscale compensation value corresponding to the age data and an input grayscale of the input image data, and to output age compensation data by applying the grayscale compensation value to the input image data.
    Type: Grant
    Filed: November 5, 2020
    Date of Patent: November 15, 2022
    Inventor: Sang-Myeon Han
  • Patent number: 11488588
    Abstract: In a control system including a printing apparatus and a server system, the server system includes a transmission unit that, if a voice instruction received by a voice control device is a query regarding the printing apparatus, transmits information concerning the printing apparatus without performing processing of content used for print processing, and a specification unit that, if the received voice instruction is a print instruction for printing the content and includes a print setting value corresponding to a first item but not a print setting value corresponding to a second item, specifies content corresponding to the print instruction, a print setting value corresponding to the first item, and a preset, predetermined print setting value for the second item. The printing apparatus includes a print control unit that performs print processing based on the content, the print setting value corresponding to the first item, and the specified predetermined print setting value.
    Type: Grant
    Filed: November 8, 2018
    Date of Patent: November 1, 2022
    Assignee: Canon Kabushiki Kaisha
    Inventor: Toshiki Shiga
  • Patent number: 11488576
    Abstract: Provided is an artificial intelligence (AI) apparatus for generating a speech having a content-based style, including: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text, extract at least one content keyword corresponding to the text, determine a speech style based on the extracted content keyword, generate a speech corresponding to the text by using a TTS engine corresponding to the determined speech style among the plurality of TTS engines, and output the generated speech.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: November 1, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Jisoo Park, Jonghoon Chae
  • Patent number: 11481510
    Abstract: One embodiment provides a method, including: receiving, at an audio capture device associated with an information handling device, command input from a user; providing, to the user and responsive to receiving the command input, a confirmation query, wherein the confirmation query is formed utilizing context data associated with an authorized user; determining, using a processor, whether a response to the confirmation query provided by the user matches a predetermined answer; and performing, responsive to determining that the response matches the predetermined answer, a function corresponding to the command input. Other aspects are described and claimed.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: October 25, 2022
    Assignee: Lenovo (Singapore) Pte. Ltd.
    Inventors: Robert James Norton, Jr., Robert James Kapinos, Russell Speight VanBlon, Scott Wentao Li
  • Patent number: 11475878
    Abstract: An electronic device for providing a text-to-speech (TTS) service and an operating method therefor are provided. The operating method of the electronic device includes obtaining target voice data based on an utterance input of a specific speaker, determining a number of learning steps of the target voice data, based on data features including a data amount of the target voice data, generating a target model by training a pre-trained model pre-trained to convert text into an audio signal, by using the target voice data as training data, based on the determined number of learning steps, generating output data obtained by converting input text into an audio signal, by using the generated target model, and outputting the generated output data.
    Type: Grant
    Filed: October 27, 2020
    Date of Patent: October 18, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kyoungbo Min, Seungdo Choi, Doohwa Hong
  • Patent number: 11468879
    Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: October 11, 2022
    Assignee: TENCENT AMERICA LLC
    Inventors: Chengzhu Yu, Heng Lu, Dong Yu
  • Patent number: 11468878
    Abstract: Disclosed is speech synthesis in a noisy environment. According to an embodiment of the disclosure, a method of speech synthesis may generate a Lombard effect-applied synthesized speech using a feature vector generated from an utterance feature. According to the disclosure, the speech synthesis method and device may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.
    Type: Grant
    Filed: September 23, 2020
    Date of Patent: October 11, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang
  • Patent number: 11468242
    Abstract: A computer evaluates free-form text messages among members of a team, using natural language processing techniques to process the text messages and to assess psychological state of the team members as reflected it the text messages. The computer assembles the psychological state as reflected in the messages to evaluate team collective psychological state. The computer reports a trend of team collective psychological state in natural language text form.
    Type: Grant
    Filed: August 21, 2020
    Date of Patent: October 11, 2022
    Assignee: SONDERMIND INC.
    Inventors: Glen A. Coppersmith, Patrick N Crutchley, Ophir Frieder, Ryan Leary, Anthony D. Wood, Aleksander Yelskiy
  • Patent number: 11455366
    Abstract: Aspects described herein may provide determination of compliance with accessibility rules by a webpage. A first version of a webpage may be compliant with the accessibility rules. The first version of the webpage may be modified to create the second version of the webpage. The second version of the webpage may be displayed. A voiceover of the second version of the webpage may be initiated. The voiceover may include starting automatic text-to-speech software that reads aloud the second version of the webpage. The voiceover of the second version of the webpage may be recorded and stored. A textual transcript of the stored recording may be generated. Compliance of the second version of the webpage with the accessibility rules may be determined based on the textual transcript of the stored recording and based on the first version of the webpage.
    Type: Grant
    Filed: October 27, 2020
    Date of Patent: September 27, 2022
    Assignee: Capital One Services, LLC
    Inventor: Evan Wiley
  • Patent number: 11443733
    Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.
    Type: Grant
    Filed: October 28, 2019
    Date of Patent: September 13, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
  • Patent number: 11423430
    Abstract: A system and method for receiving and executing emoji based commands in messaging applications. The system and method may include processes such as identifying emojis in a message, determining one or more action based on the emoji, and completing the determined actions.
    Type: Grant
    Filed: May 31, 2021
    Date of Patent: August 23, 2022
    Assignee: PayPal, Inc.
    Inventor: Kent Griffin
  • Patent number: 11423895
    Abstract: Provided are a method and device for providing an event-emotion-based interactive interface by using an artificial intelligence (AI) system. The method includes identifying an emotional state of a user for at least one event by analyzing a response to a query, learning emotion information of the user for the at least one event, based on the emotional state of the user, determining an interaction type for the at least one event, based on the emotion information of the user, and providing notification information for the at least one event, based on the interaction type.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: August 23, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Inchul Hwang, Hyeonmok Ko, Munjo Kim, Hyungtak Choi
  • Patent number: 11410639
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: August 9, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 11404043
    Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes one or more storage devices including instructions and a processor to execute the instructions. The processor is to execute the instructions to: determine a user tone of the user input; generate a response to the user input based on the user tone; and identify a response tone associated with the user tone. The example system also includes a transmitter to communicate the response and the response tone over a network.
    Type: Grant
    Filed: April 17, 2020
    Date of Patent: August 2, 2022
    Assignee: Intel Corporation
    Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
  • Patent number: 11404051
    Abstract: A language proficiency analyzer automatically evaluates a person's language proficiency by analyzing that person's oral communications with another person. The analyzer first enhances the quality of an audio recording of a conversation between the two people using a neural network that automatically detects loss features in the audio and adds those loss features back into the audio. The analyzer then performs a textual and audio analysis on the improved audio. Through textual analysis, the analyzer uses a multi-attention network to determine how focused one person is on the other and how pleased one person is with the other. Through audio analysis, the analyzer uses a neural network to determine how well one person pronounced words during the conversation.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: August 2, 2022
    Assignee: Bank of America Corporation
    Inventors: Madhusudhanan Krishnamoorthy, Harikrishnan Rajeev
  • Patent number: 11398217
    Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes one or more storage devices including instructions and a processor to execute the instructions. The processor is to execute the instructions to: generate first and second non-lexical cues to enhance speech to be synthesized from text; determine a first insertion point of the first non-lexical cue in the text; determine a second insertion point of the second non-lexical cue in the text; and insert the first non-lexical cue at the first insertion point and the second non-lexical cue at the second insertion point. The example system also includes a transmitter to communicate the text with the inserted first non-lexical cue and the inserted second non-lexical cue over a network.
    Type: Grant
    Filed: April 17, 2020
    Date of Patent: July 26, 2022
    Assignee: Intel Corporation
    Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
  • Patent number: 11393452
    Abstract: The present invention relates to methods of converting a speech into another speech that sounds more natural. The method includes learning for a target conversion function and a target identifier according to an optimal condition in which the target conversion function and the target identifier compete with each other. The target conversion function converts source speech into target speech. The target identifier identifies whether the converted target speech follows the same distribution as actual target speech. The methods include learning for a source conversion function and a source identifier according to an optimal condition in which the source conversion function and the source identifier compete with each other. The source conversion function converts target speech into source speech, and the source identifier identifies whether the converted source speech follows the same distribution as actual source speech.
    Type: Grant
    Filed: February 20, 2019
    Date of Patent: July 19, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ko Tanaka, Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo
  • Patent number: 11380319
    Abstract: A charging stand includes a controller. The controller is configured to perform one of a speech operation and a voice recognition operation using contents in accordance with a location of a charging stand that supplies electric power to the mobile terminal.
    Type: Grant
    Filed: July 4, 2018
    Date of Patent: July 5, 2022
    Assignee: KYOCERA Corporation
    Inventors: Joji Yoshikawa, Yuki Yamada, Hiroshi Okamoto
  • Patent number: 11380311
    Abstract: An AI apparatus includes a microphone to acquire speech data including multiple languages, and a processor to acquire text data corresponding to the speech data, determine a main language from languages included in the text data, acquire a translated text data obtained by translating a text data portion, which has a language other than the main language, in the main language, acquire a morpheme analysis result for the translated text data, extract a keyword for intention analysis from the morpheme analysis result, acquire an intention pattern matched to the keyword, and perform an operation corresponding to the intention pattern.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: July 5, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Yejin Kim, Hyun Yu, Jonghoon Chae
  • Patent number: 11374946
    Abstract: Detection of malicious files is disclosed. A set comprising one or more sample classification models is stored on a networked device. N-gram analysis is performed on a sequence of received packets associated with a received file. Performing the n-gram analysis includes using at least one stored sample classification model. A determination is made that the received file is malicious based at least in part on the n-gram analysis of the sequence of received packets. In response to determining that the file is malicious, propagation of the received file is prevented.
    Type: Grant
    Filed: July 19, 2019
    Date of Patent: June 28, 2022
    Assignee: Palo Alto Networks, Inc.
    Inventors: William Redington Hewlett, II, Suiqiang Deng, Sheng Yang, Ho Yu Lam
  • Patent number: 11367434
    Abstract: An electronic device, a method for obtaining an utterance intention of a user thereof, and a non-transitory computer-readable recording medium are provided. An electronic device according to an embodiment of the present disclosure may comprise: a microphone for receiving a user voice uttered by a user; and a processor for obtaining an utterance intention of a user on the basis of at least one word included in a user voice while the user voice is being input, providing response information corresponding to the obtained utterance intention, and updating the response information while providing the response information, on the basis of an additional word uttered after the at least one word is input.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: June 21, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-hyeon Lee, Hae-hun Yang, He-jung Yang, Jung-sup Lee, Hee-sik Jeon, Hyung-tak Choi
  • Patent number: 11348581
    Abstract: A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input. The processor is configured to receive second data from a second input device, the second data indicating the second input, and to update a mapping to associate the first input to the command identified by the second input.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: May 31, 2022
    Assignee: Qualcomm Incorporated
    Inventors: Ravi Choudhary, Lae-Hoon Kim, Sunkuk Moon, Yinyi Guo, Fatemeh Saki, Erik Visser
  • Patent number: 11341172
    Abstract: A method for text capture is provided. The method monitors a text session among a set of mobile text-enabled devices capable of having mixed operating system types. The method captures messages and message metadata from the text session by a machine-attended message-capture-dedicated phone configured for reception-or-pass-through-only with respect to the mobile text-enabled devices. The method receives the messages and the message metadata from the message-capture-dedicated phone by a remote message capture device that is constrained to have a compatible operating system type as the machine-attended message-capture-dedicated phone but unconstrained with respect to the operating system type of the set of mobile text-enabled devices.
    Type: Grant
    Filed: May 13, 2021
    Date of Patent: May 24, 2022
    Assignee: FIRMSCRIBE, LLC
    Inventor: Cheyenne Ehrlich
  • Patent number: 11340963
    Abstract: Aspects of the technology described herein improve the clarity of information provided in automatically generated notifications, such as reminders, tasks, alerts or other messages or communications provided to a user. The clarity may be improved through augmentations that provide additional information or specificity to the user. For example, instead of providing a notification reminding the user, “remember to send the slides before the meeting,” the user may be provided with a notification reminding the user “remember to send the updated sales presentation before the executive committee meeting on Tuesday. The augmentation may take several forms including substituting one word in the notification with another more specific word, adding additional content such as a word or phrase to the notification without altering the existing content, and/or by rephrasing the content for grammatical correctness and/or clarity.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: May 24, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Dikla Dotan-Cohen, Ido Priness, Haim Somech, Anat Inon, Amitay Dror, Michal Yarom Zarfati
  • Patent number: 11335320
    Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify speech units that are required for synthesizing speech. The system can request from a server the text-to-speech unit needed to synthesize the speech. The system can then synthesize speech using text-to-speech units already stored and a received text-to-speech unit from the server.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: May 17, 2022
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Benjamin J. Stern, Mark Charles Beutnagel, Alistair D. Conkie, Horst J. Schroeter, Amanda Joy Stent
  • Patent number: 11335321
    Abstract: A method of building a text-to-speech (TTS) system from a small amount of speech data includes receiving a first plurality of recorded speech samples from an assortment of speakers and a second plurality of recorded speech samples from a target speaker where the assortment of speakers does not include the target speaker. The method further includes training a TTS model using the first plurality of recorded speech samples from the assortment of speakers. Here, the trained TTS model is configured to output synthetic speech as an audible representation of a text input. The method also includes re-training the trained TTS model using the second plurality of recorded speech samples from the target speaker combined with the first plurality of recorded speech samples from the assortment of speakers. Here, the re-trained TTS model is configured to output synthetic speech resembling speaking characteristics of the target speaker.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: May 17, 2022
    Assignee: Google LLC
    Inventors: Ye Jia, Byungha Chun, Yusuke Oda, Norman Casagrande, Tejas Iyer, Fan Luo, Russell John Wyatt Skerry-Ryan, Jonathan Shen, Yonghui Wu, Yu Zhang
  • Patent number: 11336928
    Abstract: Disclosed are various embodiments for predictive caching of identical starting sequences in content. A content item library is scanned to identify an initial portion shared by multiple content items. The initial portion is extracted from a first content item. It is determined that a second content item is to be predictively cached by a client. The initial portion of the first content item is sent to the client in place of the initial portion of the second content item.
    Type: Grant
    Filed: September 24, 2015
    Date of Patent: May 17, 2022
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Kevin Joseph Thornberry, Piers George Cowburn, Olivier Georget
  • Patent number: 11335324
    Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: May 17, 2022
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
  • Patent number: 11336664
    Abstract: Detection of malicious files is disclosed. A set comprising one or more sample classification models is stored on a networked device. N-gram analysis is performed on a sequence of received packets associated with a received file. Performing the n-gram analysis includes using at least one stored sample classification model. A determination is made that the received file is malicious based at least in part on the n-gram analysis of the sequence of received packets. In response to determining that the file is malicious, propagation of the received file is prevented.
    Type: Grant
    Filed: July 19, 2019
    Date of Patent: May 17, 2022
    Assignee: Palo Alto Networks, Inc.
    Inventors: William Redington Hewlett, II, Suiqiang Deng, Sheng Yang, Ho Yu Lam
  • Patent number: 11328728
    Abstract: A voice assistant proxy for voice assistant servers and related methods. The voice assistant proxy comprises a processor configured to convert voice data to text using speech-to-text synthesis, determine a voice command from the text, determine whether the voice command is associated with sensitive data based on a set of criteria, route the voice command to an enterprise voice assistant server in response to a determination that the voice command is sensitive, route the voice command to a third party voice assistant server in response to a determination that the voice command is not sensitive.
    Type: Grant
    Filed: January 20, 2020
    Date of Patent: May 10, 2022
    Assignee: BlackBerry Limited
    Inventors: Michael Peter Montemurro, James Randolph Winter Lepp
  • Patent number: 11328729
    Abstract: System and method for providing presence of modifications in user dictation are disclosed. Exemplary implementations may: obtain primary audio information representing sound, including speech from a recording user, captured by a client computing platform; perform speech recognition on the primary audio information to generate a textual transcript; effectuate presentation of the transcript to the recording user; receive user input from the recording user; alter, based on the received user input from the recording user, a portion of the transcript to generate an altered transcript; effectuate presentation of the altered transcript in conjunction with audio playback of at least some of the primary audio information in a reviewing interface on a client computing platform; receive user input from the reviewing user; alter, based on the received user input from the reviewing user, portions of the altered transcript to generate a reviewed transcript; and store the reviewed transcript in electronic storage.
    Type: Grant
    Filed: February 24, 2020
    Date of Patent: May 10, 2022
    Assignee: Suki AI, Inc.
    Inventor: Matt Pallakoff
  • Patent number: 11322133
    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.
    Type: Grant
    Filed: July 21, 2020
    Date of Patent: May 3, 2022
    Assignee: Adobe Inc.
    Inventors: Sumit Shekhar, Gautam Choudhary, Abhilasha Sancheti, Shubhanshu Agarwal, E Santhosh Kumar, Rahul Saxena
  • Patent number: 11314790
    Abstract: Computing systems, database systems, and related methods are provided for recommending values for fields of database objects and dynamically updating a recommended value for a field of a database record in response to updated auxiliary data associated with the database record. One method involves obtaining associated conversational data, segmenting the conversational data, converting each respective segment of conversational data into a numerical representation, generating a combined numerical representation of the conversational data based on the sequence of numerical representations using an aggregation model, generating the recommended value based on the combined numerical representation of the conversational data using a prediction model associated with the field, and autopopulating the field of the case database object with the recommended value.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: April 26, 2022
    Assignee: salesforce.com, inc.
    Inventors: Son Thanh Chang, Weiping Peng, Na Cheng, Feifei Jiang, Jacob Nathaniel Huffman, Nandini Suresh Kumar, Khoa Le, Christopher Larry
  • Patent number: 11316964
    Abstract: Provided is a computer implemented method and system for delivering text messages, emails, and messages from a messenger application to a user while the user is engaged in an activity, such as driving, exercising, or working. Typically, the emails and other messages are announced to the user and read aloud without any user input. In Drive Mode, while the user is driving, a clean interface is shown to the user, and the user can hear announcements and messages/emails aloud without looking at the screen of the phone, and use gestures to operate the phone. After a determination is made that a new text message and/or email has arrived, the user is informed aloud of the text message/email/messenger message and in most instances, and if the user takes no further action, the body and/or subject of the text message/email/messenger message is read aloud to the user. All messages can be placed in a single queue, and read to the user in order of receipt.
    Type: Grant
    Filed: January 15, 2021
    Date of Patent: April 26, 2022
    Assignee: Messageloud Inc.
    Inventor: Garin Toren
  • Patent number: 11315435
    Abstract: Systems and methods for verbal expression are provided. In one aspect, a verbal expression system may receive a selection of sound identifiers, generate a list of video files associated with the identifiers, receive a selection of one or more video files, concatenate the video files into an assignment file, and map the assignment file to one or more users. Optionally, the verbal expression system determine user statistics for each user, generates a progress report for each user, and/or transmits the progress report to one or more users.
    Type: Grant
    Filed: February 10, 2020
    Date of Patent: April 26, 2022
    Assignee: Gemiini Educational Systems, Inc.
    Inventor: Laura Marie Kasbar
  • Patent number: 11315544
    Abstract: A method includes: determining, by a computer device, a current context associated with a user that is the target audience of an unprompted verbal output of an interactive computing device; determining, by the computer device, one or more parameters that are most effective in getting the attention of the user for the determined current context; and modifying, by the computer device, the unprompted verbal output of the interactive computing device using the determined one or more parameters.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: April 26, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Bender, Rhonda L. Childress, Craig M. Trim, Martin G. Keen
  • Patent number: 11308947
    Abstract: A system and method for voice control of a media playback device is disclosed. The method includes receiving an instruction of a voice command, converting the voice command to text, transmitting the text command to the playback device, and having the playback device execute the command. An instruction may include a command to play a set of audio tracks, and the media playback device plays the set of audio tracks upon receiving the instruction.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: April 19, 2022
    Assignee: Spotify AB
    Inventors: Daniel Bromand, Richard Mitic, Horia Jurcut, Jennifer Thom-Santelli, Henriette Cramer, Karl Humphreys, Bo Williams, Kurt Jacobson, Henrik Lindström
  • Patent number: 11302302
    Abstract: Embodiments of the present disclosure disclose a method, apparatus, device, and storage medium for switching a voice role. The method includes: recognizing an instruction of switching a voice role input by a user, and determining a target voice role corresponding to the instruction of switching the voice role; switching a current voice role of a smart terminal to the target voice role, different voice roles having different role attributes, and a role attribute including a role utterance attribute; generating interactive response information corresponding to an interactive voice, based on the interactive voice input by the user and a role utterance attribute of the target voice role; and providing a response voice corresponding to the interactive response information to the user. The embodiments of the present disclosure enable different voice roles to have different role utterance attributes, so that the voice role has a role sense.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: April 12, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Yu Wang, Bo Xie
  • Patent number: 11302300
    Abstract: A system and method enable one to set a target duration of a desired synthesized utterance without removing or adding spoken content. Without changing the spoken text, the voice characteristics may be kept the same or substantially the same. Silence adjustment and interpolation may be used to alter the duration while preserving speech characteristics. Speech may be translated prior to a vocoder step, pursuant to which the translated speech is constrained by the original audio duration, while mimicking the speech characteristics of the original speech.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: April 12, 2022
    Assignee: Applications Technology (AppTek), LLC
    Inventors: Nick Rossenbach, Mudar Yaghi
  • Patent number: 11294621
    Abstract: A control method outputs audio indicating content of an operation of a transmission device connected to a reception device. The control method includes accepting the operation of the transmission device, generating operation data indicating the content of the operation by the transmission device, transmitting the operation data from the transmission device to the reception device, generating audio data indicating the content of the operation based on the operation data by the reception device, and outputting the audio indicated by the audio data by the reception device.
    Type: Grant
    Filed: March 1, 2018
    Date of Patent: April 5, 2022
    Assignee: FUNAI ELECTRIC CO., LTD.
    Inventors: Mitsuru Kawakita, Takuya Suzuki, Kenichi Fukunaka, Yosuke Sonoda, Shigeru Toji, Masahiko Arashi