Image To Speech Patents (Class 704/260)

Method and apparatus for generating speech

Patent number: 11580963

Abstract: A speech generation method and apparatus are disclosed. The speech generation method includes obtaining, by a processor, a linguistic feature and a prosodic feature from an input text, determining, by the processor, a first candidate speech element through a cost calculation and a Viterbi search based on the linguistic feature and the prosodic feature, generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element, and outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search.

Type: Grant

Filed: October 14, 2020

Date of Patent: February 14, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jangsu Lee, Hoshik Lee, Jehun Jeon
Computer aided systems and methods for creating custom products

Patent number: 11580581

Abstract: A computer-aided design system enables physical articles to be customized via printing or embroidering and enables digital content to be customized and electronically shared. A user interface may be generated that includes an image of a model of an article of manufacture and user customizable design areas that are graphically indicated on the image corresponding to the model. A design area selection may be received. In response to an add design element instruction and design element specification, the specified design element is rendered in the selected design area on the model image. Customization permissions associated with the selected design area are accessed, and using the customization permissions, a first set of design element edit tools are selected and rendered. User edits to the design element may be received and rendered in real time. Manufacturing instructions may be transmitted to a printing system.

Type: Grant

Filed: June 8, 2021

Date of Patent: February 14, 2023

Assignee: Best Apps, LLC

Inventor: Michael Bowen
Structure-preserving attention mechanism in sequence-to-sequence neural models

Patent number: 11556782

Abstract: In a trained attentive decoder of a trained Sequence-to-Sequence (seq2seq) Artificial Neural Network (ANN): obtaining an encoded input vector sequence; generating, using a trained primary attention mechanism of the trained attentive decoder, a primary attention vectors sequence; for each primary attention vector of the primary attention vectors sequence: (a) generating a set of attention vector candidates corresponding to the respective primary attention vector, (b) evaluating, for each attention vector candidate of the set of attention vector candidates, a structure fit measure that quantifies a similarity of the respective attention vector candidate to a desired attention vector structure, (c) generating, using a trained soft-selection ANN, a secondary attention vector based on said evaluation and on state variables of the trained attentive decoder; and generating, using the trained attentive decoder, an output sequence based on the encoded input vector sequence and the secondary attention vectors.

Type: Grant

Filed: September 19, 2019

Date of Patent: January 17, 2023

Assignee: International Business Machines Corporation

Inventors: Vyacheslav Shechtman, Alexander Sorin
System and method for dual mode presentation of content in a target language to improve listening fluency in the target language

Patent number: 11551568

Abstract: Embodiments of a language learning system and method for implementing or assisting in self-study for improving listening fluency in a target language are disclosed. Such embodiments may simultaneously present the same piece of content in an auditory presentation and a corresponding visual presentation of a transcript of the auditory presentation, where the two presentations are adapted to work in tandem to increase the effectiveness of language learning for users.

Type: Grant

Filed: December 23, 2020

Date of Patent: January 10, 2023

Assignee: JIVEWORLD, SPC

Inventor: Daniel Paul Raynaud
Reading order system for improving accessibility of electronic content

Patent number: 11545131

Abstract: A reading order extrapolation and management system and process for facilitating auditory comprehension of electronic documents. As an example, a user may access contents of an electronic document via an application and request a speech-synthesized recitation of any media in the electronic document. The application may make use of a reading order that has been specifically generated and improved by reference to eye tracking data from users reading the document. A reading order can be assigned to a document and implemented when, for example, a screen reader is engaged for use with the document. Such systems can be of great benefit to users with visually impairments and/or distracted users seeking a meaningful audio presentation of textual content.

Type: Grant

Filed: July 16, 2019

Date of Patent: January 3, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Tracy ThuyDuyen Tran, Emily Tran, Daniel Yancy Parish
Systems and methods for presenting social network communications in audible form based on user engagement with a user device

Patent number: 11538454

Abstract: Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.

Type: Grant

Filed: May 28, 2020

Date of Patent: December 27, 2022

Assignee: Rovi Product Corporation

Inventor: William Korbecki
Text-to-speech adapted by machine learning

Patent number: 11531819

Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.

Type: Grant

Filed: January 14, 2020

Date of Patent: December 20, 2022

Assignee: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
System and method for customized text macros

Patent number: 11531807

Abstract: A method, computer program product, and computer system for encoding, by a computing device, a transcript and text macros into vector representations. A word by word report may be predicted based upon, at least in part, the encoding. An attention mechanism may be queried based upon, at least in part, a decoder state. An attention distribution may be produced over an encoder output. An interpolation of the encoder output may be produced based upon, at least in part, the attention distribution. The interpolation of the encoder output may be input into a decoder for report modeling that includes text macro location and content.

Type: Grant

Filed: September 30, 2019

Date of Patent: December 20, 2022

Assignee: Nuance Communications, Inc.

Inventors: Paul Joseph Vozila, Joel Praveen Pinto, Frank Diehl
Method of embodying online media service having multiple voice systems

Patent number: 11521593

Abstract: A method of embodying an online media service having a multiple voice system includes a first operation of collecting preset online articles and content from a specific media site and displaying the online articles and content on a screen of a personal terminal, a second operation of inputting a voice of a subscriber or setting a voice of a specific person among voices that are pre-stored in a database, a third operation of recognizing and classifying the online articles and content, a fourth operation of converting the classified online articles and content into speech, and a fifth operation of outputting the online articles and content using the voice of the subscriber or the specific person, which is set in the second operation.

Type: Grant

Filed: October 21, 2020

Date of Patent: December 6, 2022

Inventor: Jong Yup Lee
Automatic dubbing method and apparatus

Patent number: 11514885

Abstract: An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content (504); obtaining a voice print model for the extracted speeches of the voice (506); processing the extracted speeches by utilizing the voice print model to generate replacement speeches (508); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (510).

Type: Grant

Filed: November 21, 2016

Date of Patent: November 29, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Henry Gabryjelski, Jian Luan, Dapeng Li
Image sticking compensating device and display device having the same

Patent number: 11501700

Abstract: A image sticking compensating device according to example embodiments includes a degradation calculator configured to calculate a degradation weight based on input image data, and to calculate degradation data of a frame, an accumulator configured to accumulate the degradation data, and to generate age data using the accumulated degradation data, and a compensator configured to determine a grayscale compensation value corresponding to the age data and an input grayscale of the input image data, and to output age compensation data by applying the grayscale compensation value to the input image data.

Type: Grant

Filed: November 5, 2020

Date of Patent: November 15, 2022

Inventor: Sang-Myeon Han
Voice control system and control method for controlling printing apparatus

Patent number: 11488588

Abstract: In a control system including a printing apparatus and a server system, the server system includes a transmission unit that, if a voice instruction received by a voice control device is a query regarding the printing apparatus, transmits information concerning the printing apparatus without performing processing of content used for print processing, and a specification unit that, if the received voice instruction is a print instruction for printing the content and includes a print setting value corresponding to a first item but not a print setting value corresponding to a second item, specifies content corresponding to the print instruction, a print setting value corresponding to the first item, and a preset, predetermined print setting value for the second item. The printing apparatus includes a print control unit that performs print processing based on the content, the print setting value corresponding to the first item, and the specified predetermined print setting value.

Type: Grant

Filed: November 8, 2018

Date of Patent: November 1, 2022

Assignee: Canon Kabushiki Kaisha

Inventor: Toshiki Shiga
Artificial intelligence apparatus for generating text or speech having content-based style and method for the same

Patent number: 11488576

Abstract: Provided is an artificial intelligence (AI) apparatus for generating a speech having a content-based style, including: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text, extract at least one content keyword corresponding to the text, determine a speech style based on the extracted content keyword, generate a speech corresponding to the text by using a TTS engine corresponding to the determined speech style among the plurality of TTS engines, and output the generated speech.

Type: Grant

Filed: May 21, 2019

Date of Patent: November 1, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Jisoo Park, Jonghoon Chae
Context based confirmation query

Patent number: 11481510

Abstract: One embodiment provides a method, including: receiving, at an audio capture device associated with an information handling device, command input from a user; providing, to the user and responsive to receiving the command input, a confirmation query, wherein the confirmation query is formed utilizing context data associated with an authorized user; determining, using a processor, whether a response to the confirmation query provided by the user matches a predetermined answer; and performing, responsive to determining that the response matches the predetermined answer, a function corresponding to the command input. Other aspects are described and claimed.

Type: Grant

Filed: December 23, 2019

Date of Patent: October 25, 2022

Assignee: Lenovo (Singapore) Pte. Ltd.

Inventors: Robert James Norton, Jr., Robert James Kapinos, Russell Speight VanBlon, Scott Wentao Li
Electronic device and operating method thereof

Patent number: 11475878

Abstract: An electronic device for providing a text-to-speech (TTS) service and an operating method therefor are provided. The operating method of the electronic device includes obtaining target voice data based on an utterance input of a specific speaker, determining a number of learning steps of the target voice data, based on data features including a data amount of the target voice data, generating a target model by training a pre-trained model pre-trained to convert text into an audio signal, by using the target voice data as training data, based on the determined number of learning steps, generating output data obtained by converting input text into an audio signal, by using the generated target model, and outputting the generated output data.

Type: Grant

Filed: October 27, 2020

Date of Patent: October 18, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Kyoungbo Min, Seungdo Choi, Doohwa Hong
Duration informed attention network for text-to-speech analysis

Patent number: 11468879

Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.

Type: Grant

Filed: April 29, 2019

Date of Patent: October 11, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Heng Lu, Dong Yu
Speech synthesis in noisy environment

Patent number: 11468878

Abstract: Disclosed is speech synthesis in a noisy environment. According to an embodiment of the disclosure, a method of speech synthesis may generate a Lombard effect-applied synthesized speech using a feature vector generated from an utterance feature. According to the disclosure, the speech synthesis method and device may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

Type: Grant

Filed: September 23, 2020

Date of Patent: October 11, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang
Psychological state analysis of team behavior and communication

Patent number: 11468242

Abstract: A computer evaluates free-form text messages among members of a team, using natural language processing techniques to process the text messages and to assess psychological state of the team members as reflected it the text messages. The computer assembles the psychological state as reflected in the messages to evaluate team collective psychological state. The computer reports a trend of team collective psychological state in natural language text form.

Type: Grant

Filed: August 21, 2020

Date of Patent: October 11, 2022

Assignee: SONDERMIND INC.

Inventors: Glen A. Coppersmith, Patrick N Crutchley, Ophir Frieder, Ryan Leary, Anthony D. Wood, Aleksander Yelskiy
Webpage accessibility compliance

Patent number: 11455366

Abstract: Aspects described herein may provide determination of compliance with accessibility rules by a webpage. A first version of a webpage may be compliant with the accessibility rules. The first version of the webpage may be modified to create the second version of the webpage. The second version of the webpage may be displayed. A voiceover of the second version of the webpage may be initiated. The voiceover may include starting automatic text-to-speech software that reads aloud the second version of the webpage. The voiceover of the second version of the webpage may be recorded and stored. A textual transcript of the stored recording may be generated. Compliance of the second version of the webpage with the accessibility rules may be determined based on the textual transcript of the stored recording and based on the first version of the webpage.

Type: Grant

Filed: October 27, 2020

Date of Patent: September 27, 2022

Assignee: Capital One Services, LLC

Inventor: Evan Wiley
Contextual text-to-speech processing

Patent number: 11443733

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

Type: Grant

Filed: October 28, 2019

Date of Patent: September 13, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
Dynamic emoji modal actions

Patent number: 11423430

Abstract: A system and method for receiving and executing emoji based commands in messaging applications. The system and method may include processes such as identifying emojis in a message, determining one or more action based on the emoji, and completing the determined actions.

Type: Grant

Filed: May 31, 2021

Date of Patent: August 23, 2022

Assignee: PayPal, Inc.

Inventor: Kent Griffin
Method and system for providing an interactive interface

Patent number: 11423895

Abstract: Provided are a method and device for providing an event-emotion-based interactive interface by using an artificial intelligence (AI) system. The method includes identifying an emotional state of a user for at least one event by analyzing a response to a query, learning emotion information of the user for the at least one event, based on the emotional state of the user, determining an interaction type for the at least one event, based on the emotion information of the user, and providing notification information for the at least one event, based on the interaction type.

Type: Grant

Filed: September 25, 2019

Date of Patent: August 23, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Inchul Hwang, Hyeonmok Ko, Munjo Kim, Hyungtak Choi
Text-to-speech (TTS) processing

Patent number: 11410639

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: July 7, 2020

Date of Patent: August 9, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Systems and methods for providing non-lexical cues in synthesized speech

Patent number: 11404043

Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes one or more storage devices including instructions and a processor to execute the instructions. The processor is to execute the instructions to: determine a user tone of the user input; generate a response to the user input based on the user tone; and identify a response tone associated with the user tone. The example system also includes a transmitter to communicate the response and the response tone over a network.

Type: Grant

Filed: April 17, 2020

Date of Patent: August 2, 2022

Assignee: Intel Corporation

Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
Textual analysis system for automatic language proficiency assessment

Patent number: 11404051

Abstract: A language proficiency analyzer automatically evaluates a person's language proficiency by analyzing that person's oral communications with another person. The analyzer first enhances the quality of an audio recording of a conversation between the two people using a neural network that automatically detects loss features in the audio and adds those loss features back into the audio. The analyzer then performs a textual and audio analysis on the improved audio. Through textual analysis, the analyzer uses a multi-attention network to determine how focused one person is on the other and how pleased one person is with the other. Through audio analysis, the analyzer uses a neural network to determine how well one person pronounced words during the conversation.

Type: Grant

Filed: May 21, 2020

Date of Patent: August 2, 2022

Assignee: Bank of America Corporation

Inventors: Madhusudhanan Krishnamoorthy, Harikrishnan Rajeev
Systems and methods for providing non-lexical cues in synthesized speech

Patent number: 11398217

Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes one or more storage devices including instructions and a processor to execute the instructions. The processor is to execute the instructions to: generate first and second non-lexical cues to enhance speech to be synthesized from text; determine a first insertion point of the first non-lexical cue in the text; determine a second insertion point of the second non-lexical cue in the text; and insert the first non-lexical cue at the first insertion point and the second non-lexical cue at the second insertion point. The example system also includes a transmitter to communicate the text with the inserted first non-lexical cue and the inserted second non-lexical cue over a network.

Type: Grant

Filed: April 17, 2020

Date of Patent: July 26, 2022

Assignee: Intel Corporation

Inventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
Device for learning speech conversion, and device, method, and program for converting speech

Patent number: 11393452

Abstract: The present invention relates to methods of converting a speech into another speech that sounds more natural. The method includes learning for a target conversion function and a target identifier according to an optimal condition in which the target conversion function and the target identifier compete with each other. The target conversion function converts source speech into target speech. The target identifier identifies whether the converted target speech follows the same distribution as actual target speech. The methods include learning for a source conversion function and a source identifier according to an optimal condition in which the source conversion function and the source identifier compete with each other. The source conversion function converts target speech into source speech, and the source identifier identifies whether the converted source speech follows the same distribution as actual source speech.

Type: Grant

Filed: February 20, 2019

Date of Patent: July 19, 2022

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Ko Tanaka, Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo
Charging stand, mobile terminal, communication system, method, and program

Patent number: 11380319

Abstract: A charging stand includes a controller. The controller is configured to perform one of a speech operation and a voice recognition operation using contents in accordance with a location of a charging stand that supplies electric power to the mobile terminal.

Type: Grant

Filed: July 4, 2018

Date of Patent: July 5, 2022

Assignee: KYOCERA Corporation

Inventors: Joji Yoshikawa, Yuki Yamada, Hiroshi Okamoto
Artificial intelligence apparatus for recognizing speech including multiple languages, and method for the same

Patent number: 11380311

Abstract: An AI apparatus includes a microphone to acquire speech data including multiple languages, and a processor to acquire text data corresponding to the speech data, determine a main language from languages included in the text data, acquire a translated text data obtained by translating a text data portion, which has a language other than the main language, in the main language, acquire a morpheme analysis result for the translated text data, extract a keyword for intention analysis from the morpheme analysis result, acquire an intention pattern matched to the keyword, and perform an operation corresponding to the intention pattern.

Type: Grant

Filed: March 6, 2020

Date of Patent: July 5, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Yejin Kim, Hyun Yu, Jonghoon Chae
Inline malware detection

Patent number: 11374946

Abstract: Detection of malicious files is disclosed. A set comprising one or more sample classification models is stored on a networked device. N-gram analysis is performed on a sequence of received packets associated with a received file. Performing the n-gram analysis includes using at least one stored sample classification model. A determination is made that the received file is malicious based at least in part on the n-gram analysis of the sequence of received packets. In response to determining that the file is malicious, propagation of the received file is prevented.

Type: Grant

Filed: July 19, 2019

Date of Patent: June 28, 2022

Assignee: Palo Alto Networks, Inc.

Inventors: William Redington Hewlett, II, Suiqiang Deng, Sheng Yang, Ho Yu Lam
Electronic device, method for determining utterance intention of user thereof, and non-transitory computer-readable recording medium

Patent number: 11367434

Abstract: An electronic device, a method for obtaining an utterance intention of a user thereof, and a non-transitory computer-readable recording medium are provided. An electronic device according to an embodiment of the present disclosure may comprise: a microphone for receiving a user voice uttered by a user; and a processor for obtaining an utterance intention of a user on the basis of at least one word included in a user voice while the user voice is being input, providing response information corresponding to the obtained utterance intention, and updating the response information while providing the response information, on the basis of an additional word uttered after the at least one word is input.

Type: Grant

Filed: December 19, 2017

Date of Patent: June 21, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dong-hyeon Lee, Hae-hun Yang, He-jung Yang, Jung-sup Lee, Hee-sik Jeon, Hyung-tak Choi
Multi-modal user interface

Patent number: 11348581

Abstract: A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input. The processor is configured to receive second data from a second input device, the second data indicating the second input, and to update a mapping to associate the first input to the command identified by the second input.

Type: Grant

Filed: November 15, 2019

Date of Patent: May 31, 2022

Assignee: Qualcomm Incorporated

Inventors: Ravi Choudhary, Lae-Hoon Kim, Sunkuk Moon, Yinyi Guo, Fatemeh Saki, Erik Visser
Capturing messages from a phone message exchange

Patent number: 11341172

Abstract: A method for text capture is provided. The method monitors a text session among a set of mobile text-enabled devices capable of having mixed operating system types. The method captures messages and message metadata from the text session by a machine-attended message-capture-dedicated phone configured for reception-or-pass-through-only with respect to the mobile text-enabled devices. The method receives the messages and the message metadata from the message-capture-dedicated phone by a remote message capture device that is constrained to have a compatible operating system type as the machine-attended message-capture-dedicated phone but unconstrained with respect to the operating system type of the set of mobile text-enabled devices.

Type: Grant

Filed: May 13, 2021

Date of Patent: May 24, 2022

Assignee: FIRMSCRIBE, LLC

Inventor: Cheyenne Ehrlich
Augmentation of notification details

Patent number: 11340963

Abstract: Aspects of the technology described herein improve the clarity of information provided in automatically generated notifications, such as reminders, tasks, alerts or other messages or communications provided to a user. The clarity may be improved through augmentations that provide additional information or specificity to the user. For example, instead of providing a notification reminding the user, “remember to send the slides before the meeting,” the user may be provided with a notification reminding the user “remember to send the updated sales presentation before the executive committee meeting on Tuesday. The augmentation may take several forms including substituting one word in the notification with another more specific word, adding additional content such as a word or phrase to the notification without altering the existing content, and/or by rephrasing the content for grammatical correctness and/or clarity.

Type: Grant

Filed: January 8, 2019

Date of Patent: May 24, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Dikla Dotan-Cohen, Ido Priness, Haim Somech, Anat Inon, Amitay Dror, Michal Yarom Zarfati
System and method for distributed voice models across cloud and device for embedded text-to-speech

Patent number: 11335320

Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify speech units that are required for synthesizing speech. The system can request from a server the text-to-speech unit needed to synthesize the speech. The system can then synthesize speech using text-to-speech units already stored and a received text-to-speech unit from the server.

Type: Grant

Filed: June 23, 2020

Date of Patent: May 17, 2022

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Benjamin J. Stern, Mark Charles Beutnagel, Alistair D. Conkie, Horst J. Schroeter, Amanda Joy Stent
Building a text-to-speech system from a small amount of speech data

Patent number: 11335321

Abstract: A method of building a text-to-speech (TTS) system from a small amount of speech data includes receiving a first plurality of recorded speech samples from an assortment of speakers and a second plurality of recorded speech samples from a target speaker where the assortment of speakers does not include the target speaker. The method further includes training a TTS model using the first plurality of recorded speech samples from the assortment of speakers. Here, the trained TTS model is configured to output synthetic speech as an audible representation of a text input. The method also includes re-training the trained TTS model using the second plurality of recorded speech samples from the target speaker combined with the first plurality of recorded speech samples from the assortment of speakers. Here, the re-trained TTS model is configured to output synthetic speech resembling speaking characteristics of the target speaker.

Type: Grant

Filed: August 28, 2020

Date of Patent: May 17, 2022

Assignee: Google LLC

Inventors: Ye Jia, Byungha Chun, Yusuke Oda, Norman Casagrande, Tejas Iyer, Fan Luo, Russell John Wyatt Skerry-Ryan, Jonathan Shen, Yonghui Wu, Yu Zhang
Predictive caching of identical starting sequences in content

Patent number: 11336928

Abstract: Disclosed are various embodiments for predictive caching of identical starting sequences in content. A content item library is scanned to identify an initial portion shared by multiple content items. The initial portion is extracted from a first content item. It is determined that a second content item is to be predictively cached by a client. The initial portion of the first content item is sent to the client in place of the initial portion of the second content item.

Type: Grant

Filed: September 24, 2015

Date of Patent: May 17, 2022

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Kevin Joseph Thornberry, Piers George Cowburn, Olivier Georget
Synthesized data augmentation using voice conversion and speech recognition models

Patent number: 11335324

Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Type: Grant

Filed: August 31, 2020

Date of Patent: May 17, 2022

Assignee: Google LLC

Inventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
Inline malware detection

Patent number: 11336664

Abstract: Detection of malicious files is disclosed. A set comprising one or more sample classification models is stored on a networked device. N-gram analysis is performed on a sequence of received packets associated with a received file. Performing the n-gram analysis includes using at least one stored sample classification model. A determination is made that the received file is malicious based at least in part on the n-gram analysis of the sequence of received packets. In response to determining that the file is malicious, propagation of the received file is prevented.

Type: Grant

Filed: July 19, 2019

Date of Patent: May 17, 2022

Assignee: Palo Alto Networks, Inc.

Inventors: William Redington Hewlett, II, Suiqiang Deng, Sheng Yang, Ho Yu Lam
Voice assistant proxy for voice assistant servers

Patent number: 11328728

Abstract: A voice assistant proxy for voice assistant servers and related methods. The voice assistant proxy comprises a processor configured to convert voice data to text using speech-to-text synthesis, determine a voice command from the text, determine whether the voice command is associated with sensitive data based on a set of criteria, route the voice command to an enterprise voice assistant server in response to a determination that the voice command is sensitive, route the voice command to a third party voice assistant server in response to a determination that the voice command is not sensitive.

Type: Grant

Filed: January 20, 2020

Date of Patent: May 10, 2022

Assignee: BlackBerry Limited

Inventors: Michael Peter Montemurro, James Randolph Winter Lepp
Systems, methods, and storage media for providing presence of modifications in user dictation

Patent number: 11328729

Abstract: System and method for providing presence of modifications in user dictation are disclosed. Exemplary implementations may: obtain primary audio information representing sound, including speech from a recording user, captured by a client computing platform; perform speech recognition on the primary audio information to generate a textual transcript; effectuate presentation of the transcript to the recording user; receive user input from the recording user; alter, based on the received user input from the recording user, a portion of the transcript to generate an altered transcript; effectuate presentation of the altered transcript in conjunction with audio playback of at least some of the primary audio information in a reviewing interface on a client computing platform; receive user input from the reviewing user; alter, based on the received user input from the reviewing user, portions of the altered transcript to generate a reviewed transcript; and store the reviewed transcript in electronic storage.

Type: Grant

Filed: February 24, 2020

Date of Patent: May 10, 2022

Assignee: Suki AI, Inc.

Inventor: Matt Pallakoff
Expressive text-to-speech utilizing contextual word-level style tokens

Patent number: 11322133

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.

Type: Grant

Filed: July 21, 2020

Date of Patent: May 3, 2022

Assignee: Adobe Inc.

Inventors: Sumit Shekhar, Gautam Choudhary, Abhilasha Sancheti, Shubhanshu Agarwal, E Santhosh Kumar, Rahul Saxena
Dynamic field value recommendation methods and systems

Patent number: 11314790

Abstract: Computing systems, database systems, and related methods are provided for recommending values for fields of database objects and dynamically updating a recommended value for a field of a database record in response to updated auxiliary data associated with the database record. One method involves obtaining associated conversational data, segmenting the conversational data, converting each respective segment of conversational data into a numerical representation, generating a combined numerical representation of the conversational data based on the sequence of numerical representations using an aggregation model, generating the recommended value based on the combined numerical representation of the conversational data using a prediction model associated with the field, and autopopulating the field of the case database object with the recommended value.

Type: Grant

Filed: April 28, 2020

Date of Patent: April 26, 2022

Assignee: salesforce.com, inc.

Inventors: Son Thanh Chang, Weiping Peng, Na Cheng, Feifei Jiang, Jacob Nathaniel Huffman, Nandini Suresh Kumar, Khoa Le, Christopher Larry
Method and system for communication

Patent number: 11316964

Abstract: Provided is a computer implemented method and system for delivering text messages, emails, and messages from a messenger application to a user while the user is engaged in an activity, such as driving, exercising, or working. Typically, the emails and other messages are announced to the user and read aloud without any user input. In Drive Mode, while the user is driving, a clean interface is shown to the user, and the user can hear announcements and messages/emails aloud without looking at the screen of the phone, and use gestures to operate the phone. After a determination is made that a new text message and/or email has arrived, the user is informed aloud of the text message/email/messenger message and in most instances, and if the user takes no further action, the body and/or subject of the text message/email/messenger message is read aloud to the user. All messages can be placed in a single queue, and read to the user in order of receipt.

Type: Grant

Filed: January 15, 2021

Date of Patent: April 26, 2022

Assignee: Messageloud Inc.

Inventor: Garin Toren
Verbal expression system

Patent number: 11315435

Abstract: Systems and methods for verbal expression are provided. In one aspect, a verbal expression system may receive a selection of sound identifiers, generate a list of video files associated with the identifiers, receive a selection of one or more video files, concatenate the video files into an assignment file, and map the assignment file to one or more users. Optionally, the verbal expression system determine user statistics for each user, generates a progress report for each user, and/or transmits the progress report to one or more users.

Type: Grant

Filed: February 10, 2020

Date of Patent: April 26, 2022

Assignee: Gemiini Educational Systems, Inc.

Inventor: Laura Marie Kasbar
Cognitive modification of verbal communications from an interactive computing device

Patent number: 11315544

Abstract: A method includes: determining, by a computer device, a current context associated with a user that is the target audience of an unprompted verbal output of an interactive computing device; determining, by the computer device, one or more parameters that are most effective in getting the attention of the user for the determined current context; and modifying, by the computer device, the unprompted verbal output of the interactive computing device using the determined one or more parameters.

Type: Grant

Filed: June 25, 2019

Date of Patent: April 26, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael Bender, Rhonda L. Childress, Craig M. Trim, Martin G. Keen
Voice recognition system for use with a personal media streaming appliance

Patent number: 11308947

Abstract: A system and method for voice control of a media playback device is disclosed. The method includes receiving an instruction of a voice command, converting the voice command to text, transmitting the text command to the playback device, and having the playback device execute the command. An instruction may include a command to play a set of audio tracks, and the media playback device plays the set of audio tracks upon receiving the instruction.

Type: Grant

Filed: May 7, 2018

Date of Patent: April 19, 2022

Assignee: Spotify AB

Inventors: Daniel Bromand, Richard Mitic, Horia Jurcut, Jennifer Thom-Santelli, Henriette Cramer, Karl Humphreys, Bo Williams, Kurt Jacobson, Henrik Lindström
Method, apparatus, device and storage medium for switching voice role

Patent number: 11302302

Abstract: Embodiments of the present disclosure disclose a method, apparatus, device, and storage medium for switching a voice role. The method includes: recognizing an instruction of switching a voice role input by a user, and determining a target voice role corresponding to the instruction of switching the voice role; switching a current voice role of a smart terminal to the target voice role, different voice roles having different role attributes, and a role attribute including a role utterance attribute; generating interactive response information corresponding to an interactive voice, based on the interactive voice input by the user and a role utterance attribute of the target voice role; and providing a response voice corresponding to the interactive response information to the user. The embodiments of the present disclosure enable different voice roles to have different role utterance attributes, so that the voice role has a role sense.

Type: Grant

Filed: July 18, 2018

Date of Patent: April 12, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventors: Yu Wang, Bo Xie
Method and apparatus for forced duration in neural speech synthesis

Patent number: 11302300

Abstract: A system and method enable one to set a target duration of a desired synthesized utterance without removing or adding spoken content. Without changing the spoken text, the voice characteristics may be kept the same or substantially the same. Silence adjustment and interpolation may be used to alter the duration while preserving speech characteristics. Speech may be translated prior to a vocoder step, pursuant to which the translated speech is constrained by the original audio duration, while mimicking the speech characteristics of the original speech.

Type: Grant

Filed: November 19, 2020

Date of Patent: April 12, 2022

Assignee: Applications Technology (AppTek), LLC

Inventors: Nick Rossenbach, Mudar Yaghi
Control method, transmission device, and reception device

Patent number: 11294621

Abstract: A control method outputs audio indicating content of an operation of a transmission device connected to a reception device. The control method includes accepting the operation of the transmission device, generating operation data indicating the content of the operation by the transmission device, transmitting the operation data from the transmission device to the reception device, generating audio data indicating the content of the operation based on the operation data by the reception device, and outputting the audio indicated by the audio data by the reception device.

Type: Grant

Filed: March 1, 2018

Date of Patent: April 5, 2022

Assignee: FUNAI ELECTRIC CO., LTD.

Inventors: Mitsuru Kawakita, Takuya Suzuki, Kenichi Fukunaka, Yosuke Sonoda, Shigeru Toji, Masahiko Arashi

prev 1 2 3 4 5 6 … next