Image To Speech Patents (Class 704/260)
-
Patent number: 11971926Abstract: Disclosed computer-based systems and methods for analyzing a plurality of audio files corresponding to text-based news stories and received from a plurality of audio file creators are configured to (i) compare quality and/or accuracy metrics of individual audio files against corresponding quality and/or accuracy thresholds, and (ii) based on the comparison: (a) accept audio files meeting the quality and/or accuracy thresholds for distribution to a plurality of subscribers for playback, (b) reject audio files failing to meet one or more certain quality and/or accuracy thresholds, (c) remediate audio files failing to meet certain quality thresholds, and (d) designate for human review, audio files failing to meet one or more certain quality and/or accuracy thresholds by a predetermined margin.Type: GrantFiled: August 17, 2020Date of Patent: April 30, 2024Assignee: Gracenote Digital Ventures, LLCInventors: Gregory P. Defouw, Venkatarama Anilkumar Panguluri
-
Patent number: 11960852Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.Type: GrantFiled: December 15, 2021Date of Patent: April 16, 2024Assignee: Google LLCInventors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz
-
Patent number: 11948559Abstract: Various embodiments include methods and devices for implementing automatic grammar augmentation for improving voice command recognition accuracy in systems with a small footprint acoustic model. Alternative expressions that may capture acoustic model decoding variations may be added to a grammar set. An acoustic model-specific statistical pronunciation dictionary may be derived by running the acoustic model through a large general speech dataset and constructing a command-specific candidate set containing potential grammar expressions. Greedy based and cross-entropy-method (CEM) based algorithms may be utilized to search the candidate set for augmentations with improved recognition accuracy.Type: GrantFiled: March 21, 2022Date of Patent: April 2, 2024Assignee: QUALCOMM IncorporatedInventors: Yang Yang, Anusha Lalitha, Jin Won Lee, Christopher Lott
-
Patent number: 11922194Abstract: A method of operating a computing device in support of improved accessibility includes displaying a user interface to an application on a display screen of the computing device, wherein the computing device includes an accessibility assistant that reads an audible description of an element of the user interface; initiating, on the computing device, a virtual assistant that conducts an audible conversation between a user and the virtual assistant through at least a microphone and a speaker associated with the computing device, wherein the virtual assistant is not integrated with an operating system of the computing device; inhibiting an ability of the accessibility assistant to read the audible description of the element of the user interface; and upon transition of the virtual assistant from an active state, enabling the ability of the accessibility assistant.Type: GrantFiled: May 19, 2022Date of Patent: March 5, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Jaclyn Carley Knapp, Lisa Stifelman, André Roberto Lima Tapajós, Jin Xu, Steven DiCarlo, Kaichun Wu, Yuhua Guan
-
Patent number: 11915714Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.Type: GrantFiled: December 21, 2021Date of Patent: February 27, 2024Assignees: Adobe Inc., Northwestern UniversityInventors: Maxwell Morrison, Juan Pablo Caceres Chomali, Zeyu Jin, Nicholas Bryan, Bryan A. Pardo
-
Patent number: 11888797Abstract: Emoji-first messaging where text messaging is automatically converted to emojis by an emoji-first application so that only emojis are communicated from one client device to another client device. Each client device has a library of emojis that are mapped to words, which libraries are customizable and unique to the users of the client devices, such that the users can communicate secretly in code. Upon receipt of a string of emojis, a user can select the emoji string to convert to text if desired, for a predetermined period of time.Type: GrantFiled: April 20, 2021Date of Patent: January 30, 2024Assignee: Snap Inc.Inventors: Karl Bayer, Prerna Chikersal, Shree K. Nayar, Brian Anthony Smith
-
Patent number: 11886483Abstract: Techniques for recommending media are described. A character preference function comprising a plurality of preference coefficients is accessed. A first character model comprises a first set of attribute values for the plurality of attributes of a first character. The first and second characters are associated with a first and second salience value, respectively. A second character model comprises a second set of attribute values for the plurality of attributes of a second character of the plurality of characters. A first character rating is calculated using the plurality of preference coefficients and the first set of attribute values. A second character rating of the second character is calculated using the plurality of preference coefficients with the second set of attribute values. A media rating is calculated based on the first and second salience values and the first and second character ratings. A media is recommended based on the media rating.Type: GrantFiled: May 4, 2020Date of Patent: January 30, 2024Assignee: The Nielsen Company (US), LLCInventors: Rachel Payne, Meghana Bhatt, Natasha Mohanty
-
Patent number: 11848001Abstract: Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes processor circuitry to generate a breathing cue to enhance speech to be synthesized from text; determine a first insertion point of the breathing cue in the text, wherein the breathing cue is identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text; determine a second insertion point of the prosody cue in the text, wherein the prosody cue is identified by a second tag of the markup language; insert the breathing cue at the first insertion point based on the first tag and the prosody cue at the second insertion point based on the second tag; and trigger a synthesis of the speech from the text, the breathing cue, and the prosody cue.Type: GrantFiled: June 23, 2022Date of Patent: December 19, 2023Assignee: Intel CorporationInventors: Jessica M. Christian, Peter Graff, Crystal A. Nakatsu, Beth Ann Hockey
-
Patent number: 11848004Abstract: A method for controlling an electronic device includes obtaining a text, obtaining, by inputting the text into a first neural network model, acoustic feature information corresponding to the text and alignment information in which each frame of the acoustic feature information is matched with each phoneme included in the text, identifying an utterance speed of the acoustic feature information based on the alignment information, identifying a reference utterance speed for each phoneme included in the acoustic feature information based on the text and the acoustic feature information, obtaining utterance speed adjustment information based on the utterance speed of the acoustic feature information and the reference utterance speed for each phoneme, and obtaining, based on the utterance speed adjustment information, speech data corresponding to the text by inputting the acoustic feature information into a second neural network model.Type: GrantFiled: June 27, 2022Date of Patent: December 19, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Sangjun Park, Kihyun Choo
-
Patent number: 11848655Abstract: Systems, devices, and methods are provided for multi-stem volume equalization, wherein the volume levels of each stem may be adjusted non-uniformly. Audio may be diarized into a plurality of stems, including background noise separate. Mean and variance of the volume levels of the stems may be computed. Each audio stem may be automatically adjusted based on a stem-specific preference that a user may specify. View may adjust actor volume relative to the mean/variance that maintains a relative difference in volume levels between stems.Type: GrantFiled: September 15, 2021Date of Patent: December 19, 2023Assignee: Amazon Technologies, Inc.Inventors: Mohammed Khalilia, Naveen Sudhakaran Nair
-
Patent number: 11837223Abstract: A human language software defined network (SDN) control system, including: a voice to text machine learning model configured to convert user speech to text; a machine learning language processing engine configured to control the operation of a SDN controller based upon the text; and a machine learning minimal language processing engine configured to control the operation of a SDN element based upon commands from the SDN controller produced by the machine learning language processing engine.Type: GrantFiled: December 18, 2020Date of Patent: December 5, 2023Assignee: NOKIA SOLUTIONS AND NETWORKS OYInventor: Sowrirajan Padmanabhan
-
Patent number: 11830475Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform speech synthesis. One of the methods includes obtaining a training data set for training a first neural network to process a spectral representation of an audio sample and to generate a prediction of the audio sample, wherein, after training, the first neural network obtains spectral representations of audio samples from a second neural network; for a plurality of audio samples in the training data set: generating a ground-truth spectral representation of the audio sample; and processing the ground-truth spectral representation using a third neural network to generate an updated spectral representation of the audio sample; and training the first neural network using the updated spectral representations, wherein the third neural network is configured to generate updated spectral representations that resemble spectral representations generated by the second neural network.Type: GrantFiled: June 1, 2022Date of Patent: November 28, 2023Assignee: DeepMind Technologies LimitedInventor: Norman Casagrande
-
Patent number: 11817087Abstract: Systems and methods for distributing cloud-based language processing services to partially execute in a local device to reduce latency perceived by the user. For example, a local device may receive a request via audio input, that requires a cloud-based service to process the request and generate a response. A partial response may be generated locally and played back while a more complete response is generated remotely.Type: GrantFiled: August 28, 2020Date of Patent: November 14, 2023Assignee: Micron Technology, Inc.Inventor: Ameen D. Akel
-
Patent number: 11810000Abstract: Systems and methods for classifying data are disclosed. For example, a system may include at least one memory storing instructions and at least one processor configured to execute the instructions to perform operations. The operations may include receiving training data comprising a class. The operations may include training a data classification model using the training data to generate a trained data classification model. The operations may include receiving additional data comprising labeled samples of an additional class not contained in the training data. The operations may include creating a synthetic data generator. The operations may include training the synthetic data generator to generate synthetic data corresponding to the additional class. The operations may include generating a synthetic classified dataset comprising the additional class. The operations may include retraining the trained data classification model using the synthetic classified dataset.Type: GrantFiled: November 30, 2022Date of Patent: November 7, 2023Assignee: CAPITAL ONE SERVICES, LLCInventors: Austin Walters, Jeremy Goodsitt, Anh Truong
-
Patent number: 11809831Abstract: A symbol sequence converting apparatus according to an embodiment includes one or more hardware processors. The processors: generates a plurality of candidate output symbol sequences, based on rule information in which input symbols are each associated with one or more output symbols each obtained by converting the corresponding input symbol in accordance with a predetermined conversion condition, the plurality of candidate output symbol sequences each containing one or more of the output symbols and corresponding to an input symbol sequence containing one or more of the input symbols; derives respective confidence levels of the plurality of candidate output symbol sequences by using a learning model; and identifies, as an output symbol sequence corresponding to the input symbol sequence, the candidate output symbol sequence corresponding to a highest confidence level.Type: GrantFiled: August 27, 2020Date of Patent: November 7, 2023Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Tomohiro Yamasaki, Yoshiyuki Kokojima
-
Patent number: 11810548Abstract: A speech translation method using a multilingual text-to-speech synthesis model includes acquiring a single artificial neural network text-to-speech synthesis model having acquired learning based on a learning text of a first language and learning speech data of the first language corresponding to the learning text of the first language, and a learning text of a second language and learning speech data of the second language corresponding to the learning text of the second language, receiving input speech data of the first language and an articulatory feature of a speaker regarding the first language, converting the input speech data of the first language into a text of the first language, converting the text of the first language into a text of the second language, and generating output speech data for the text of the second language that simulates the speaker's speech.Type: GrantFiled: July 10, 2020Date of Patent: November 7, 2023Assignee: NEOSAPIENCE, INC.Inventors: Taesu Kim, Younggun Lee
-
Patent number: 11804209Abstract: Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.Type: GrantFiled: December 21, 2022Date of Patent: October 31, 2023Assignee: Rovi Product CorporationInventor: William Korbecki
-
Patent number: 11794106Abstract: A non-transitory computer-readable medium including a video game processing program for causing a server to perform functions to control progress of a video game is provided. The functions include: a first arranging function configured to arrange a first object determined based on a user operation at a position determined based on a position of the user terminal in a virtual space corresponding to map information of a real space; a second arranging function configured to arrange a second object with which a predetermined event is associated at a position determined based on a position of the first object after a time determined in accordance with the first object elapses from a time when the first object is arranged; and a generating function configured to generate the predetermined event in accordance with a positional condition regarding the positions of the user terminal and the second object.Type: GrantFiled: September 30, 2021Date of Patent: October 24, 2023Assignee: SQUARE ENIX CO., LTD.Inventors: Takamasa Shiba, Hiroshi Kobayashi, Jun Waga, Yutaka Yoshida
-
Patent number: 11756024Abstract: A first computing device broadcasts a first audio token comprising the first user computing device identifier over two or more audio frequency channels at specified intervals and listens for audio inputs via the two or more audio frequency channels at the specified intervals. The first computing device receives a second audio token generated by a second computing device and communicates the received second audio token to the one or more computing devices. The second computing device receives the first audio token generated by the first computing device and communicates the received first audio token to the one or more computing devices. The one or more computing devices receive the first and second audio tokens and pair the first computing device and the second computing device and facilitate a transfer of data between the first computing device and the second computing device.Type: GrantFiled: July 12, 2021Date of Patent: September 12, 2023Assignee: GOOGLE LLCInventors: Edward Chiang, Arjita Madan, Gopi Krishna Madabhushi, Heman Khanna, Rohan Laishram, Aviral Gupta
-
Patent number: 11749270Abstract: An output apparatus according to the present application includes an estimation unit, a decision unit, and an output unit. The estimation unit estimates an emotion of a user from detection information detected by a predetermined detection device. The decision unit decides information to be changed on the basis of the estimated emotion of the user. The output unit outputs information for changing the information to be changed.Type: GrantFiled: March 10, 2021Date of Patent: September 5, 2023Assignee: YAHOO JAPAN CORPORATIONInventors: Kota Tsubouchi, Teruhiko Teraoka, Hidehito Gomi, Junichi Sato
-
Patent number: 11749256Abstract: Among other things, a developer of an interaction application for an enterprise can create items of content to be provided to an assistant platform for use in responses to requests of end-users. The developer can deploy the interaction application using defined items of content and an available general interaction model including intents and sample utterances having slots. The developer can deploy the interaction application without requiring the developer to formulate any of the intents, sample utterances, or slots of the general interaction model.Type: GrantFiled: August 10, 2020Date of Patent: September 5, 2023Assignee: Voicify, LLCInventors: Jeffrey K. McMahon, Robert T. Naughton, Nicholas G. Laidlaw, Alexander M. Dunn, Jason Green
-
Patent number: 11741945Abstract: An adaptive virtual assistant system can be configured to change an attribute of a virtual assistant based on user responses, environmental conditions, and/or topics of discussion. For example, the virtual assistant system can determine, based at least in part on user data, a communication profile that is associated with the virtual assistant and determine first communication data comprising a first communication attribute based on the communication profile. In some instances, the system can transmit the first communication data to a user device and receive, from the user device, input audio data representing a user utterance. Based at least in part on the input audio data, the system can determine second communication data comprising a second communication attribute and transmit the second communication data to the user device.Type: GrantFiled: September 30, 2019Date of Patent: August 29, 2023Assignee: Amazon Technologies, Inc.Inventors: Joseph Daniel Sullivan, Pasquale DeMaio, Akshay Isaac Lazarus, Juliana Saussy
-
Patent number: 11741996Abstract: In one aspect, an example method includes (i) obtaining a set of user attributes for a user of a content-presentation device; (ii) based on the set of user attributes, obtaining structured data and determining a textual description of the structured data; (iii) transforming, using a text-to-speech engine, the textual description of the structured data into synthesized speech; and (iv) generating, using the synthesized speech and for display by the content-presentation device, a synthetic video of a targeted advertisement comprising the synthesized speech.Type: GrantFiled: December 26, 2022Date of Patent: August 29, 2023Assignee: Roku, Inc.Inventors: Sunil Ramesh, Michael Cutter, Charles Brian Pinkerton, Karina Levitian
-
Patent number: 11741429Abstract: A method, system and computer-readable storage medium for performing a cognitive information processing operation. The cognitive information processing operation includes: receiving data from a plurality of data sources; processing the data from the plurality of data sources to provide cognitively processed insights via an augmented intelligence system, the augmented intelligence system executing on a hardware processor of an information processing system, the augmented intelligence system and the information processing system providing a cognitive computing function; performing an explainability with recourse operation, the explainability with recourse operation providing an assurance explanation regarding the cognitive computing function; and, providing the cognitively processed insights to a destination, the destination comprising a cognitive application, the cognitive application enabling a user to interact with the cognitive insights.Type: GrantFiled: October 25, 2019Date of Patent: August 29, 2023Assignee: Tecnotree Technologies, Inc.Inventors: Joydeep Ghosh, Jessica Henderson, Matthew Sanchez
-
Patent number: 11735162Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: GrantFiled: August 8, 2022Date of Patent: August 22, 2023Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Patent number: 11735204Abstract: Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.Type: GrantFiled: August 17, 2021Date of Patent: August 22, 2023Assignee: SomniQ, Inc.Inventors: Rikko Sakaguchi, Hidenori Ishikawa
-
Patent number: 11735156Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.Type: GrantFiled: August 31, 2020Date of Patent: August 22, 2023Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
-
Patent number: 11727914Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.Type: GrantFiled: December 24, 2021Date of Patent: August 15, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan, Jian Luan, Yu Shi, Malone Ma, Mei-Yuh Hwang
-
Patent number: 11721331Abstract: Systems and methods for device functionality identification are disclosed. For example, a connected device may be coupled to a secondary device. A user may request operation of the connected device, and a system may determine that the connected device is of a given device type. Based on the connected device being of the given device type, the system may cause another device having an environmental sensor to send sensor data indicating environmental changes sensed by the sensor. The connected device may be operated and the sensor may sense environmental changes caused by operation of the connected device. When the sensed environmental changes indicate the device type of the secondary device, a recommendation to change the device type of the connected device to the device type of the secondary device may be provided to a user device associated with the connected device.Type: GrantFiled: December 6, 2019Date of Patent: August 8, 2023Assignee: Amazon Technologies, Inc.Inventor: Jeffrey B Kinsey
-
Patent number: 11699037Abstract: Systems and methods for increasing the impact of a message for a target individual are provided. An audio recording of the message and audio recordings of the target individual are each associated with transcribed text, which is separated into morphemes. Morphemes in the message are substituted with, or supplemented by, matching morphemes in the audio recordings of the target individual to create a revised version of the audio recording of the message, and then electronically transmit the revised audio recording to an electronic device associated with the target individual.Type: GrantFiled: March 8, 2021Date of Patent: July 11, 2023Assignee: Rankin Labs, LLCInventor: John Rankin
-
Patent number: 11694674Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.Type: GrantFiled: May 26, 2021Date of Patent: July 4, 2023Assignee: Amazon Technologies, Inc.Inventors: Syed Ammar Abbas, Bajibabu Bollepalli, Alexis Pierre Moinet, Thomas Renaud Drugman, Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Simon Slangen, Petr Makarov
-
Patent number: 11687722Abstract: A consistent meaning framework (CMF) graph including a plurality of nodes linked by a plurality of edges is maintained in data storage of a data processing system. Multiple nodes among the plurality of nodes are meaning nodes corresponding to different word meanings for a common word spelling of a natural language. Each of the multiple word meanings has a respective one of a plurality of associated constraints. A natural language communication is processed by reference to the CMF graph. The processing includes parsing the natural language communication and selecting, for each of multiple word spellings in the natural language communication, a selected word meaning from among the word meanings provided by the CMF graph. The selected word meaning for each of the multiple word spellings in the natural language communication is recorded in data storage.Type: GrantFiled: February 6, 2020Date of Patent: June 27, 2023Inventor: Thomas A. Visel
-
Patent number: 11687727Abstract: A method for processing a natural language communication includes a processor receiving a natural language communication and identifying, in the natural language communication, an ordered sequence of a plurality of word spellings of a natural human language. The processor performs constraint-based parsing of the natural language communication by reference to a consistent meaning framework (CMF) graph including a plurality of nodes each corresponding to a respective word meaning. The parsing includes scanning the phrase in multiple different parse scan directions and determining a respective meaning associated with each of multiple word spellings in a phrase of the natural language communication by reference to multiple constraints provided by multiple nodes in the CMF graph respectively corresponding to the word meanings.Type: GrantFiled: May 6, 2021Date of Patent: June 27, 2023Inventor: Thomas A. Visel
-
Patent number: 11681417Abstract: Techniques are disclosed for increasing accessibility of digital content. For instance, a code for the digital content and one or more accessibility guidelines are received. The code is analyzed to identify a violation of an accessibility guideline. The digital content presented in accordance with the code, data indicative of the violation, and an option to correct the violation are displayed on a User Interface (UI). In response to receiving an input indicative of a selection of the option to correct the violation, one or more correction options to correct the violation are provided. In response to a selection of a correction option, the code is altered, based on the selected correction option. The alteration of the code corrects the violation of the accessibility guideline and thereby changes one or more aspects of how the digital content is to be presented.Type: GrantFiled: October 23, 2020Date of Patent: June 20, 2023Assignee: Adobe Inc.Inventors: Meera Ramachandran Nair, Manish Kumar Pandey, Majji Kranthi Kumar, Mohit Chaturvedi, Malkeet Singh, Sanjay Kumar Biswas
-
Patent number: 11676577Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for adapting a language model are disclosed. In one aspect, a method includes the actions of receiving transcriptions of utterances that were received by computing devices operating in a domain and that are in a source language. The actions further include generating translated transcriptions of the transcriptions of the utterances in a target language. The actions further include receiving a language model for the target language. The actions further include biasing the language model for the target language by increasing the likelihood of the language model selecting terms included in the translated transcriptions. The actions further include generating a transcription of an utterance in the target language using the biased language model and while operating in the domain.Type: GrantFiled: September 9, 2021Date of Patent: June 13, 2023Assignee: Google LLCInventors: Petar Aleksic, Benjamin Paul Hillson Haynor
-
Patent number: 11670415Abstract: Systems and methods are provided for data driven analysis, modeling, and semi-supervised machine learning for qualitative and quantitative determinations. The systems and methods include obtaining data associated with individuals, and determining features associated with the individuals based on the data and similarities among the individuals based on the features. The systems and methods can label some individuals as exemplary, generate a graph wherein nodes of the graph represent individuals, edges of the graph represent similarity among the individuals, and nodes associated labeled individuals are weighted. The disclosed system and methods can apply a weight to unweighted nodes of the graph based on propagating the labels through the graph where the propagation is based on influence exerted by the weighted nodes on the unweighted nodes. The disclosed systems and methods can provide output associated with the individuals represented on the graph and the associated weights.Type: GrantFiled: December 18, 2020Date of Patent: June 6, 2023Assignee: INCLUDED HEALTH, INC.Inventors: Seiji James Yamamoto, Ranjit Chacko
-
Patent number: 11664011Abstract: A method of providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word and selecting a mel spectral embedding for the text utterance. Each word has at least one syllable and each syllable has at least one phoneme. For each phoneme, the method further includes using the selected mel spectral embedding to: (i) predict a duration of the corresponding phoneme based on corresponding linguistic features associated with the word that includes the corresponding phoneme and corresponding linguistic features associated with the syllable that includes the corresponding phoneme; and (ii) generate a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame represents mel-spectral information of the corresponding phoneme.Type: GrantFiled: February 9, 2022Date of Patent: May 30, 2023Assignee: Google LLCInventors: Robert Andrew James Clark, Chun-an Chan, Vincent Ping Leung Wan
-
Patent number: 11645620Abstract: A method, system and computer-readable storage medium for performing a counterfactual generation operation. The counterfactual generation operation includes: receiving a subject data point; classifying the data point via a trained classifier, the classifying providing a classified data point; identifying a counterfactual using the classified data point, the counterfactual comprising another datapoint, the another data point being close to the subject data point, the another data point resulting in production of a different outcome when provided to a model when compared to an outcome resulting from the subject data point being provided to the model; and, providing the counterfactual to a destination.Type: GrantFiled: October 18, 2019Date of Patent: May 9, 2023Assignee: Tecnotree Technologies, Inc.Inventors: Joydeep Ghosh, Shubham Sharma, Jessica Henderson, Matthew Sanchez
-
Patent number: 11646021Abstract: According to one embodiment, an apparatus for processing a voice signal includes a display configured to display an image of a user or a character corresponding to the user, a microphone, a speaker configured to output a voice signal of the user, a memory configured to store a trained voice age conversion model, and a processor configured to, based on changing an age of the user or the character displayed on the display, control the display such that the display displays the user or the character corresponding to the changed age. The processor is further configured to determine a first age that is a current age of the user or the character based on the voice signal of the user inputted through the microphone. Accordingly, convenience of a user may be enhanced.Type: GrantFiled: April 16, 2020Date of Patent: May 9, 2023Assignee: LG ELECTRONICS INC.Inventors: Siyoung Yang, Yongchul Park, Sungmin Han, Sangki Kim, Juyeong Jang, Minook Kim
-
Patent number: 11632346Abstract: A device, such as a head-mounted wearable device (HMWD), provides audible notifications to a user with a voice user interface (VUI). A filtered subset of notifications addressed to the user, such as notifications from contacts associated with specified applications, are processed by a text to speech system that generates audio output for presentation to the user. The audio output may be presented using the HMWD. For example, the audio output generated from a text message received from a contact may be played on the device. The user may provide an input to play the notification again, initiate a reply, or take another action. The input may comprise a gesture on a touch sensor, activation of a button, verbal input acquired by a microphone, and so forth.Type: GrantFiled: December 12, 2019Date of Patent: April 18, 2023Assignee: Amazon Technologies, Inc.Inventors: Abinash Mahapatra, Anuj Saluja, Ouning Zhang, Xinyu Miao, Ting Liu, Yanina Potashnik, Alfred Ying Fai Lui, Choon-Mun Hooi, Jeffrey John Easter, Oliver Huy Doan, Jonathan B. Assayag
-
Patent number: 11623347Abstract: A monitor is installed in an eye of a robot, and an eye image is displayed on the monitor. The robot extracts a feature quantity of an eye of a user from a filmed image of the user. The feature quantity of the eye of the user is reflected in the eye image. For example, a feature quantity is a size of a pupillary region and a pupil image, and a form of an eyelid image. Also, a blinking frequency or the like may also be reflected as a feature quantity. Familiarity with respect to each user is set, and which user's feature quantity is to be reflected may be determined in accordance with the familiarity.Type: GrantFiled: May 23, 2019Date of Patent: April 11, 2023Assignee: GROOVE X, INC.Inventor: Kaname Hayashi
-
Patent number: 11610582Abstract: Methods and systems are presented for translating informal utterances into formal texts. Informal utterances may include words in abbreviation forms or typographical errors. The informal utterances may be processed by mapping each word in an utterance into a well-defined token. The mapping from the words to the tokens may be based on a context associated with the utterance derived by analyzing the utterance in a character-by-character basis. The token that is mapped for each word can be one of a vocabulary token that corresponds to a formal word in a pre-defined word corpus, an unknown token that corresponds to an unknown word, or a masked token. Formal text may then be generated based on the mapped tokens. Through the processing of informal utterances using the techniques disclosed herein, the informal utterances are both normalized and sanitized.Type: GrantFiled: March 26, 2020Date of Patent: March 21, 2023Assignee: PayPal, Inc.Inventors: Sandro Cavallari, Yuzhen Zhuo, Van Hoang Nguyen, Quan Jin Ferdinand Tang, Gautam Vasappanavara
-
Patent number: 11611649Abstract: Provided is a computer implemented method and system for delivering text messages, emails, and messages from a messenger application to a user while the user is engaged in an activity, such as driving, exercising, or working. Typically, the emails and other messages are announced to the user and read aloud without any user input. In Drive Mode, while the user is driving, a clean interface is shown to the user, and the user can hear announcements and messages/emails aloud without looking at the screen of the phone, and use gestures to operate the phone. After a determination is made that a new text message and/or email has arrived, the user is informed aloud of the text message/email/messenger message and in most instances, and if the user takes no further action, the body and/or subject of the text message/email/messenger message is read aloud to the user. All messages can be placed in a single queue, and read to the user in order of receipt.Type: GrantFiled: March 21, 2022Date of Patent: March 21, 2023Assignee: MESSAGELOUD INCInventor: Garin Toren
-
Patent number: 11600259Abstract: Provided are a voice synthesis method, an apparatus, a device, and a storage medium, involving obtaining text information and determining characters in the text information and a text content of each of the characters; performing a character recognition on the text content of each of the characters, to determine character attribute information of each of the characters; obtaining speakers in one-to-one correspondence with the characters according to the character attribute information of each of the characters, where the speakers are pre-stored pronunciation object having the character attribute information; and generating multi-character synthesized voices according to the text information and the speakers corresponding to the characters of the text information. These improve pronunciation diversities of different characters in the synthesized voices, improve an audience's discrimination between different characters in the synthesized voices, and thereby improve experience of a user.Type: GrantFiled: September 10, 2019Date of Patent: March 7, 2023Inventor: Jie Yang
-
Patent number: 11586835Abstract: An apparatus, method, and computer readable medium for generating and displaying a dynamic language translation overlay that include accessing a frame buffer of the GPU, analyzing, in the frame buffer of the GPU, a frame representing a section of a stream of displayed data that is being displayed by a display device, based on the analyzed frame, identifying a reference patch that includes an instruction to identify an object comprising original text, based on the instruction included in the reference patch, recognizing the original text, generating translated text, generating an overlay comprising an augmentation layer, the augmentation layer including the translated text, and overlaying the overlay, onto the displayed data such that the translated text is viewable while the original text is obscured from view.Type: GrantFiled: February 18, 2022Date of Patent: February 21, 2023Assignee: MOBEUS INDUSTRIES, INC.Inventors: Dharmendra Etwaru, Michael R. Sutcliff
-
Patent number: 11587548Abstract: Presented herein are novel approaches to synthesize video of the speech from text. In a training phase, embodiments build a phoneme-pose dictionary and train a generative neural network model using a generative adversarial network (GAN) to generate video from interpolated phoneme poses. In deployment, the trained generative neural network in conjunction with the phoneme-pose dictionary convert an input text into a video of a person speaking the words of the input text. Compared to audio-driven video generation approaches, the embodiments herein have a number of advantages: 1) they only need a fraction of the training data used by an audio-driven approach; 2) they are more flexible and not subject to vulnerability due to speaker variation; and 3) they significantly reduce the preprocessing, training, and inference times.Type: GrantFiled: April 2, 2021Date of Patent: February 21, 2023Assignee: Baidu USA LLCInventors: Sibo Zhang, Jiahong Yuan, Miao Liao, Liangjun Zhang
-
Patent number: 11580955Abstract: A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.Type: GrantFiled: March 31, 2021Date of Patent: February 14, 2023Assignee: Amazon Technologies, Inc.Inventors: Yixiong Meng, Roberto Barra Chicote, Grzegorz Beringer, Zeya Chen, Jie Liang, James Garnet Droppo, Chia-Hao Chang, Oguz Hasan Elibol
-
Patent number: 11580952Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.Type: GrantFiled: April 22, 2020Date of Patent: February 14, 2023Assignee: Google LLCInventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
-
Patent number: 11580970Abstract: A method, an electronic device and computer readable medium for dialogue breakdown detection are provided. The method includes obtaining a verbal input from an audio sensor. The method also includes generating a reply to the verbal input. The method additionally includes identifying a local context from the verbal input and a global context from the verbal input, additional verbal inputs previously received by the audio sensor, and previous replies generated in response to the additional verbal inputs. The method further includes identifying a dialogue breakdown in response to determining that the reply does not correspond to the local context and the global context. In addition, the method includes generating sound corresponding to the reply through a speaker when the dialogue breakdown is not identified.Type: GrantFiled: March 23, 2020Date of Patent: February 14, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: JongHo Shin, Alireza Dirafzoon, Aviral Anshu
-
Patent number: 11580963Abstract: A speech generation method and apparatus are disclosed. The speech generation method includes obtaining, by a processor, a linguistic feature and a prosodic feature from an input text, determining, by the processor, a first candidate speech element through a cost calculation and a Viterbi search based on the linguistic feature and the prosodic feature, generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element, and outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search.Type: GrantFiled: October 14, 2020Date of Patent: February 14, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Jangsu Lee, Hoshik Lee, Jehun Jeon