Translation Patents (Class 704/277)

Enhanced graphical user interface for voice communications

Patent number: 12272358

Abstract: Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.

Type: Grant

Filed: February 2, 2023

Date of Patent: April 8, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Sandra Lemon, Nancy Yi Liang
Text reconstruction system and method thereof

Patent number: 12271695

Abstract: A text reconstruction system for reconstructing a primary text data is provided. A voice input signal of a user is converted into an input text data by a speech recognition module. A text classifier module generates one or more tokens and adds the tokens into a word bag corresponding to the user. A text identifier module generates a text corpus based on the input text data. A user profile builder module creates a user profile based on the word bag, the input text data, and the text corpus. A decision module determines, based on the word bag, whether the primary data is to be reconstructed and reconstructs the primary text data to generate a personalized text data based on the user profile.

Type: Grant

Filed: May 19, 2020

Date of Patent: April 8, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Nitesh Laller, Sujit Kumar Sinha
Method and system for smart interaction in a multi voice capable device environment

Patent number: 12254868

Abstract: A system and method for providing a custom response to a voice command of a specific user. The method encompasses receiving, at a transceiver unit [102] from a user device, a custom voice response preference setting associated with the specific user. The method thereafter leads to receiving, at the transceiver unit [102] from a first target device, a voice command of the specific user. The method thereafter encompasses generating, by a processing unit [104], a custom response to the voice command of the specific user based at least on the custom voice response preference setting. Further, the method encompasses identifying, by an identification unit [106], a second target device from one or more devices present in vicinity of the specific user. Thereafter, the method comprises providing, by the processing unit [104], the generated custom response to the voice command of the specific user via the second target device.

Type: Grant

Filed: June 1, 2021

Date of Patent: March 18, 2025

Assignee: JIO PLATFORMS LIMITED

Inventors: Vishal Shashikant Patil, Gulprit Singh, Rajeev Gupta
Anonymization of text transcripts corresponding to user commands

Patent number: 12230260

Abstract: One embodiment provides a method, including: receiving, at an information handling device, text associated with a user command; storing, in a data store, an encrypted form of the text associated with the user command; determining, using a processor, whether the encrypted form of the text has been detected in other user commands in exceedance of a predetermined threshold; and storing, responsive to determining that the encrypted form of the text has been detected in the other user commands in exceedance of the predetermined threshold, an unencrypted transcript of the text in a data table. Other aspects are described and claimed.

Type: Grant

Filed: March 5, 2021

Date of Patent: February 18, 2025

Assignee: Lenovo (Singapore) Pte. Ltd.

Inventors: John Weldon Nicholson, Igor Stolbikov, David Alexander Schwarz
Collaborative work on translations in industrial system projects

Patent number: 12197904

Abstract: An industrial integrated development environment (IDE) supports collaborative editing of translation tables used to facilitate rendering of the system project text in different defined languages. Rather than merging edits by comparing text representations of the edited translation tables on a line-by-line basis, the system expresses the edited and base versions of the translation tables as information models that represent the translation table versions as hierarchical organizations of nodes representing content of the tables, and compares corresponding nodes of the information models to obtain differential statuses for the nodes. The various versions of the nodes are then merged into a single consistent model based on the differential statuses of the nodes.

Type: Grant

Filed: September 23, 2022

Date of Patent: January 14, 2025

Assignee: ROCKWELL AUTOMATION TECHNOLOGIES, INC.

Inventors: Valerio Guarnieri, Alessandro Menon
Translation method, electronic device and storage medium

Patent number: 12197882

Abstract: A translation method, an electronic device and a storage medium, which relate to the field of artificial intelligence technologies, such as machine learning technologies, information processing technologies, are disclosed. An implementation includes: acquiring an intermediate translation result generated by each of multiple pre-trained translation models for a to-be-translated specified sentence in a same iteration of a translation process, so as to obtain multiple intermediate translation results; acquiring a co-occurrence word based on the multiple intermediate translation results; and acquiring a target translation result of the specified sentence based on the co-occurrence word.

Type: Grant

Filed: August 10, 2022

Date of Patent: January 14, 2025

Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Ruiqing Zhang, Xiyang Wang, Zhongjun He, Zhi Li, Hua Wu
Media system with closed-captioning data and/or subtitle data generation features

Patent number: 12198700

Abstract: In one aspect, an example method includes (i) obtaining media, wherein the obtained media includes (a) audio representing speech and (b) video; (ii) using at least the audio representing speech as a basis to generate speech text; (iii) using at least the audio representing speech to determine starting and ending time points of the speech; and (iv) using at least the generated speech text and the determined starting and ending time points of the speech to (a) generate closed-captioning or subtitle data that includes closed-captioning or subtitle text based on the generated speech text and (b) associating the generated closed-captioning or subtitle data with the obtained media, such that the closed-captioning or subtitle text is time-aligned with the video based on the determined starting and ending time points of the speech.

Type: Grant

Filed: June 2, 2023

Date of Patent: January 14, 2025

Assignee: Roku, Inc.

Inventors: Snehal Karia, Greg Garner, Sunil Ramesh
Display device and artificial intelligence system

Patent number: 12190876

Abstract: A display device according to an embodiment of the present invention may include: a display unit which displays a content image; a microphone which receives a voice command of a user; a network interface unit for communicating with a natural language processing server and a search server; and a control unit which transmits the received voice command to the natural language processing server, receives intent analysis result information that indicates the intent of the user, which corresponds to the voice command, from the natural language processing server, and performs a function of the display device according to the received intent analysis result information.

Type: Grant

Filed: September 27, 2019

Date of Patent: January 7, 2025

Assignee: LG ELECTRONICS INC.

Inventors: Sangseok Lee, Jaekyung Lee
Determining multilingual content in responses to a query

Patent number: 12118981

Abstract: Implementations relate to determining multilingual content to render at an interface in response to a user submitted query. Those implementations further relate to determining a first language response and a second language response to a query that is submitted to an automated assistant. Some of those implementations relate to determining multilingual content that includes a response to the query in both the first and second languages. Other implementations relate to determining multilingual content that includes a query suggestion in the first language and a query suggestion in a second language. Some of those implementations relate to pre-fetching results for the query suggestions prior to rendering the multilingual content.

Type: Grant

Filed: September 15, 2021

Date of Patent: October 15, 2024

Assignee: GOOGLE LLC

Inventors: Wangqing Yuan, Bryan Christopher Horling, David Kogan
Methods for emotion classification in text

Patent number: 12112134

Abstract: The technology relates to methods for detecting and classifying emotions in textual communication, and using this information to suggest graphical indicia such as emoji, stickers or GIFs to a user. Two main types of models are fully supervised models and few-shot models. In addition to fully supervised and few-shot models, other types of models focusing on the back-end (server) side or client (on-device) side may also be employed. Server-side models are larger-scale models that can enable higher degrees of accuracy, such as for use cases where models can be hosted on cloud servers where computational and storage resources are relatively abundant. On-device models are smaller-scale models, which enable use on resource-constrained devices such as mobile phones, smart watches or other wearables (e.g., head mounted displays), in-home devices, embedded devices, etc.

Type: Grant

Filed: January 24, 2022

Date of Patent: October 8, 2024

Assignee: GOOGLE LLC

Inventors: Dana Movshovitz-Attias, John Patrick McGregor, Jr., Gaurav Nemade, Sujith Ravi, Jeongwoo Ko, Dora Demszky
Automated domain-specific constrained decoding from speech inputs to structured resources

Patent number: 12094459

Abstract: Methods, systems, and computer program products for automated domain-specific constrained decoding from speech inputs to structured resources are provided herein.

Type: Grant

Filed: January 5, 2022

Date of Patent: September 17, 2024

Assignee: International Business Machines Corporation

Inventors: Ashish R Mittal, Samarth Bharadwaj, Shreya Khare, Karthik Sankaranarayanan
Display device and artificial intelligence server

Patent number: 12087296

Abstract: A display device according to an embodiment of the present disclosure includes an output unit, a communication unit configured to perform communication with an artificial intelligence server, and a control unit configured to receive a voice command, convert the received voice command into text data, determine whether the converted text data is composed of a plurality of languages, when the text data is composed of the plurality of languages, determine a language for a voice recognition service among the plurality of languages based on the text data, and output an intent analysis result of the voice command in the determined language.

Type: Grant

Filed: September 19, 2019

Date of Patent: September 10, 2024

Assignee: LG ELECTRONICS INC.

Inventors: Changmin Kwak, Jaekyung Lee
Speech generation using crosslingual phoneme mapping

Patent number: 12080271

Abstract: Computer generated speech can be generated for cross-lingual natural language textual data streams by utilizing a universal phoneme set. In a variety of implementations, the natural language textual data stream includes a primary language portion in a primary language and a secondary language portion that is not in the primary language. Phonemes corresponding to the secondary language portion can be determined from a set of phonemes in a universal data set. These phonemes can be mapped back to a set of phonemes for the primary language. Audio data can be generated for these phonemes to pronounce the secondary language portion of the natural language textual data stream utilizing phonemes associated with the primary language.

Type: Grant

Filed: August 26, 2022

Date of Patent: September 3, 2024

Assignee: GOOGLE LLC

Inventors: Ami Patel, Siamak Tazari
Apparatus, method, and program product for performing a transcription action

Patent number: 12080296

Abstract: Apparatuses, methods, and program products are disclosed for performing a transcription action. One apparatus includes at least one processor and a memory that stores code executable by the at least one processor. The code is executable by the processor to monitor, by use of the at least one processor, a quality of audio information. The code is executable by the processor to determine whether the quality of the audio information is below a predetermined threshold. The code is executable by the processor to, in response to determining that the quality of the audio information is below the predetermined threshold, perform a transcription action corresponding to the audio information.

Type: Grant

Filed: March 16, 2021

Date of Patent: September 3, 2024

Assignee: Lenovo (Singapore) Pte. Ltd.

Inventors: John C. Mese, Arnold S. Weksler, Mark Patrick Delaney, Nathan J. Peterson, Russell Speight VanBlon
Electronic device and control method therefor

Patent number: 12079540

Abstract: An electronic device to perform an operation corresponding to a first user voice when the first user voice is received through the microphone; store, information about an operation corresponding to the first user voice and user reaction information including the user command when a user command is received through the input unit within the first threshold time from when the first user voice is received, or from when the operation corresponding to the first user voice is performed; perform an operation corresponding to a second user voice when the second user voice is received through the microphone; and provide guide information corresponding to the user command on the basis of the user reaction information stored in the memory when the type of operation corresponding to the first user voice is the same as the type of operation corresponding to the second user voice.

Type: Grant

Filed: October 12, 2021

Date of Patent: September 3, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Youngsoo Yun
Determining lexical difficulty in textual content

Patent number: 12061873

Abstract: Techniques performed by a data processing system for analyzing the lexical difficulty of words of textual content include analyzing a plurality of textual content sources to determine a first frequency at which each of a plurality of first words appears, analyzing search data to determine a second frequency at which each of the plurality of first words appear in searches for a definition, generating a lexical difficulty model based on the first frequency and the second frequency, the model is configured to receive a word as an input and to output a prediction for how difficult the word is likely to be for a user, receiving a request to analyze first textual content from a client device, analyzing the first textual content using the lexical difficulty model to generate lexical difficulty information, and sending a response to the client device that includes requested information.

Type: Grant

Filed: October 30, 2020

Date of Patent: August 13, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Priyanka Subhash Kulkarni, Robert Rounthwaite
Voice recognition based on neural networks

Patent number: 12057110

Abstract: An information processing method applied to a computation circuit is disclosed. The computation circuit includes a communication circuit and an operation circuit. The method includes controlling, by the computation circuit, the communication circuit to obtain a voice to be identified input by a user; controlling, by the computation circuit, the operation circuit to obtain and call an operation instruction to perform voice identification processing on the voice to be identified to obtain target text information corresponding to the voice to be identified. The operation instruction is a preset instruction for voice identification.

Type: Grant

Filed: December 11, 2020

Date of Patent: August 6, 2024

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD.

Inventors: Tianshi Chen, Shaoli Liu, Zai Wang, Shuai Hu
Using speech recognition to improve cross-language speech synthesis

Patent number: 11990117

Abstract: A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.

Type: Grant

Filed: October 20, 2021

Date of Patent: May 21, 2024

Assignee: Google LLC

Inventors: Zhehuai Chen, Bhuvana Ramabhadran, Andrew Rosenberg, Yu Zhang, Pedro J. Moreno Mengibar
Device for recognizing speech input of user and operating method thereof

Patent number: 11984126

Abstract: A device for recognizing a speech input and an operating method thereof are provided. The device may be configured to: obtain one or more text candidates comprising a character string in which it is predicted that the speech input is to be converted by recognizing a speech input using an automatic speech recognition (ASR) model; extract text history information corresponding to the speech input from a database by comparing the speech input with a plurality of speech signals previously stored in the database; and perform training to adjust a weight of each of the one or more text candidates using the extracted text history information. Also, a method in which the device recognizes a speech input using an AI model may be performed.

Type: Grant

Filed: August 10, 2021

Date of Patent: May 14, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyungtak Choi, Jongyoub Ryu
Medical information translation system

Patent number: 11978541

Abstract: Medical information is communicated between different entities. Personalized models of peoples' understanding of a medical field are created. Role information is used to assign user-appropriate ontologies. More information than mere role may be used for assigning ontologies, such as information on past medical history. The concepts and relationships in different ontologies may be linked, providing a translation from one personalized model to another. The terminology with similar or the same concepts and/or relationships is output for a given user based on their model.

Type: Grant

Filed: August 9, 2019

Date of Patent: May 7, 2024

Assignee: CERNER INNOVATION, INC.

Inventor: John D. Haley
Anaphoric reference resolution using natural language processing and machine learning

Patent number: 11977852

Abstract: A device configured to receive a sentence that includes a plurality of words. The device is further configured to input the words into a machine learning model that is configured to output a first feature vector based on the words. The device is further configured to identify a keyword within the sentence and to determine that the keyword is an implicit reference to an item. The device is further configured to identify a second feature vector in a reference list that closest matches a numeric value of the first feature vector and to identify an explicit reference in the reference list that is associated with the second feature vector. The device is further configured to replace the keyword with the explicit reference in the sentence and to output the sentence that includes the first explicit reference.

Type: Grant

Filed: January 12, 2022

Date of Patent: May 7, 2024

Assignee: Bank of America Corporation

Inventors: Aaron Michael Hosford, Donatus E. Asumu, Emad Noorizadeh, Ramakrishna Reddy Yannam
Lexicon development via shared translation database

Patent number: 11972227

Abstract: A speech translation system and methods for cross-lingual communication that enable users to improve and customize content and usage of the system and easily. The methods include, in response to receiving an utterance including a first term associated with a field, translating the utterance into a second language. In response to receiving an indication to add the first term associated with the field to a first recognition lexicon, adding the first term associated with the field and the determined translation to a first machine translation module and to a shared database for a community associated with the field of the first term associated with the field, wherein the first term associated with the field added to the shared database is accessible by the community.

Type: Grant

Filed: December 7, 2021

Date of Patent: April 30, 2024

Assignee: Meta Platforms, Inc.

Inventors: Alexander Waibel, Ian R. Lane
Conversation-based foreign language learning method using reciprocal speech transmission through speech recognition function and TTS function of terminal

Patent number: 11967248

Abstract: A method for foreign language learning between a learner and a terminal, based on video or audio containing foreign language, particularly, to a conversation-based foreign language learning method using a speech recognition function and a TTS function of a terminal, a learner learns a foreign language in a way that: the terminal reads a current learning target sentence to the learner to allow the learner to speak the current learning target sentence after the terminal, when speech input by the learner in a speech waiting state of the terminal is the same as the current learning target sentence or belongs to the same category as the current learning target sentence; and the terminal and the learner alternately speak sentences one-by-one when the speech input by the learner is the same as the next sentence of the current learning target sentence or belongs to the same category as the next sentence.

Type: Grant

Filed: December 12, 2019

Date of Patent: April 23, 2024

Inventor: Jangho Lee
Minimum word error rate training for attention-based sequence-to-sequence models

Patent number: 11922932

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Type: Grant

Filed: March 31, 2023

Date of Patent: March 5, 2024

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
Text-based virtual object animation generation method, apparatus, storage medium, and terminal

Patent number: 11908451

Abstract: A text-based virtual object animation generation includes acquiring text information, where the text information includes an original text of a virtual object animation to be generated; analyzing an emotional feature of the text information; performing speech synthesis according to the emotional feature, a rhyme boundary, and the text information to obtain audio information, where the audio information includes emotional speech obtained by conversion based on the original text; and generating a corresponding virtual object animation based on the text information and the audio information, where the virtual object animation is synchronized in time with the audio information.

Type: Grant

Filed: August 9, 2021

Date of Patent: February 20, 2024

Assignees: Mofa (Shanghai) Information Technology Co., Ltd., Shanghai Movu Technology Co., Ltd.

Inventors: Congyi Wang, Yu Chen, Jinxiang Chai
Speech translation method and terminal when translated speech of two users are obtained at the same time

Patent number: 11893359

Abstract: This application discloses an audio processing method and a terminal. The method may include: collecting, by a first terminal, an original speech of a first user, translating the original speech of the first user into a translated speech of the first user, receiving an original speech of a second user that is sent by a second terminal, and translating the original speech of the second user into a translated speech of the second user; sending at least one of the original speech of the first user, the translated speech of the first user, and the translated speech of the second user to the second terminal based on a first setting; and playing at least one of the original speech of the second user, the translated speech of the second user, and the translated speech of the first user based on a second setting.

Type: Grant

Filed: April 14, 2021

Date of Patent: February 6, 2024

Assignee: Huawei Technologies Co., Ltd.

Inventors: Xin Zhang, Gan Zhao
System for minimizing repetition in intelligent virtual assistant conversations

Patent number: 11868732

Abstract: This disclosure describes techniques and architectures for evaluating conversations. In some instances, conversations with users, virtual assistants, and others may be analyzed to identify potential risks within a language model that is employed by the virtual assistants and other entities. The potential risks may be evaluated by administrators, users, systems, and others to identify potential issues with the language model that need to be addressed. This may allow the language model to be improved and enhance user experience with the virtual assistants and others that employ the language model.

Type: Grant

Filed: August 8, 2022

Date of Patent: January 9, 2024

Assignee: Verint Americas Inc.

Inventors: Cynthia Freeman, Ian Beaver
Determining adequacy of documentation using perplexity and probabilistic coherence

Patent number: 11853707

Abstract: Technologies are provided for determining deficiencies in narrative textual data that may impact decision-making in a decisional context. A candidate text document and a reference corpus of text may be utilized to generate one or more topic models and document-term matrices, and then to determine a corresponding statistical perplexity and probabilistic coherence. Statistical determinations of a degree to which the candidate deviates from the reference normative corpus are determined, in terms of the statistical perplexity and probabilistic coherence of the candidate as compared to the reference. If the difference is statistically significant, a message may be reported to user, such as the author or an auditor of the candidate text document, so that the user has the opportunity to amend the candidate document so as to improve its adequacy for the decisional purposes in the context at hand.

Type: Grant

Filed: October 13, 2021

Date of Patent: December 26, 2023

Assignee: Cerner Innovation, Inc.

Inventor: Douglas S. McNair
Natural language generation

Patent number: 11847424

Abstract: Devices and techniques are generally described for data-to-text generation. In various examples, a first machine learned model may receive first data including a structured representation of linguistic data. In various examples, the first machine learned model may generate first output data comprising a first natural language representation of the first data. In at least some examples, a second machine learning model may determine second data indicating that the first natural language representation is a semantically accurate representation of the first data. In some examples, the first output data may be selected for output based at least in part on the second data.

Type: Grant

Filed: March 20, 2020

Date of Patent: December 19, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Hamza Harkous, Isabel Groves, Amir Reza Safari Azar Alamdari
Vehicle device setting method

Patent number: 11847857

Abstract: A vehicle device setting method including: capturing, by an image sensing unit, a first image frame; recognizing a user ID according to the first image frame; showing ID information of the recognized user ID on a screen or by a speaker; capturing a second image frame; generating a confirm signal when a first user expression is recognized by calculating an expression feature in the second image frame and comparing the recognized expression feature with stored expression data associated with a predetermined user expression to confirm whether the recognized user ID is correct or not according to the second image frame captured after the ID information is shown; controlling an electronic device according to the confirm signal; and entering a data update mode instructed by the user and updating setting information of the electronic device by current electronic device setting according to a saving signal generated by confirming a second user expression in a third image frame captured after the user ID is confirmed

Type: Grant

Filed: November 24, 2021

Date of Patent: December 19, 2023

Assignee: PIXART IMAGING INC.

Inventors: Liang-Chi Chiu, Yu-Han Chen, Ming-Tsan Kao
Translation method and electronic device

Patent number: 11843716

Abstract: A translation method includes: a first electronic device establishes a call connection to a second electronic device and then displays a call interface; after receiving a first operation of a first user, the first electronic device switches from displaying the call to displaying a translation interface; when receiving a first speech of the first user in a first language, the translation interface sequentially displays at least a first text and a second text, where the first text is obtained by recognizing the first speech, and the second text is obtained by translating the first speech into a target language; and the first electronic device sends a machine speech in the target language to the second electronic device.

Type: Grant

Filed: June 28, 2022

Date of Patent: December 12, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Hejin Gu, Long Wang
Wearable speech input-based vision to audio interpreter

Patent number: 11830494

Abstract: An eyewear device with camera-based compensation that improves the user experience for user's having partial blindness or complete blindness. The camera-based compensation determines features, such as objects, and then converts the determined objects to audio that is indicative of the objects and that is perceptible to the eyewear user. The camera-based compensation may use a region-based convolutional neural network (RCNN) to generate a feature map including text that is indicative of objects in images captured by a camera. The feature map is then processed through a speech to audio algorithm featuring a natural language processor to generate audio indicative of the objects in the processed images.

Type: Grant

Filed: December 20, 2022

Date of Patent: November 28, 2023

Assignee: Snap Inc.

Inventor: Stephen Pomes
Data storage server with on-demand media subtitles

Patent number: 11818406

Abstract: A network-attached storage device (NAS) includes a non-volatile memory module storing a media stream, a network interface, and control circuitry coupled to the non-volatile memory module and to the network interface and configured to connect to a client over a network connection using the network interface, receive a request for the media stream from the client, determine subtitle preferences associated with the request for the media stream, access an audio stream associated with the media stream, generate subtitles based on the audio stream, and send a transport stream to the client over the network connection, the transport stream including the media stream and the subtitles.

Type: Grant

Filed: July 23, 2020

Date of Patent: November 14, 2023

Assignee: Western Digital Technologies, Inc.

Inventor: Ramanathan Muthiah
Answer prediction in a speech processing system

Patent number: 11798538

Abstract: This disclosure relates to answer prediction in a speech processing system. The system may disambiguate entities spoken or implied in a request to initiate an action with respect to a target user. To initiate the action, the system may determine one or more parameters; for example, the target (e.g., a contact/recipient), a source (e.g., a caller/requesting user), and a network (voice over internet protocol (VOIP), cellular, video chat, etc.). Due to the privacy implications of initiating actions involving data transfers between parties, the system may apply a high threshold for a confidence associated with each parameter. Rather than ask multiple follow-up questions, which may frustrate the requesting user, the system may attempt to disambiguate or determine a parameter, and skip a question regarding the parameter if it can predict an answer with high confidence. The system can improve the customer experience while maintaining security for actions involving, for example, communications.

Type: Grant

Filed: September 21, 2020

Date of Patent: October 24, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Christopher Geiger Parker, Piyush Bhargava, Aparna Nandyal, Rajagopalan Ranganathan, Mugunthan Govindaraju, Vidya Narasimhan
Image processing system for controlling an image forming apparatus with a microphone

Patent number: 11792338

Abstract: An image forming system is configured to receive an input of natural language speech. Regardless of whether the natural language speech includes a combination of first words or second words, the image forming system can recognize the natural language speech as an instruction to select a specific print setting displayed on a screen.

Type: Grant

Filed: August 27, 2021

Date of Patent: October 17, 2023

Assignee: Canon Kabushiki Kaisha

Inventors: Toru Takahashi, Yuji Naya, Takeshi Matsumura
Controllable style-based text transformation

Patent number: 11734509

Abstract: Methods, systems and computer program products for multi-style text transformation are provided herein. A computer-implemented method includes selecting at least one set of style specifications for transforming at least a portion of input text. The at least one set of style specifications include one or more target writing style domains selected from a plurality of writing style domains, weights for each of the target writing style domains representing relative impact of the target writing style domains for transformation of at least a portion of the input text, and weights for each of a set of linguistic aspects for transformation of at least a portion of the input text. The computer-implemented method also includes generating one or more style-transformed output texts based at least in part on the at least one set of style specifications utilizing at least one unsupervised neural network.

Type: Grant

Filed: December 29, 2020

Date of Patent: August 22, 2023

Assignee: International Business Machines Corporation

Inventors: Abhijit Mishra, Parag Jain, Amar P. Azad, Karthik Sankaranarayanan
Electronic apparatus and control method thereof

Patent number: 11721333

Abstract: The disclosure relates to an artificial intelligence (AI) system using a learned AI model according to at least one of machine learning, neural network, or a deep learning algorithm and applications thereof. In the disclosure, a control method of an electronic apparatus is provided. The control method comprises the steps of: displaying an image including at least one object receiving a voice; inputting the voice to an AI model learned by an AI algorithm to identify an object related to the voice among the at least one object included in the image and acquire tag information about the identified object; and providing the obtained tag information.

Type: Grant

Filed: January 11, 2019

Date of Patent: August 8, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Younghwa Lee, Jinhe Jung, Meejeong Park, Inchul Hwang
Method, system and apparatus for multilingual and multimodal keyword search in a mixlingual speech corpus

Patent number: 11721329

Abstract: In the present invention, a method for searching multilingual keywords in mixlingual speech corpus is proposed. This method is capable of searching audio as well as text keywords. The capability of audio search enables it to search out-of-vocabulary (OOV) words. The capability of searching text keywords enables it to perform semantic search. An advanced application of searching keyword translations in mixlingual speech corpus is also possible within posteriorgram framework with this system. Also, a technique for combining information from text and audio keywords is given which further enhances the search performance. This system is based on multiple posteriorgrams based on articulatory classes trained with multiple languages.

Type: Grant

Filed: September 10, 2018

Date of Patent: August 8, 2023

Assignees: Indian Institute of Technology, Delhi, Centre for Development of Telematics

Inventors: Arun Kumar, Abhimanyu Popli
Interface-providing apparatus and interface-providing method

Patent number: 11705122

Abstract: According to one embodiment, the interface-providing apparatus comprises an identifying unit and a generating unit. The identifying unit identifies a keyword from dialogue data including a question text to request information, and a response text in reply thereto. The generating unit generates display information to display a user interface for receiving feedback input relating to a degree of usefulness of a keyword when searching for the requested information.

Type: Grant

Filed: August 31, 2020

Date of Patent: July 18, 2023

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Kenji Iwata, Hiroshi Fujimura, Takami Yoshida
Triggering voice control disambiguation

Patent number: 11694682

Abstract: In various embodiments, a voice command is associated with a plurality of processing steps to be performed. The plurality of processing steps may include analysis of audio data using automatic speech recognition, generating and selecting a search query from the utterance text, and conducting a search of database of items using a search query. The plurality of processing steps may include additional or different steps, depending on the type of the request. In performing one or more of these processing steps, an error or ambiguity may be detected. An error or ambiguity may either halt the processing step or create more than one path of actions. A model may be used to determine if and how to request additional user input to attempt to resolve the error or ambiguity. The voice-enabled device or a second client device is then causing to output a request for the additional user input.

Type: Grant

Filed: December 11, 2019

Date of Patent: July 4, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Julia Reinspach, Oleg Rokhlenko, Ramakanthachary Gottumukkala, Giovanni Clemente, Ankit Agrawal, Swayam Bhardwaj, Guy Michaeli, Vaidyanathan Puthucode Krishnamoorthy, Costantino Vlachos, Nalledath P. Vinodkrishnan, Shaun M. Vickers, Sethuraman Ramachandran, Charles C. Moore
Data generation apparatus and data generation method that generate recognition text from speech data

Patent number: 11694028

Abstract: According to one embodiment, the data generation apparatus includes a speech synthesis unit, a speech recognition unit, a matching processing unit, and a dataset generation unit. The speech synthesis unit generates speech data from an original text. The speech recognition unit generates a recognition text by speech recognition from the speech data. The matching processing unit performs matching between the original text and the recognition text. The dataset generation unit generates a dataset in such a manner where the speech data, from which the recognition text satisfying a certain condition for a matching degree relative to the original text is generated, is associated with the original text, based on a matching result.

Type: Grant

Filed: August 31, 2020

Date of Patent: July 4, 2023

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Hiroshi Fujimura, Kenji Iwata, Hui Di, Pengfei Chen
Post-filtering of named entities with machine learning

Patent number: 11687719

Abstract: A method for identifying errors associated with named entity recognition includes recognizing a candidate named entity within a text and extracting a chunk from the text containing the candidate named entity. The method further includes creating a feature vector associated with the chunk and analyzing the feature vector for an indication of an error associated with the candidate named entity. The method also includes correcting the error associated with the candidate named entity.

Type: Grant

Filed: March 1, 2021

Date of Patent: June 27, 2023

Assignee: LEVERTON HOLDING LLC

Inventors: Christian Schäfer, Michael Kieweg, Florian Kuhlmann
Artificial intelligence apparatus for recognizing speech including multiple languages, and method for the same

Patent number: 11682388

Abstract: An AI apparatus includes a microphone to acquire speech data including multiple languages, and a processor to acquire text data corresponding to the speech data, determine a main language from languages included in the text data, acquire a translated text data obtained by translating a text data portion, which has a language other than the main language, in the main language, acquire a morpheme analysis result for the translated text data, extract a keyword for intention analysis from the morpheme analysis result, acquire an intention pattern matched to the keyword, and perform an operation corresponding to the intention pattern.

Type: Grant

Filed: June 2, 2022

Date of Patent: June 20, 2023

Assignee: LG ELECTRONICS INC

Inventors: Yejin Kim, Hyun Yu, Jonghoon Chae
Robot guidance system

Patent number: 11669107

Abstract: A guidance system S includes a plurality of autonomous mobile robots (1) which guide a user to a destination, and a reception apparatus (2) which is provided separately from the robots (1) and recognizes the destination. Availability of each of the plurality of robots (1) is managed based on a state of the robot and the destination.

Type: Grant

Filed: December 25, 2018

Date of Patent: June 6, 2023

Assignee: HONDA MOTOR CO., LTD.

Inventor: Kenichiro Sugiyama
Translation method, learning method, and non-transitory computer-readable storage medium for storing translation program to translate a named entity based on an attention score using neural network

Patent number: 11669695

Abstract: A translation method, implemented by a computer, includes: converting a text written in a first language into a replacement text in which a named entity in the text is replaced with a predetermined character string; translating the replacement text into a second language by using a text translation model which is a neural network; and translating a named entity corresponding to the predetermined character string in the replacement text into the second language by using a named entity translation model which is a neural network.

Type: Grant

Filed: March 17, 2020

Date of Patent: June 6, 2023

Assignee: FUJITSU LIMITED

Inventors: Akiba Miura, Tomoya Iwakura
Minimum word error rate training for attention-based sequence-to-sequence models

Patent number: 11646019

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Type: Grant

Filed: July 27, 2021

Date of Patent: May 9, 2023

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
Information processing system, information processing apparatus, and computer readable recording medium

Patent number: 11646034

Abstract: An information processing system includes: a first device configured to acquire a user's uttered voice, transfer the user's uttered voice to at least one of a second and a third devices each actualizing a voice interaction agent, when a control command is acquired, convert a control signal based on the acquired control command to a control signal that matches the second device, and transmit the converted control signal to the second device; a second device configured to recognize the uttered voice transferred from the first device, and output, to the first device, a control command regarding a recognition result obtained by recognizing the uttered voice and response data based on the control signal; and a third device configured to recognize the uttered voice transferred from the first device, and output, to the first device, a control command regarding a recognition result obtained by recognizing the uttered voice.

Type: Grant

Filed: January 8, 2021

Date of Patent: May 9, 2023

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor: Satoshi Aihara
Scene annotation using machine learning

Patent number: 11636673

Abstract: A system enhances existing audio-visual content with audio describing the setting of the visual content. A scene annotation module classifies scene elements from an image frame received from a host system and generates a caption describing the scene elements.

Type: Grant

Filed: October 31, 2018

Date of Patent: April 25, 2023

Assignee: SONY INTERACTIVE ENTERTAINMENT INC.

Inventors: Sudha Krishnamurthy, Justice Adams, Arindam Jati, Masanori Omote, Jian Zheng
Devices, systems, and methods for selectively providing contextual language translation

Patent number: 11630961

Abstract: A device includes a memory adapted to store a list in a file or database comprising a plurality of vocabulary words in a first language and, for each vocabulary word, a corresponding word in a second language, a display device, and a processor. The processor is adapted to receive a plurality of words in the first language, select one or more words among the plurality of words, based on one or more predetermined criteria, translate, match or equate the one or more selected words from the first language to words of the second language, and cause the display device to display the plurality of words, wherein one or more first words that are in the plurality of words and are not among the one or more selected words which are displayed in the first language and one or more second words that are in the plurality of words and are among the one or more selected words are displayed in the second language.

Type: Grant

Filed: September 14, 2018

Date of Patent: April 18, 2023

Inventor: Robert F. Deming, Jr.
Dynamic language and command recognition

Patent number: 11626101

Abstract: Systems and methods are described for processing and interpreting audible commands spoken in one or more languages. Speech recognition systems disclosed herein may be used as a stand-alone speech recognition system or comprise a portion of another content consumption system. A requesting user may provide audio input (e.g., command data) to the speech recognition system via a computing device to request an entertainment system to perform one or more operational commands. The speech recognition system may analyze the audio input across a variety of linguistic models, and may parse the audio input to identify a plurality of phrases and corresponding action classifiers. In some embodiments, the speech recognition system may utilize the action classifiers and other information to determine the one or more identified phrases that appropriately match the desired intent and operational command associated with the user's spoken command.

Type: Grant

Filed: October 28, 2021

Date of Patent: April 11, 2023

Assignee: Comcast Cable Communications, LLC

Inventors: George Thomas Des Jardins, Vikrant Sagar

1 2 3 4 5 … next