Abstract: An electronic apparatus includes a touch screen display, a microphone disposed at least one speaker, a wireless communication circuit, a processor, and a memory. The memory stores instructions that, when executed, cause the processor to receive a first user utterance input, to transmit first data associated with the first user utterance input to an external server, to receive a first response, to provide the first sample utterances, to receive a first user input for selecting one of the first sample utterances, to transmit second data associated with the first user input to the external server, and to perform the second task by causing the electronic apparatus to have a sequence of states. The first user utterance input includes a request for performing a first task. The first response includes first sample utterances indicating a second task.
Type:
Grant
Filed:
April 30, 2018
Date of Patent:
February 2, 2021
Assignee:
Samsung Electronics Co., Ltd.
Inventors:
Woo Up Kwon, Jung Hoe Kim, Kyu Young Kim, Hyun Jin Park, Kyoung Gu Woo, In Jong Rhee, Dong Ho Jang
Abstract: In some examples, natural language processing based sign language generation may include ascertaining a speech video that is selected by a user, and determining, based on application of natural language processing to contents of the speech video, a plurality of sentences included in the speech video. For each sentence of the plurality of sentences identified in the speech video, a sign language sentence type, a sign language sentence structure, and a sentiment may be determined. For each sign language sentence structure and based on a corresponding sentiment, a sign video may be determined. Based on the sign video determined for each sentence of the plurality of sentences identified in the speech video, a combined sign video may be generated.
Abstract: A computer operable method is described for transforming phonemes, graphemes, and other language structures into interactive elements. The method may comprise, receiving a word, wherein the word consists of a group of phonemes; forming a group of graphemes, wherein the group of graphemes is constructed using information relating to the group of phonemes; and forming a group of manipulatives, wherein the group of manipulatives is constructed using information relating to the group of phonemes or the group of graphemes.
Abstract: A technique is described herein for transforming audio content into images. The technique may include: receiving the audio content from a source; converting the audio content into a temporal stream of audio features; and converting the stream of audio features into one or more images using one or more machine-trained models. The technique generates the image(s) based on recognition of: semantic information that conveys one or more semantic topics associated with the audio content; and sentiment information that conveys one or more sentiments associated with the audio content. The technique then generates an output presentation that includes the image(s), which it provides to one or more display devices for display thereat. The output presentation serves as a summary of salient semantic and sentiment-related characteristics of the audio content.
Abstract: Methods and systems for providing a correct voice command. One system includes a communication device that includes an electronic processor configured to receive a first voice command via a microphone and analyze the first voice command using a first type of voice recognition. The electronic processor determines that an action to be performed in accordance with the first voice command is unrecognizable based on the analysis using the first type of voice recognition. The electronic processor transmits the first voice command to a remote electronic computing device accompanying a request requesting that the first voice command be analyzed using a second type of voice recognition different from the first type of voice recognition. The electronic processor receives, from the remote electronic computing device, a second voice command corresponding to the action and different from the first voice command, and outputs, with a speaker, the second voice command.
Type:
Grant
Filed:
November 13, 2018
Date of Patent:
January 5, 2021
Assignee:
MOTOROLA SOLUTIONS, INC.
Inventors:
Ming Yeh Koh, Hee Tat Goey, Bing Qin Lim, Yan Pin Ong
Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums for determining states of content characteristics of electronic messages. In some embodiments, the probability of the states of the content characteristics of electronic messages are determined. Some embodiments determine a scores for states of content characteristics. Some embodiments determine a score for electronic messages for content characteristic diversity and inclusion based on a probability of a gender-bias state, a probability of a gender-neutral state, and a probability of not applicable to gender-bias state or gender-neutral state. In some embodiments the probabilities are determined based on a natural language model that is trained with data structures that relate training phrases to states of content characteristics.
Abstract: An information processing apparatus includes a processor that executes an information processing program to operate as a text obtaining unit that obtains a text indicating an issue occurred in an electronic apparatus, a keyword extracting unit that tokenizes the text, and filters words obtained by tokenizing the text to extract a keyword, a maintenance-information obtaining unit that obtains a maintenance-information-item or a plurality of maintenance-information-items in association with the keyword from a database that stores a plurality of maintenance-information-items, each of the plurality of maintenance-information-item being information about a solution to an issue occurred in the electronic apparatus, and a maintenance-information providing unit that provides the obtained maintenance-information-item or the plurality of obtained maintenance-information-items to a user.
Abstract: A method of generating text from speech using video-speech matching from a primary user is disclosed herein. The method requires first receiving a video and audio input. The video and audio inputs are then segmented into a plurality of video and audio features, respectively. The plurality of video and audio features are then matched according to their similarities. A primary speaker is then determined from one of the matched video and audio features. The primary speaker's matched video and audio features are then used to generate a text representative of the primary speaker's speech.
Type:
Grant
Filed:
February 20, 2019
Date of Patent:
December 29, 2020
Assignee:
Valyant Al, Inc.
Inventors:
Robley Carpenter, II, Benjamin Thielker
Abstract: Systems, devices, and methods are described for reducing degradation of a voice recognition input. An always listening device may always be listening for voice commands via a microphone and may experience interference from unwanted audio such as from the output audio of television speakers. The always listening device may receive data associated with the output audio over a first communications channel. The always listening device may also receive, on a second communications channel, timing information associated with data. The always listening device may adjust admission of the audio received by the microphone to enable it to arrive at approximately the same time as the data received via the first communications channel. The unwanted output audio included in the audio received via the microphone may then be determined and may be removed so that a voice command in the audio received by the microphone may be processed.
Type:
Grant
Filed:
January 25, 2019
Date of Patent:
December 15, 2020
Assignee:
Comcast Cable Communications, LLC
Inventors:
Ross Gilson, Michael Sallas, Scott David Kurtz, Gary Skrabutenas, Christopher Stone
Abstract: A response sentence generation apparatus includes a conversion device for converting an input voice of a user into text information, an extraction device for extracting prosodic information from the input voice, a specifying device for specifying an emotion occurrence word indicating an occurrence of an emotion of the user based on the text information and the prosodic information, and a generation device for selecting a character string including the specified emotion occurrence word from the text information and generating a response sentence by performing predetermined processing on the selected character string.
Abstract: Methods, apparatuses, and computer program products are described herein that are configured to express a time in an output text. In some example embodiments, a method is provided that comprises identifying a time period to be described linguistically in an output text. The method of this embodiment may also include identifying a communicative context for the output text. The method of this embodiment may also include determining one or more temporal reference frames that are applicable to the time period and a domain defined by the communicative context. The method of this embodiment may also include generating a phrase specification that linguistically describes the time period based on the descriptor that is defined by a temporal reference frame of the one or more temporal reference frames. In some examples, the descriptor specifies a time window that is inclusive of at least a portion of the time period to be described linguistically.
Abstract: An audio signal encoding method is provided comprising: receiving first and second audio signal frames; processing a second portion of the first audio signal frame and a first portion of the second audio signal frame using an orthogonal transformation to determine in part a first intermediate encoding result; and processing the first intermediate encoding result using an orthogonal transformation to determine a set of spectral coefficients that corresponds to at least a portion of the first audio signal frame.
Type:
Grant
Filed:
April 30, 2018
Date of Patent:
November 24, 2020
Assignee:
DTS, Inc.
Inventors:
Michael M. Goodwin, Antonius Kalker, Albert Chau
Abstract: Candidate identification and matching for professional positions, and associated systems and methods are disclosed herein. A representative method includes obtaining first key phrase groups based on textual input, converting the first key phrase groups into vectors defined in accordance with a collection of key phrases, generating a set of topics based on the vectors, generating second key phrase groups based on an association between individual topics of the set of topics and the collection of key phrases, and identifying documentation associated with one or more candidates for a professional position based on the second key phrase groups.
Abstract: An approach is provided that receives, from a user, an amalgamation at a digital assistant. The amalgamation includes one or more words spoken by the user that are captured by a digital microphone and a set of digital images corresponding to one or more gestures that are performed by the user with the digital images captured by a digital camera. The system then determines an action that is responsive to the amalgamation and then performs the determined action.
Type:
Grant
Filed:
October 19, 2018
Date of Patent:
November 10, 2020
Assignee:
International Business Machines Corporation
Inventors:
Jeremy R. Fox, Gregory J. Boss, Kelley Anders, Sarbajit K. Rakshit
Abstract: An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.
Type:
Grant
Filed:
September 27, 2018
Date of Patent:
November 3, 2020
Assignee:
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Inventors:
Markus Multrus, Christian Neukam, Markus Schnell, Benjamin Schubert
Abstract: A method of encoding an audio signal is provided comprising: applying multiple different time-frequency transformations to an audio signal frame; computing measures of coding efficiency across multiple frequency bands for multiple time-frequency resolutions; selecting a combination of time-frequency resolutions to represent the frame at each of the multiple frequency bands based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size; determining a modification transformation; windowing the frame using the determined window size; transforming the windowed frame using the determined transform size; modifying a time-frequency resolution within a frequency band of the transform of the windowed frame using the determined modification transformation.
Type:
Grant
Filed:
April 30, 2018
Date of Patent:
October 27, 2020
Assignee:
DTS, Inc.
Inventors:
Michael M. Goodwin, Antonius Kalker, Albert Chau
Abstract: In one implementation, a content processing system includes a computing platform having a hardware processor and a system memory storing a content classification software code, a natural language processor, and a computer vision analyzer. The hardware processor executes the content classification software code to receive content inputs from multiple content sources, and, for each content input, to parse the content input for metadata describing the content input, obtain a description of language-based content included in the content input from the natural language processor, and obtain a description of visual content included in the content input from the computer vision analyzer.
Type:
Grant
Filed:
November 13, 2018
Date of Patent:
October 20, 2020
Assignee:
Disney Enterprises, Inc.
Inventors:
Concetta Maratta, Brian Kennedy, Zachary Toback, Jared L. Wiener, Fabian Westerwelle
Abstract: The invention discloses systems and methods for enhancing the sound of vocal utterances of interest in an acoustically cluttered environment. The system generates canceling signals (sound suppression signals) for an ambient audio environment and identifies and characterizes desired vocal signals and hence a vocal stream or multiple streams of interest. Each canceling signal, or collectively, the noise canceling stream, is processed so that signals associated with the desired audio stream or streams are dynamically removed from the canceling stream. This modified noise canceling stream is combined (electronically or acoustically) with the ambient to effectuate a destructive interference of all ambient sound except for the removed audio streams, thus “enhancing” the vocal streams with respect to the unwanted ambient sound. Cepstral analysis may be used to identify a fundamental frequency associated with a voiced human utterance.
Abstract: A system and method for providing a voice assistant including receiving, at a first device, a first audio input from a user requesting a first action; performing automatic speech recognition on the first audio input; obtaining a context of user; performing natural language understanding based on the speech recognition of the first audio input; and taking the first action based on the context of the user and the natural language understanding.
Abstract: A sound collection apparatus includes: a sound collection unit including a microphone configured to collect sound; a noise determination unit configured to determine noise in dictation based on voice collected by the sound collection unit; and a presentation unit configured to perform presentation based on a determination result by the noise determination unit. With this configuration, presentation is performed to indicate environmental noise at sound collection for dictation, which leads to improved efficiency of dictation work.
Type:
Grant
Filed:
May 2, 2018
Date of Patent:
September 22, 2020
Assignee:
OLYMPUS CORPORATION
Inventors:
Kazutaka Tanaka, Osamu Nonaka, Kazuhiko Osa