Abstract: Techniques for initiating system actions based on conversational content are disclosed. A system identifies a first conversational moment type. The first conversational moment type is defined by a first set of one or more conversational conditions. The system receives a user-selected action to be performed by the system in response to detecting conversational moments of the first conversational moment type. The system stores the user-selected action in association with the first conversational moment type. The system performs the user-selected action in response to detecting the conversational moments of the first conversational moment type.
Abstract: An autonomously motile device may be controlled by speech received by a user device. A first speech-processing system associated with the user device may determine that audio data includes a representation of a command; a second speech-processing system associated with the autonomously motile device may determine that the command should be executed by the autonomously motile device. A network connection is established between the user device and the autonomously motile device, and a device manager authorizes execution of the command.
Type:
Grant
Filed:
December 9, 2019
Date of Patent:
August 16, 2022
Assignee:
Amazon Technologies, Inc.
Inventors:
Anil Kumar Katta, Amy Marie Whitberg, Xiaoqing Jing, Swetha Bijoy, Swati S. Rao, Robert Franklin Ebert
Abstract: According to one embodiment, a dialogue system includes a setting apparatus and a processing apparatus. The setting apparatus sets in advance a plurality of words that are in impossible combination relationships to each other. The processing apparatus acquires speech of a user, and when a speech recognition result of an object included in the speech includes a word combination included in the plurality of words that are in impossible combination relationships to each other, output a notification to the user that processing of the object cannot be carried out.
Abstract: This disclosure proposes systems and methods employing dynamic skill endpoints by allowing skills to register themselves with a language processing system. The language processing system allows the skill system to open a persistent network connection to the language processing system. This connection does not require the machine(s) running the skill system to have an Internet routable address; rather the skill system can contact the language processing system, which can remain at a static address, through any local routers or firewalls which may block connections from being initiated from outside the local area network. This registration opens the connection between the skill system and the language processing system. When the language processing system receives a skill invocation request indicating the skill, the language processing system can check its registry for a dynamic endpoint corresponding to the skill, and route the request over the network connection to the registered endpoint.
Abstract: An audio encoder for encoding an audio signal includes: a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor includes: a time frequency converter for converting the first audio signal portion into a frequency domain representation having spectral lines up to a maximum frequency of the first audio signal portion; a spectral encoder for encoding the frequency domain representation; a second encoding processor for encoding a second different audio signal portion in the time domain; a cross-processor for calculating, from the encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, so that the second encoding processing is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal; a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the firs
Type:
Grant
Filed:
March 1, 2019
Date of Patent:
August 9, 2022
Inventors:
Sascha Disch, Martin Dietz, Markus Multrus, Guillaume Fuchs, Emmanuel Ravelli, Matthias Neusinger, Markus Schnell, Benjamin Schubert, Bernhard Grill
Abstract: In an audio signal, one or more processing circuits recognize spoken content in a user's own speech signal using speech recognition and natural language understanding. The spoken content describes a listening difficulty of the user. The one or more processing circuits generate, based on the spoken content, one or more actions for hearing devices and feedback for the user. The one or more actions attempt to resolve the listening difficulty. Additionally, the one or more processing circuits convert the user feedback to verbal feedback using speech synthesis and transmit the one or more actions and the verbal feedback to the hearing devices via a body-worn device. The hearing devices are configured to perform the one or more actions and play back the verbal feedback to the user.
Type:
Grant
Filed:
November 15, 2018
Date of Patent:
August 9, 2022
Assignee:
Starkey Laboratories, Inc.
Inventors:
Tao Zhang, Eric Durant, Dean G. Meyer, Martin McKinney, Matthew D. Kleffner, Dominic Perz, Karrie Recker
Abstract: Audio data saved at the end of client interactions are sampled, analyzed for pauses in speech, and sliced into stretches of acoustic data containing human speech between those pauses. The acoustic data are accompanied by machine transcripts made by VoiceAI. A suitable distribution of data useful for training and testing are stipulated during data sampling by applying certain filtering criteria. The resulting datasets are sent for transcription by a human transcriber team. The human transcripts are retrieved, some post-transcription processing and cleaning are performed, and the results are added to datastores for training and testing an acoustic model.
Type:
Grant
Filed:
October 29, 2019
Date of Patent:
August 9, 2022
Assignee:
Dialpad, Inc.
Inventors:
Eddie Yee Tak Ma, James Palmer, Kevin James, Etienne Manderscheid
Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.
Abstract: A communication terminal is communicable with a conversion system. The communication terminal includes circuitry configured to: receive a selection of one of a first mode and a second mode, the first mode being a mode in which audio data obtained based on sound collected by a sound collecting device is converted into text data, the second mode being a mode in which audio data obtained based on sound to be output from a sound output device is converted into text data, the audio data being relating to content obtained during an event being conducted; transmit, to the conversion system, audio data corresponding to selected one of the first mode and the second mode; receive, from the conversion system, text data converted from the transmitted audio data; and control a display to display text based on the received text data.
Abstract: A conversational system that recognizes, understands, and acts on multiple intents that may be explicit or implicit during conversations with humans. During a conversation, one or more utterances are received and processed through a plurality of machine learning algorithms to establish precise meanings, additional intentions, and alternative hypothesis. Using a combination of machine learning algorithms and datastores, conversations are interpreted as intended and may diverge where needed or desired, delivering a more useful, natural, and human-like dialogue between machines and people.
Type:
Grant
Filed:
January 13, 2021
Date of Patent:
July 19, 2022
Assignee:
ARTIFICIAL SOLUTIONS IBERIA S.L
Inventors:
Eric Aili, Ramazan Gurbuz, Andreas Wieweg
Abstract: A system and method are disclosed for setting up a communication link between a device or application and a system with a controller. The controller can collect and send information to the application. A user interfaces with the controller to access the functionality of the application through providing commands to the controller. The system allows the user to interface with multiple applications.
Type:
Grant
Filed:
April 19, 2019
Date of Patent:
July 19, 2022
Assignee:
SoundHound, Inc.
Inventors:
Timothy P. Stonehocker, Kathleen Worthington McMahon
Abstract: A method for playing voice, which is applied to a webcast server, the method comprising: receiving voice data sent by at least one first electronic device for obtaining a voice data set, the first electronic device having a first preset authority, and the voice data set comprising at least one piece of the voice data; receiving audio-video data sent by a second electronic device, the second electronic device having a second preset authority, the audio-video data comprising the voice data selected for playback, wherein the voice data selected for playback comprises any voice data of the voice data set clicked for playback; pushing the audio-video data to each first electronic device.
Type:
Grant
Filed:
September 4, 2019
Date of Patent:
July 5, 2022
Assignee:
BEIJING DAJIA INTERNET INFORMATION TECHNOLOGY CO., LTD
Abstract: A method, device, system, and computer medium for providing interactive advertising are provided. For example, a device may request an advertisement from a remote server, receive the advertisement, receive a response from a user who is listening and/or watching the advertisement, and transmit the response to the server for further action. The user may input a response by speaking. A server may receive an advertisement request from the device, select an advertisement based on pre-defined one or more criteria, transmit the selected advertisement to the device for play, receive from the device a response to the selected advertisement, and then perform an action corresponding to the received response.
Type:
Grant
Filed:
November 14, 2018
Date of Patent:
June 28, 2022
Assignee:
XAPPMEDIA, INC.
Inventors:
Patrick B. Higbie, John P. Kelvie, Michael M. Myers, Franklin D. Raines
Abstract: Disclosed are a speech processing method and a speech processing apparatus in a 5G communication environment through speech processing by executing embedded artificial intelligence (AI) algorithms and/or machine learning algorithms. The speech processing method includes determining a temporary pause of reception of a first spoken utterance, outputting a first spoken response utterance as a result of speech recognition processing of a second spoken utterance received after the temporary pause, determining, as an extension of the first spoken utterance, a third spoken utterance that is received after outputting the first spoken response utterance, deleting, using a deep neural network model, a duplicate utterance part from a fourth spoken utterance that is obtained by combining the first and the third spoken utterance, and outputting a second spoken response utterance as a result of speech recognition processing of the fourth spoken utterance from which the duplicate utterance part has been deleted.
Abstract: A user, such as an elderly person, may be assisted by an assistance device in his or her caregiving environment that operates in conjunction with one or more server computers. The assistance device may execute a schedule of assistance actions where each assistance action is associated with a time and is executed at that time to assist the user. An assistance action may present an input request to a user, process a voice input of the user, and analyze the voice input to determine that the voice input corresponds to a negative response event, a positive response event, or a non-response event. Based on the categorization of one or more voice inputs as negative response events, positive response events, or non-response events, it may be determined to notify a caregiver of the user, for example where the user has not responded to a number of assistance actions.
Abstract: The present disclosure discloses a method and apparatus for providing a speech service. A specific embodiment of the method comprises: receiving request information sent by a device, the request information comprising first event information and speech information, the first event information used for indicating a first event occurring on the device when the device sends the request information, wherein the first event information comprises speech input event information used for instructing a user to input the speech information; generating response information comprising an operation instruction for a targeted device on the basis of the first event information and the speech information; and sending the response information to the targeted device for the targeted device to perform an operation indicated by the operation instruction. The embodiment improves the efficiency of providing a speech service.
Abstract: Systems and methods are provided for dynamic grammar augmentation for editing multiple network switch configuration files as a single file. The method includes identifying a first base grammar of a first network switch; identifying a second base grammar of a second network switch; identifying first and second patch grammars for the first and second network switches; generating an augmented grammar based on the first and second patch grammars and the first and second base grammars; identifying a first configuration file for the first network switch; identifying a second configuration file for the second network switch; generating a base merged configuration file, the base merged configuration file representing the first and second configuration files modified according to the augmented grammar.
Type:
Grant
Filed:
November 30, 2018
Date of Patent:
May 24, 2022
Assignee:
Hewlett Packard Enterprise Development LP
Inventors:
Gurraj Atwal, Frank Wood, Shaun Wackerly
Abstract: One embodiment provides a method, including: receiving, at an information handling device comprising a digital personal assistant, a command; determining, utilizing at least one sensor operatively coupled to the information handling device, a source of the command comprises a source to be ignored; and ignoring, responsive to determining the source is to be ignored, the command. Other aspects are described and claimed.
Type:
Grant
Filed:
November 18, 2019
Date of Patent:
April 12, 2022
Assignee:
Lenovo (Singapore) Pte. Ltd.
Inventors:
Russell Speight VanBlon, Roderick Echols, Ryan Charles Knudson, Bradley Park Strazisar, John Polischak
Abstract: An information processing device stores, in a keyword database, keywords extracted from speech sounds picked up by a speech-sound processing device as keywords matching keyword entries in a dictionary database of the speech-sound processing device. The information processing device receives, from the speech-sound processing device, an instruction to update the dictionary database of the speech-sound processing device, and then determines, by inference, words related to the keywords stored in the keyword database, prepares an update of the dictionary database on the basis of the keywords stored in the keyword database and the related words determined by inference, and transmits the update of the dictionary database to the speech-sound processing device.
Abstract: A system and method for determining a target speaker's community of origin from a sound sample of the target speaker is provided. An indexed database of morpheme data from speakers from various communities of origin is provided. Indexed morphemes from a target speaker are extracted. The extracted indexed morphemes from the target speaker are compared against the morpheme data in the indexed database to determine the target speaker's community of origin.