Patents Examined by Abul K. Azad
  • Patent number: 11527245
    Abstract: Systems and methods are provided herein for avoiding inadvertently trigging a voice assistant with audio played through a speaker. An audio signal is captured by sampling a microphone of the voice assistant at a sampling frequency that is higher than an expected finite sampling frequency of previously recorded audio played through the speaker to generate a voice data sample. A quality metric of the generated voice data sample is calculated by determining whether the generated voice data sample comprises artifacts resulting from previous compression or approximation by the expected finite sampling frequency. Based on the calculated quality metric, it is determined whether the captured audio signal is previously recorded audio played through the speaker. Responsive to the determination that the captured audio signal is previously recorded audio played through the speaker, the voice assistant refrains from being activated.
    Type: Grant
    Filed: April 29, 2020
    Date of Patent: December 13, 2022
    Assignee: Rovi Guides, Inc.
    Inventors: Ankur Anil Aher, Jeffry Copps Robert Jose
  • Patent number: 11521609
    Abstract: A voice command system according to a first disclosure comprises a gateway apparatus having an interface configured to receive a voice command, and a controller configured to perform a registration process of registering a speaker permitted to receive the voice command. The controller is configured to perform an authentication process of rejecting a reception of the voice command when a speaker of the voice command is not registered, and permitting a reception of the voice command when a speaker of the voice command is registered. The controller is configured to perform the authentication process for each voice command.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: December 6, 2022
    Assignee: KYOCERA CORPORATION
    Inventor: Yumiko Yamamoto
  • Patent number: 11521631
    Abstract: An apparatus for selecting one of a first encoding algorithm having a first characteristic and a second encoding algorithm having a second characteristic for encoding a portion of an audio signal to obtain an encoded version of the portion of the audio signal has a first estimator for estimating a first quality measure for the portion of the audio signal, which is associated with the first encoding algorithm, without actually encoding and decoding the portion of the audio signal using the first encoding algorithm. A second estimator is provided for estimating a second quality measure for the portion of the audio signal, which is associated with the second encoding algorithm, without actually encoding and decoding the portion of the audio signal using the second encoding algorithm. The apparatus has a controller for selecting the first or second encoding algorithms based on a comparison between the first and second quality measures.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: December 6, 2022
    Assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
    Inventors: Emmanuel Ravelli, Stefan Doehla, Guillaume Fuchs, Eleni Fotopoulou, Christian Helmrich
  • Patent number: 11514907
    Abstract: The present disclosure is generally directed to the generation of voice-activated data flows in interconnected network. The voice-activated data flows can include input audio signals that include a request and are detected at a client device. The client device can transmit the input audio signal to a data processing system, where the input audio signal can be parsed and passed to the data processing system of a service provider to fulfill the request in the input audio signal. The present solution is configured to conserve network resources by reducing the number of network transmissions needed to fulfill a request.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: November 29, 2022
    Assignee: GOOGLE LLC
    Inventors: Gaurav Bhaya, Ulas Kirazci, Bradley Abrams, Adam Coimbra, Ilya Firman, Carey Radebaugh
  • Patent number: 11501777
    Abstract: The disclosure herein relates to methods and systems for enabling human-robot interaction (HRI) to resolve task ambiguity. Conventional techniques that initiates continuous dialogue with the human to ask a suitable question based on the observed scene until resolving the ambiguity are limited. The present disclosure use the concept of Talk-to-Resolve (TTR) which initiates a continuous dialogue with the user based on visual uncertainty analysis and by asking a suitable question that convey the veracity of the problem to the user and seek guidance until all the ambiguities are resolved. The suitable question is formulated based on the scene understanding and the argument spans present in the natural language instruction. The present disclosure asks questions in a natural way that not only ensures that the user can understand the type of confusion, the robot is facing; but also ensures minimal and relevant questioning to resolve the ambiguities.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: November 15, 2022
    Assignee: Tata Consultancy Services Limited
    Inventors: Chayan Sarkar, Pradip Pramanick, Snehasis Banerjee, Brojeshwar Bhowmick
  • Patent number: 11501770
    Abstract: Provided is a system, server, and method for speech recognition capable of collectively setting a plurality of setting items for device control through an utterance of a single sentence provided in the form of natural language. The system includes: a home appliance configured to receive a speech command that is generated through an utterance of a single sentence for control of the home appliance; and a server configured to receive the speech command in the single sentence from the home appliance and interpret the speech command of the single sentence through multiple intent determination.
    Type: Grant
    Filed: August 29, 2018
    Date of Patent: November 15, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Eun Jin Chun, Woo Cheol Shin, Nam Gook Cho, Young Soo Do, Min Hyung Lee, Pil Soo Lee
  • Patent number: 11501776
    Abstract: Disclosed herein is a system for facilitating accomplishing tasks based on a natural language conversation. Accordingly, the system may include a direct graph unit. Further, the direct graph unit may include a directed graph. Further, the directed graph models a non-linearity of the natural language conversation. Further, the directed graph may include a set of nodes connected by at least one edge. Further, the system may include a context-encoded language understanding unit may include a learning unit and an inferring unit. Further, the learning unit may be configured for receiving a plurality of inputs. Further, the learning unit may be configured for generating a model based on the plurality of inputs. Further, the inferring unit may be configured for receiving a plurality of inputs. Further, the inferring unit may be configured for generating an output based on the plurality of inputs and the model.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: November 15, 2022
    Assignee: KOSMOS AI TECH INC
    Inventor: An Wei
  • Patent number: 11488586
    Abstract: Disclosed is a system for speech recognition text enhancement fusing multi-modal semantic invariance, the system includes an acoustic feature extraction module, an acoustic down-sampling module, an acoustic feature extraction module, an acoustic down-sampling module, an encoder and a decoder fusing multi-modal semantic invariance; the acoustic feature extraction module is configured for frame-dividing processing of speech data, dividing the speech data into short-term audio frames with a fixed length, extracting thank acoustic features from the short-term audio frames, and inputting the acoustic features into the acoustic down-sampling module for down-sampling to obtain an acoustic representation; inputting the speech data into an existing speech recognition module to obtain input text data, and inputting the input text data into the encoder to obtain an input text encoded representation; inputting the acoustic representation and the input text encoded representation into the decoder to fuse.
    Type: Grant
    Filed: July 19, 2022
    Date of Patent: November 1, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shuai Zhang, Jiangyan Yi
  • Patent number: 11487503
    Abstract: The present invention discloses an interactive control method executed during instant video communication between a user and one or more other users. The method comprises: monitoring video information collected by a camera during the instant video communication between the user and the one or more other users; performing recognition on the video information after acquiring the video information, to acquire user behavior data inputted by the user in a preset manner; determining whether the user behavior data comprises preset trigger information; when it is determined that the user behavior data comprises the preset trigger information, further determining whether the user behavior data comprises a preset gesture action; and when it is determined that the user behavior data comprises the preset gesture action, determining an operation instruction corresponding to the preset gesture action in a preset operation instruction set, and performing an event corresponding to the operation instruction.
    Type: Grant
    Filed: June 11, 2020
    Date of Patent: November 1, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventor: Feng Li
  • Patent number: 11475885
    Abstract: Methods for mapping intents to utterances using a three-tiered system is provided. Methods may include receiving a plurality of predetermined action-topic pairs and a plurality of predetermined intents. Methods may include mapping the plurality of predetermined action-topic pairs to the plurality of predetermined intents via a one-to-many mapping. Methods may include receiving a linguistic utterance at a first tier of the three-tiered system. Methods may include translating the linguistic utterance at the first tier of the three-tiered system. Methods may include mapping the textual representation to one or more action-topic pairs included in the plurality of action-topic pairs. The mapping may be executed at the second tier of the three-tiered system. Methods may include identifying one or more intents that correlate to the textual representation. The identifying may be executed at the third tier. The identifying may be based on the mapping between the action-topics pairs and the predetermined intents.
    Type: Grant
    Filed: May 26, 2020
    Date of Patent: October 18, 2022
    Assignee: Bank of America Corporation
    Inventors: Isaac Persing, Emad Noorizadeh
  • Patent number: 11475890
    Abstract: Training and/or utilizing a single neural network model to generate, at each of a plurality of assistant turns of a dialog session between a user and an automated assistant, a corresponding automated assistant natural language response and/or a corresponding automated assistant action. For example, at a given assistant turn of a dialog session, both a corresponding natural language response and a corresponding action can be generated jointly and based directly on output generated using the single neural network model. The corresponding response and/or corresponding action can be generated based on processing, using the neural network model, dialog history and a plurality of discrete resources. For example, the neural network model can be used to generate a response and/or action on a token-by-token basis.
    Type: Grant
    Filed: June 24, 2020
    Date of Patent: October 18, 2022
    Assignee: GOOGLE LLC
    Inventors: Arvind Neelakantan, Daniel Duckworth, Ben Goodrich, Vishaal Prasad, Chinnadhurai Sankar, Semih Yavuz
  • Patent number: 11465290
    Abstract: A robot capable of conversation with another robot and a method of controlling the same are disclosed. The robot includes a main body having a first region corresponding to a human face and rotatable in left-right direction directions, a signal generator generating a first data signal to be transmitted to a listener robot and a first robot voice signal corresponding to the first data signal, a communication unit transmitting the first data signal to an external server, a speaker outputting the first robot voice signal, and a controller controlling a rotation direction of the main body such that the first region is directed toward the listener robot at a time point adjacent to a transmission time of the first data signal and controlling the speaker to output the first robot voice signal after the rotation direction of the robot is controlled, wherein the listener robot receives the first data signal transmitted from the external server and is controlled to operate based on the first data signal.
    Type: Grant
    Filed: August 29, 2019
    Date of Patent: October 11, 2022
    Assignee: LG ELECTRONICS INC.
    Inventors: Ji Yoon Park, Jungkwan Son
  • Patent number: 11461560
    Abstract: A device includes a memory adapted to store a list in a file or database comprising a plurality of vocabulary words in a first language and, for each vocabulary word, a corresponding word in a second language, a display device, and a processor. The processor is adapted to receive a plurality of words in the first language, select one or more words among the plurality of words, based on one or more predetermined criteria, translate, match or equate the one or more selected words from the first language to words of the second language, and cause the display device to display the plurality of words, wherein one or more first words that are in the plurality of words and are not among the one or more selected words which are displayed in the first language and one or more second words that are in the plurality of words and are among the one or more selected words are displayed in the second language.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: October 4, 2022
    Inventor: Robert F. Deming, Jr.
  • Patent number: 11430458
    Abstract: A Unified Speech and Audio Codec (USAC) that may process a window sequence based on mode switching is provided. The USAC may perform encoding or decoding by overlapping between frames based on a folding point when mode switching occurs. The USAC may process different window sequences for each situation to perform encoding or decoding, and thereby may improve a coding efficiency.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: August 30, 2022
    Assignees: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KAWNGWOON UNIVERSITY INDUSTRY-ACADEMIC COLLABORATION FOUNDATION
    Inventors: Seungkwon Beack, Tae Jin Lee, Min Je Kim, Kyeongok Kang, Dae Young Jang, Jeongil Seo, Jin Woo Hong, Chieteuk Ahn, Ho Chong Park, Young-cheol Park
  • Patent number: 11417354
    Abstract: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus for voice activity detection (VAD). The VAD comprises creating a signal indicative of a primary VAD decision and determining hangover addition. The determination on hangover addition is made in dependence of a short term activity measure and/or a long term activity measure. A signal indicative of a final VAD decision is then created.
    Type: Grant
    Filed: February 18, 2020
    Date of Patent: August 16, 2022
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventor: Martin Sehlstedt
  • Patent number: 11417331
    Abstract: The present disclosure provides a method for controlling a terminal, including the following operations: obtaining recognition results corresponding to control signals after receiving the control signals, and determining whether control instructions corresponding to the recognition results conflict, each control signal comprising at least one of a voice signal or a gesture signal; determining a credibility of each control instruction in response to a determination that there exists conflict among control instructions; and sending the control instruction with highest credibility to a control terminal. The present disclosure further provides a device for controlling a terminal and a computer readable storage medium. When control instructions are received and there exists conflict among control instructions, the control instruction with the highest credibility is sent to the control terminal after the credibility of each control instructions is determined, thereby avoiding settings from conflict.
    Type: Grant
    Filed: March 6, 2020
    Date of Patent: August 16, 2022
    Assignees: GD MIDEA AIR-CONDITIONING EQUIPMENT CO., LTD., MIDEA GROUP CO., LTD.
    Inventors: Zhicai Ou, Weiying Li
  • Patent number: 11410651
    Abstract: Network source identification via audio signals is provided. A system receives data packets with an input audio signal from a client device. The system identifies a request. The system selects a digital component provided by a digital component provider device. The system identifies audio chimes stored in memory of the client device. The system matches, based on a policy, an identifier of the digital component provider device to a first audio chime stored in the memory of the client device. The system determines, based on a characteristic of the first audio chime, a configuration to combine the digital component with the first audio chime. The system generates an action data structure with the digital component, an indication of the first audio chime, and the configuration. The system transmits the action data structure to the client device to cause the client device to generate an output audio signal.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: August 9, 2022
    Assignee: GOOGLE LLC
    Inventor: Peter Kraker
  • Patent number: 11410643
    Abstract: A computer-implemented method of responding to a conversational event. The method comprises enacting, by a conversational computing interface, an initial computer-executable plan based on a conversational event received by the conversational computing interface, wherein the initial computer-executable plan is configured to output an initial value based on the conversational event. The method further comprises selecting, by the conversational computing interface, an extended computer-executable plan based on determining that the initial value is insufficient for generating an extended description responsive to the conversational event. The method further comprises enacting, by the conversational computing interface, the extended computer-executable plan to output additional information beyond what the initial computer-executable plan is configured to output, the additional information sufficient for generating the extended description responsive to the conversational event.
    Type: Grant
    Filed: October 18, 2019
    Date of Patent: August 9, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jacob Daniel Andreas, Jayant Sivarama Krishnamurthy, Alan Xinyu Guo, Andrei Vorobev, John Philip Bufe, III, Jesse Daniel Eskes Rusak, Yuchen Zhang
  • Patent number: 11410664
    Abstract: An apparatus for estimating an inter-channel time difference between a first channel signal and a second channel signal, includes: a calculator for calculating a cross-correlation spectrum for a time block from the first channel signal in the time block and the second channel signal in the time block; a spectral characteristic estimator for estimating a characteristic of a spectrum of the first channel signal or the second channel signal for the time block; a smoothing filter for smoothing the cross-correlation spectrum over time using the spectral characteristic to obtain a smoothed cross-correlation spectrum; and a processor for processing the smoothed cross-correlation spectrum to obtain the inter-channel time difference.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: August 9, 2022
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Stefan Bayer, Eleni Fotopoulou, Markus Multrus, Guillaume Fuchs, Emmanuel Ravelli, Markus Schnell, Stefan Doehla, Wolfgang Jaegers, Martin Dietz, Goran Markovic
  • Patent number: 11404062
    Abstract: Provided is a voice assistance system with proactive routines that couples a remote server and respective user voice interactive devices to deliver a complete experience to the end user of the device. The user devices can be managed by groups and/or associated entities who manage voice services for their users. For example, the entities can provide pre-configured voice routines that perform actions on behalf of their users. The voice assistance system can also allow users to customize these routines to improve day to day operation. In addition, external services and/or providers can be linked to the system and allowed to define routines that have external system dependencies. Avoiding and managing conflicts in this environment becomes quite challenging. Some approaches use execution queues and priority, others invoke time slices and limitations on assignment of routines to time slices to resolve these issues, among other examples.
    Type: Grant
    Filed: July 26, 2021
    Date of Patent: August 2, 2022
    Assignee: LifePod Solutions, Inc.
    Inventors: Nirmalya K. De, Alan R. Bugos, Dale M. Smith, Stuart R. Patterson, Jonathan E. Gordon