Speech Recognition (epo) Patents (Class 704/E15.001)

E Subclasses

Assessment or evaluation of speech recognition systems (epo) (Class 704/E15.002)

Language recognition (epo) (Class 704/E15.003)

Feature extraction for speech recognition; selection of recognition unit (epo) (Class 704/E15.004)

Segmentation or word limit detection (epo) (Class 704/E15.005)

Word boundary detection (EPO) (Class 704/E15.006)

Creation of reference templates; training of speech recognition systems, e.g., adaption to the characteristics of the speaker's voice, etc. (epo) (Class 704/E15.007)

Speech classification or search (epo) (Class 704/E15.014)

Speech recognition techniques for robustness in adverse environments, e.g., in noise, of stress induced speech, etc. (epo) (Class 704/E15.039)

Procedures used during a speech recognition process, e.g., man-machine dialogue, etc. (epo) (Class 704/E15.04)

Speech recognition using nonacoustical features, e.g., position of the lips, etc. (epo) (Class 704/E15.041)

Using position of the lips, movement of the lips, or face analysis (EPO) (Class 704/E15.042)

Speech to text systems (epo) (Class 704/E15.043)

Constructional details of speech recognition systems (epo) (Class 704/E15.046)

Providing composite graphical assistant interfaces for controlling various connected devices

Patent number: 12248659

Abstract: Methods, apparatus, systems, and computer-readable media are provided for tailoring composite graphical assistant interfaces for interacting with multiple different connected devices. The composite graphical assistant interfaces can be generated in response to a user providing a request for an automated assistant to cause a connected device to perform a particular function. In response to the automated assistant receiving the request, the automated assistant can identify other connected devices, and other functions capable of being performed by the other connected devices. The other functions can then be mapped to various graphical control elements in order to provide a composite graphical assistant interface from which the user can interact with different connected devices. Each graphical control element can be arranged to reflect how each connected device is operating simultaneous to the presentation of the composite graphical assistant interface.

Type: Grant

Filed: May 23, 2023

Date of Patent: March 11, 2025

Assignee: GOOGLE LLC

Inventors: Yuzhao Ni, David Roy Schairer
Hub for supporting voice relay between wearable terminals in vehicle and method of supporting voice relay by using same

Patent number: 12244392

Abstract: A hub for supporting a voice relay between wearable terminals in a vehicle includes a reception processing device that receives a voice signal from at least one transmission terminal among two or more wearable terminals in the vehicle, a transmission processing device that transmits the voice signal to at least one reception terminal among the wearable terminals, and a controller that establishes a communication channel with each of the wearable terminals, selects a target terminal based on a voice signal obtained from a conversation attempt terminal attempting to transmit voice, and controls the reception processing device and the transmission processing device to relay bidirectional voice information transfer between the conversation attempt terminal and the target terminal.

Type: Grant

Filed: May 24, 2022

Date of Patent: March 4, 2025

Assignees: Hyundai Motor Company, Kia Corporation

Inventor: Yong Sik Cho
Utterance endpointing in task-oriented conversational systems

Patent number: 12243517

Abstract: A task-oriented dialog system determines an endpoint in a user utterance by receiving incremental portions of a user utterance that is provided in real time during a task-oriented communication session between a user and a virtual agent (VA). The task-oriented dialog system recognizes words in the incremental portions using an automated speech recognition (ASR) model and generates semantic information for the incremental portions of the utterance by applying a natural language processing (NLP) model to the recognized words. An acoustic-prosodic signature of the incremental portions of the utterance is generated using an acoustic-prosodic model. The task-oriented dialog system can generate a feature vector that represents the incrementally recognized words, the semantic information, the acoustic-prosodic signature, and corresponding confidence scores of the model outputs. A model is applied to the feature vector to identify a likely endpoint in the user utterance.

Type: Grant

Filed: October 13, 2021

Date of Patent: March 4, 2025

Assignee: Interactions LLC

Inventors: Mahnoosh Mehrabani, Srinivas Bangalore
Method for processing sound used in speech recognition robot

Patent number: 12236969

Abstract: A method for processing sound used in a speech recognition robot is disclosed. The method for processing sound comprises the steps of: recognizing, by a robot, an obstacle on a driving path; calculating, by the robot, a driving distance to the obstacle; calculating a driving speed, by the robot; and determining, by the robot, a point in time at which a transient sound is generated by an impact caused by passing through the obstacle, wherein the point in time at which the transient sound is generated may be determined, by the robot, from the driving distance to the obstacle and the driving speed. The robot can transmit and receive a wireless signal on a mobile communication network established according to 5G (fifth generation) communication.

Type: Grant

Filed: June 18, 2019

Date of Patent: February 25, 2025

Assignee: LG ELECTRONICS INC.

Inventor: Ji Hwan Park
Systems and methods for displaying subjects of a video portion of content

Patent number: 12219215

Abstract: Systems and methods are described herein for displaying subjects of a portion of content. Media data of content is analyzed during playback, and a number of action signatures are identified. Each action signature is associated with a particular subject within the content. The action signature is stored, along with a timestamp corresponding to a playback position at which the action signature begins, in association with an identifier of the particular subject. Upon receiving a command, icons representing each of a number of action signatures at or near the current playback position are displayed. Upon receiving user selection of an icon corresponding to a particular signature, a portion of the content corresponding to the action signature is played back.

Type: Grant

Filed: November 28, 2023

Date of Patent: February 4, 2025

Assignee: Adeia Guides Inc.

Inventors: Gabriel C Dalbec, Nicholas Lovell, Lance G. O'Connor
Active speaker tracking using a global naming scheme

Patent number: 12206715

Abstract: A method includes, at a media bridge configured to distribute a plurality of media streams among a plurality of client devices connected to the media bridge over a network, receiving the plurality of media streams from the plurality of client devices via the media bridge. The media bridge connects the plurality of client devices. The method further includes assigning a pair of names for each of the plurality of media streams. The pair of names include a contribution name and a distribution name. The method further includes presenting a first list to the plurality of client devices. The first list including a plurality of the distribution names for the plurality of media streams received from the plurality of client devices. The method further includes providing an indication of a current active speaker within the plurality of media streams via a signaling process.

Type: Grant

Filed: December 29, 2023

Date of Patent: January 21, 2025

Assignee: Cisco Technology, Inc.

Inventors: Jacques Samain, Giovanna Carofiglio, Giulio Grassi, Enrico Loparco, Michele Papalini
System and method for generating impulse responses using neural networks

Patent number: 12198715

Abstract: A method for generating an impulse response representing a sound wave propagation from at least one sound source received at a listening point in a room includes obtaining the generated impulse response at the listening point in the room from a neural network architecture by providing at least the position of the listening point as input. The generated impulse response is generated using a neural network architecture. The network is trained by obtaining a 3D model of the room including the at least one sound source emitting sound in the room and obtaining a training group of simulated impulse responses, wherein each simulated impulse response is generated for a respective predefined listening point in the 3D model of the virtual room.

Type: Grant

Filed: November 10, 2023

Date of Patent: January 14, 2025

Assignee: TREBLE TECHNOLOGIES

Inventor: Martin Eineborg
Information processing system, information processing method, and storage medium

Patent number: 12183335

Abstract: Provided is an information processing system including: a voice information acquisition unit that acquires voice information including an utterance made by a person; a status acquisition unit that acquires status information related to status of the person; and a support information generation unit that generates support information used for supporting operation of the person based on the voice information and the status information.

Type: Grant

Filed: July 5, 2019

Date of Patent: December 31, 2024

Assignee: NEC CORPORATION

Inventor: Masamichi Tanabe
Unified communications call routing and decision based on integrated analytics-driven database and aggregated data

Patent number: 12166923

Abstract: Exemplary aspects involve a data-communications apparatus or system communicate over a broadband network with a plurality of remotely-located data-communications circuits respectively associated with a plurality of remotely-situated client entities. The system includes data-communications platform (e.g., UC-CC) that processes incoming data-communication interactions including different types of digitally-represented communications among which are incoming call, and that is integrated with a memory circuit including a database of information sets. Each of the information sets includes experience data corresponding to past incoming data-communication interactions processed by the platform, and with aggregated and organized data based on data collected in previous incoming interactions.

Type: Grant

Filed: January 3, 2024

Date of Patent: December 10, 2024

Assignee: 8x8, Inc.

Inventors: Bryan R. Martin, Matt Taylor, Manu Mukerji
Signal processing device, signal processing method, and signal processing system

Patent number: 12148432

Abstract: Provided is a signal processing device including a main speech detection unit that detects, by using a neural network, whether or not a signal input to a sound collection device assigned to each of at least two speakers includes a main speech that is a voice of the corresponding speaker, and outputs frame information indicating presence or absence of the main speech.

Type: Grant

Filed: December 10, 2020

Date of Patent: November 19, 2024

Assignee: SONY GROUP CORPORATION

Inventor: Atsuo Hiroe
Machine learning for improving quality of voice biometrics

Patent number: 12131740

Abstract: Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.

Type: Grant

Filed: June 8, 2023

Date of Patent: October 29, 2024

Assignee: Capital One Services, LLC

Inventors: Bozhao Tan, Isabelle Alice Yvonne Moulinier, David Almquist, June Wu
Enabling an automated assistant to leverage odor sensor(s) of client device(s)

Patent number: 12117463

Abstract: Implementations described herein are directed to leveraging odor sensor(s) of client device(s) in responding to user request(s) and/or in generating notification(s). Processor(s) of a given client device can receive a request to identify an odor in an environment of the given client device, process an odor data instance generated by the odor sensor(s) of the given client device, identify the odor based on processing the odor data instance, generate a response that identifies the odor and/or a source of the odor, and cause the response to the request to be rendered via the given client device. Processor(s) of the given client device can additionally, or alternatively, establish baseline odor(s) in the environment and generate a notification when an odor is detected that does not correspond to the baseline odor(s) and/or exclude the baseline odor(s) in generating the response to the request.

Type: Grant

Filed: December 13, 2021

Date of Patent: October 15, 2024

Assignee: GOOGLE LLC

Inventor: Evan Brown
Speech processing

Patent number: 12087305

Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.

Type: Grant

Filed: May 26, 2023

Date of Patent: September 10, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Beiye Liu, Wael Hamza, Liwei Cai, Konstantine Arkoudas, Chengwei Su, Subendhu Rongali
Reducing streaming ASR model delay with self alignment

Patent number: 12057124

Abstract: A streaming speech recognition model includes an audio encoder configured to receive a sequence of acoustic frames and generate a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The streaming speech recognition model also includes a label encoder configured to receive a sequence of non-blank symbols output by a final softmax layer and generate a dense representation. The streaming speech recognition model also includes a joint network configured to receive the higher order feature representation generated by the audio encoder and the dense representation generated by the label encoder and generate a probability distribution over possible speech recognition hypotheses. Here, the streaming speech recognition model is trained using self-alignment to reduce prediction delay by encouraging an alignment path that is one frame left from a reference forced-alignment frame.

Type: Grant

Filed: December 15, 2021

Date of Patent: August 6, 2024

Assignee: Google LLC

Inventors: Jaeyoung Kim, Han Lu, Anshuman Tripathi, Qian Zhang, Hasim Sak
Method for speech recognition based on language adaptivity and related apparatus

Patent number: 12033621

Abstract: A method for speech recognition based on language adaptivity comprises obtaining voice data of a user. The method also comprises extracting, based on the obtained voice data, a phoneme feature representing pronunciation phoneme information. The phoneme feature is input to a pre-trained language discrimination model that is pre-trained based on a multilingual corpus. A language discrimination result corresponding to the phoneme feature and in accordance with the language discrimination model is obtained. The method also comprises obtaining a speech recognition result of the voice data based on a language acoustic model of a language corresponding to the language discrimination result. The method further comprises determining a speech recognition result of the voice data based on a language acoustic model of a language corresponding to the language discrimination result.

Type: Grant

Filed: April 15, 2021

Date of Patent: July 9, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Dan Su, Tianxiao Fu, Min Luo, Qi Chen, Yulu Zhang, Lin Luo
Post-calibration of large language model confidence scoring via combined techniques

Patent number: 12032919

Abstract: Examples provide a large language model confidence scoring post-calibration based on a combination of temperature scaling, softmax denominator top-k probabilities selection, and polynomial regression. A secure machine learning system receives results generated by a machine learning (ML) model, the results including at least one confidence score. The secure ML system identifies at least one challenge in accuracy of the results generated by the ML model configured to perform document processing and understanding.

Type: Grant

Filed: August 16, 2023

Date of Patent: July 9, 2024

Assignee: Snowflake Inc.

Inventor: Andrzej Szwabe
Digital assistant reference resolution

Patent number: 12027166

Abstract: Systems and processes for operating a digital assistant are provided. An example process for performing a task includes, at an electronic device having one or more processors and memory, receiving a spoken input including a request, receiving an image input including a plurality of objects, selecting a reference resolution module of a plurality of reference resolution modules based on the request and the image input, determining, with the selected reference resolution module, whether the request references a first object of the plurality of objects based on at least the spoken input, and in accordance with a determination that the request references the first object of the plurality of objects, determining a response to the request including information about the first object.

Type: Grant

Filed: August 13, 2021

Date of Patent: July 2, 2024

Assignee: Apple Inc.

Inventors: Hong Yu, Saurabh Adya, Shruti Bhargava, Myra C. Lukens, Jianpeng Cheng, Lin Li, Alkeshkumar M. Patel, Dhivya Piraviperumal, Stephen G. Pulman
Classification of documents

Patent number: 11977841

Abstract: An apparatus includes a display device that displays an input document in a user interface and at least one processor configured to receive a command to determine a document type of the input document and classify the input document to assign at least one document type and a respective confidence score. The processor assigns a significance score to each word of the input document that is indicative of a degree of influence the word has in deciding that the input document is of the at least one document type. The processor determines a level of visual emphasis to be placed on each word of the input document based on the significance score of the word and displays the input document on the display device with each word of the input document visually emphasized in accordance with the determined level of visual emphasis of the word.

Type: Grant

Filed: December 22, 2021

Date of Patent: May 7, 2024

Assignee: Bank of America Corporation

Inventors: Jeremy A. Geiman, Kongkuo Lu, Ron Papka
Signal processing apparatus, learning apparatus, signal processing method, learning method and program

Patent number: 11978471

Abstract: A signal processing device according to an embodiment of the present invention includes: a conversion unit configured to convert an input mixed acoustic signal into a plurality of first internal states, a weighting unit configured to generate a second internal state which is a weighted sum of the plurality of first internal states based on auxiliary information regarding an acoustic signal of a target sound source when the auxiliary information is input, and generate the second internal state by selecting one of the plurality of first internal states when the auxiliary information is not input, and a mask estimation unit configured to estimate a mask based on the second internal state.

Type: Grant

Filed: February 12, 2020

Date of Patent: May 7, 2024

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani
Server and method for providing multilingual subtitle service using artificial intelligence learning model, and method for controlling server

Patent number: 11966712

Abstract: Provided are a server and a method for providing a multilingual subtitle service using an artificial intelligence learning model, and a method for controlling the server. The server includes: a communication unit configured to perform data communication with either or both of a first user terminal device of a client requesting translation of a content image and a second user terminal device of a worker performing a translation task; a storage configured to store a worker search list based on learned worker information, and an artificial intelligence learning model for performing a worker's task performance evaluation; and a controller configured to input image information on the content image to the artificial intelligence learning model in accordance with a worker recommendation command of the client to acquire a worker list of workers capable of translating the content image, and control the communication unit to transmit the acquired worker list to the first user terminal device.

Type: Grant

Filed: July 8, 2021

Date of Patent: April 23, 2024

Assignee: GLOZ INC.

Inventors: Kug Koung Lee, Ho Kyun Kim, Bong Wan Kim
Artificial intelligence for intent-based networking

Patent number: 11968088

Abstract: Example implementations include a method, apparatus, and computer-readable medium configured for generating a network configuration using a large language model (LLM). The apparatus receives, at an interface between a user and LLM, a natural language intent for a network configuration. The apparatus requests the large language model to update the network configuration to an updated network configuration that satisfies the natural language intent in a declarative network configuration language. The apparatus verifies whether the updated network configuration satisfies a configuration syntax of the declarative network configuration language to detect an error. The apparatus requests the large language model to update the updated network configuration to correct the error. The apparatus deploys the updated network configuration to a user network.

Type: Grant

Filed: June 7, 2023

Date of Patent: April 23, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yu Yan, Ryan Andrew Beckett, Paramvir Bahl
Method and system for providing real-time trustworthiness analysis

Patent number: 11955114

Abstract: Disclosed herein is a method for providing real-time trustworthiness analysis. The method comprises the steps of: receiving, by a speech data receiving module, speech data; delivering, by the speech data receiving module, the speech data to a speech analysis module; analyzing, by the speech analysis module, the speech data to identify one or more speech attributes; quantifying, by the speech analysis module, at least one of the speech attributes with an attribute score; and determining, by a trustworthiness determination module, a trustworthiness level based on the attribute score of the at least one of the speech attributes.

Type: Grant

Filed: July 14, 2023

Date of Patent: April 9, 2024

Inventor: Craig Hancock, Sr.
Conversation-aware proactive notifications for a voice interface device

Patent number: 11908445

Abstract: A method for proactive notifications in a voice interface device includes: receiving a first user voice request for an action with an future performance time; assigning the first user voice request to a voice assistant service for performance; subsequent to the receiving, receiving a second user voice request and in response to the second user voice request initiating a conversation with the user; and during the conversation: receiving a notification from the voice assistant service of performance of the action; triggering a first audible announcement to the user to indicate a transition from the conversation and interrupting the conversation; triggering a second audible announcement to the user to indicate performance of the action; and triggering a third audible announcement to the user to indicate a transition back to the conversation and rejoining the conversation.

Type: Grant

Filed: May 16, 2022

Date of Patent: February 20, 2024

Assignee: Google LLC

Inventors: Kenneth Mixter, Daniel Colish, Tuan Nguyen
Controlling vehicle functions

Patent number: 11904888

Abstract: A system for controlling autonomously-controllable vehicle functions of an autonomous vehicle cooperating with partner subjects includes a database device with information on communication signals from partner subjects, action objectives, and scenarios, and has an autonomous vehicle with autonomously controllable vehicle functions communicatively connected to the database device. The autonomous vehicle includes a control device with a programmable unit and a surround sensor device. The control device receives sensor signals acquired by the surround sensor device of a surrounding area of the vehicle and communication signals originating from at least one partner subject. The control device determines a situation context based on the database information, and converts the captured communication signals into control signals for the autonomously controllable vehicle functions based on the situation context.

Type: Grant

Filed: October 27, 2022

Date of Patent: February 20, 2024

Assignee: Ford Global Technologies, LLC

Inventors: Ahmed Benmimoun, Mohamed Benmimoun, Sufian Ashraf Mazhari, Mohsen Lakehal-Ayat, Muhammad Adeel Awan
Unified communications call routing and decision based on integrated analytics-driven database and aggregated data

Patent number: 11889026

Abstract: Exemplary aspects involve a data-communications apparatus or system communicate over a broadband network with a plurality of remotely-located data-communications circuits respectively associated with a plurality of remotely-situated client entities. The system includes data-communications platform (e.g., UC-CC) that processes incoming data-communication interactions including different types of digitally-represented communications among which are incoming call, and that is integrated with a memory circuit including a database of information sets. Each of the information sets includes experience data corresponding to past incoming data-communication interactions processed by the platform, and with aggregated and organized data based on data collected in previous incoming interactions.

Type: Grant

Filed: August 18, 2022

Date of Patent: January 30, 2024

Assignee: 8x8, Inc.

Inventors: Bryan R. Martin, Matt Taylor, Manu Mukerji
Thermal load prediction method and apparatus, readable medium, and electronic device

Patent number: 11887020

Abstract: A thermal load prediction method and apparatus. The method includes configuring multiple prediction states and corresponding error thresholds and forming a prediction model. The prediction model predicting first thermal load magnitudes respectively corresponding to multiple testing time periods, wherein a target steam user uses boiler steam in the multiple testing time periods. Determining, according to the first thermal load magnitudes, relative prediction errors respectively corresponding to the multiple testing time periods Forming a state transition probability matrix according to the relative prediction errors, and determining a state probability of each prediction state in each future time period of future time periods according to the state transition probability matrix.

Type: Grant

Filed: September 25, 2019

Date of Patent: January 30, 2024

Assignee: ENNEW Technology Co., Ltd.

Inventors: Shengwei Liu, Xin Huang
Active speaker tracking using a global naming scheme

Patent number: 11863592

Abstract: A method includes, at a media bridge configured to distribute a plurality of media streams among a plurality of client devices connected to the media bridge over a network, receiving the plurality of media streams from the plurality of client devices via the media bridge. The media bridge connects the plurality of client devices. The method further includes assigning a pair of names for each of the plurality of media streams. The pair of names include a contribution name and a distribution name. The method further includes presenting a first list to the plurality of client devices. The first list including a plurality of the distribution names for the plurality of media streams received from the plurality of client devices. The method further includes providing an indication of a current active speaker within the plurality of media streams via a signaling process.

Type: Grant

Filed: May 14, 2021

Date of Patent: January 2, 2024

Assignee: CISCO TECHNOLOGY, INC.

Inventors: Jacques Samain, Giovanna Carofiglio, Giulio Grassi, Enrico Loparco, Michele Papalini
Analysis and validation of language models

Patent number: 11829720

Abstract: Systems and methods for analysis and validation of language models trained using data that is unavailable or inaccessible are provided. One example method includes, at an electronic device with one or more processors and memory, obtaining a first set of data corresponding to one or more tokens predicted based on one or more previous tokens. The method determines a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process. In accordance with a determination that the probability is within a predetermined range, the method determines that the one or more tokens correspond to a prediction associated with the user privacy preserving training process and outputs a predicted token sequence including the one or more tokens and the one or more previous tokens.

Type: Grant

Filed: December 1, 2020

Date of Patent: November 28, 2023

Assignee: Apple Inc.

Inventors: Jerome R. Bellegarda, Bishal Barman, Brent D. Ramerth
Extended reality based voice command device management

Patent number: 11790908

Abstract: A voice command can be received from a user. One or more voice command devices (VCDs) that the voice command is targeting can be determined. A visual indicator of each of the one or more targeted VCDs can be displayed on an XR device worn by the user, wherein each visual indicator visually indicates a respective targeted VCD the voice command is directed to on the XR device.

Type: Grant

Filed: February 9, 2021

Date of Patent: October 17, 2023

Assignee: International Business Machines Corporation

Inventors: Soma Shekar Naganna, Sarbajit K. Rakshit, Abhishek Seth, Matheen Ahmed Pasha
Intelligent agent for interactive service environments

Patent number: 11776546

Abstract: Techniques are described for providing information during a service session, using an intelligent agent. The intelligent agent executes as a process to monitor communications exchanged during a service session between an individual and a service representative (SR) within a service environment. The agent analyzes the communications to identify questions or other topics that are posed by the individual during the service session. The agent retrieves stored data related to such questions or other topics, and generates a message to address each question or other topic. The message is injected into the service session to be presented to the individual, to supplement the conversation that is taking place between the SR and the individual. In some implementations, the agent monitors the communications, generates the message, and/or injects the message into the service session at least partly autonomously of any explicit action taken by the SR.

Type: Grant

Filed: September 8, 2021

Date of Patent: October 3, 2023

Assignee: United Services Automobile Association (USAA )

Inventors: Michael Waldmeier, Yuibi Fujimoto
Video conference collaboration

Patent number: 11778102

Abstract: A system and method providing an accessibility tool that enhances a graphical user interface of an online meeting application is described. In one aspect, a computer-implemented method performed by an accessibility tool (128), the method includes accessing (802), in real-time, audio data of a session of an online meeting application (120), identifying (804) a target user, a speaking user, and a task based on the audio data, the speaking user indicating the task assigned to the target user in the audio data, generating (806) a message (318) that identifies the speaking user, the target user, and the task, the message (318) including textual content, and displaying (808) the message (318) in a chat pane (906) of a graphical user interface (902) of the online meeting application (120) during the session.

Type: Grant

Filed: April 1, 2022

Date of Patent: October 3, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shahil Soni, Charles Yin-Che Lee
Biometric authentication

Patent number: 11748462

Abstract: A method for authenticating a user of an electronic device is disclosed. The method comprises: responsive to detection of a trigger event indicative of a user interaction with the electronic device, generating an audio probe signal to play through an audio transducer of the electronic device; receiving a first audio signal comprising a response of the user's ear to the audio probe signal; receiving a second audio signal comprising speech of the user; and applying an ear biometric algorithm to the first audio signal and a voice biometric algorithm to the second audio signal to authenticate the user as an authorised user.

Type: Grant

Filed: December 7, 2020

Date of Patent: September 5, 2023

Assignee: Cirrus Logic Inc.

Inventor: John Paul Lesso
System and methods for training task-oriented dialogue (TOD) language models

Patent number: 11749264

Abstract: Embodiments described herein provide methods and systems for training task-oriented dialogue (TOD) language models. In some embodiments, a TOD language model may receive a TOD dataset including a plurality of dialogues and a model input sequence may be generated from the dialogues using a first token prefixed to each user utterance and a second token prefixed to each system response of the dialogues. In some embodiments, the first token or the second token may be randomly replaced with a mask token to generate a masked training sequence and a masked language modeling (MLM) loss may be computed using the masked training sequence. In some embodiments, the TOD language model may be updated based on the MLM loss.

Type: Grant

Filed: November 3, 2020

Date of Patent: September 5, 2023

Assignee: Salesforce, Inc.

Inventors: Chien-Sheng Wu, Chu Hong Hoi, Richard Socher, Caiming Xiong
Method and apparatus and telephonic system for acoustic scene conversion

Patent number: 11741984

Abstract: An acoustic scene conversion method, comprising: receiving sound signals including user's speech and scenic sounds; processing the sound signals according to an artificial intelligence model to generate enhanced speech signals without scenic sounds; and mixing the enhanced speech signals with new scenic sounds to produce converted sound signals.

Type: Grant

Filed: June 1, 2021

Date of Patent: August 29, 2023

Assignee: ACADEMIA SINICA

Inventors: Tsao Yu, Syu-Siang Wang, Szu-Wei Fu, Alexander Chao-Fu Kang, Hsin-Min Wang
Agent control device

Patent number: 11710484

Abstract: An agent control device configured to execute a plurality of agents and including a processor, the processor being configured to store an interruptibility list that stipulates interruptibility of execution for each function of one given agent being executed or for an execution status of the one given agent; request execution of each of the agents at a prescribed trigger, or request execution of another given agent at a specific trigger, reference the interruptibility list in order to set permissibility information relating to executability of the other given agent in conjunction with execution of the one given agent; and perform management such that, in a case in which there is a request at the specific trigger for execution of the other given agent while the one given agent is executing, the other given agent is executed based on the request.

Type: Grant

Filed: April 8, 2021

Date of Patent: July 25, 2023

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor: Satoshi Aihara
Memory retention system

Patent number: 11709654

Abstract: The present disclosure generally relates to a computer-implemented system for intelligently retaining and recalling memory data. An exemplary method comprises receiving, via a microphone of an electronic device, a speech input of the user; receiving a text input of the user; constructing a first instance of a memory data structure based on the speech input; constructing a second instance of the memory data structure based on the text input; adding the first instance and the second instance of the memory data structure to a memory stack of the user; displaying a user interface for retrieving memory data of the user; receiving, via the user interface, a beginning of a statement from the user; retrieving a particular instance of the memory data structure from the memory stack based on the beginning of the statement; and automatically displaying a completion of the statement.

Type: Grant

Filed: May 19, 2022

Date of Patent: July 25, 2023

Assignee: Human AI Labs, Inc.

Inventors: Suman Kanuganti, Xiaoran Zhang, Kristie Kaiser
System and method for fast and accurate netlist to RTL reverse engineering

Patent number: 11704460

Abstract: Embodiments herein provide for reverse engineering of integrated circuits (ICs) for design verification. In example embodiments, an apparatus receives a gate-level netlist for an integrated circuit (IC), generates a list of equivalence classes related to signals included in the gate-level netlist, determines control signals of the gate-level netlist based at least in part on the list of equivalence classes, determines a logic flow of a finite state transducer (FST) based at least in part on the control signals, and generates register transfer level (RTL) source code for the IC based on the FST.

Type: Grant

Filed: June 9, 2021

Date of Patent: July 18, 2023

Assignee: UNIVERSITY OF FLORIDA RESEARCH FOUNDATION, INCORPORATED

Inventors: Yier Jin, Shaojie Zhang, James Geist, Travis Meade, Jason Liam Portillo
Display device

Patent number: 11706482

Abstract: Provided is a display device including a display unit, a storage unit configured to store information on a web page, a microphone configured to receive a user's voice command, a network interface unit configured to perform communication with a natural language processing (NLP) server, and a controller configured to transmit text data of the voice command to the NLP server, to receive intention analysis result information corresponding to the voice command from the NLP server, to select, as a final candidate address, one of a plurality of candidate addresses related to a search word included in the received intention analysis result information if the search word is not stored in the storage unit, and to access a website corresponding to the selected final candidate address.

Type: Grant

Filed: February 20, 2018

Date of Patent: July 18, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Chulmin Son, Seunghyun Heo, Jaekyung Lee
Machine learning for improving quality of voice biometrics

Patent number: 11699446

Abstract: Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.

Type: Grant

Filed: May 19, 2021

Date of Patent: July 11, 2023

Assignee: Capital One Services, LLC

Inventors: Bozhao Tan, Isabelle Alice Yvonne Moulinier, David Almquist, June Wu
Multimedia processing method and electronic system

Patent number: 11699429

Abstract: An electronic system is provided. The electronic system includes a host and a display. The host includes an audio processing module, and a smart interpreter engine. The audio processing module is utilized for acquiring audio data corresponding to a first language from audio streams processed by an application program executed on the host. The application program executed on the host includes a specific game software. The smart interpreter engine is utilized for receiving the audio data corresponding to the first language from the audio processing module and converting the audio data corresponding to the first language into text data corresponding to a second language according to the game software executed on the host The display is utilized for receiving the text data corresponding to the second language from the smart interpreter engine and displaying the text data corresponding to the second language.

Type: Grant

Filed: March 3, 2021

Date of Patent: July 11, 2023

Assignee: ACER INCORPORATED

Inventors: Gianna Tseng, Szu-Ting Chou, Shang-Yao Lin, Shih-Cheng Huang
Template-based intent classification for chatbots

Patent number: 11694032

Abstract: The present disclosure relates to chatbot systems and, more particularly, to techniques for determining that an input utterance is representative of a task that a particular chatbot can perform, based on matching the input utterance to a template. Techniques are also described for generating templates based on example utterances that have been provided for a chatbot. In certain embodiments, an initial set of templates is generated based on example utterances. This initial set of templates is then refined using template generalization techniques, which can be performed at the word or sentence level to generate a final set of templates for use at runtime, when the templates are matched against user utterances. The final set of templates may include one or more generalized templates that were derived from the initial set of templates and may also include the initial set of templates.

Type: Grant

Filed: September 3, 2020

Date of Patent: July 4, 2023

Assignee: Oracle International Corporation

Inventors: Stephen Andrew McRitchie, Sunghye Jeon
System and method for facilitating electronic financial transactions during a phone call

Patent number: 11687908

Abstract: A payment button on a device capable of making telephone calls, such as a mobile phone, allows a payer to electronically transfer money while in a phone call with a payee. The payment button also allows a payee to initiate an electronic payment transaction while in a phone call with a payer. The payment button may be a clickable or tappable virtual button presented on a display of the phone when being used to make or receive a call. The payer or the payee can simply enter a payment amount on the phone to complete an electronic payment transaction. A notification of payment is instantly transmitted to the phones being used for the phone call, so that the parties can safely and conveniently conclude a purchase and/or payment transaction during one phone call.

Type: Grant

Filed: June 7, 2021

Date of Patent: June 27, 2023

Assignee: PAYPAL, INC.

Inventors: Saumil Ashvin Gandhi, Ray Hideki Tanaka
Automatic speech recognition triggering system

Patent number: 11683632

Abstract: An automatic speech recognition (ASR) triggering system, and a method of providing an ASR trigger signal, is described. The ASR triggering system can include a microphone to generate an acoustic signal representing an acoustic vibration and an accelerometer worn in an ear canal of a user to generate a non-acoustic signal representing a bone conduction vibration. A processor of the ASR triggering system can receive an acoustic trigger signal based on the acoustic signal and a non-acoustic trigger signal based on the non-acoustic signal, and combine the trigger signals to gate an ASR trigger signal. For example, the ASR trigger signal may be provided to an ASR server only when the trigger signals are simultaneously asserted. Other embodiments are also described and claimed.

Type: Grant

Filed: August 17, 2021

Date of Patent: June 20, 2023

Assignee: Apple Inc.

Inventors: Sorin V. Dusan, Aram M. Lindahl, Robert D. Watson
End-to-end multi-speaker audio-visual automatic speech recognition

Patent number: 11615781

Abstract: A singe audio-visual automated speech recognition model for transcribing speech from audio-visual data includes an encoder frontend and a decoder. The encoder includes an attention mechanism configured to receive an audio track of the audio-visual data and a video portion of the audio-visual data. The video portion of the audio-visual data includes a plurality of video face tracks each associated with a face of a respective person. For each video face track of the plurality of video face tracks, the attention mechanism is configured to determine a confidence score indicating a likelihood that the face of the respective person associated with the video face tack includes a speaking face of the audio track. The decoder is configured to process the audio track and the video face track of the plurality of video face tracks associated with the highest confidence score to determine a speech recognition result of the audio track.

Type: Grant

Filed: October 2, 2020

Date of Patent: March 28, 2023

Assignee: Google LLC

Inventor: Otavio Braga
Learning word-level confidence for subword end-to-end automatic speech recognition

Patent number: 11610586

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

Type: Grant

Filed: February 23, 2021

Date of Patent: March 21, 2023

Assignee: Google LLC

Inventors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara Sainath, Ian Mcgraw
Information processing apparatus that cooperates with smart speaker, information processing system, control methods, and storage media

Patent number: 11595535

Abstract: An information processing apparatus that is capable of reducing time and effort to set settings of a smart speaker that cooperates with the information processing apparatus when a user starts to use the smart speaker. The information processing apparatus acquires identification information of the user, and acquires audio control information associated with the acquired identification information. Then, the information processing apparatus requests the smart speaker to change the audio setting of the smart speaker based on the acquired audio control information.

Type: Grant

Filed: June 10, 2021

Date of Patent: February 28, 2023

Assignee: CANON KABUSHIKI KAISHA

Inventor: Ryosuke Kasahara
Automated audio-to-text transcription in multi-device teleconferences

Patent number: 11574638

Abstract: A system and method are disclosed for generating a teleconference space for two or more communication devices using a computer coupled with a database and comprising a processor and memory. The computer generates a teleconference space and transmits requests to join the teleconference space to the two or more communication devices. The computer stores in memory identification information, and audiovisual data associated with one or more users, for each of the two or more communication devices. The computer stores audio transcription data, transmitted to the computer by each of the two or more communication devices and associated with one or more communication device users, in the computer memory. The computer merges the audio transcription data from each of the two or more communication devices into a master audio transcript, and transmits the master audio transcript to each of the two or more communication devices.

Type: Grant

Filed: May 9, 2022

Date of Patent: February 7, 2023

Assignee: Nextiva, Inc.

Inventors: Tomas Gorny, Jean-Baptiste Martinoli, Tracy Conrad, Lukas Gorny
Phrase recognition model for autonomous vehicles

Patent number: 11562573

Abstract: Aspects of the disclosure relate to training and using a phrase recognition model to identify phrases in images. As an example, a selected phrase list may include a plurality of phrases is received. Each phrase of the plurality of phrases includes text. An initial plurality of images may be received. A training image set may be selected from the initial plurality of images by identifying the phrase-containing images that include one or more phrases from the selected phrase list. Each given phrase-containing image of the training image set may be labeled with information identifying the one or more phrases from the selected phrase list included in the given phrase-containing images. The model may be trained based on the training image set such that the model is configured to, in response to receiving an input image, output data indicating whether a phrase of the plurality of phrases is included in the input image.

Type: Grant

Filed: December 16, 2020

Date of Patent: January 24, 2023

Assignee: Waymo LLC

Inventors: Victoria Dean, Abhijit S Ogale, Henrik Kretzschmar, David Harrison Silver, Carl Kershaw, Pankaj Chaudhari, Chen Wu, Congcong Li
Information processing device, information processing method, and recording medium

Patent number: 11514787

Abstract: In an information processing device, a first acquirer acquires, from a user, plan information including a scheduled time and a destination. A second acquirer acquires a spare time. A third acquirer acquires travelling schedule information for enabling arrival at the destination earlier than the scheduled time by the spare time or more. A display controller displays, on a display unit, information regarding the travelling schedule information and the spare time.

Type: Grant

Filed: August 1, 2019

Date of Patent: November 29, 2022

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventors: Koichi Suzuki, Makoto Akahane
Data mining apparatus, method and system for speech recognition using the same

Patent number: 11495234

Abstract: A data mining device, and a speech recognition method and system using the same are disclosed. The speech recognition method includes selecting speech data including a dialect from speech data, analyzing and refining the speech data including a dialect, and learning an acoustic model and a language model through an artificial intelligence (AI) algorithm using the refined speech data including a dialect. The user is able to use a dialect speech recognition service which is improved using services such as eMBB, URLLC, or mMTC of 5G mobile communications.

Type: Grant

Filed: May 30, 2019

Date of Patent: November 8, 2022

Assignee: LG Electronics Inc.

Inventors: Jee Hye Lee, Seon Yeong Park

1 2 3 4 5 … next