Speech To Image Patents (Class 704/235)
  • Patent number: 11527167
    Abstract: System, apparatus and method for facilitating interactive reading can include an electronic device having a program or application thereon. In one embodiment, the application can recognize one or more cues, combined with an external data source, that result from reading a story aloud and/or performing one or more acts.
    Type: Grant
    Filed: July 13, 2017
    Date of Patent: December 13, 2022
    Assignee: The Marketing Store Worldwide, LP
    Inventors: Thomas Foreman, Hiren Jakison
  • Patent number: 11527248
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.
    Type: Grant
    Filed: May 27, 2020
    Date of Patent: December 13, 2022
    Assignee: GOOGLE LLC
    Inventors: Brian Strope, Francoise Beaufays, Olivier Siohan
  • Patent number: 11526658
    Abstract: The growing amount of communication data generated by inmates in controlled environments makes a timely and effective investigation and analysis more and more difficult. The present disclosure provides details of a system and method to investigate and analyze the communication data in a correctional facility timely and effectively. Such a system receives both real time communication data and recorded communication data, processes and investigates the data automatically, and stores the received communication data and processed communication data in a unified data server. Such a system enables a reviewer to review, modify and insert markers and comments for the communication data. Such a system further enables the reviewer to search the communication data and create scheduled search reports.
    Type: Grant
    Filed: December 7, 2020
    Date of Patent: December 13, 2022
    Assignee: Global Tel*Link Corporation
    Inventor: Stephen Lee Hodge
  • Patent number: 11526667
    Abstract: Embodiments of the present systems and methods may provide techniques for augmenting textual data that may be used for textual classification tasks. Embodiments of such techniques may provide the capability to synthesize labeled data to improve text classification tasks. Embodiments may be specifically useful when only a small amount of data is available, and provide improved performance in such cases. For example, in an embodiment, a method implemented in a computer system may comprise a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, and the method may comprise fine-tuning a language model using a training dataset, synthesizing a plurality of samples using the fine-tuned language model, filtering the plurality of synthesized samples, and generating an augmented training dataset comprising the training dataset and the filtered plurality of synthesized sentences.
    Type: Grant
    Filed: May 9, 2020
    Date of Patent: December 13, 2022
    Assignee: International Business Machines Corporation
    Inventors: Amir Kantor, Ateret Anaby Tavor, Boaz Carmeli, Esther Goldbraich, George Kour, Segev Shlomov, Naama Tepper, Naama Zwerdling
  • Patent number: 11526323
    Abstract: Example apparatus, computer-implemented methods, systems, devices, and computer-readable media facilitate concurrent consumption of media content by multiple users using superimposed animation. Example instructions, when executed, cause at least one processor of an electronic user device to at least cause a display to present media and a visual representation of a remote individual, the visual representation including at least a portion of a human profile; identify a direction of a gaze of a user of the electronic user device based on signals output by one or more sensors of the electronic user device; and determine whether the gaze of the user is directed toward the display of the media or the display of the visual representation based on the direction of the gaze.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: December 13, 2022
    Assignee: Intel Corporation
    Inventors: Paul I. Felkai, Annie Harper, Ratko Jagodic, Rajiv K. Mongia, Garth Shoemaker
  • Patent number: 11522730
    Abstract: In an approach to customizing meeting notes, a computer receives audio input of a virtual meeting, converts the audio input to text, and displays the text to a plurality of meeting participants. A computer receives highlighted phrases of the text from the plurality of meeting participants and determines a highlighting frequency of each of the highlighted phrases. A computer determines phrases with a highlighting frequency greater than a pre-defined threshold. A computer orders the phrases based on a chronological order of the phrases in the audio input. A computer determines preferences of a first meeting participant associated with a meeting summary. A computer generates a customized summary of the virtual meeting for the first meeting participant of the plurality of meeting participants based on the ordered phrases with a high frequency of highlighting and on the preferences. A computer transmits the customized summary to the first meeting participant.
    Type: Grant
    Filed: October 5, 2020
    Date of Patent: December 6, 2022
    Assignee: International Business Machines Corporation
    Inventors: Ruchi Asthana, Jennifer A. Mallette, Steven Ware Jones, Nicholas Fong, Vivek Salve
  • Patent number: 11521608
    Abstract: Methods and systems for correcting, based on subsequent second speech, an error in an input generated from first speech using automatic speech recognition, without an explicit indication in the second speech that a user intended to correct the input with the second speech, include determining that a time difference between when search results in response to the input were displayed and when the second speech was received is less than a threshold time, and based on the determination, correcting the input based on the second speech. The methods and systems also include determining that a difference in acceleration of a user input device, used to input the first speech and second speech, between when the search results in response to the input were displayed and when the second speech was received is less than a threshold acceleration, and based on the determination, correcting the input based on the second speech.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: December 6, 2022
    Assignee: Rovi Guides, Inc.
    Inventor: Arun Sreedhara
  • Patent number: 11520471
    Abstract: The illustrative embodiments described herein provide systems and methods for notifying a user when a set of characters are identified in a media file. In one embodiment, a method includes receiving a set of characters inputted by the user of a computing device, playing the media file, transcribing the media file to form a transcription, and determining whether the transcription of the media file includes the set of characters. The method also includes initiating a notification prompt on a graphical user interface of the computing device in response to determining that the media file includes the set of characters.
    Type: Grant
    Filed: October 31, 2021
    Date of Patent: December 6, 2022
    Assignee: GOOGLE LLC
    Inventor: Margarita Khafizova
  • Patent number: 11514914
    Abstract: Systems and methods for an intelligent virtual assistant for meetings are disclosed. In one embodiment, a system for an intelligent virtual assistant for meeting may include a server comprising at least one computer processor executing a virtual assistant computer program; a communication server in communication with the server; and a plurality of communication devices in communication with the server and the communication server, wherein the communication server facilitates an electronic meeting with a plurality of attendees via the plurality of communication devices. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees, may receive an edited transcription, and may update the speech recognition algorithm based on the edited transcription.
    Type: Grant
    Filed: February 5, 2020
    Date of Patent: November 29, 2022
    Assignee: JPMORGAN CHASE BANK, N.A.
    Inventors: Daniel D McQuiston, Aarti Narayanan, Dave Burrells, Simon Burke, Jan S Dabrowski, Rhys Dawes, Charlotte Knight, Libby Kent, Sandeep Koul, Uday Pant, Tony M Nazarowski, Aditi Vaidya, Ayush Kumar Bilala, Charanjith Allaparambil Chandran, Prayag Godha, Nikhil Kotikanikadanam Madhusudhan, Chitra Pillai Sundaribai, Aditya Anil Upadhyay, Eric Han Kai Chang, Stefan Cristian Bardasu, Erin Michelle Perry, Saifuddin Merchant, James P White, III
  • Patent number: 11515020
    Abstract: A method, computer program product, and computing system for: receiving an initial portion of an encounter record; processing the initial portion of the encounter record to generate initial content for a medical report; receiving one or more additional portions of the encounter record; and processing the one or more additional portions of the encounter record to modify the medical report.
    Type: Grant
    Filed: March 5, 2019
    Date of Patent: November 29, 2022
    Assignee: Nuance Communications, Inc.
    Inventors: Paul Joseph Vozila, Joel Praveen Pinto, Kumar Abhinav, Haibo Li, Marilisa Amoia, Frank Diehl
  • Patent number: 11516518
    Abstract: A method comprises receiving from each of a plurality of commentator applications respective distinct streams of media content comprising commentary information, combining at least portions of selected ones of the distinct streams of media content comprising commentary information in a mixer associated with a media server to generate a composite media content stream, and providing the composite media content stream generated by the mixer to one or more servers of a content delivery network for delivery to one or more viewer devices. The commentary information of a given one of the distinct streams of media content received from a corresponding one of the commentator applications illustratively comprises at least one of audio content, video content, image content, social media posting content, chat text and closed caption text. The mixer may comprise a post-mixer coupled to the media server.
    Type: Grant
    Filed: June 18, 2021
    Date of Patent: November 29, 2022
    Assignee: Kiswe Mobile Inc.
    Inventors: Bert De Decker, Tom Cuypers, Wim Sweldens, Francis X. Zane, Thomas J. Janiszewski, Yung-Lung Ho
  • Patent number: 11507345
    Abstract: Systems and methods to accept speech input and edit a note upon receipt of an indication to edit are disclosed. Exemplary implementations may: effectuate presentation of a graphical user interface that includes a note, the note including note sections, the note sections including a first note section, the individual note sections including body fields; obtain user input from the client computing platform, the user input representing an indication to edit a first body field of the first note section; obtain audio information representing sound captured by an audio section of the client computing platform, the audio information including value definition information specifying one or more values to be included in the individual body fields; perform speech recognition on the audio information to obtain a first value; and populate the first body field with the first value so that the first value is included in the first body field.
    Type: Grant
    Filed: September 23, 2020
    Date of Patent: November 22, 2022
    Assignee: SuKI AI, Inc.
    Inventor: Matt Pallakoff
  • Patent number: 11508106
    Abstract: An disclosure includes: moving image acquisition unit configured to acquire moving image data obtained through moving image capturing of at least a mouth part of an utterer; a lip detection unit configured to detect a lip part from the moving image data and detect motion of the lip part; a moving image processing unit configured to generate a moving image enhanced to increase the motion of the lip part detected by the lip detection unit; and a display control unit configured to control a display panel to display the moving image generated by the moving image processing unit.
    Type: Grant
    Filed: April 8, 2020
    Date of Patent: November 22, 2022
    Assignee: JVCKENWOOD Corporation
    Inventor: Takuji Teruuchi
  • Patent number: 11507253
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing contextual information for a displayed resource that includes an image. In one aspect, a method includes providing, by a user device to a contextual information server, a query-independent request for contextual information relevant to an active resource displayed in an application environment on the user device, wherein the request specifies content of the active resource and further specifies that the active resource displayed on the user device includes an image, but does not include the image in the request, receiving a request for the image from the contextual information server, providing the image to the contextual information server, receiving a user interface element that includes contextual information regarding the image, and displaying the user interface element on the user device with the active resource.
    Type: Grant
    Filed: October 5, 2020
    Date of Patent: November 22, 2022
    Assignee: GOOGLE LLC
    Inventors: Joao Paulo Pagaime da Silva, Vikram Aggarwal
  • Patent number: 11507346
    Abstract: A method for text feedback includes: receiving, by a controller, an utterance from a user; determining, by an automatic speech recognition engine of the controller, a plurality of speech recognition results based on the utterance from the user, wherein the speech recognition results include probable commands; determining, by the automatic speech recognition engine of the controller, a plurality of confidence scores for each of the plurality of speech recognition results; determining, by the controller, a text characteristic for each of the plurality of probable commands as a function of the confidence scores for each of the plurality of speech recognition results; and commanding, by the controller, a display to show text corresponding to each of the plurality of probable commands with the text characteristic determined by the controller.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: November 22, 2022
    Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Xu Fang Zhao, Gaurav Talwar, Alaa M. Khamis
  • Patent number: 11507759
    Abstract: A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.
    Type: Grant
    Filed: March 19, 2020
    Date of Patent: November 22, 2022
    Assignee: PANASONIC HOLDINGS CORPORATION
    Inventors: Hiroki Furukawa, Atsushi Sakaguchi, Tsuyoki Nishikawa
  • Patent number: 11501757
    Abstract: Disclosed herein is an artificial intelligence apparatus including an input interface configured to receive speech data, and a processor configured to detect a non-utterance interval included in the speech data and determine presence/absence of a second utterance after the non-utterance interval according to characteristics of a first utterance before the non-utterance interval, when the non-utterance interval exceeds a set time.
    Type: Grant
    Filed: December 17, 2019
    Date of Patent: November 15, 2022
    Assignee: LG ELECTRONICS INC.
    Inventor: Hansuk Shim
  • Patent number: 11501074
    Abstract: Methods, systems, and computing devices for visualizing natural language processing algorithm processes are described herein. A plurality of categories may be determined. Each color of a plurality of colors may correspond to the categories. Text content may be processed using a natural language processing algorithm. Confidence values indicating, for each of a plurality of portions of the text content, a degree of confidence corresponding to one or more of the plurality of categories may be determined. Display colors may be determined based on the confidence values. A user interface comprising a visualization of the text content may be displayed, and the user interface may be configured to show each portion of the text content using a display color such that the user interface indicates changes in confidence across the plurality of characters.
    Type: Grant
    Filed: August 27, 2020
    Date of Patent: November 15, 2022
    Assignee: Capital One Services, LLC
    Inventors: Jeremy Goodsitt, Austin Walters, Anh Truong
  • Patent number: 11501091
    Abstract: A real-time speech-to-speech generator and sign gestures converter system is disclosed. The system is still challenging for deaf or hearing impaired people. Embodiments of the invention provide direct speech to speech translation system and further conversion to sign gestures is disclosed. Direct speech to speech translation and further sign gesture conversion uses a one-tier approach, creating a unified-model for whole application. The single-model ecosystem takes in audio (MEL spectrogram) as an input and gives out audio (MEL spectrogram) as an output to a speech-sign converter device with a display. This solves the bottleneck problem by converting the translated speech directly to sign language gesture from first language with emotion by preserving phonetic information along the way. This model needs parallel audio samples in two languages.
    Type: Grant
    Filed: June 11, 2022
    Date of Patent: November 15, 2022
    Inventor: Sandeep Dhawan
  • Patent number: 11501780
    Abstract: Devices, systems, and methods for automatic real-time moderation of meetings, by a computerized or automated moderation unit able to manage, steer and guide the meeting in real-time and able to selectively generate and convey real-time differential notifications and advice to particular participants. A Meeting Moderator Bot monitors audio conversations in a meeting, and analyzes their textual equivalent; detects topics that were skipped or that should be discussed, and notifies participants; detects double-talk or interferences and generates warnings accordingly; detects absence of participants that are relevant to particular topics; detects that the conversation should shift to another topic on the agenda; generates other meeting steering notifications; and monitors compliance of the meeting participants with such steering notifications.
    Type: Grant
    Filed: May 19, 2020
    Date of Patent: November 15, 2022
    Assignee: AUDIOCODES LTD.
    Inventors: Shabtai Adlersberg, Menachem Honig, Tatiana Adar
  • Patent number: 11503401
    Abstract: A dual-zone automotive multimedia system may include a first infotainment device associated with a front zone of a vehicle, at least one second infotainment device associated with a rear zone of a vehicle, wherein the at least one second infotainment device includes a directional loudspeaker arranged facing the rear zone of the vehicle, and a processor programmed to transmit audio signals to the first and second infotainment devices to create sound at each of the front and rear zones, wherein the audio signal transmitted to the directional loudspeaker relates to playback at the rear zone.
    Type: Grant
    Filed: February 19, 2021
    Date of Patent: November 15, 2022
    Assignee: Harman International Industries, Incorporated
    Inventors: Riley Winton, Christopher Ludwig, Christopher Michael Trestain, Maxwell Boone Willis
  • Patent number: 11496623
    Abstract: A telecommunications network for playing an enhanced announcement in the same format as that of an enhanced call is described herein. An enhanced call is a call via real time text or video. The telecommunications network includes a node or subsystem, such as an IP multimedia subsystem core (IMS or IMS core), programmed to receive an enhanced call in a text or video format, detect the format of the enhanced call, and return an enhanced announcement—an announcement provided in the same format as the enhanced call. The IMS core can include one or more sub-nodes or sub-components, including a telephony application server (TAS), a media resource function (MRF), or both.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: November 8, 2022
    Assignee: T-MOBILE USA, INC.
    Inventor: Tianmin Ding
  • Patent number: 11495206
    Abstract: Voice synthesis method and apparatus generate second control data using an intermediate trained model with first input data including first control data designating phonetic identifiers, change the second control data in accordance with a first user instruction provided by a user, generate synthesis data representing frequency characteristics of a voice to be synthesized using a final trained model with final input data including the first control data and the changed second control data, and generate a voice signal based on the generated synthesis data.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: November 8, 2022
    Assignee: YAMAHA CORPORATION
    Inventor: Ryunosuke Daido
  • Patent number: 11488604
    Abstract: A method may include obtaining first features of first audio data that includes speech and obtaining second features of second audio data that is a revoicing of the first audio data. The method may further include providing the first features and the second features to an automatic speech recognition system and obtaining a single transcription generated by the automatic speech recognition system using the first features and the second features.
    Type: Grant
    Filed: August 19, 2020
    Date of Patent: November 1, 2022
    Assignee: Sorenson IP Holdings, LLC
    Inventor: David Thomson
  • Patent number: 11482214
    Abstract: Techniques for speech-to-text hypothesis generation and hypothesis selection described. A text input representing at least part of a voice recording is received from a speech-to-text component. A first text alternative is generated using a finite state transducer based at least in part on the text input. A hypothesis from a hypothesis set is selected using a language model that includes probabilities for sequences of words, the hypothesis set including the text input and the first text alternative. A selected hypothesis text associated with the selected hypothesis is sent to a search engine.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: October 25, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Natali Arieli, Eran Fainman, Yochai Zvik, Yaniv Ben-Yehuda
  • Patent number: 11483273
    Abstract: Chat-based interaction with an in-meeting virtual assistant may be provided. First, audio input associated with a meeting may be received. Next, an intent from the audio input may be detected. Text content associated with the audio input may then be generated in response to detecting the intent from the audio input. The text content may be displayed in a chat interface.
    Type: Grant
    Filed: July 14, 2020
    Date of Patent: October 25, 2022
    Assignee: Cisco Technology, Inc.
    Inventors: Mohamed Gamal Mohamed Mahmoud, Omar Tarek El-Sadany
  • Patent number: 11474780
    Abstract: An electronic device includes a communication circuit, a display, a microphone, a processor operatively connected to the communication circuit, the display, and the microphone, and a memory operatively connected to the processor, wherein the memory is configured to store instructions which, when executed, cause the processor to control the electronic device to: transmit information related to a predetermined event to a server through the communication circuit in response to detection of the predetermined event through an application, display a user interface through the display in response to reception of information related to the user interface including at least one visual object selectable by a user to control a function of the application through the communication circuit, receive a user-uttered input for selecting one of the at least one visual object included in the user interface through the microphone, and transmit information related to the user-uttered input to the server through the communication
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: October 18, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jaeyoung Yang, Donghee Suh, Hojun Jaygarl, Minsung Kim, Jinwoong Kim, Youngbin Kim, Kwangbin Lee, Youngmin Lee
  • Patent number: 11475887
    Abstract: An electronic device receives audio data for a media item. The electronic device generates, from the audio data, a plurality of samples, each sample having a predefined maximum length. The electronic device, using a neural network trained to predict textal unit probabilities, generates a probability matrix of textual units for a first portion of a first sample of the plurality of samples. The probability matrix includes information about textual units, timing information, and respective probabilities of respective textual units at respective times. The electronic device identifies, for the first portion of the first sample, a first sequence of textual units based on the generated probability matrix.
    Type: Grant
    Filed: November 21, 2019
    Date of Patent: October 18, 2022
    Assignee: Spotify AB
    Inventors: Daniel Stoller, Simon René Georges Durand, Sebastian Ewert
  • Patent number: 11470404
    Abstract: An example method performed by a processing system includes retrieving a digital model of a media element from a database storing a plurality of media elements, wherein the media element is to be inserted into a scene of an audiovisual media, rendering the media element in the scene of the audiovisual media, based on the digital model of the media element and on metadata associated with the digital model to produce a rendered media element, wherein the metadata describes a characteristic of the media element and a limit on the characteristic, and inserting the rendered media element into the scene of the audiovisual media.
    Type: Grant
    Filed: May 26, 2020
    Date of Patent: October 11, 2022
    Assignees: AT&T Intellectual Property I, L.P., AT&T Mobility II LLC
    Inventors: John Oetting, Eric Zavesky, James Pratt, Jason Decuir, Terrel Lecesne
  • Patent number: 11468867
    Abstract: A system and method for providing acoustic output is disclosed, the system comprising a communication device, a processor coupled to the communication device, and a memory coupled to the processor. The processor receives multimedia data associated with a multimedia output stream, extracts audio data based on the multimedia data, and generates a rhythmic data set including time-series acoustic characteristic data based on the extracted audio data. A sequence of visual elements is generated based on the time-series acoustic characteristic data and associated with the respective visual elements in the sequence of visual elements with the multimedia data. The multimedia data for visually displaying the acoustic characteristic data concurrently with the multimedia stream is transmitted to a multimedia output device.
    Type: Grant
    Filed: March 25, 2020
    Date of Patent: October 11, 2022
    Assignee: COMMUNOTE INC.
    Inventor: Kemal S. Ahmed
  • Patent number: 11468123
    Abstract: Disclosed is an electronic apparatus providing a reply to a query of a user. The electronic apparatus includes a microphone, a camera, a memory configured to store at least one instruction, and at least one processor, and the processor is configured to execute the at least one instruction to control the electronic apparatus to: identify a region of interest corresponding to a co-reference in an image acquired through the camera based on a co-reference being included in the query, identify an object referred to by the co-reference among at least one object included in the identified region of interest based on a dialogue content that includes the query, and provide information on the identified object as the reply.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: October 11, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kangwook Lee, Jaewon Kim, Jiin Nam, Huiwon Yun, Hojin Jung, Kunal Chawla, Akhil Kedia
  • Patent number: 11470327
    Abstract: Scene aware video content encoding techniques can determine if video content is a given content type and is one of one or more given titles that include one or more given scenes. The one or more given scenes of the video content of the given type and given one of the titles can be encoded using corresponding scenes specific encoding parameter values, and the non-given scenes can be encoded using one or more general encoding parameter values. The one or more given titles can be selected based on a rate of streaming of various video content titles of the given type.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: October 11, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Tae Meon Bae, Minghai Qin, Guanlin Wu, Yen-kuang Chen, Qinggang Zhou, Shaolin Xie
  • Patent number: 11462216
    Abstract: A method for selecting a speech recognition result on a computing device includes receiving a first speech recognition result determined by the computing device, receiving first features, at least some of the features being determined using the first speech recognition result, determining whether to select the first speech recognition result or to wait for a second speech recognition result determined by a cloud computing service based at least in part on the first speech recognition result and the first features.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: October 4, 2022
    Assignee: Cerence Operating Company
    Inventor: Min Tang
  • Patent number: 11461779
    Abstract: Techniques for transferring control of a system-user dialog session are described. A first speechlet component may interact with a user until the first speechlet component receives user input that the first speechlet component cannot handle. The first speechlet component may output an action representing the user input. A system may determine a second speechlet component configured to execute the action. The system may send the second speechlet component a navigator object that results in the second speechlet component handling the user interaction that the first speechlet component could not handle. Once the second speechlet component is finished processing, the second speechlet component may output an updated navigator object, which causes the first speechlet component to either further interact with a user or cause a current dialog session to be closed.
    Type: Grant
    Filed: March 23, 2018
    Date of Patent: October 4, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Rohin Dabas, Troy Dean Schuring, Xu Zhang, Maksym Kolodeznyi, Andres Felipe Borja Jaramillo, Nnenna Eleanya Okwara, Alberto Milan Gutierrez, Rashmi Tonge
  • Patent number: 11455151
    Abstract: Disclosed herein is a software technology for facilitating an interactive conversational session between a user and a digital conversational character. For instance, in one aspect, the disclosed process may involve two primary phases: (1) an authoring phase that involves a first user accessing a content authoring tool to create a given type of visual conversation application that facilitates interactions between a second user and a digital conversational character in an interactive conversational session, and (2) a rendering phase that involves the second user accessing the created visual conversation application to interact with the digital conversational character in an interactive conversational session. In one implementation, accessing the created visual conversation application may involve detecting an object and identifying information associated with the detected object.
    Type: Grant
    Filed: October 11, 2019
    Date of Patent: September 27, 2022
    Assignee: HIA Technologies Inc.
    Inventors: Vacit Arat, Richard Cardran, Rick King
  • Patent number: 11450334
    Abstract: To filter unwanted sounds from a conference call, a first voice signal is captured by a first device during a conference call and converted into corresponding text, which is then analyzed to determine that a first portion of the text was spoken by a first user and a second portion of the text was spoken by a second user. If the first user is relevant to the conference call while the second user is not, the first voice signal is prevented from being transmitted into the conference call, the first portion of text is converted into a second voice signal using a voice profile of the first user to synthesize the voice of the first user, and the second voice signal is then transmitted into the conference call. The second portion of text is not converted into a voice signal, as the second user is determined not to be relevant.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: September 20, 2022
    Assignee: Rovi Guides, Inc.
    Inventors: Rajendran Pichaimurthy, Madhusudhan Seetharam
  • Patent number: 11450319
    Abstract: The present disclosure discloses an image processing device including: a receiving module configured to receive a voice signal and an image to be processed; a conversion module configured to convert the voice signal into an image processing instruction and determine a target area according to a target voice instruction conversion model, in which the target area is a processing area of the image to be processed; and a processing module configured to process the target area according to the image processing instruction and a target image processing model. The examples may realize the functionality of using voice commands to control image processing, which may save users' time spent in learning image processing software prior to image processing, and improve user experience.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: September 20, 2022
    Inventors: Tianshi Chen, Shuai Hu, Xiaobing Chen
  • Patent number: 11450095
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for machine learning for video analysis and feedback. In some implementations, a machine learning model is trained to classify videos into performance level classifications based on characteristics of image data and audio data in the videos. Video data captured by a device of a user following a prompt that the device provides to the user is received. A set of feature values that describe audio and video characteristics of the video data are determined. The set of feature values are provided as input to the trained machine learning model to generate output that classifies the video data with respect to the performance level classifications. A user interface of the device is updated based on the performance level classification for the video data.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: September 20, 2022
    Assignee: Voomer, Inc.
    Inventor: David Wesley Anderton-Yang
  • Patent number: 11445056
    Abstract: Technologies related to telecommunications are described herein, wherein such technologies are configured to assist users with hearing impairments. The technologies described herein cause transcriptions of spoken utterances directed to a telephone in a telephone conversation to be presented on a display of the telephone nearly simultaneously with the spoken utterances being audibly output by the telephone.
    Type: Grant
    Filed: August 8, 2020
    Date of Patent: September 13, 2022
    Assignee: EUGENIOUS ENTERPRISES, LLC
    Inventors: Daniel Yusef Abdelsamed, Michael J. Medley, Matthew G. Good
  • Patent number: 11443741
    Abstract: A natural language processing (NLP) apparatus includes a housing, a built-in voice input interface; a built-in data communication interface configured to establish data communication with multiple types of appliances; a built-in NLP module, and a built-in control device. A first voice input is received through the built-in voice input interface; if the target appliance is a first appliance of a first appliance type, the first voice input is processed using a first NLP model of the built-in NLP module to obtain a first machine command, and the first machine command is sent via the built-in data communication interface to the first appliance; and if the target appliance is a second appliance of a second appliance type, the first voice input is processed using a second NLP model of the built-in NLP module, and the second machine command is sent via the built-in data communication interface to the second appliance.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: September 13, 2022
    Assignee: MIDEA GROUP CO. LTD.
    Inventors: Haibin Huang, Chen Zhang, Xin Liu
  • Patent number: 11442614
    Abstract: A method and workstation for generating a transcript of a conversation between a patient and a healthcare practitioner is disclosed. A workstation is provided with a tool for rendering of an audio recording of the conversation and generating a display of a transcript of the audio recording using a speech-to-text engine, thereby enabling inspection of the accuracy of conversion of speech to text. A tool is provided for scrolling through the transcript and rendering the portion of the audio according to the position of the scrolling. There is a highlighting in the transcript of words or phrases spoken by the patient relating to symptoms, medications or other medically relevant concepts. Additionally, there is provided a set of transcript supplement tools enabling editing of specific portions of the transcript based on the content of the corresponding portion of audio recording.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: September 13, 2022
    Assignee: Google LLC
    Inventors: Melissa Strader, William Ito, Christopher Co, Katherine Chou, Alvin Rajkomar, Rebecca Rolfe
  • Patent number: 11437027
    Abstract: Techniques for handling errors during processing of natural language inputs are described. A system may process a natural language input to generate an ASR hypothesis or NLU hypothesis. The system may use more than one data searching technique (e.g., deep neural network searching, convolutional neural network searching, etc.) to generate an alternate ASR hypothesis or NLU hypothesis, depending on the type of hypothesis input for alternate hypothesis processing.
    Type: Grant
    Filed: December 4, 2019
    Date of Patent: September 6, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Chenlei Guo, Xing Fan, Jin Hock Ong, Kai Wei
  • Patent number: 11436938
    Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for engaging in interactive sessions between computerized devices and participants. The method includes receiving user input that interacts with user controls to specify, for each respective content item in a plurality of content items, a text spelling and an audio recording. The method includes receiving user input that selects a set of user-selected content items from among the plurality of content items for inclusion as part of the first interactive exercise, and assigns an order to the user-selected content items. The computing system presents the selected content items in the user-selected order, receives user input that inputs the respective content items, and determines whether the user input matches the content item.
    Type: Grant
    Filed: February 11, 2020
    Date of Patent: September 6, 2022
    Assignee: Debby Webby, LLC
    Inventors: Deb Mallin, Marc Dispensa
  • Patent number: 11432045
    Abstract: Disclosed is a display device. According to an embodiment, a display device may include a voice signal receiver, a display, at least one memory storing an application supporting a contents providing service and storing instructions, a communication circuit communicating with at least one external server supporting the contents providing service, and at least one processor. The contents providing service may provide contents files of a first type and contents files of a second type.
    Type: Grant
    Filed: February 19, 2019
    Date of Patent: August 30, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jibum Moon, Gyungchan Seol, Kyerim Lee
  • Patent number: 11429789
    Abstract: Embodiments relate to an intelligent computer platform to identify and evaluate candidate passage response data in natural language form. Natural language processing is applied to analyze a passage against one or more input tokens to identify matching content. A structure representing the analyzed passage is populated with matching input and passage tokens. A first count of matching token entries and a second count of evaluated token entries are determined and qualified by closeness criteria. An alignment of the passage to a candidate question is calculated, including assessing a ratio of the first and second counts as a confidence value. Matching passage data is returned from the passage with the confidence value.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: August 30, 2022
    Assignee: International Business Machines Corporation
    Inventors: Stephen A. Boxwell, Keith G. Frost, Kyle M. Brake, Stanley J. Vernier
  • Patent number: 11430428
    Abstract: The present disclosure describes a method, apparatus, and storage medium for performing speech recognition. The method includes acquiring, by an apparatus, first to-be-processed speech information. The apparatus includes a memory storing instructions and a processor in communication with the memory. The method includes acquiring, by the apparatus, a first pause duration according to the first to-be-processed speech information; and in response to the first pause duration being greater than or equal to a first threshold, performing, by the apparatus, speech recognition on the first to-be-processed speech information to obtain a first result of sentence segmentation of speech, the first result of sentence segmentation of speech being text information, the first threshold being determined according to speech information corresponding to a previous moment.
    Type: Grant
    Filed: September 10, 2020
    Date of Patent: August 30, 2022
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Lianwu Chen, Jingliang Bai, Min Luo
  • Patent number: 11432090
    Abstract: An audio system can be configured to generate an audio heatmap for the audio emission potential profiles for one or more speakers, in specific or arbitrary locations. The audio heatmap maybe based on speaker location and orientation, speaker acoustic properties, and optionally environmental properties. The audio heatmap often shows areas of low sound density when there are few speakers, and areas of high sound density when there are a lot of speakers. An audio system may be configured to normalize audio signals for a set of speakers that cooperatively emit sound to render an audio object in a defined audio object location. The audio signals for each speaker can be normalized to ensure accurate rendering of the audio object without volume spikes or dropout.
    Type: Grant
    Filed: January 12, 2021
    Date of Patent: August 30, 2022
    Assignee: SPATIALX INC.
    Inventors: Xavier Prospero, Aric Marshall, Michael Plitkins, Calin Pacurariu
  • Patent number: 11423897
    Abstract: Systems and methods are described herein for generating an adaptive response to a user request. Input indicative of a user request may be received and utilized to identify an item in an electronic catalog. Title segments may be identified from the item's title. Significant segments of the user request may be determined. In response to the user request, a shortened title may be generated from the identified title segments and provided as output at the user device (e.g., via audible output provided at a speaker of the user device, via textual output, or the like). At least one of the title segments provided in the shortened title may correlate to the significant segment identified from the user request. In some embodiments, the length and content of the shortened title may vary based at least in part on the contextual intent of the user's request.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: August 23, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ran Levy, Ori Rozen, Leon Portman, Knaan Ratosh, Ido Arad, Hadar Neumann
  • Patent number: 11423236
    Abstract: A method for identifying phrases in a text document having a similar discourse to a candidate phrase includes separating text in a document file into a plurality of phrases and generating a plurality of embedding vectors in a textual embedding space by inputting the plurality of phrases into an embedding engine. A mapping of each embedding vector in the textual embedding space is generated with each corresponding phrase and a document location of each corresponding phrase in the document file. A candidate phrase is received by a user and a candidate embedding vector is generated using the embedding engine. Similarity scores are computed based on the plurality of embedding space distances between the candidate phrase embedding vector location and each respective location of each embedding vector in the textual embedding space. A listing of phrases with the highest similarity scores are outputted with respective document locations in the text.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: August 23, 2022
    Assignee: Capital One Services, LLC
    Inventors: Austin Walters, Vincent Pham, Ernest Kwak, Galen Rafferty, Reza Farivar, Jeremy Goodsitt, Anh Truong
  • Patent number: 11425315
    Abstract: A video communications method is provided, including: respectively displaying video images of at least two terminals in at least two display subareas of a video communication interface in a video chat session of the at least two terminal; obtaining a first special effect display instruction; and adding a first special effect to the at least two display subareas based on the first special effect display instruction. The method also includes transmitting the first special effect display instruction to a second terminal of the at least two terminals, the second terminal being an action recipient of the first special effect; and selecting, among multiple end special effects, a target end special effect to be added to the video images of the at least two terminals according to a body action occurred in the video image of the second terminal.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: August 23, 2022
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Ying Zhu, Chao Wang, Yinglei Liang, Haoqi Kuang, Lin Shi, Jinjie Wang, Weisong Zhu