Speech To Image Patents (Class 704/235)
-
Patent number: 11527167Abstract: System, apparatus and method for facilitating interactive reading can include an electronic device having a program or application thereon. In one embodiment, the application can recognize one or more cues, combined with an external data source, that result from reading a story aloud and/or performing one or more acts.Type: GrantFiled: July 13, 2017Date of Patent: December 13, 2022Assignee: The Marketing Store Worldwide, LPInventors: Thomas Foreman, Hiren Jakison
-
Patent number: 11527248Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.Type: GrantFiled: May 27, 2020Date of Patent: December 13, 2022Assignee: GOOGLE LLCInventors: Brian Strope, Francoise Beaufays, Olivier Siohan
-
Patent number: 11526658Abstract: The growing amount of communication data generated by inmates in controlled environments makes a timely and effective investigation and analysis more and more difficult. The present disclosure provides details of a system and method to investigate and analyze the communication data in a correctional facility timely and effectively. Such a system receives both real time communication data and recorded communication data, processes and investigates the data automatically, and stores the received communication data and processed communication data in a unified data server. Such a system enables a reviewer to review, modify and insert markers and comments for the communication data. Such a system further enables the reviewer to search the communication data and create scheduled search reports.Type: GrantFiled: December 7, 2020Date of Patent: December 13, 2022Assignee: Global Tel*Link CorporationInventor: Stephen Lee Hodge
-
Patent number: 11526667Abstract: Embodiments of the present systems and methods may provide techniques for augmenting textual data that may be used for textual classification tasks. Embodiments of such techniques may provide the capability to synthesize labeled data to improve text classification tasks. Embodiments may be specifically useful when only a small amount of data is available, and provide improved performance in such cases. For example, in an embodiment, a method implemented in a computer system may comprise a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, and the method may comprise fine-tuning a language model using a training dataset, synthesizing a plurality of samples using the fine-tuned language model, filtering the plurality of synthesized samples, and generating an augmented training dataset comprising the training dataset and the filtered plurality of synthesized sentences.Type: GrantFiled: May 9, 2020Date of Patent: December 13, 2022Assignee: International Business Machines CorporationInventors: Amir Kantor, Ateret Anaby Tavor, Boaz Carmeli, Esther Goldbraich, George Kour, Segev Shlomov, Naama Tepper, Naama Zwerdling
-
Patent number: 11526323Abstract: Example apparatus, computer-implemented methods, systems, devices, and computer-readable media facilitate concurrent consumption of media content by multiple users using superimposed animation. Example instructions, when executed, cause at least one processor of an electronic user device to at least cause a display to present media and a visual representation of a remote individual, the visual representation including at least a portion of a human profile; identify a direction of a gaze of a user of the electronic user device based on signals output by one or more sensors of the electronic user device; and determine whether the gaze of the user is directed toward the display of the media or the display of the visual representation based on the direction of the gaze.Type: GrantFiled: December 23, 2020Date of Patent: December 13, 2022Assignee: Intel CorporationInventors: Paul I. Felkai, Annie Harper, Ratko Jagodic, Rajiv K. Mongia, Garth Shoemaker
-
Patent number: 11522730Abstract: In an approach to customizing meeting notes, a computer receives audio input of a virtual meeting, converts the audio input to text, and displays the text to a plurality of meeting participants. A computer receives highlighted phrases of the text from the plurality of meeting participants and determines a highlighting frequency of each of the highlighted phrases. A computer determines phrases with a highlighting frequency greater than a pre-defined threshold. A computer orders the phrases based on a chronological order of the phrases in the audio input. A computer determines preferences of a first meeting participant associated with a meeting summary. A computer generates a customized summary of the virtual meeting for the first meeting participant of the plurality of meeting participants based on the ordered phrases with a high frequency of highlighting and on the preferences. A computer transmits the customized summary to the first meeting participant.Type: GrantFiled: October 5, 2020Date of Patent: December 6, 2022Assignee: International Business Machines CorporationInventors: Ruchi Asthana, Jennifer A. Mallette, Steven Ware Jones, Nicholas Fong, Vivek Salve
-
Patent number: 11521608Abstract: Methods and systems for correcting, based on subsequent second speech, an error in an input generated from first speech using automatic speech recognition, without an explicit indication in the second speech that a user intended to correct the input with the second speech, include determining that a time difference between when search results in response to the input were displayed and when the second speech was received is less than a threshold time, and based on the determination, correcting the input based on the second speech. The methods and systems also include determining that a difference in acceleration of a user input device, used to input the first speech and second speech, between when the search results in response to the input were displayed and when the second speech was received is less than a threshold acceleration, and based on the determination, correcting the input based on the second speech.Type: GrantFiled: May 24, 2017Date of Patent: December 6, 2022Assignee: Rovi Guides, Inc.Inventor: Arun Sreedhara
-
Patent number: 11520471Abstract: The illustrative embodiments described herein provide systems and methods for notifying a user when a set of characters are identified in a media file. In one embodiment, a method includes receiving a set of characters inputted by the user of a computing device, playing the media file, transcribing the media file to form a transcription, and determining whether the transcription of the media file includes the set of characters. The method also includes initiating a notification prompt on a graphical user interface of the computing device in response to determining that the media file includes the set of characters.Type: GrantFiled: October 31, 2021Date of Patent: December 6, 2022Assignee: GOOGLE LLCInventor: Margarita Khafizova
-
Patent number: 11514914Abstract: Systems and methods for an intelligent virtual assistant for meetings are disclosed. In one embodiment, a system for an intelligent virtual assistant for meeting may include a server comprising at least one computer processor executing a virtual assistant computer program; a communication server in communication with the server; and a plurality of communication devices in communication with the server and the communication server, wherein the communication server facilitates an electronic meeting with a plurality of attendees via the plurality of communication devices. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees, may receive an edited transcription, and may update the speech recognition algorithm based on the edited transcription.Type: GrantFiled: February 5, 2020Date of Patent: November 29, 2022Assignee: JPMORGAN CHASE BANK, N.A.Inventors: Daniel D McQuiston, Aarti Narayanan, Dave Burrells, Simon Burke, Jan S Dabrowski, Rhys Dawes, Charlotte Knight, Libby Kent, Sandeep Koul, Uday Pant, Tony M Nazarowski, Aditi Vaidya, Ayush Kumar Bilala, Charanjith Allaparambil Chandran, Prayag Godha, Nikhil Kotikanikadanam Madhusudhan, Chitra Pillai Sundaribai, Aditya Anil Upadhyay, Eric Han Kai Chang, Stefan Cristian Bardasu, Erin Michelle Perry, Saifuddin Merchant, James P White, III
-
Patent number: 11515020Abstract: A method, computer program product, and computing system for: receiving an initial portion of an encounter record; processing the initial portion of the encounter record to generate initial content for a medical report; receiving one or more additional portions of the encounter record; and processing the one or more additional portions of the encounter record to modify the medical report.Type: GrantFiled: March 5, 2019Date of Patent: November 29, 2022Assignee: Nuance Communications, Inc.Inventors: Paul Joseph Vozila, Joel Praveen Pinto, Kumar Abhinav, Haibo Li, Marilisa Amoia, Frank Diehl
-
Patent number: 11516518Abstract: A method comprises receiving from each of a plurality of commentator applications respective distinct streams of media content comprising commentary information, combining at least portions of selected ones of the distinct streams of media content comprising commentary information in a mixer associated with a media server to generate a composite media content stream, and providing the composite media content stream generated by the mixer to one or more servers of a content delivery network for delivery to one or more viewer devices. The commentary information of a given one of the distinct streams of media content received from a corresponding one of the commentator applications illustratively comprises at least one of audio content, video content, image content, social media posting content, chat text and closed caption text. The mixer may comprise a post-mixer coupled to the media server.Type: GrantFiled: June 18, 2021Date of Patent: November 29, 2022Assignee: Kiswe Mobile Inc.Inventors: Bert De Decker, Tom Cuypers, Wim Sweldens, Francis X. Zane, Thomas J. Janiszewski, Yung-Lung Ho
-
Patent number: 11507345Abstract: Systems and methods to accept speech input and edit a note upon receipt of an indication to edit are disclosed. Exemplary implementations may: effectuate presentation of a graphical user interface that includes a note, the note including note sections, the note sections including a first note section, the individual note sections including body fields; obtain user input from the client computing platform, the user input representing an indication to edit a first body field of the first note section; obtain audio information representing sound captured by an audio section of the client computing platform, the audio information including value definition information specifying one or more values to be included in the individual body fields; perform speech recognition on the audio information to obtain a first value; and populate the first body field with the first value so that the first value is included in the first body field.Type: GrantFiled: September 23, 2020Date of Patent: November 22, 2022Assignee: SuKI AI, Inc.Inventor: Matt Pallakoff
-
Patent number: 11508106Abstract: An disclosure includes: moving image acquisition unit configured to acquire moving image data obtained through moving image capturing of at least a mouth part of an utterer; a lip detection unit configured to detect a lip part from the moving image data and detect motion of the lip part; a moving image processing unit configured to generate a moving image enhanced to increase the motion of the lip part detected by the lip detection unit; and a display control unit configured to control a display panel to display the moving image generated by the moving image processing unit.Type: GrantFiled: April 8, 2020Date of Patent: November 22, 2022Assignee: JVCKENWOOD CorporationInventor: Takuji Teruuchi
-
Patent number: 11507253Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing contextual information for a displayed resource that includes an image. In one aspect, a method includes providing, by a user device to a contextual information server, a query-independent request for contextual information relevant to an active resource displayed in an application environment on the user device, wherein the request specifies content of the active resource and further specifies that the active resource displayed on the user device includes an image, but does not include the image in the request, receiving a request for the image from the contextual information server, providing the image to the contextual information server, receiving a user interface element that includes contextual information regarding the image, and displaying the user interface element on the user device with the active resource.Type: GrantFiled: October 5, 2020Date of Patent: November 22, 2022Assignee: GOOGLE LLCInventors: Joao Paulo Pagaime da Silva, Vikram Aggarwal
-
Patent number: 11507346Abstract: A method for text feedback includes: receiving, by a controller, an utterance from a user; determining, by an automatic speech recognition engine of the controller, a plurality of speech recognition results based on the utterance from the user, wherein the speech recognition results include probable commands; determining, by the automatic speech recognition engine of the controller, a plurality of confidence scores for each of the plurality of speech recognition results; determining, by the controller, a text characteristic for each of the plurality of probable commands as a function of the confidence scores for each of the plurality of speech recognition results; and commanding, by the controller, a display to show text corresponding to each of the plurality of probable commands with the text characteristic determined by the controller.Type: GrantFiled: October 25, 2021Date of Patent: November 22, 2022Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Xu Fang Zhao, Gaurav Talwar, Alaa M. Khamis
-
Patent number: 11507759Abstract: A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.Type: GrantFiled: March 19, 2020Date of Patent: November 22, 2022Assignee: PANASONIC HOLDINGS CORPORATIONInventors: Hiroki Furukawa, Atsushi Sakaguchi, Tsuyoki Nishikawa
-
Patent number: 11501757Abstract: Disclosed herein is an artificial intelligence apparatus including an input interface configured to receive speech data, and a processor configured to detect a non-utterance interval included in the speech data and determine presence/absence of a second utterance after the non-utterance interval according to characteristics of a first utterance before the non-utterance interval, when the non-utterance interval exceeds a set time.Type: GrantFiled: December 17, 2019Date of Patent: November 15, 2022Assignee: LG ELECTRONICS INC.Inventor: Hansuk Shim
-
Patent number: 11501074Abstract: Methods, systems, and computing devices for visualizing natural language processing algorithm processes are described herein. A plurality of categories may be determined. Each color of a plurality of colors may correspond to the categories. Text content may be processed using a natural language processing algorithm. Confidence values indicating, for each of a plurality of portions of the text content, a degree of confidence corresponding to one or more of the plurality of categories may be determined. Display colors may be determined based on the confidence values. A user interface comprising a visualization of the text content may be displayed, and the user interface may be configured to show each portion of the text content using a display color such that the user interface indicates changes in confidence across the plurality of characters.Type: GrantFiled: August 27, 2020Date of Patent: November 15, 2022Assignee: Capital One Services, LLCInventors: Jeremy Goodsitt, Austin Walters, Anh Truong
-
Patent number: 11501091Abstract: A real-time speech-to-speech generator and sign gestures converter system is disclosed. The system is still challenging for deaf or hearing impaired people. Embodiments of the invention provide direct speech to speech translation system and further conversion to sign gestures is disclosed. Direct speech to speech translation and further sign gesture conversion uses a one-tier approach, creating a unified-model for whole application. The single-model ecosystem takes in audio (MEL spectrogram) as an input and gives out audio (MEL spectrogram) as an output to a speech-sign converter device with a display. This solves the bottleneck problem by converting the translated speech directly to sign language gesture from first language with emotion by preserving phonetic information along the way. This model needs parallel audio samples in two languages.Type: GrantFiled: June 11, 2022Date of Patent: November 15, 2022Inventor: Sandeep Dhawan
-
Patent number: 11501780Abstract: Devices, systems, and methods for automatic real-time moderation of meetings, by a computerized or automated moderation unit able to manage, steer and guide the meeting in real-time and able to selectively generate and convey real-time differential notifications and advice to particular participants. A Meeting Moderator Bot monitors audio conversations in a meeting, and analyzes their textual equivalent; detects topics that were skipped or that should be discussed, and notifies participants; detects double-talk or interferences and generates warnings accordingly; detects absence of participants that are relevant to particular topics; detects that the conversation should shift to another topic on the agenda; generates other meeting steering notifications; and monitors compliance of the meeting participants with such steering notifications.Type: GrantFiled: May 19, 2020Date of Patent: November 15, 2022Assignee: AUDIOCODES LTD.Inventors: Shabtai Adlersberg, Menachem Honig, Tatiana Adar
-
Patent number: 11503401Abstract: A dual-zone automotive multimedia system may include a first infotainment device associated with a front zone of a vehicle, at least one second infotainment device associated with a rear zone of a vehicle, wherein the at least one second infotainment device includes a directional loudspeaker arranged facing the rear zone of the vehicle, and a processor programmed to transmit audio signals to the first and second infotainment devices to create sound at each of the front and rear zones, wherein the audio signal transmitted to the directional loudspeaker relates to playback at the rear zone.Type: GrantFiled: February 19, 2021Date of Patent: November 15, 2022Assignee: Harman International Industries, IncorporatedInventors: Riley Winton, Christopher Ludwig, Christopher Michael Trestain, Maxwell Boone Willis
-
Patent number: 11496623Abstract: A telecommunications network for playing an enhanced announcement in the same format as that of an enhanced call is described herein. An enhanced call is a call via real time text or video. The telecommunications network includes a node or subsystem, such as an IP multimedia subsystem core (IMS or IMS core), programmed to receive an enhanced call in a text or video format, detect the format of the enhanced call, and return an enhanced announcement—an announcement provided in the same format as the enhanced call. The IMS core can include one or more sub-nodes or sub-components, including a telephony application server (TAS), a media resource function (MRF), or both.Type: GrantFiled: July 2, 2021Date of Patent: November 8, 2022Assignee: T-MOBILE USA, INC.Inventor: Tianmin Ding
-
Patent number: 11495206Abstract: Voice synthesis method and apparatus generate second control data using an intermediate trained model with first input data including first control data designating phonetic identifiers, change the second control data in accordance with a first user instruction provided by a user, generate synthesis data representing frequency characteristics of a voice to be synthesized using a final trained model with final input data including the first control data and the changed second control data, and generate a voice signal based on the generated synthesis data.Type: GrantFiled: May 28, 2020Date of Patent: November 8, 2022Assignee: YAMAHA CORPORATIONInventor: Ryunosuke Daido
-
Patent number: 11488604Abstract: A method may include obtaining first features of first audio data that includes speech and obtaining second features of second audio data that is a revoicing of the first audio data. The method may further include providing the first features and the second features to an automatic speech recognition system and obtaining a single transcription generated by the automatic speech recognition system using the first features and the second features.Type: GrantFiled: August 19, 2020Date of Patent: November 1, 2022Assignee: Sorenson IP Holdings, LLCInventor: David Thomson
-
Patent number: 11482214Abstract: Techniques for speech-to-text hypothesis generation and hypothesis selection described. A text input representing at least part of a voice recording is received from a speech-to-text component. A first text alternative is generated using a finite state transducer based at least in part on the text input. A hypothesis from a hypothesis set is selected using a language model that includes probabilities for sequences of words, the hypothesis set including the text input and the first text alternative. A selected hypothesis text associated with the selected hypothesis is sent to a search engine.Type: GrantFiled: December 12, 2019Date of Patent: October 25, 2022Assignee: Amazon Technologies, Inc.Inventors: Natali Arieli, Eran Fainman, Yochai Zvik, Yaniv Ben-Yehuda
-
Patent number: 11483273Abstract: Chat-based interaction with an in-meeting virtual assistant may be provided. First, audio input associated with a meeting may be received. Next, an intent from the audio input may be detected. Text content associated with the audio input may then be generated in response to detecting the intent from the audio input. The text content may be displayed in a chat interface.Type: GrantFiled: July 14, 2020Date of Patent: October 25, 2022Assignee: Cisco Technology, Inc.Inventors: Mohamed Gamal Mohamed Mahmoud, Omar Tarek El-Sadany
-
Patent number: 11474780Abstract: An electronic device includes a communication circuit, a display, a microphone, a processor operatively connected to the communication circuit, the display, and the microphone, and a memory operatively connected to the processor, wherein the memory is configured to store instructions which, when executed, cause the processor to control the electronic device to: transmit information related to a predetermined event to a server through the communication circuit in response to detection of the predetermined event through an application, display a user interface through the display in response to reception of information related to the user interface including at least one visual object selectable by a user to control a function of the application through the communication circuit, receive a user-uttered input for selecting one of the at least one visual object included in the user interface through the microphone, and transmit information related to the user-uttered input to the server through the communicationType: GrantFiled: February 14, 2020Date of Patent: October 18, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Jaeyoung Yang, Donghee Suh, Hojun Jaygarl, Minsung Kim, Jinwoong Kim, Youngbin Kim, Kwangbin Lee, Youngmin Lee
-
Patent number: 11475887Abstract: An electronic device receives audio data for a media item. The electronic device generates, from the audio data, a plurality of samples, each sample having a predefined maximum length. The electronic device, using a neural network trained to predict textal unit probabilities, generates a probability matrix of textual units for a first portion of a first sample of the plurality of samples. The probability matrix includes information about textual units, timing information, and respective probabilities of respective textual units at respective times. The electronic device identifies, for the first portion of the first sample, a first sequence of textual units based on the generated probability matrix.Type: GrantFiled: November 21, 2019Date of Patent: October 18, 2022Assignee: Spotify ABInventors: Daniel Stoller, Simon René Georges Durand, Sebastian Ewert
-
Patent number: 11470404Abstract: An example method performed by a processing system includes retrieving a digital model of a media element from a database storing a plurality of media elements, wherein the media element is to be inserted into a scene of an audiovisual media, rendering the media element in the scene of the audiovisual media, based on the digital model of the media element and on metadata associated with the digital model to produce a rendered media element, wherein the metadata describes a characteristic of the media element and a limit on the characteristic, and inserting the rendered media element into the scene of the audiovisual media.Type: GrantFiled: May 26, 2020Date of Patent: October 11, 2022Assignees: AT&T Intellectual Property I, L.P., AT&T Mobility II LLCInventors: John Oetting, Eric Zavesky, James Pratt, Jason Decuir, Terrel Lecesne
-
Patent number: 11468867Abstract: A system and method for providing acoustic output is disclosed, the system comprising a communication device, a processor coupled to the communication device, and a memory coupled to the processor. The processor receives multimedia data associated with a multimedia output stream, extracts audio data based on the multimedia data, and generates a rhythmic data set including time-series acoustic characteristic data based on the extracted audio data. A sequence of visual elements is generated based on the time-series acoustic characteristic data and associated with the respective visual elements in the sequence of visual elements with the multimedia data. The multimedia data for visually displaying the acoustic characteristic data concurrently with the multimedia stream is transmitted to a multimedia output device.Type: GrantFiled: March 25, 2020Date of Patent: October 11, 2022Assignee: COMMUNOTE INC.Inventor: Kemal S. Ahmed
-
Patent number: 11468123Abstract: Disclosed is an electronic apparatus providing a reply to a query of a user. The electronic apparatus includes a microphone, a camera, a memory configured to store at least one instruction, and at least one processor, and the processor is configured to execute the at least one instruction to control the electronic apparatus to: identify a region of interest corresponding to a co-reference in an image acquired through the camera based on a co-reference being included in the query, identify an object referred to by the co-reference among at least one object included in the identified region of interest based on a dialogue content that includes the query, and provide information on the identified object as the reply.Type: GrantFiled: July 29, 2020Date of Patent: October 11, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Kangwook Lee, Jaewon Kim, Jiin Nam, Huiwon Yun, Hojin Jung, Kunal Chawla, Akhil Kedia
-
Patent number: 11470327Abstract: Scene aware video content encoding techniques can determine if video content is a given content type and is one of one or more given titles that include one or more given scenes. The one or more given scenes of the video content of the given type and given one of the titles can be encoded using corresponding scenes specific encoding parameter values, and the non-given scenes can be encoded using one or more general encoding parameter values. The one or more given titles can be selected based on a rate of streaming of various video content titles of the given type.Type: GrantFiled: March 30, 2020Date of Patent: October 11, 2022Assignee: Alibaba Group Holding LimitedInventors: Tae Meon Bae, Minghai Qin, Guanlin Wu, Yen-kuang Chen, Qinggang Zhou, Shaolin Xie
-
Patent number: 11462216Abstract: A method for selecting a speech recognition result on a computing device includes receiving a first speech recognition result determined by the computing device, receiving first features, at least some of the features being determined using the first speech recognition result, determining whether to select the first speech recognition result or to wait for a second speech recognition result determined by a cloud computing service based at least in part on the first speech recognition result and the first features.Type: GrantFiled: March 26, 2020Date of Patent: October 4, 2022Assignee: Cerence Operating CompanyInventor: Min Tang
-
Patent number: 11461779Abstract: Techniques for transferring control of a system-user dialog session are described. A first speechlet component may interact with a user until the first speechlet component receives user input that the first speechlet component cannot handle. The first speechlet component may output an action representing the user input. A system may determine a second speechlet component configured to execute the action. The system may send the second speechlet component a navigator object that results in the second speechlet component handling the user interaction that the first speechlet component could not handle. Once the second speechlet component is finished processing, the second speechlet component may output an updated navigator object, which causes the first speechlet component to either further interact with a user or cause a current dialog session to be closed.Type: GrantFiled: March 23, 2018Date of Patent: October 4, 2022Assignee: Amazon Technologies, Inc.Inventors: Rohin Dabas, Troy Dean Schuring, Xu Zhang, Maksym Kolodeznyi, Andres Felipe Borja Jaramillo, Nnenna Eleanya Okwara, Alberto Milan Gutierrez, Rashmi Tonge
-
Patent number: 11455151Abstract: Disclosed herein is a software technology for facilitating an interactive conversational session between a user and a digital conversational character. For instance, in one aspect, the disclosed process may involve two primary phases: (1) an authoring phase that involves a first user accessing a content authoring tool to create a given type of visual conversation application that facilitates interactions between a second user and a digital conversational character in an interactive conversational session, and (2) a rendering phase that involves the second user accessing the created visual conversation application to interact with the digital conversational character in an interactive conversational session. In one implementation, accessing the created visual conversation application may involve detecting an object and identifying information associated with the detected object.Type: GrantFiled: October 11, 2019Date of Patent: September 27, 2022Assignee: HIA Technologies Inc.Inventors: Vacit Arat, Richard Cardran, Rick King
-
Patent number: 11450334Abstract: To filter unwanted sounds from a conference call, a first voice signal is captured by a first device during a conference call and converted into corresponding text, which is then analyzed to determine that a first portion of the text was spoken by a first user and a second portion of the text was spoken by a second user. If the first user is relevant to the conference call while the second user is not, the first voice signal is prevented from being transmitted into the conference call, the first portion of text is converted into a second voice signal using a voice profile of the first user to synthesize the voice of the first user, and the second voice signal is then transmitted into the conference call. The second portion of text is not converted into a voice signal, as the second user is determined not to be relevant.Type: GrantFiled: September 9, 2020Date of Patent: September 20, 2022Assignee: Rovi Guides, Inc.Inventors: Rajendran Pichaimurthy, Madhusudhan Seetharam
-
Patent number: 11450319Abstract: The present disclosure discloses an image processing device including: a receiving module configured to receive a voice signal and an image to be processed; a conversion module configured to convert the voice signal into an image processing instruction and determine a target area according to a target voice instruction conversion model, in which the target area is a processing area of the image to be processed; and a processing module configured to process the target area according to the image processing instruction and a target image processing model. The examples may realize the functionality of using voice commands to control image processing, which may save users' time spent in learning image processing software prior to image processing, and improve user experience.Type: GrantFiled: December 18, 2019Date of Patent: September 20, 2022Inventors: Tianshi Chen, Shuai Hu, Xiaobing Chen
-
Patent number: 11450095Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for machine learning for video analysis and feedback. In some implementations, a machine learning model is trained to classify videos into performance level classifications based on characteristics of image data and audio data in the videos. Video data captured by a device of a user following a prompt that the device provides to the user is received. A set of feature values that describe audio and video characteristics of the video data are determined. The set of feature values are provided as input to the trained machine learning model to generate output that classifies the video data with respect to the performance level classifications. A user interface of the device is updated based on the performance level classification for the video data.Type: GrantFiled: January 21, 2022Date of Patent: September 20, 2022Assignee: Voomer, Inc.Inventor: David Wesley Anderton-Yang
-
Patent number: 11445056Abstract: Technologies related to telecommunications are described herein, wherein such technologies are configured to assist users with hearing impairments. The technologies described herein cause transcriptions of spoken utterances directed to a telephone in a telephone conversation to be presented on a display of the telephone nearly simultaneously with the spoken utterances being audibly output by the telephone.Type: GrantFiled: August 8, 2020Date of Patent: September 13, 2022Assignee: EUGENIOUS ENTERPRISES, LLCInventors: Daniel Yusef Abdelsamed, Michael J. Medley, Matthew G. Good
-
Patent number: 11443741Abstract: A natural language processing (NLP) apparatus includes a housing, a built-in voice input interface; a built-in data communication interface configured to establish data communication with multiple types of appliances; a built-in NLP module, and a built-in control device. A first voice input is received through the built-in voice input interface; if the target appliance is a first appliance of a first appliance type, the first voice input is processed using a first NLP model of the built-in NLP module to obtain a first machine command, and the first machine command is sent via the built-in data communication interface to the first appliance; and if the target appliance is a second appliance of a second appliance type, the first voice input is processed using a second NLP model of the built-in NLP module, and the second machine command is sent via the built-in data communication interface to the second appliance.Type: GrantFiled: April 9, 2020Date of Patent: September 13, 2022Assignee: MIDEA GROUP CO. LTD.Inventors: Haibin Huang, Chen Zhang, Xin Liu
-
Patent number: 11442614Abstract: A method and workstation for generating a transcript of a conversation between a patient and a healthcare practitioner is disclosed. A workstation is provided with a tool for rendering of an audio recording of the conversation and generating a display of a transcript of the audio recording using a speech-to-text engine, thereby enabling inspection of the accuracy of conversion of speech to text. A tool is provided for scrolling through the transcript and rendering the portion of the audio according to the position of the scrolling. There is a highlighting in the transcript of words or phrases spoken by the patient relating to symptoms, medications or other medically relevant concepts. Additionally, there is provided a set of transcript supplement tools enabling editing of specific portions of the transcript based on the content of the corresponding portion of audio recording.Type: GrantFiled: March 29, 2021Date of Patent: September 13, 2022Assignee: Google LLCInventors: Melissa Strader, William Ito, Christopher Co, Katherine Chou, Alvin Rajkomar, Rebecca Rolfe
-
Patent number: 11437027Abstract: Techniques for handling errors during processing of natural language inputs are described. A system may process a natural language input to generate an ASR hypothesis or NLU hypothesis. The system may use more than one data searching technique (e.g., deep neural network searching, convolutional neural network searching, etc.) to generate an alternate ASR hypothesis or NLU hypothesis, depending on the type of hypothesis input for alternate hypothesis processing.Type: GrantFiled: December 4, 2019Date of Patent: September 6, 2022Assignee: Amazon Technologies, Inc.Inventors: Chenlei Guo, Xing Fan, Jin Hock Ong, Kai Wei
-
Patent number: 11436938Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for engaging in interactive sessions between computerized devices and participants. The method includes receiving user input that interacts with user controls to specify, for each respective content item in a plurality of content items, a text spelling and an audio recording. The method includes receiving user input that selects a set of user-selected content items from among the plurality of content items for inclusion as part of the first interactive exercise, and assigns an order to the user-selected content items. The computing system presents the selected content items in the user-selected order, receives user input that inputs the respective content items, and determines whether the user input matches the content item.Type: GrantFiled: February 11, 2020Date of Patent: September 6, 2022Assignee: Debby Webby, LLCInventors: Deb Mallin, Marc Dispensa
-
Patent number: 11432045Abstract: Disclosed is a display device. According to an embodiment, a display device may include a voice signal receiver, a display, at least one memory storing an application supporting a contents providing service and storing instructions, a communication circuit communicating with at least one external server supporting the contents providing service, and at least one processor. The contents providing service may provide contents files of a first type and contents files of a second type.Type: GrantFiled: February 19, 2019Date of Patent: August 30, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jibum Moon, Gyungchan Seol, Kyerim Lee
-
Patent number: 11429789Abstract: Embodiments relate to an intelligent computer platform to identify and evaluate candidate passage response data in natural language form. Natural language processing is applied to analyze a passage against one or more input tokens to identify matching content. A structure representing the analyzed passage is populated with matching input and passage tokens. A first count of matching token entries and a second count of evaluated token entries are determined and qualified by closeness criteria. An alignment of the passage to a candidate question is calculated, including assessing a ratio of the first and second counts as a confidence value. Matching passage data is returned from the passage with the confidence value.Type: GrantFiled: June 12, 2019Date of Patent: August 30, 2022Assignee: International Business Machines CorporationInventors: Stephen A. Boxwell, Keith G. Frost, Kyle M. Brake, Stanley J. Vernier
-
Patent number: 11430428Abstract: The present disclosure describes a method, apparatus, and storage medium for performing speech recognition. The method includes acquiring, by an apparatus, first to-be-processed speech information. The apparatus includes a memory storing instructions and a processor in communication with the memory. The method includes acquiring, by the apparatus, a first pause duration according to the first to-be-processed speech information; and in response to the first pause duration being greater than or equal to a first threshold, performing, by the apparatus, speech recognition on the first to-be-processed speech information to obtain a first result of sentence segmentation of speech, the first result of sentence segmentation of speech being text information, the first threshold being determined according to speech information corresponding to a previous moment.Type: GrantFiled: September 10, 2020Date of Patent: August 30, 2022Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Lianwu Chen, Jingliang Bai, Min Luo
-
Patent number: 11432090Abstract: An audio system can be configured to generate an audio heatmap for the audio emission potential profiles for one or more speakers, in specific or arbitrary locations. The audio heatmap maybe based on speaker location and orientation, speaker acoustic properties, and optionally environmental properties. The audio heatmap often shows areas of low sound density when there are few speakers, and areas of high sound density when there are a lot of speakers. An audio system may be configured to normalize audio signals for a set of speakers that cooperatively emit sound to render an audio object in a defined audio object location. The audio signals for each speaker can be normalized to ensure accurate rendering of the audio object without volume spikes or dropout.Type: GrantFiled: January 12, 2021Date of Patent: August 30, 2022Assignee: SPATIALX INC.Inventors: Xavier Prospero, Aric Marshall, Michael Plitkins, Calin Pacurariu
-
Patent number: 11423897Abstract: Systems and methods are described herein for generating an adaptive response to a user request. Input indicative of a user request may be received and utilized to identify an item in an electronic catalog. Title segments may be identified from the item's title. Significant segments of the user request may be determined. In response to the user request, a shortened title may be generated from the identified title segments and provided as output at the user device (e.g., via audible output provided at a speaker of the user device, via textual output, or the like). At least one of the title segments provided in the shortened title may correlate to the significant segment identified from the user request. In some embodiments, the length and content of the shortened title may vary based at least in part on the contextual intent of the user's request.Type: GrantFiled: January 30, 2020Date of Patent: August 23, 2022Assignee: Amazon Technologies, Inc.Inventors: Ran Levy, Ori Rozen, Leon Portman, Knaan Ratosh, Ido Arad, Hadar Neumann
-
Patent number: 11423236Abstract: A method for identifying phrases in a text document having a similar discourse to a candidate phrase includes separating text in a document file into a plurality of phrases and generating a plurality of embedding vectors in a textual embedding space by inputting the plurality of phrases into an embedding engine. A mapping of each embedding vector in the textual embedding space is generated with each corresponding phrase and a document location of each corresponding phrase in the document file. A candidate phrase is received by a user and a candidate embedding vector is generated using the embedding engine. Similarity scores are computed based on the plurality of embedding space distances between the candidate phrase embedding vector location and each respective location of each embedding vector in the textual embedding space. A listing of phrases with the highest similarity scores are outputted with respective document locations in the text.Type: GrantFiled: June 12, 2020Date of Patent: August 23, 2022Assignee: Capital One Services, LLCInventors: Austin Walters, Vincent Pham, Ernest Kwak, Galen Rafferty, Reza Farivar, Jeremy Goodsitt, Anh Truong
-
Patent number: 11425315Abstract: A video communications method is provided, including: respectively displaying video images of at least two terminals in at least two display subareas of a video communication interface in a video chat session of the at least two terminal; obtaining a first special effect display instruction; and adding a first special effect to the at least two display subareas based on the first special effect display instruction. The method also includes transmitting the first special effect display instruction to a second terminal of the at least two terminals, the second terminal being an action recipient of the first special effect; and selecting, among multiple end special effects, a target end special effect to be added to the video images of the at least two terminals according to a body action occurred in the video image of the second terminal.Type: GrantFiled: February 3, 2020Date of Patent: August 23, 2022Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Ying Zhu, Chao Wang, Yinglei Liang, Haoqi Kuang, Lin Shi, Jinjie Wang, Weisong Zhu