Speech To Image Patents (Class 704/235)
-
Patent number: 11432045Abstract: Disclosed is a display device. According to an embodiment, a display device may include a voice signal receiver, a display, at least one memory storing an application supporting a contents providing service and storing instructions, a communication circuit communicating with at least one external server supporting the contents providing service, and at least one processor. The contents providing service may provide contents files of a first type and contents files of a second type.Type: GrantFiled: February 19, 2019Date of Patent: August 30, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jibum Moon, Gyungchan Seol, Kyerim Lee
-
Patent number: 11429789Abstract: Embodiments relate to an intelligent computer platform to identify and evaluate candidate passage response data in natural language form. Natural language processing is applied to analyze a passage against one or more input tokens to identify matching content. A structure representing the analyzed passage is populated with matching input and passage tokens. A first count of matching token entries and a second count of evaluated token entries are determined and qualified by closeness criteria. An alignment of the passage to a candidate question is calculated, including assessing a ratio of the first and second counts as a confidence value. Matching passage data is returned from the passage with the confidence value.Type: GrantFiled: June 12, 2019Date of Patent: August 30, 2022Assignee: International Business Machines CorporationInventors: Stephen A. Boxwell, Keith G. Frost, Kyle M. Brake, Stanley J. Vernier
-
Patent number: 11430428Abstract: The present disclosure describes a method, apparatus, and storage medium for performing speech recognition. The method includes acquiring, by an apparatus, first to-be-processed speech information. The apparatus includes a memory storing instructions and a processor in communication with the memory. The method includes acquiring, by the apparatus, a first pause duration according to the first to-be-processed speech information; and in response to the first pause duration being greater than or equal to a first threshold, performing, by the apparatus, speech recognition on the first to-be-processed speech information to obtain a first result of sentence segmentation of speech, the first result of sentence segmentation of speech being text information, the first threshold being determined according to speech information corresponding to a previous moment.Type: GrantFiled: September 10, 2020Date of Patent: August 30, 2022Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Lianwu Chen, Jingliang Bai, Min Luo
-
Patent number: 11432090Abstract: An audio system can be configured to generate an audio heatmap for the audio emission potential profiles for one or more speakers, in specific or arbitrary locations. The audio heatmap maybe based on speaker location and orientation, speaker acoustic properties, and optionally environmental properties. The audio heatmap often shows areas of low sound density when there are few speakers, and areas of high sound density when there are a lot of speakers. An audio system may be configured to normalize audio signals for a set of speakers that cooperatively emit sound to render an audio object in a defined audio object location. The audio signals for each speaker can be normalized to ensure accurate rendering of the audio object without volume spikes or dropout.Type: GrantFiled: January 12, 2021Date of Patent: August 30, 2022Assignee: SPATIALX INC.Inventors: Xavier Prospero, Aric Marshall, Michael Plitkins, Calin Pacurariu
-
Patent number: 11423897Abstract: Systems and methods are described herein for generating an adaptive response to a user request. Input indicative of a user request may be received and utilized to identify an item in an electronic catalog. Title segments may be identified from the item's title. Significant segments of the user request may be determined. In response to the user request, a shortened title may be generated from the identified title segments and provided as output at the user device (e.g., via audible output provided at a speaker of the user device, via textual output, or the like). At least one of the title segments provided in the shortened title may correlate to the significant segment identified from the user request. In some embodiments, the length and content of the shortened title may vary based at least in part on the contextual intent of the user's request.Type: GrantFiled: January 30, 2020Date of Patent: August 23, 2022Assignee: Amazon Technologies, Inc.Inventors: Ran Levy, Ori Rozen, Leon Portman, Knaan Ratosh, Ido Arad, Hadar Neumann
-
Patent number: 11423236Abstract: A method for identifying phrases in a text document having a similar discourse to a candidate phrase includes separating text in a document file into a plurality of phrases and generating a plurality of embedding vectors in a textual embedding space by inputting the plurality of phrases into an embedding engine. A mapping of each embedding vector in the textual embedding space is generated with each corresponding phrase and a document location of each corresponding phrase in the document file. A candidate phrase is received by a user and a candidate embedding vector is generated using the embedding engine. Similarity scores are computed based on the plurality of embedding space distances between the candidate phrase embedding vector location and each respective location of each embedding vector in the textual embedding space. A listing of phrases with the highest similarity scores are outputted with respective document locations in the text.Type: GrantFiled: June 12, 2020Date of Patent: August 23, 2022Assignee: Capital One Services, LLCInventors: Austin Walters, Vincent Pham, Ernest Kwak, Galen Rafferty, Reza Farivar, Jeremy Goodsitt, Anh Truong
-
Patent number: 11425315Abstract: A video communications method is provided, including: respectively displaying video images of at least two terminals in at least two display subareas of a video communication interface in a video chat session of the at least two terminal; obtaining a first special effect display instruction; and adding a first special effect to the at least two display subareas based on the first special effect display instruction. The method also includes transmitting the first special effect display instruction to a second terminal of the at least two terminals, the second terminal being an action recipient of the first special effect; and selecting, among multiple end special effects, a target end special effect to be added to the video images of the at least two terminals according to a body action occurred in the video image of the second terminal.Type: GrantFiled: February 3, 2020Date of Patent: August 23, 2022Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Ying Zhu, Chao Wang, Yinglei Liang, Haoqi Kuang, Lin Shi, Jinjie Wang, Weisong Zhu
-
Patent number: 11425487Abstract: Disclosed is a translation system using a sound vibration microphone capable of fundamentally blocking noise by detecting vibrations generated from a user's voice using a piezo element. The translation system using a sound vibration microphone includes a first earset including a sound vibration sensor and a speaker, a first wearable acoustic device including a microphone and a speaker and performing wireless communication with the first earset, and a first electronic communication device performing wireless communication with the first wearable acoustic device and performing communication with a translation server, wherein the first wearable acoustic device and the first electronic communication device are in a communicatable state.Type: GrantFiled: November 23, 2020Date of Patent: August 23, 2022Assignee: EM-TECH CO., LTD.Inventors: Cheon Myeong Kim, Byung Min Yu, Jae Hwa Lim, Jeong Ae Ha, Ji Young Lee
-
Patent number: 11423114Abstract: Systems for automatically suggesting content items to an author of a digital page so that they can be added to the page. One embodiment comprises a system having a page editor that launches a component browser to enable viewing of content items stored in a repository. A web content management server extracts textual information from a digital page that is open for editing and analyzes the information to identify words indicative of the page's subject matter. The system generates a query using the identified words and a search engine searches the repository for items associated with the identified words. The content items identified by the search are displayed by the component browser and are selectable to add particular suggested content items to the page.Type: GrantFiled: November 7, 2019Date of Patent: August 23, 2022Assignee: OPEN TEXT HOLDINGS, INC.Inventors: Paul Kieron Carl Narth, Gregory Aaron Parmiter, Jason Jackson, Partha Saradhi Govind
-
Patent number: 11423073Abstract: An audio-visual management system and process for facilitating comprehension of electronic documents. As an example, a user may access contents of an electronic document via a user interface and request a speech-synthesized recitation of any text in the electronic document. In addition, the user interface can include a navigation tool configured to visually represent significant events in the electronic document via selectable indicators, including but not limited to comments, slides, images, graphs, and/or other internal contextual markers. These indicators can serve as navigation options that can provide guidance to users as they listen to a reading of the electronic document. User selection of an indicator can initiate an audio presentation of textual content that is directly associated with the specified event. In addition, users can view availability of others currently listening to the document and connect with them to create a joint listening experience.Type: GrantFiled: November 16, 2018Date of Patent: August 23, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Erez Kikin-Gil, Benjamin David Smith, Colleen Griffiths Estrada
-
Patent number: 11417340Abstract: Electronic conferences are a common method of conducting a meeting when the participants are not at the same location. When a conference drops a portion of the audio, such carrying speech from a speaker, participants may be excluded from the full content of the conference. By having a device associated with a speaking participant convey both audio and generated text from the speech provided by the speaking participant, a server may be able to determine that speech is missing from the audio portion and automatically insert text content. As a result, poor audio quality may be mitigated with text and omitting unwanted text when the audio quality is sufficient.Type: GrantFiled: December 16, 2019Date of Patent: August 16, 2022Assignee: Avaya Inc.Inventors: Darren Holmes, Jibin George, Salil Dhawan, Sandeep Goynar, Harsimran Jeet Singh
-
Patent number: 11418644Abstract: Systems and methods for making a multitrack recording of a telephony communication, such as a conference call, record the contributions of each participant its own respective, separate recording track. In some instances, the contribution(s) of one or more participants is recorded in separate recording tracks, and the contributions of multiple other participants is mixed and recorded in a single recording track. An organizer or administrator of a telephony communication, such as a conference call, can instruct a multitrack recording system as to how to format a multitrack recording of the telephony communication via commands submitted through an application programming interface (API).Type: GrantFiled: June 28, 2019Date of Patent: August 16, 2022Assignee: Vonage Business Inc.Inventors: Jonathan Farrow, Dragos Oancea
-
Patent number: 11410644Abstract: Systems and methods for generating training data for a supervised topic modeling system from outputs of a topic discovery model are described herein. In an embodiment, a system receives a plurality of digitally stored call transcripts and, using a topic model, generates an output which identifies a plurality of topics represented in the plurality of digitally stored call transcripts. Using the output of the topic model, the system generates an input dataset for a supervised learning model by identify a first subset of the plurality of digitally stored call transcripts that include the particular topic, storing a positive value for the first subset, identifying a second subset that do not include the particular topic, and storing a negative value for the second subset. The input training dataset is then used to train a supervised learning model.Type: GrantFiled: August 18, 2020Date of Patent: August 9, 2022Assignee: INVOCA, INC.Inventors: Michael McCourt, Anoop Praturu
-
Patent number: 11410426Abstract: In non-limiting examples of the present disclosure, systems, methods and devices for generating summary content are presented. Voice audio data and video data for an electronic meeting may be received. A language processing model may be applied to a transcript of the audio data and textual importance scores may be calculated. A video/image model may be applied to the video data and visual importance scores may be calculated. A combined importance score may be calculated for sections of the electronic meeting based on the textual importance scores and the visual importance scores. A meeting summary that includes summary content from sections for which combined importance scores exceed a threshold value may be generated.Type: GrantFiled: June 4, 2020Date of Patent: August 9, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Erez Kikin-Gil, Daniel Yancy Parish
-
Patent number: 11409804Abstract: The present disclosure relates to a data analysis method and a data analysis system thereof. The data analysis method includes steps of: receiving a first learning content data, and adding multiple first segmentation marks to the first learning content data to divide multiple first learning sections on the first learning content data. The first learning sections are arranged according to a time axis. Searching a first keyword string corresponding to each first learning section from the first learning sections. Receiving a analysis command, and analyzing the analysis command with the first keyword string of each of the first learning content data to obtain multiple first similarities, corresponding to the analysis command and each of the first learning sections. Finally, searching for the first learning section with the highest similarity.Type: GrantFiled: August 2, 2019Date of Patent: August 9, 2022Assignee: DELTA ELECTRONICS, INC.Inventors: Shih-Gong Li, Shih-Han Chan, Chao-Hsuan Ko, Guo-Cheng Lan
-
Patent number: 11411961Abstract: An electronic device is provided. The electronic device includes a user interface, a location sensor configured to sense a location of the electronic device, a processor electrically connected with the user interface and the location sensor, and a memory electrically connected with the processor and configured to store a first application program and a second application program. The memory is further configured to store instructions that, when executed, enable the processor to receive first location data with a first degree of accuracy regarding the location of the electronic device from the location sensor, process at least part of the first location data to generate second location data with a second degree of accuracy lower than the first degree of accuracy regarding the location of the electronic device, provide the at least part of the first location data to execute the first application program, and provide at least part of the second location data to execute the second application program.Type: GrantFiled: August 6, 2021Date of Patent: August 9, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Jin-Ho Moon, Jong-Eun Yang, Jae-Yung Yeo
-
Patent number: 11404052Abstract: In a service data processing method performed by a server, user speech information collected by a first terminal is received. A target service operation code according to the user speech information is obtained. The target service operation code is used for identifying target service operation information. The target service operation code is transmitted from the server to the first terminal, so that the first terminal plays the target service operation code by using a speech. The target service operation code obtained by a second terminal is received. A target execution page corresponding to the target service operation code is searched for. The target execution page is transmitted to the second terminal, so that the second terminal executes a service operation corresponding to the target.Type: GrantFiled: September 17, 2020Date of Patent: August 2, 2022Assignee: Tencent Technology (Shenzhen) Company LimitedInventors: Jinglin Ma, Xuewei Fang
-
Patent number: 11404041Abstract: A message management unit receives and accumulates a message, wherein the message is distributed for every update, is the message data representing a latest situation of a competition, an explanation generation unit generates an explanatory text for conveying unconveyed information detected from the message, based on conveyed information, a speech synthesis unit outputs a speech converted from the explanatory text, wherein the explanation generation unit stores the unconveyed information for the explanatory text as the conveyed information, stands by until completion of completion of the speech, and initiates a procedure for generating a new explanatory text based on updated unconveyed information.Type: GrantFiled: May 23, 2018Date of Patent: August 2, 2022Assignees: NIPPON HOSO KYOKAI, NHK Engineering System, Inc.Inventors: Tadashi Kumano, Ichiro Yamada, Atsushi Imai, Hideki Sumiyoshi, Yuko Yamanouchi, Toshihiro Shimizu, Nobumasa Seiyama, Shoei Sato, Reiko Saito, Taro Miyazaki, Kiyoshi Kurihara, Manon Ichiki, Tohru Takagi
-
Patent number: 11404058Abstract: Systems and methods for e-commerce systems using natural language understanding are described. A computing device is configured receive a user utterance including at least one identified semantic component and at least one missing semantic component and generate a context stack including a set of context entries. Each of the context entries includes a root intent element, an entity list element, and a dialogue stack and each context entry in the set of context entries is associated with one of a user utterance or a system utterance. The computing device is further configured to review at least one context entry in the set of context entries to locate the at least one missing semantic element within the dialogue stack and generate an intent flow execution request including the at least one semantic element from the first speech data and the missing semantic element.Type: GrantFiled: October 31, 2018Date of Patent: August 2, 2022Assignee: Walmart Apollo, LLCInventors: Snehasish Mukherjee, Shankara Bhargava Subramanya
-
Patent number: 11398218Abstract: Techniques are described for providing dynamically configured speech output, through which text data from a message is presented as speech output through a text-to-speech (TTS) engine that employs a voice profile to provide a machine-generated voice that approximates that of the sender of the message. The sender can also indicate the type of voice they would prefer the TTS engine use to render their text to a recipient, and the voice to be used can be specified in a sender's user profile, as a preference or attribute of the sending user. In some examples, the voice profile to be used can be indicated as metadata included in the message. A voice profile can specify voice attributes such as the tone, pitch, register, timbre, pacing, gender, accent, and so forth. A voice profile can be generated through a machine learning (ML) process.Type: GrantFiled: April 25, 2019Date of Patent: July 26, 2022Assignee: United Services Automobile Association (USAA)Inventors: Justin Dax Haslam, Robert Wilson Barner
-
Patent number: 11386902Abstract: Embodiments of the present invention provide systems and methods for generation and maintenance of verified data records. The system may receive a data submission from a user device over one or more communication channels and convert the data submission into a normalized text format for processing and analysis. The data submission may then be analyzed using one or more trained machined learning models in order to identify factual statements and modifiers within the data submission, and generate a confidence score of verified factual information based on corroboration with one or more additional data sources. Additionally, identified modifiers may be analyzed to determine positive or negative sentiment.Type: GrantFiled: April 28, 2020Date of Patent: July 12, 2022Assignee: BANK OF AMERICA CORPORATIONInventors: Kelly Renee-Drop Keiter, Christopher Daniel Birch, Susan R. Hart, Lisa Matthews, Cody Dean Searl
-
Patent number: 11373672Abstract: Disclosed are devices, systems, apparatus, methods, products, and other implementations, including a method comprising obtaining, by a device, a combined sound signal for signals combined from multiple sound sources in an area in which a person is located, and applying, by the device, speech-separation processing (e.g., deep attractor network (DAN) processing, online DAN processing, LSTM-TasNet processing, Conv-TasNet processing), to the combined sound signal from the multiple sound sources to derive a plurality of separated signals that each contains signals corresponding to different groups of the multiple sound sources. The method further includes obtaining, by the device, neural signals for the person, the neural signals being indicative of one or more of the multiple sound sources the person is attentive to, and selecting one of the plurality of separated signals based on the obtained neural signals. The selected signal may then be processed (amplified, attenuated).Type: GrantFiled: October 24, 2018Date of Patent: June 28, 2022Assignee: The Trustees of Columbia University in the City of New YorkInventors: Nima Mesgarani, Yi Luo, James O'Sullivan, Zhuo Chen
-
Patent number: 11373634Abstract: An electronic device secures diversity of a user utterance with respect to a content name when a user searches a content through a display device by utilizing a voice. A method by an electronic device includes steps of receiving input of a user voice, acquiring a keyword related to a content included in the user voice, and acquiring at least one modified keyword based on the keyword, acquiring a plurality of search results corresponding to the keyword and the at least one modified keyword, comparing the keyword and the modified keyword with the plurality of search results and acquiring a content name corresponding to the keyword, and updating a database of content names based on the keyword, the modified keyword, and the final content name.Type: GrantFiled: October 30, 2019Date of Patent: June 28, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jiwon Yoo, Jihun Park
-
Patent number: 11372608Abstract: A machine includes a processor and a memory connected to the processor. The memory stores instructions executed by the processor to receive a message and a message parameter indicative of a characteristic of the message, where the message includes a photograph or a video. A determination is made that the message parameter corresponds to a selected gallery, where the selected gallery includes a sequence of photographs or videos. The message is posted to the selected gallery in response to the determination. The selected gallery is supplied in response to a request.Type: GrantFiled: October 24, 2019Date of Patent: June 28, 2022Assignee: Snap Inc.Inventor: Timothy Sehn
-
Patent number: 11368581Abstract: A method to transcribe communications includes obtaining an audio message from a first device during a voice communication session with a second device including a display screen, providing the message to a first speech recognition system to generate a first message transcript, providing the transcript to the second device for presentation on the screen, obtaining an indication that a transcript quality is below a threshold, providing, in response, the message to a second system to generate a second transcript while still providing it to the first system to generate the first transcript and providing the first transcript to the second device for presentation on the display screen, and in response to an event occurring that indicates the second transcript is to be provided to the second device instead of the first transcript, providing the second transcript to the second device for presentation on the screen instead of the first transcript.Type: GrantFiled: September 11, 2020Date of Patent: June 21, 2022Assignee: Ultratec, Inc.Inventors: Robert M. Engelke, Kevin R. Colwell, Christopher Engelke
-
Patent number: 11366569Abstract: The disclosure relates to an interactive interface display method, apparatus, and storage medium. The method includes displaying an information display interface including a call entry of an intelligent interactive application; calling the intelligent interactive application when a trigger operation on the call entry is detected; displaying a first dynamic effect in which the call entry moves in the information display interface; and displaying an interactive interface of the intelligent interactive application after displaying the first dynamic effect in which the call entry moves in the information display interface.Type: GrantFiled: March 12, 2020Date of Patent: June 21, 2022Assignee: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.Inventors: Yuhang Xia, Zekui Li
-
Patent number: 11361491Abstract: The present invention relates to a method of generating a facial expression of a user for a virtual environment. The method comprises obtaining a video and an associated speech of the user. Further, extracting in real-time at least one of one or more voice features and one or more text features based on the speech. Furthermore, identifying one or more phonemes in the speech. Thereafter, determining one or more facial features relating to the speech of the user using a pre-trained second learning model based on the one or more voice features, the one or more phonemes, the video and one or more previously generated facial features of the user. Finally, generating the facial expression of the user corresponding to the speech for an avatar representing the user in the virtual environment.Type: GrantFiled: September 3, 2020Date of Patent: June 14, 2022Assignee: Wipro LimitedInventors: Vivek Kumar Varma Nadimpalli, Gopichand Agnihotram
-
Patent number: 11356492Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention detect an audio stream comprising one or more voice packets from a first computing system. Embodiments of the present invention can, in response to detecting an audio stream, dynamically prevent audio drop out on a second computing system using circular buffers based on network consistency.Type: GrantFiled: September 16, 2020Date of Patent: June 7, 2022Assignee: Kyndryl, Inc.Inventors: Tiberiu Suto, Nadiya Kochura, Vinod A. Valecha
-
Patent number: 11354516Abstract: An information processor includes a generation section that generates a specified character string on the basis of at least one of voice information corresponding to a content of speech detected by a voice detection section and vehicle information acquired from a vehicle. With this configuration, a user can input the specified character string, which is a hashtag, without an operation. Thus, compared to the related art in which the hashtag is generated on the basis of the operation (manual input) by the user, a burden on the user can significantly be reduced, and an input error can be prevented.Type: GrantFiled: November 22, 2019Date of Patent: June 7, 2022Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHAInventors: Ryotaro Fujiwara, Keiko Suzuki, Makoto Honda, Chikage Kubo, Ryota Okubi, Takeshi Fujiki
-
Patent number: 11350148Abstract: Aspects of the subject disclosure may include, for example, modifying a user profile associated with a user associated with a content service to generate an updated user profile according to consumption of media content by the user and user feedback information associated with the consumption of the media content, determining a user context according to information associated with user device, where the user context includes current activity of the user, modifying a set of media content according to the user context that is determined to generate an updated set of media content, where a type of media content is eliminated from the set of media content in the updated set of media content according to the user context, and presenting the updated set of content at a presentation device of the user via a personal media channel of the user associated with the content service. Other embodiments are disclosed.Type: GrantFiled: October 29, 2020Date of Patent: May 31, 2022Assignee: AT&T Intellectual Property I, L.P.Inventors: Zhu Liu, Eric Zavesky, Bernard S. Renger, Behzad Shahraray, David Crawford Gibbon, Tan Xu, Lee Begeja, Raghuraman Gopalan
-
Patent number: 11341707Abstract: A method and system for transforming simple user input into customizable animated images for use in text-messaging applications.Type: GrantFiled: March 23, 2021Date of Patent: May 24, 2022Assignee: EMONSTER INCInventor: Enrique Bonansea
-
Patent number: 11341331Abstract: An intelligent speech assistant receives information collected while a user is speaking. The information can comprise speech data, vision data, or both, where the speech data is from the user speaking and the vision data is of the user while speaking. The assistant evaluates the speech data against a script which can contain information that the user should speak, information that the user should not speak, or both. The assistant collects instances where the user utters phrases that match the script or instances where the user utters phrases that do not match the script, depending on whether phases should or should not be spoken. The assistant evaluates vision data to identify gestures, facial expressions, and/or emotions of the user. Instances where the gestures, facial expressions, and/or emotions are not appropriate to the context are flagged. Real-time prompts and/or a summary is presented to the user as feedback.Type: GrantFiled: October 4, 2019Date of Patent: May 24, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Huakai Liao, Priyanka Vikram Sinha, Kevin Dara Khieu, Derek Martin Johnson, Siliang Kang, Huey-Ru Tsai, Amit Srivastava
-
Patent number: 11341957Abstract: A method for detecting a keyword, applied to a terminal, includes: extracting a speech eigenvector of a speech signal; obtaining, according to the speech eigenvector, a posterior probability of each target character being a key character in any keyword in an acquisition time period of the speech signal; obtaining confidences of at least two target character combinations according to the posterior probability of each target character; and determining that the speech signal includes the keyword upon determining that all the confidences of the at least two target character combinations meet a preset condition. The target character is a character in the speech signal whose pronunciation matches a pronunciation of the key character. Each target character combination includes at least one target character, and a confidence of a target character combination represents a probability of the target character combination being the keyword or a part of the keyword.Type: GrantFiled: July 20, 2020Date of Patent: May 24, 2022Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Yi Gao, Meng Yu, Dan Su, Jie Chen, Min Luo
-
Patent number: 11334315Abstract: Systems, methods, and devices for human-machine interfaces for utterance-based playlist selection are disclosed. In one method, a list of playlists is traversed and a portion of each is audibly output until a playlist command is received. Based on the playlist command, the traversing is stopped and a playlist is selected for playback. In examples, the list of playlists is modified based on a modification input.Type: GrantFiled: July 8, 2019Date of Patent: May 17, 2022Assignee: Spotify ABInventors: Daniel Bromand, Richard Mitic, Horia-Dragos Jurcut, Henriette Susanne Martine Cramer, Ruth Brillman
-
Patent number: 11335360Abstract: In one aspect, a device includes at least one processor and storage accessible to the at least one processor. The storage includes instructions executable by the at least one processor to analyze the decibel levels of audio of a user's speech. The instructions are executable to, based on the analysis, enhance a transcript of the user's speech with indications of particular words from the user's speech as being associated with one or more emotions of the user.Type: GrantFiled: September 21, 2019Date of Patent: May 17, 2022Assignee: Lenovo (Singapore) Pte. Ltd.Inventors: Johnathan Co Lee, Jonathan Jen-Wei Yu
-
Patent number: 11322231Abstract: A method, computer program product, and computing system for automating an intake process is executed on a computing device and includes prompting a patient to provide encounter information via a virtual assistant during a pre-visit portion of a patient encounter. Encounter information is obtained from the patient in response to the prompting by the virtual assistant.Type: GrantFiled: August 9, 2018Date of Patent: May 3, 2022Assignee: NUANCE COMMUNICATIONS, INC.Inventors: Donald E. Owen, Garret N. Erskine, Mehmet Mert Öz
-
Patent number: 11314799Abstract: Described herein are technologies that facilitate effective use (e.g., indexing and searching) of non-text machine data (e.g., audio/visual data) with text-based indexes of an event-based machine-data intake and query system.Type: GrantFiled: July 29, 2016Date of Patent: April 26, 2022Assignee: Splunk Inc.Inventor: Adam Oliner
-
Patent number: 11315570Abstract: The technology disclosed relates to a machine learning based speech-to-text transcription intermediary which, from among multiple speech recognition engines, selects a speech recognition engine for accurately transcribing an audio channel based on sound and speech characteristics of the audio channel.Type: GrantFiled: April 2, 2019Date of Patent: April 26, 2022Assignee: Facebook Technologies, LLCInventor: Shamir Allibhai
-
Patent number: 11315569Abstract: Disclosed is a system for generating a transcript of a meeting using individual audio recordings of speakers in the meeting. The system obtains an audio recording file from each speaker in the meeting, generates a speaker-specific transcript for each speaker using the audio recording of the corresponding speaker, and merges the speaker-specific transcripts to generate a meeting transcript that includes text of a speech from all speakers in the meeting. As the system generates speaker specific transcripts using speaker-specific (high quality) audio recordings, the need for “diarization” is removed, the audio quality of recording of each speaker is maximized, leading to virtually lossless recordings, and resulting in an improved transcription quality and analysis.Type: GrantFiled: February 7, 2020Date of Patent: April 26, 2022Assignee: Memoria, Inc.Inventors: Homayoun Talieh, Rémi Berson, Eric Pellish
-
Patent number: 11308951Abstract: There is provided an information processing apparatus, an information processing method, and a program capable of providing a more convenient speech recognition service. The processing of recognizing, as an edited portion, a desired word configuring a sentence presented to a user as a speech recognition result, acquiring speech information repeatedly uttered for editing a word of the edited portion, and connecting speech information other than a repeated utterance to the speech information is performed, and speech information for speech recognition for editing is generated. Then, speech recognition is performed on the generated speech information for speech recognition for editing.Type: GrantFiled: January 4, 2018Date of Patent: April 19, 2022Assignee: SONY CORPORATIONInventors: Shinichi Kawano, Yuhei Taki
-
Patent number: 11308938Abstract: To train a speech recognizer, such as for recognizing variables in a neural speech-to-meaning system, compute, within an embedding space, a range of vectors of features of natural speech. Generate parameter sets for speech synthesis and synthesis speech according to the parameters. Analyze the synthesized speech to compute vectors in the embedding space. Using a cost function that favors an even spread (minimal clustering) generates a multiplicity of speech synthesis parameter sets. Using the multiplicity of parameter sets, generate a multiplicity of speech of known words that can be used as training data for speech recognition.Type: GrantFiled: December 5, 2019Date of Patent: April 19, 2022Assignee: SoundHound, Inc.Inventors: Maisy Wieman, Jonah Probell, Sudharsan Krishnaswamy
-
Patent number: 11308945Abstract: A hypernym of a word in utterance data may be probabilistically determined. The utterance data may correspond to a spoken query or command. A redacted utterance may be derived by replacing the word with the hypernym. The hypernym may be determined by applying noise to a position in a hierarchical embedding that corresponds to the word. The word may be identified as being potentially sensitive. The hierarchical embedding may be a Hyperbolic embedding that may indicate hierarchical relationships between individual words of a corpus of words, such as “red” is a “color” or “Austin” is in “Texas.” Noise may be applied by obtaining a first value in Euclidean space based on a second value in Hyperbolic space, and obtaining a third value in Hyperbolic space based on the first value in Euclidean space. The second value in Hyperbolic space may correspond to the word.Type: GrantFiled: September 4, 2019Date of Patent: April 19, 2022Assignee: Amazon Technologies, Inc.Inventors: Thomas Drake, Oluwaseyi Feyisetan, Thomas Diethe
-
Patent number: 11308943Abstract: An electronic device receives audio data for a media item. The electronic device generates, from the audio data, a plurality of samples, each sample having a predefined maximum length. The electronic device, using a neural network trained to predict character probabilities, generates a probability matrix of characters for a first portion of a first sample of the plurality of samples. The probability matrix includes character information, timing information, and respective probabilities of respective characters at respective times. The electronic device identifies, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.Type: GrantFiled: September 12, 2019Date of Patent: April 19, 2022Assignee: Spotify ABInventors: Daniel Stoller, Simon René Georges Durand, Sebastian Ewert
-
Patent number: 11310223Abstract: An identity authentication method, includes: at an electronic device having one or more processors and memory, the electronic device coupled with a display and one or more input devices: receiving an identity authentication request; in response to receiving the identity authentication request, performing an interactive authentication information exchange between the electronic device and a user, including: displaying, on the display, first visual information in a first manner; displaying, on the display, the first visual information in a second manner that is distinct from the first manner, wherein the first visual information displayed in the second manner includes a timing characteristic that is absent from the first visual information displayed in the first manner; receiving user input entered in accordance with the first visual information displayed in the second manner; and verifying that the user input conforms to the timing characteristic in the first visual information displayed in the second manner.Type: GrantFiled: May 26, 2020Date of Patent: April 19, 2022Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Lu Zheng, Shuai Zhang, Tingting Shang, Rui Rao, Yan Chen, Yaode Huang, Zhenhua Wang
-
Patent number: 11302290Abstract: Described are various embodiments related to a vehicle device and an electronic device, wherein the vehicle device according to one embodiment can include: a display; a memory; at least one or more sensors; communication circuitry configured to communicate with an external electronic device; and a processor configured to display first display information according to execution of a first application on a first area on the display, perform control to transfer vehicle-related context information to the electronic device based on information obtained by the at least one or more sensors and, if information related to a second application corresponding to the vehicle-related context information is received from the electronic device, display second display information associated with the second application on a second area on the display using the received information.Type: GrantFiled: January 11, 2018Date of Patent: April 12, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Yong-Jun Lim, Tae-Young Ha
-
Patent number: 11302305Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the method includes receiving a voice input from a user device; generating a recognition output; receiving a user selection of one or more terms in the recognition output; receiving a user input of one or more letters replacing the user selected one or more terms; determining suggested correction candidates based in part on the user input and the voice input; and providing one or more suggested correction candidates to the user device as suggested corrected recognition outputs.Type: GrantFiled: May 14, 2020Date of Patent: April 12, 2022Assignee: Google LLCInventors: Evgeny A. Cherepanov, Jakob Nicolaus Foerster, Vikram Sridar, Ishai Rabinovitz, Omer Tabach
-
Patent number: 11302313Abstract: Systems and methods for speech recognition are provided. The method may include obtaining a plurality of candidate recognition results of speech information uttered by a user and a plurality of preliminary scores corresponding to the plurality of candidate recognition results, respectively. The method may further include, for each of the plurality of candidate recognition results, extracting one or more keywords from the candidate recognition result and determining at least one parameter associated with the one or more extracted keywords. The method may further include, for each of the plurality of candidate recognition results, generating an updating coefficient based on the at least one parameter and updating the preliminary score based on the updating coefficient to generate an updated score. The method may further include determining, from the plurality of candidate recognition results, a target recognition result based on the plurality of updated scores.Type: GrantFiled: December 14, 2019Date of Patent: April 12, 2022Assignee: BEIJING DIDI INFINITY TECHNOLOGY AND DEVELOPMENT CO., LTD.Inventor: Xiulin Li
-
Patent number: 11301506Abstract: Automated digital asset tagging techniques and systems are described that support use of multiple vocabulary sets. In one example, a plurality of digital assets are obtained having first-vocabulary tags taken from a first-vocabulary set. Second-vocabulary tags taken from a second-vocabulary set are assigned to the plurality of digital assets through machine learning. A determination is made that at least one first-vocabulary tag includes a plurality of visual classes based on the assignment of at least one second-vocabulary tag. Digital assets are collected from the plurality of digital assets that correspond to one visual class of the plurality of visual classes. The model is generated using machine learning based on the collected digital assets.Type: GrantFiled: June 29, 2017Date of Patent: April 12, 2022Assignee: Adobe Inc.Inventors: Mayur Hemani, Balaji Krishnamurthy
-
Patent number: 11295839Abstract: A method, computer program product, and computing system for automating a follow-up process is executed on a computing device and includes prompting a patient to provide encounter information via a virtual assistant during a post-visit portion of a patient encounter. Encounter information is obtained from the patient in response to the prompting by the virtual assistant.Type: GrantFiled: August 9, 2018Date of Patent: April 5, 2022Assignee: NUANCE COMMUNICATIONS, INC.Inventors: Donald E. Owen, Garret N. Erskine, Mehmet Mert Öz
-
Patent number: 11294474Abstract: A virtual collaboration system receives input video data including a participant. The system analyzes the input video data to identify a gesture or a movement made by the participant. The system selects an overlay image as a function of the gesture or the movement made by the participant, incorporates the overlay image into the input video data, thereby generating output video data that includes the overlay image, and transmits the output video data to one or more participant devices.Type: GrantFiled: February 5, 2021Date of Patent: April 5, 2022Assignee: Lenovo (Singapore) Pte. Ltd.Inventors: Aaron Michael Stewart, Alden Rose, Ellis Anderson