Abstract: Systems and methods for providing sidebars during virtual meetings are provided herein. In an aspect, a system including a non-transitory computer-readable medium, a communications interface, and a processor is provided. The processor may be configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: establish a video conference having a plurality of participants, receive, from a first client device, a first audio stream and a first video stream, and receive, from the first client device, a request for a sidebar meeting with a second client device. The processor may be configured to establish the sidebar meeting, and responsive to establishing the sidebar meeting: terminate transmission of the first audio stream, and transmit to the first client device: a first set of audio and video streams corresponding to a main meeting, and a second set of audio and video streams corresponding to the sidebar meeting.
Abstract: A portable information system is comprised of an input device for capturing an image having a user-selected object or text, and a background. A hand-held computer is responsive to the input device and is programmed to: distinguish the user-selected object/text from the background; compare the user-selected object to a database of objects/characters; and output a translation of, information about, or interpretation of, the user-selected object or text in response to the step of comparing. The invention is particularly useful as a portable aid for translating or remembering text messages foreign to the user that are found in visual scenes. A second important use is to provide mobile information and guidance to the mobile user in connection with surrounding objects (such as, identifying landmarks, people, and/or acting as a navigational aid). Methods of operating the present invention are also disclosed.
Abstract: Systems and methods for providing sidebars during virtual meetings are provided herein. In an aspect, a system including a non-transitory computer-readable medium, a communications interface, and a processor is provided. The processor may be configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: establish a video conference having a plurality of participants, receive, from a first client device, a first audio stream and a first video stream, and receive, from the first client device, a request for a sidebar meeting with a second client device. The processor may be configured to establish the sidebar meeting, and responsive to establishing the sidebar meeting: terminate transmission of the first audio stream, and transmit to the first client device: a first set of audio and video streams corresponding to a main meeting, and a second set of audio and video streams corresponding to the sidebar meeting.
Abstract: An attribute-based speech recognition system is described. A speech pre-processor receives input speech and produces a sequence of acoustic observations representative of the input speech. A database of context-dependent acoustic models characterize a probability of a given sequence of sounds producing the sequence of acoustic observations. Each acoustic model includes phonetic attributes and suprasegmental non-phonetic attributes. A finite state language model characterizes a probability of a given sequence of words being spoken. A one-pass decoder compares the sequence of acoustic observations to the acoustic models and the language model, and outputs at least one word sequence representative of the input speech.
Type:
Grant
Filed:
October 6, 2000
Date of Patent:
November 8, 2005
Assignee:
Multimodal Technologies, Inc.
Inventors:
Michael Finke, Jurgen Fritsch, Detleff Koll, Alex Waibel
Abstract: An iterative language translation system. The system includes a first automatic speech recognition component adapted to recognize spoken language in a source language and to create a source language hypothesis and a first machine translation component adapted to translate the source language hypothesis into a target language. The system also includes a second automatic speech recognition component adapted to recognize spoken language in the target language that is spoken by a translator, and wherein the second automatic speech recognition component is further adapted to create a target language hypothesis.
Abstract: A method of organizing an acoustic model for speech recognition is comprised of the steps of calculating a measure of acoustic dissimilarity of subphonetic units. A clustering technique is recursively applied to the subphonetic units based on the calculated measure of acoustic dissimilarity to automatically generate a hierarchically arranged model. Each application of the clustering technique produces another level of the hierarchy with the levels progressing from the least specific to the most specific. A technique for adapting the structure and size of a trained acoustic model to an unseen domain using only a small amount of adaptation data is also disclosed.
Abstract: An iterative language translation system. The system includes a first automatic speech recognition component adapted to recognize spoken language in a source language and to create a source language hypothesis and a first machine translation component adapted to translate the source language hypothesis into a target language. The system also includes a second universal automatic speech recognition component adapted to recognize spoken languages in plurality of target languages spoken by a translator, and wherein the second automatic speech recognition component is further adapted to create a target language hypothesis.
Abstract: A method of repairing machine-recognized speech is comprised of the steps of receiving from a recognition engine a first n-best list of hypotheses and scores for each hypothesis generated in response to a primary utterance to be recognized. An error within the hypothesis having the highest score is located. Control signals are generated from the first n-best list which are input to the recognition engine to constrain the generation of a second n-best list of hypotheses, and scores for each hypothesis, in response to an event independent of the primary utterance. The scores for the hypotheses in the first n-best list are combined with the scores for the hypotheses in the second n-best list. The hypothesis having the highest combined score is selected as the replacement for the located error.