Pattern Display Patents (Class 704/276)
-
Publication number: 20140142954Abstract: A soundtrack creation method and user playback system for soundtracks synchronized to electronic text. Synchronization is achieved by maintaining a reading speed variable indicative of the user's reading speed. The system provides for multiple channels of audio to enable concurrent playback of two or more partially or entirely overlapping audio regions so as to create an audio output having, for example, sound effects, ambience, music or other audio features that are triggered to playback at specific portions in the electronic text to enhance the reading experience.Type: ApplicationFiled: January 28, 2014Publication date: May 22, 2014Applicant: BOOKTRACK HOLDINGS LIMITEDInventors: PAUL CHARLES CAMERON, MARK STEVEN CAMERON, RUI ZHANG, ANDREW RUSSELL DAVENPORT, PAUL ANTHONY MCGRATH
-
Patent number: 8731943Abstract: Systems, methods and computer program products are provided for translating a natural language into music. Through systematic parsing, music compositions can be created. These compositions can be created by one or more persons who do not speak the same natural language.Type: GrantFiled: February 5, 2010Date of Patent: May 20, 2014Assignee: Little Wing World LLCInventors: Nicolle Ruetz, David Warhol
-
Patent number: 8725518Abstract: A system for providing automatic quality management regarding a level of conformity to a specific accent, including, a recording system, a statistical model database with statistical models representing speech data of different levels of conformity to a specific accent, a speech analysis system, a quality management system. Wherein the recording system is adapted to record one or more samples of a speakers speech and provide it to the speech analysis system for analysis, and wherein the speech analysis system is adapted to provide a score of the speakers speech samples to the quality management system by analyzing the recorded speech samples relative to the statistical models in the statistical model database.Type: GrantFiled: April 25, 2006Date of Patent: May 13, 2014Assignee: Nice Systems Ltd.Inventors: Moshe Waserblat, Barak Eilam
-
Publication number: 20140129235Abstract: Apparatus comprising a receiver configured to receive a first audio signal, a signal characteriser configured to determine at least one characteristic associated with the first audio signal, a comparator configured to compare the at least one characteristic against at least one characteristic associated with at least one further audio signal, and a display configured to display the at least one characteristic associated with at least one further audio signal dependent on the first audio signal characteristic.Type: ApplicationFiled: June 17, 2011Publication date: May 8, 2014Applicant: Nokia CorporationInventor: Mikko Veli Aimo Suvanto
-
Patent number: 8719032Abstract: A clear picture of who is speaking in a setting where there are multiple input sources (e.g., a conference room with multiple microphones) can be obtained by comparing input channels against each other. The data from each channel can not only be compared, but can also be organized into portions which logically correspond to statements by a user. These statements, along with information regarding who is speaking, can be presented in a user friendly format via an interactive timeline which can be updated in real time as new audio input data is received.Type: GrantFiled: December 11, 2013Date of Patent: May 6, 2014Assignee: Jefferson Audio Video Systems, Inc.Inventors: Matthew David Bader, Nathan David Cole
-
Patent number: 8719038Abstract: Computerized apparatus for obtaining and displaying information, such as for example directions to a desired entity or organization. In one embodiment, the computerized apparatus is configured to receive user speech input and enable performance of various tasks, such as obtaining desired information relating to indoor entities, maps or directions, or any number of other topics. The obtained data may also, in various variants, be displayed in various formats and relative to other entities nearby.Type: GrantFiled: January 28, 2013Date of Patent: May 6, 2014Assignee: West View Research, LLCInventor: Robert F. Gazdzinski
-
Patent number: 8706494Abstract: Methods and systems for providing a network-accessible text-to-speech synthesis service are provided. The service accepts content as input. After extracting textual content from the input content, the service transforms the content into a format suitable for high-quality speech synthesis. Additionally, the service produces audible advertisements, which are combined with the synthesized speech. The audible advertisements themselves can be generated from textual advertisement content.Type: GrantFiled: August 29, 2011Date of Patent: April 22, 2014Assignee: Aeromee Development L.L.C.Inventor: James H. Stephens, Jr.
-
Patent number: 8706485Abstract: The present invention pertains to method and a communication device (100) for associating a contact record pertaining to a remote speaker (220) with a mnemonic image (191) based on attributes of the speaker (220). The method comprises receiving voice data of the speaker (220); in a communication session with a source device (200). A source determination representing the speaker (220) is registered, and then the received voice data is analyzed so that voice data characteristics can be extracted. Based on these voice data characteristics a mnemonic image (191) can be selected, and associated to a contact record in which the source determination is stored. The mnemonic image (191) may be selected among images previously stored in the device, or derived through editing of such images.Type: GrantFiled: May 17, 2011Date of Patent: April 22, 2014Assignees: Sony Corporation, Sony Mobile Communications ABInventor: Joakim Martensson
-
Patent number: 8706495Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and thus establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during the acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to the speech data (SD) just played back marked by the link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it.Type: GrantFiled: January 17, 2013Date of Patent: April 22, 2014Assignee: Nuance Communications, Inc.Inventor: Wolfgang Gschwendtner
-
Patent number: 8707381Abstract: A synchronization process between captioning data and/or corresponding metatags and the associated media file parses the media file, correlates the caption information and/or metatags with segments of the media file, and provides a capability for textual search and selection of particular segments. A time-synchronized version of the captions is created that is synchronized to the moment that the speech is uttered in the recorded media. The caption data is leveraged to enable search engines to index not merely the title of a video, but the entirety of what was said during the video as well as any associated metatags relating to contents of the video. Further, because the entire media file is indexed, a search can request a particular scene or occurrence within the event recorded by the media file, and the exact moment within the media relevant to the search can be accessed and played for the requester.Type: GrantFiled: September 21, 2010Date of Patent: April 22, 2014Assignee: Caption Colorado L.L.C.Inventors: Richard T. Polumbus, Michael W. Homyack
-
Patent number: 8700403Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.Type: GrantFiled: November 3, 2005Date of Patent: April 15, 2014Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Lin Zhao
-
Patent number: 8682672Abstract: A system and method is described that permits synchronization of a transcript with an audio/video stream of a webcast. The system also permits a user to perform a search of the transcript and then to jump in the webcast audio/video stream to the point identified during the search.Type: GrantFiled: September 17, 2004Date of Patent: March 25, 2014Assignee: ON24, Inc.Inventors: Tommy Ha, Kamalaksha Ghosh
-
Patent number: 8676590Abstract: A computer-implemented technique for transcribing audio data includes generating, along a vertical axis on a display of a client device, an image representing audio content. The technique further includes receiving, from a user of the client device, a selection of a portion of the image; and generating, via an audio module of the client device, an audio output corresponding to the selected portion of the image. The technique further includes receiving, from the user, a selection indicating a position along the vertical axis on the display to enter a text portion representing the audio output, wherein the position is aligned to the selected portion of the image. The technique further includes receiving, from the user, the text portion representing the audio output; and displaying, on the display, the text portion at the position, wherein the text portion extends along a horizontal axis on the display.Type: GrantFiled: September 26, 2012Date of Patent: March 18, 2014Assignee: Google Inc.Inventors: Jeffrey Scott Sorensen, Masayuki Nanzawa, Ravindran Rajakumar
-
Patent number: 8674996Abstract: A system for controlling a rendering engine by using specialized commands. The commands are used to generate a production, such as a television show, at an end-user's computer that executes the rendering engine. In one embodiment, the commands are sent over a network, such as the Internet, to achieve broadcasts of video programs at very high compression and efficiency. Commands for setting and moving camera viewpoints, animating characters, and defining or controlling scenes and sounds are described. At a fine level of control math models and coordinate systems can be used make specifications. At a coarse level of control the command language approaches the text format traditionally used in television or movie scripts. Simple names for objects within a scene are used to identify items, directions and paths. Commands are further simplified by having the rendering engine use defaults when specifications are left out.Type: GrantFiled: October 27, 2008Date of Patent: March 18, 2014Assignee: Quonsil PL. 3, LLCInventor: Charles J. Kulas
-
Patent number: 8666749Abstract: The disclosure includes a system and method for generating audio snippets from a subset of audio tracks. In some embodiments an audio snippet is an audio summary of a group or collection of songs.Type: GrantFiled: January 17, 2013Date of Patent: March 4, 2014Assignee: Google Inc.Inventors: Amarnag Subramanya, Jennifer Gillenwater, Garth Griffin, Fernando Pereira, Douglas Eck
-
Patent number: 8655662Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: November 29, 2012Date of Patent: February 18, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst Schroeter
-
Patent number: 8655667Abstract: A software and/or hardware facility for inferring user context and delivering advertisements, such as coupons, using natural language and/or sentiment analysis is disclosed. The facility may infer context information based on a user's emotional state, attitude, needs, or intent from the user's interaction with or through a mobile device. The facility may then determine whether it is appropriate to deliver an advertisement to the user and select an advertisement for delivery. The facility may also determine an appropriate expiration time and/or discount amount for the advertisement.Type: GrantFiled: November 19, 2012Date of Patent: February 18, 2014Assignee: Microsoft CorporationInventors: Raman Chandrasekar, Eric I-Chao Chang, Michael Tsang, Tian Bai
-
Patent number: 8639512Abstract: A computer-implemented system and method for evaluating the performance of a user using a dictation system is provided. The system and method include receiving a text or transcription file generated from user audio. A performance metric, such as, for example, words/minute or errors is generated based on the transcription file. The performance metric is provided to an administrator so the administrator can evaluate the performance of the user using the dictation system.Type: GrantFiled: April 21, 2009Date of Patent: January 28, 2014Assignee: nVoq IncorporatedInventors: Brian Marquette, Charles Corfield, Todd Espy
-
Patent number: 8634708Abstract: The invention relates to a method for creating a new roundup of an audiovisual document previously recorded in a device. The document contains two parts, one being the roundup and the other composed of a plurality of reports. The roundup is itself divided into a plurality of parts. The device first searches for the associations between the roundup parts and the reports, and detects the reports that are not associated with roundup parts. Then, summaries are created for the reports not associated with the roundup, and incorporated into the initial roundup to create a new roundup. In this manner, the user can easily select any report from the roundup part associated with this report. The invention also relates to the receiver suitable for implementing the method.Type: GrantFiled: December 20, 2007Date of Patent: January 21, 2014Assignee: Thomson LicensingInventors: Louis Chevallier, Claire-Helene Demarty, Lionel Oisel
-
Patent number: 8635075Abstract: A system is configured to enable a user to assert voice-activated commands. When the user issues a non-ambiguous command, the system activates a corresponding control. The area of activity on the user interface is visually highlighted to emphasize to the user that what they spoke caused an action. In one specific embodiment, the highlighting involves floating text the user uttered to a visible user interface component.Type: GrantFiled: October 12, 2009Date of Patent: January 21, 2014Assignee: Microsoft CorporationInventor: Felix Andrew
-
Patent number: 8630860Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.Type: GrantFiled: March 3, 2011Date of Patent: January 14, 2014Assignee: Nuance Communications, Inc.Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
-
Publication number: 20140012586Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.Type: ApplicationFiled: August 6, 2012Publication date: January 9, 2014Applicant: GOOGLE INC.Inventors: Andrew E. Rubin, Johan Schalkwyk, Maria Carolina Parada San Martin
-
Patent number: 8626493Abstract: Sounds are inserted into audio content according to a pattern. A library stores humanly perceptible voice sounds. Pattern control information is received that is associated with a device recording the audio content. A pattern is retrieved and washing machine sounds are inserted into the audio content according to the pattern. The humanly perceptible voice sounds are inserted into the audio content according to the pattern to generate a signed audio recording.Type: GrantFiled: April 26, 2013Date of Patent: January 7, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Steven N. Tischer
-
Patent number: 8612228Abstract: A section corresponding to a given duration is sampled from sound data that indicates the voice of a player collected by a microphone, and a vocal tract cross-sectional area function of the sampled section is calculated. The vertical dimension of the mouth is calculated from a throat-side average cross-sectional area of the vocal tract cross-sectional area function, and the area of the mouth is calculated from a mouth-side average cross-sectional area. The transverse dimension of the mouth is calculated from the area of the mouth and the vertical dimension of the mouth.Type: GrantFiled: March 26, 2010Date of Patent: December 17, 2013Assignee: Namco Bandai Games Inc.Inventor: Hiroyuki Hiraishi
-
Patent number: 8600763Abstract: Whenever an event occurs on a computing system which will accept a response from a user of the system, the system automatically determines whether or not to enable speech interaction with the system for the event response. Whenever speech interaction is enabled with the system for the event response, the system provides a notification to the user which informs the user of the event and their options for responding thereto, where these options include responding verbally. Whenever the user responds within a prescribed period of time via a voice command (VC), the system attempts to recognize the VC. Whenever the VC is successfully recognized, the system responds appropriately to the VC.Type: GrantFiled: June 4, 2010Date of Patent: December 3, 2013Assignee: Microsoft CorporationInventors: Alice Jane Bernheim Brush, Paul Johns, Jen Anderson, Connie Missimer, Seung Yang, Jean Ku
-
Patent number: 8588378Abstract: A computer-implemented voice mail method includes obtaining an electronic audio file of a recorded user message directed to a telephone user, automatically generating a transcript of the recorded user message, and identifying locations in the transcript in coordination with timestamps in the recorded user message so that successive portions of the transcript can be highlighted in coordination with playing of the recorded user message. The method also include identifying one or characteristics of the message using meta data relating to the recorded user message, and storing the recorded user message and information about the identified locations of the recorded user message.Type: GrantFiled: July 15, 2010Date of Patent: November 19, 2013Assignee: Google Inc.Inventors: Benedict Davies, Christian Brunschen
-
Patent number: 8583434Abstract: Computer-implemented methods and apparatus are provided to facilitate the recognition of the content of a body of speech data. In one embodiment, a method for analyzing verbal communication is provided, comprising acts of producing an electronic recording of a plurality of spoken words; processing the electronic recording to identify a plurality of word alternatives for each of the spoken words, each of the plurality of word alternatives being identified by comparing a portion of the electronic recording with a lexicon, and each of the plurality of word alternatives being assigned a probability of correctly identifying a spoken word; loading the word alternatives and the probabilities to a database for subsequent analysis; and examining the word alternatives and the probabilities to determine at least one characteristic of the plurality of spoken words.Type: GrantFiled: January 29, 2008Date of Patent: November 12, 2013Assignee: CallMiner, Inc.Inventor: Jeffrey A. Gallino
-
Patent number: 8560317Abstract: A vocabulary dictionary storing unit for storing a plurality of words in advance, a vocabulary dictionary managing unit for extracting recognition target words, a matching unit for calculating a degree of matching with the recognition target words based on an accepted voice, a result output unit for outputting, as a recognition result, a word having a best score from a result of calculating the degree of matching, and an extraction criterion information managing unit for changing extraction criterion information according to a result of monitoring by a monitor control unit are provided. The vocabulary dictionary storing unit further includes a scale information storing unit for storing scale information serving as a scale at the time of extracting the recognition target words, and an extraction criterion information storing unit for storing extraction criterion information indicating a criterion of the recognition target words at the time of extracting the recognition target words.Type: GrantFiled: September 18, 2006Date of Patent: October 15, 2013Assignee: Fujitsu LimitedInventor: Kenji Abe
-
Patent number: 8554566Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: November 29, 2012Date of Patent: October 8, 2013Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8527279Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.Type: GrantFiled: August 23, 2012Date of Patent: September 3, 2013Assignee: Google Inc.Inventors: David P. Singleton, Debajit Ghosh
-
Publication number: 20130226593Abstract: An apparatus comprising: an audio source determiner configured to determine at least one audio source; a visualizer configured to generate a visual representation associated with the at least one audio source; and a controller configured to process an audio signal associated with the at least one audio source dependent on interaction with the visual representation.Type: ApplicationFiled: November 12, 2010Publication date: August 29, 2013Applicant: Nokia CorporationInventors: Birgir Magnusson, Koray Ozcan
-
Patent number: 8515753Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.Type: GrantFiled: March 30, 2007Date of Patent: August 20, 2013Assignee: Gwangju Institute of Science and TechnologyInventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
-
Patent number: 8494852Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.Type: GrantFiled: October 27, 2010Date of Patent: July 23, 2013Assignee: Google Inc.Inventors: Michael J. LeBeau, William J. Byrne, John Nicholas Jitkoff, Brandon M. Ballinger, Trausti Kristjansson
-
Patent number: 8494668Abstract: Character value of a sound signal is extracted for each unit portion, and degrees of similarity between the character values of the individual unit portions are calculated and arranged in a matrix configuration. The matrix has arranged in each column the degrees of similarity acquired by comparing, for each of the unit portions, the sound signal and a delayed sound signal obtained by delaying the sound signal by a time difference equal to an integral multiple of a time length of the unit portion, and it has a plurality of the columns in association with different time differences. Repetition probability is calculated for each of the columns corresponding to the different time differences in the matrix. A plurality of peaks in a distribution of the repetition probabilities are identified. The loop region in the sound signal is identified by collating a reference matrix with the degree of similarity matrix.Type: GrantFiled: February 19, 2009Date of Patent: July 23, 2013Assignee: Yamaha CorporationInventors: Bee Suan Ong, Sebastian Streich, Takuya Fujishima, Keita Arimoto
-
Arrangement for creating and using a phonetic-alphabet representation of a name of a party to a call
Patent number: 8484034Abstract: A first party creates and edits a phonetic-alphabet representation of its name. The phonetic representation is conveyed to a second party as “caller-identification” information by messages that set up a call between the parties. The phonetic representation of the name is displayed to the second party, converted to speech, and/or converted to an alphabet of a language of the second party and then displayed to the second party.Type: GrantFiled: March 31, 2008Date of Patent: July 9, 2013Assignee: Avaya Inc.Inventors: Paul Roller Michaelis, David Mohler, Charles Wrobel -
Patent number: 8478590Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.Type: GrantFiled: September 30, 2011Date of Patent: July 2, 2013Assignee: Google Inc.Inventors: Michael J. LeBeau, William J. Byrne, John Nicholas Jitkoff, Brandon M. Ballinger, Trausti Kristjansson
-
Patent number: 8452604Abstract: Recognizable visual and/or audio artifacts, such as recognizable sounds, are introduced into visual and/or audio content in an identifying pattern to generate a signed visual and/or audio recording for distribution over a digital communications medium. A library of images and/or sounds may be provided, and the image and/or sounds from the library may be selectively inserted to generate the identifying pattern. The images and/or sounds may be inserted responsive to one or more parameters associated with creation of the visual and/or audio content. A representation of the identifying pattern may be generated and stored in a repository, e.g., an independent repository configured to maintain creative rights information. The stored pattern may be retrieved from the repository and compared to an unidentified visual and/or audio recording to determine an identity thereof.Type: GrantFiled: August 15, 2005Date of Patent: May 28, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Steven Tischer
-
Publication number: 20130124213Abstract: Provided in some embodiments is a computer implemented method that includes providing script data including script words indicative of dialogue words to be spoken, providing audio data corresponding to at least a portion of the dialogue words to be spoken, wherein the audio data includes timecodes associated with dialogue words, generating a sequential alignment of the script words to the dialogue words, matching at least some of the script words to corresponding dialogue words to determine alignment points, determining corresponding timecodes for unmatched script words using interpolation based on the timecodes associated with matching script words, and generating time-aligned script data including the script words and their corresponding time codes.Type: ApplicationFiled: May 28, 2010Publication date: May 16, 2013Inventors: Jerry R. Scoggins, II, Walter W. Chang
-
Publication number: 20130124202Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.Type: ApplicationFiled: May 28, 2010Publication date: May 16, 2013Inventor: Walter W. Chang
-
Publication number: 20130124212Abstract: A method includes receiving script data including script words for dialogue, receiving audio data corresponding to at least a portion of the dialogue, wherein the audio data includes timecodes associated with dialogue words, generating a sequential alignment of the script words to the dialogue words, matching at least some of the script words to corresponding dialogue words to determine hard alignment points, partitioning the sequential alignment of script words into alignment sub-sets, wherein the bounds of the alignment sub-subsets are defined by adjacent hard-alignment points, and wherein the alignment subsets includes a sub-set of the script words and a corresponding sub-set of dialogue words that occur between the hard-alignment points, determining corresponding timecodes for a sub-set of script words in a sub-subset based on the timecodes associated with the sub-set of dialogue words, and generating time-aligned script data including the sub-set of script words and their corresponding timecodes.Type: ApplicationFiled: May 28, 2010Publication date: May 16, 2013Inventors: Jerry R. Scoggins, II, Walter W. Chang, David A. Kuspa, Charles E. Van Winkle, Simon R. Hayhurst
-
Patent number: 8412531Abstract: The present invention provides a user interface for providing press-to-talk-interaction via utilization of a touch-anywhere-to-speak module on a mobile computing device. Upon receiving an indication of a touch anywhere on the screen of a touch screen interface, the touch-anywhere-to-speak module activates the listening mechanism of a speech recognition module to accept audible user input and displays dynamic visual feedback of a measured sound level of the received audible input. The touch-anywhere-to-speak module may also provide a user a convenient and more accurate speech recognition experience by utilizing and applying the data relative to a context of the touch (e.g., relative location on the visual interface) in correlation with the spoken audible input.Type: GrantFiled: June 10, 2009Date of Patent: April 2, 2013Assignee: Microsoft CorporationInventors: Anne K. Sullivan, Lisa Stifelman, Kathleen J. Lee, Su Chuin Leong
-
Patent number: 8392199Abstract: A clipping detection device calculates an amplitude distribution of an input signal for each predetermined period, calculates a deflection degree of the distribution on the basis of the calculated amplitude distribution, and then detects clipping of a communication signal on the basis of the calculated deflection degree of the distribution.Type: GrantFiled: May 21, 2009Date of Patent: March 5, 2013Assignee: Fujitsu LimitedInventors: Takeshi Otani, Masakiyo Tanaka, Yasuji Ota, Shusaku Ito
-
Patent number: 8390669Abstract: The present disclosure discloses a method for identifying individuals in a multimedia stream originating from a video conferencing terminal or a Multipoint Control Unit, including executing a face detection process on the multimedia stream; defining subsets including facial images of one or more individuals, where the subsets are ranked according to a probability that their respective one or more individuals will appear in a video stream; comparing a detected face to the subsets in consecutive order starting with a most probable subset, until a match is found; and storing an identity of the detected face as searchable metadata in a content database in response to the detected face matching a facial image in one of the subsets.Type: GrantFiled: December 15, 2009Date of Patent: March 5, 2013Assignee: Cisco Technology, Inc.Inventors: Jason Catchpole, Craig Cockerton
-
Publication number: 20130054249Abstract: Methods and arrangements for visually representing audio content in a voice application. A display is connected to a voice application, and an image is displayed on the display, the image comprising a main portion and at least one subsidiary portion, the main portion representing a contextual entity of the audio content and the at least one subsidiary portion representing at least one participatory entity of the audio content. The at least one subsidiary portion is displayed without text, and the image is changed responsive to changes in audio content in the voice application.Type: ApplicationFiled: August 24, 2011Publication date: February 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Amit Anil Nanavati, Nitendra Rajput
-
Publication number: 20130054250Abstract: Methods and arrangements for visually representing audio content in a voice application. A display is connected to a voice application, and an image is displayed on the display, the image comprising a main portion and at least one subsidiary portion, the main portion representing a contextual entity of the audio content and the at least one subsidiary portion representing at least one participatory entity of the audio content. The at least one subsidiary portion is displayed without text, and the image is changed responsive to changes in audio content in the voice application.Type: ApplicationFiled: August 29, 2012Publication date: February 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Amit Anil Nanavati, Nitendra Rajput
-
Patent number: 8380509Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.Type: GrantFiled: February 13, 2012Date of Patent: February 19, 2013Assignee: Nuance Communications Austria GmbHInventor: Wolfgang Gschwendtner
-
Patent number: 8374879Abstract: Systems and methods are described for speech systems that utilize an interaction manager to manage interactions—also known as dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains an interaction list. An application wishing to utilize the speech system submits one or more interactions to the interaction manager. Interactions are normally processed in the order in which they are received. An exception to this rule is an interaction that is configured by an application to be processed immediately, which causes the interaction manager to place the interaction at the front of the interaction list of interactions. If an application has designated an interaction to interrupt a currently processing interaction, then the newly submitted application will interrupt any interaction currently being processed and, therefore, it will be processed immediately.Type: GrantFiled: December 16, 2005Date of Patent: February 12, 2013Assignee: Microsoft CorporationInventors: Stephen Russell Falcon, Clement Yip, Dan Banay, David Miller
-
Patent number: 8374864Abstract: In one embodiment, a method includes receiving at a communication device an audio communication and a transcribed text created from the audio communication, and generating a mapping of the transcribed text to the audio communication independent of transcribing the audio. The mapping identifies locations of portions of the text in the audio communication. An apparatus for mapping the text to the audio is also disclosed.Type: GrantFiled: March 17, 2010Date of Patent: February 12, 2013Assignee: Cisco Technology, Inc.Inventor: Jim Kerr
-
Patent number: 8374873Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: August 11, 2009Date of Patent: February 12, 2013Assignee: Morphism, LLCInventor: James H. Stephens, Jr.
-
Patent number: 8370148Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: April 14, 2008Date of Patent: February 5, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst Schroeter