Patents by Inventor Mahesh Viswanathan

Mahesh Viswanathan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20060047386
    Abstract: An improved apparatus and method is provided for operating devices and systems in a motor vehicle, while at the same time reducing vehicle operator distractions. One or more touch sensitive pads are mounted on the steering wheel of the motor vehicle, and the vehicle operator touches the pads in a pre-specified synchronized pattern, to perform functions such as controlling operation of the radio or adjusting a window. At least some of the touch patterns used to generate different commands may be selected by the vehicle operator. Usefully, the system of touch pad sensors and the signals generated thereby are integrated with speech recognition and/or facial gesture recognition systems, so that commands may be generated by synchronized multi-mode inputs.
    Type: Application
    Filed: August 31, 2004
    Publication date: March 2, 2006
    Applicant: International Business Machines Corporation
    Inventors: Dimitri Kanevsky, Roberto Sicconi, Mahesh Viswanathan
  • Publication number: 20050197843
    Abstract: In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality.
    Type: Application
    Filed: March 7, 2004
    Publication date: September 8, 2005
    Applicant: International Business Machines Corporation
    Inventors: Alexander Faisman, Dimitri Kanevsky, David Nahamoo, Roberto Sicconi, Mahesh Viswanathan
  • Publication number: 20050086060
    Abstract: A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.
    Type: Application
    Filed: October 17, 2003
    Publication date: April 21, 2005
    Applicant: International Business Machines Corporation
    Inventors: Philip Gleason, Maria Smith, Mahesh Viswanathan, Jie Zeng
  • Patent number: 6748356
    Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. A speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. A hierarchical speaker tree clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. A hierarchical enrolled speaker database is used that includes one or more background models for unenrolled speakers to assign a speaker to each identified segment.
    Type: Grant
    Filed: June 7, 2000
    Date of Patent: June 8, 2004
    Assignee: International Business Machines Corporation
    Inventors: Homayoon Sadr Mohammad Beigi, Mahesh Viswanathan
  • Patent number: 6738745
    Abstract: Methods and apparatus are disclosed for detecting non-target language references in an audio transcription or speech recognition system using a confidence score. The confidence score may be based on (i) a probabilistic engine score provided by a speech recognition system, (ii) additional scores based on background models, or (iii) a combination of the foregoing. The engine score provided by the speech recognition system for a given input speech utterance reflects the degree of acoustic and linguistic match of the utterance with the trained target language. The background models are created or trained based on speech data in other languages, which may or may not include the target language itself. A number of types of background language models may be employed for each modeled language, including one or more of (i) prosodic models; (ii) acoustic models; (iii) phonotactic models; and (iv) keyword spotting models.
    Type: Grant
    Filed: April 7, 2000
    Date of Patent: May 18, 2004
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, Mahesh Viswanathan
  • Patent number: 6567775
    Abstract: A method and apparatus are disclosed for identifying a speaker in an audio-video source using both audio and video information. An audio-based speaker identification system identifies one or more potential speakers for a given segment using an enrolled speaker database. A video-based speaker identification system identifies one or more potential speakers for a given segment using a face detector/recognizer and an enrolled face database. An audio-video decision fusion process evaluates the individuals identified by the audio-based and video-based speaker identification systems and determines the speaker of an utterance in accordance with the present invention. A linear variation is imposed on the ranked-lists produced using the audio and video information. The decision fusion scheme of the present invention is based on a linear combination of the audio and the video ranked-lists. The line with the higher slope is assumed to convey more discriminative information.
    Type: Grant
    Filed: April 26, 2000
    Date of Patent: May 20, 2003
    Assignee: International Business Machines Corporation
    Inventors: Fereydoun Maali, Mahesh Viswanathan
  • Patent number: 6424946
    Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. The speaker identification system uses an enrolled speaker database that includes background models for unenrolled speakers, such as “unenrolled male” or “unenrolled female,” to assign a speaker label to each identified segment. Speaker labels are identified for each speech segment by comparing the segment utterances to the enrolled speaker database and finding the “closest” speaker, if any. A speech segment having an unknown speaker is initially assigned a general speaker label from the set of background models. The “unenrolled” segment is assigned a segment number and receives a cluster identifier assigned by the clustering system.
    Type: Grant
    Filed: November 5, 1999
    Date of Patent: July 23, 2002
    Assignee: International Business Machines Corporation
    Inventors: Alain Charles Louis Tritschler, Mahesh Viswanathan
  • Patent number: 6421645
    Abstract: A method and apparatus are disclosed for automatically transcribing audio information from an audio-video source and concurrently identifying the speakers. The disclosed audio transcription and speaker classification system includes a speech recognition system, a speaker segmentation system and a speaker identification system. A common front-end processor computes feature vectors that are processed along parallel branches in a multi-threaded environment by the speech recognition system, speaker segmentation system and speaker identification system, for example, using a shared memory architecture that acts in a server-like manner to distribute the computed feature vectors to a channel associated with each parallel branch. The speech recognition system produces transcripts with time-alignments for each word in the transcript. The speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions.
    Type: Grant
    Filed: June 30, 1999
    Date of Patent: July 16, 2002
    Assignee: International Business Machines Corporation
    Inventors: Homayoon Sadr Mohammad Beigi, Alain Charles Louis Tritschler, Mahesh Viswanathan
  • Patent number: 6345253
    Abstract: An audio retrieval system and method are provided for augmenting the transcription of an audio file with one or more alternate word or phrase choices, such as next-best guesses for each word or phrase, in addition to the best word sequence identified by the transcription process. The audio retrieval system can utilize a primary index file containing the best identified words and/or phrases for each portion of the input audio stream and a supplemental index file containing alternative choices for each word or phrase in the transcript. The present invention allows words that are incorrectly transcribed during speech recognition to be identified in response to a textual query by searching the supplemental index files. During an indexing process, the list of alternative word or phrase choices provided by the speech recognition system are collected to produce a set of supplemental index files.
    Type: Grant
    Filed: June 18, 1999
    Date of Patent: February 5, 2002
    Assignee: International Business Machines Corporation
    Inventor: Mahesh Viswanathan
  • Patent number: 6345252
    Abstract: Methods and apparatus are provided for retrieving audio information based on the audio content as well as the identity of the speaker. The results of content and speaker-based audio information retrieval methods are combined to provide references to audio information (and indirectly to video). A query search system retrieves information responsive to a textual query containing a text string (one or more key words), and the identity of a given speaker. An indexing system transcribes and indexes the audio information to create time-stamped content index file(s) and speaker index file(s). An audio retrieval system uses the generated content and speaker indexes to perform query-document matching based on the audio content and the speaker identity. Documents satisfying the user-specified content and speaker constraints are identified by comparing the start and end times of the document segments in both the content and speaker domains.
    Type: Grant
    Filed: April 9, 1999
    Date of Patent: February 5, 2002
    Assignee: International Business Machines Corporation
    Inventors: Homayoon Sadr Mohammad Beigi, Alain Charles Louis Tritschler, Mahesh Viswanathan
  • Patent number: 6092089
    Abstract: A marking system is provided within the comment field of PostScript page files to identify document management data placed therein. A special processor is provided to recognize the signature of document management data so that page characteristic records can be created. The page characteristic records are stored in a database thereby enabling a management system based upon the page characteristic records. The marking system includes a prefix, for example, "%%" and a keyword, for example, "OutputTagElement:", followed by a "tag" which is a management data attribute. The tags are processed to fill data fields in the created page characteristic records.
    Type: Grant
    Filed: July 9, 1997
    Date of Patent: July 18, 2000
    Assignee: International Business Machines Corp.
    Inventors: L. Corning Lahey, Arthur R. Roberts, Robert W. Peverly, Mahesh Viswanathan
  • Patent number: 6078934
    Abstract: A management system for enabling the creation and maintenance of a database wherein specially created document records, version records and page records are stored. Each page has a page record including the document name, the initial version number in which it is utilized, the last version number in which it is utilized and the date of submission. It also has a delete field to indicate its current status in the document and a field for the date of deletion. The records can include any other desired data upon which it may be desired to search the database and they can include data enabling assembly of the document with inserts, tabs, etc. when it is printed. The page records include a pointer to the location of a corresponding page file for print-on-demand and/or to a storage location for a plate for offset printing.
    Type: Grant
    Filed: July 9, 1997
    Date of Patent: June 20, 2000
    Assignee: International Business Machines Corporation
    Inventors: L. Corning Lahey, Arthur R. Roberts, Robert W. Peverly, Mahesh Viswanathan
  • Patent number: 5983243
    Abstract: A data processing system and method of preparing a presentation-ready document are described. In response to receipt of a document description that includes at least one Page Description Language (PDL) instruction and specifies both fixed data and variable data, the one or more PDL instructions are processed to produce separate presentation-ready images of the fixed data and the variable data. In addition, a bookticket specifying an arrangement of the presentation-ready images of the fixed data and the variable data is automatically generated. In response to receipt of the bookticket, a presentation-ready document is built that includes the presentation-ready images of the fixed data and the variable data in the arrangement specified by the bookticket. In one embodiment of the present invention, the document description specifies the fixed data of the document utilizing a PDL form operator.
    Type: Grant
    Filed: October 31, 1996
    Date of Patent: November 9, 1999
    Assignee: International Business Machines Corporation
    Inventors: Ronald Heiney, Anthony Stuart, Mahesh Viswanathan