Patents by Inventor Kunal Mukerjee

Kunal Mukerjee has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deriving document similarity indices

Patent number: 8478740

Abstract: The present invention extends to methods, systems, and computer program products for deriving document similarity indices. Embodiments of the invention include scalable and efficient mechanisms for deriving and updating a document similarity index for a plurality of documents. The number of maintained similarities can be controlled to conserve CPU and storage resources.

Type: Grant

Filed: December 16, 2010

Date of Patent: July 2, 2013

Assignee: Microsoft Corporation

Inventors: Sorin Gherman, Kunal Mukerjee, Adam Prout
Identifying key phrases within documents

Patent number: 8423546

Abstract: The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).

Type: Grant

Filed: December 3, 2010

Date of Patent: April 16, 2013

Assignee: Microsoft Corporation

Inventors: Sorin Gherman, Kunal Mukerjee
Noise robust speech classifier ensemble

Patent number: 8412525

Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.

Type: Grant

Filed: April 30, 2009

Date of Patent: April 2, 2013

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
CODING OF MOTION VECTOR INFORMATION

Publication number: 20120213280

Abstract: Techniques and tools for encoding and decoding motion vector information for video images are described. For example, a video encoder yields an extended motion vector code by jointly coding, for a set of pixels, a switch code, motion vector information, and a terminal symbol indicating whether subsequent data is encoded for the set of pixels. In another aspect, an encoder/decoder selects motion vector predictors for macroblocks. In another aspect, a video encoder/decoder uses hybrid motion vector prediction. In another aspect, a video encoder/decoder signals a motion vector mode for a predicted image. In another aspect, a video decoder decodes a set of pixels by receiving an extended motion vector code, which reflects joint encoding of motion information together with intra/inter-coding information and a terminal symbol. The decoder determines whether subsequent data exists for the set of pixels based on e.g., the terminal symbol.

Type: Application

Filed: April 24, 2012

Publication date: August 23, 2012

Applicant: Microsoft Corporation

Inventors: Sridhar Srinivasan, Pohsiang Hsu, Thomas W. Holcomb, Kunal Mukerjee, Bruce Chih-Lung Lin
Word clustering for input data

Patent number: 8249871

Abstract: A clustering tool to generate word clusters. In embodiments described, the clustering tool includes a clustering component that generates word clusters for words or word combinations in input data. In illustrated embodiments, the word clusters are used to modify or update a grammar for a closed vocabulary speech recognition application.

Type: Grant

Filed: November 18, 2005

Date of Patent: August 21, 2012

Assignee: Microsoft Corporation

Inventor: Kunal Mukerjee
Uncertainty interval content sensing within communications

Patent number: 8209175

Abstract: Repetition of content words in a communication is used to increase the certainty, or, alternatively, reduce the uncertainty, that the content words were actual words from the communication. Reducing the uncertainty of a particular content word of a communication in turn increases the likelihood that the content word is relevant to the communication. Reliable, relevant content words mined from a communication can be used for, e.g., automatic internet searches for documents and/or web sites pertinent to the communication. Reliable, relevant content words mined from a communication can also, or alternatively, be used to automatically generate one or more documents from the communication, e.g., communication summaries, communication outlines, etc.

Type: Grant

Filed: June 8, 2006

Date of Patent: June 26, 2012

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, Rafael Ballesteros
DERIVING DOCUMENT SIMILARITY INDICES

Publication number: 20120158731

Abstract: The present invention extends to methods, systems, and computer program products for deriving document similarity indices. Embodiments of the invention include scalable and efficient mechanisms for deriving and updating a document similarity index for a plurality of documents. The number of maintained similarities can be controlled to conserve CPU and storage resources.

Type: Application

Filed: December 16, 2010

Publication date: June 21, 2012

Applicant: Microsoft Corporation

Inventors: Sorin Gherman, Kunal Mukerjee, Adam Prout
IDENTIFYING KEY PHRASES WITHIN DOCUMENTS

Publication number: 20120143860

Abstract: The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).

Type: Application

Filed: December 3, 2010

Publication date: June 7, 2012

Applicant: Microsoft Corporation

Inventors: Sorin Gherman, Kunal Mukerjee
INTERFACE TO NAVIGATE AND SEARCH A CONCEPT HIERARCHY

Publication number: 20120066210

Abstract: A method includes receiving a concept hierarchy at a computing device. The concept hierarchy identifies concepts associated with a document corpus. An interface based on the concept hierarchy is generated. The interface is operable to navigate, search, and modify the concept hierarchy. The method includes transmitting the interface for display to a display device.

Type: Application

Filed: September 14, 2010

Publication date: March 15, 2012

Applicant: Microsoft Corporation

Inventors: Kunal Mukerjee, Naveen Garg
Signaling for field ordering and field/frame display repetition

Patent number: 8116380

Abstract: A decoder processes a first bitstream element (e.g., a pull-down flag) in a first syntax layer (e.g., sequence layer or entry point layer) above frame layer in a bitstream for a video sequence, the bitstream comprising encoded source video having a source type (e.g., progressive or interlace). The decoder processes frame data in a second syntax layer (e.g., frame layer) of the bitstream for a frame (such as an interlaced frame or progressive frame, depending on source type, or a skipped frame) in the video sequence. The first bitstream element indicates whether a repeat-picture element (e.g., a repeat-frame element or a repeat field-element) is present or absent in the frame data in the second syntax layer.

Type: Grant

Filed: September 4, 2004

Date of Patent: February 14, 2012

Assignee: Microsoft Corporation

Inventors: Shankar Regunathan, Chih-Lung Lin, Thomas W. Holcomb, Kunal Mukerjee, Pohsiang Hsu
Signaling reference frame distances

Patent number: 8085844

Abstract: Techniques and tools for signaling reference frame distances are described. For example, a video encoder signals a code for a reference frame distance for a current field-coded interlaced video frame. The code indicates a count of frames (e.g., bi-directionally predicted frames) between the current frame and a preceding reference frame. The code may be a variable length code signaled in the frame header for the current frame. The encoder may selectively signal the use of a default value for reference frame distances rather than signal a reference frame distance per frame. A video decoder performs corresponding parsing and decoding.

Type: Grant

Filed: November 15, 2004

Date of Patent: December 27, 2011

Assignee: Microsoft Corporation

Inventors: Thomas W. Holcomb, Kunal Mukerjee, Chih-Lung Lin
Lightweight windowing method for screening harvested data for novelty

Patent number: 8069032

Abstract: Biasing of language model customization due to repetitious data is substantially reduced by introducing novelty screening to data harvesting process. Novelty detection based filtering is added to ensure that an adaptation system gives more weight to representative adaptation data that is not repetitious. The value of the adaptation data is preserved and the process prevented from being polluted when the same data is seen multiple times, such as the original posting in an email thread, various versions of the same document, and the like. The screening technique may be built on top of existing data harvesting mechanisms as already seen data is used to determine the novelty of a particular portion of the data. A window into the new data, fixed or variable size, is compared against the already collected data to determine the likelihood that the data is novel.

Type: Grant

Filed: July 27, 2006

Date of Patent: November 29, 2011

Assignee: Microsoft Corporation

Inventors: Julian J. Odell, Kunal Mukerjee
Self-compacting pattern indexer: storing, indexing and accessing information in a graph-like data structure

Patent number: 8065293

Abstract: An indexing system uses a graph-like data structure that clusters features indexes together. The minimum atomic value in the data structure is represented as a leaf node which is either a single feature index or a sequence of two or more feature indexes when a minimum sequence length is imposed. Root nodes are formed as clustered collections of leaf nodes and/or other root nodes. Context nodes are formed from root nodes that are associated with content that is being indexed. Links between a root node and other nodes each include a sequence order value that is used to maintain the sequencing order for feature indexes relative to the root node. The collection of nodes forms a graph-like data structure, where each context node is indexed according to the sequenced pattern of feature indexes. Clusters can be split, merged, and promoted to increase the efficiency in searching the data structure.

Type: Grant

Filed: October 24, 2007

Date of Patent: November 22, 2011

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, R. Donald Thompson, III, Jeffrey Cole, Brendan Meeder
Advanced bi-directional predictive coding of interlaced video

Patent number: 8064520

Abstract: For interlaced B-fields or interlaced B-frames, forward motion vectors are predicted by an encoder/decoder using forward motion vectors from a forward motion vector buffer, and backward motion vectors are predicted using backward motion vectors from a backward motion vector buffer. The resulting motion vectors are added to the corresponding buffer. Holes in motion vector buffers can be filled in with estimated motion vector values. An encoder/decoder switches prediction modes between fields in a field-coded macroblock of an interlaced B-frame. For interlaced B-frames and interlaced B-fields, an encoder/decoder computes direct mode motion vectors. For interlaced B-fields or interlaced B-frames, an encoder/decoder uses 4 MV coding. An encoder/decoder uses “self-referencing” B-frames. An encoder sends binary information indicating whether a prediction mode is forward or not-forward for one or more macroblocks in an interlaced B-field. An encoder/decoder uses intra-coded B-fields [“BI-fields”].

Type: Grant

Filed: June 29, 2004

Date of Patent: November 22, 2011

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, Thomas W. Holcomb
Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text

Publication number: 20110264997

Abstract: A search engine for documents containing text may process text using a statistical language model, classify the text based on entropy, and create suffix trees or other mappings of the text for each classification. From the suffix trees or mappings, a graph may be constructed with relationship strengths between different words or text strings. The graph may be used to determine search results, and may be browsed or navigated before viewing search results. As new documents are added, they may be processed and added to the suffix trees, then the graph may be created on demand in response to a search request. The graph may be represented as a adjacency matrix, and a transitive closure algorithm may process the adjacency matrix as a background process.

Type: Application

Filed: April 21, 2010

Publication date: October 27, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Kunal Mukerjee, Sorin Gherman
Quantized feature index trajectory

Patent number: 7945441

Abstract: Indexing methods are described that may be used by databases, search engines, query and retrieval systems, context sensitive data mining, context mapping, language identification, image recognition, and robotic systems. Raw baseline features from an input signal are aggregated, abstracted and indexed for later retrieval or manipulation. The feature index is the quantization number for the underlying features that are represented by an abstraction. Trajectories are used to signify how the features evolve over time. Features indexes are linked in an ordered sequence indicative of time quanta, where the sequence represents the underlying input signal. An example indexing system based on the described processes is an inverted index that creates a mapping from features or atoms to the underlying documents, files, or data. A highly optimized set of operations can be used to manipulate the quantized feature indexes, where the operations can be fine tuned independent from the base feature set.

Type: Grant

Filed: August 7, 2007

Date of Patent: May 17, 2011

Assignee: Microsoft Corporation

Inventors: R. Donald Thompson, Kunal Mukerjee
Motion vector prediction in bi-directionally predicted interlaced field-coded pictures

Patent number: 7852936

Abstract: Forward motion vectors are predicted by an encoder/decoder using previously reconstructed (or estimated) forward motion vectors from a forward motion vector buffer, and backward motion vectors are predicted using previously reconstructed (or estimated) backward motion vectors from a backward motion vector buffer. The resulting motion vectors are added to the corresponding buffer. Holes in motion vector buffers can be filled in with estimated motion vector values. For example, for interlaced B-fields, to choose between different polarity motion vectors (e.g., “same polarity” or “opposite polarity”) for hole-filling, an encoder/decoder selects a dominant polarity field motion vector. The distance between anchors and current frames is computed using various syntax elements, and the computed distance is used for scaling reference field motion vectors.

Type: Grant

Filed: September 15, 2004

Date of Patent: December 14, 2010

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, Thomas W. Holcomb
NOISE ROBUST SPEECH CLASSIFIER ENSEMBLE

Publication number: 20100280827

Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.

Type: Application

Filed: April 30, 2009

Publication date: November 4, 2010

Applicant: Microsoft Corporation

Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
Predictive lossless coding of images and video

Patent number: 7689051

Abstract: Predictive lossless coding provides effective lossless image compression of both photographic and graphics content in image and video media. Predictive lossless coding can operate on a macroblock basis for compatibility with existing image and video codecs. Predictive lossless coding chooses and applies one of multiple available differential pulse-code modulation (DPCM) modes to individual macro-blocks to produce DPCM residuals having a closer to optimal distribution for run-length, Golomb Rice RLGR entropy encoding. This permits effective lossless entropy encoding despite the differing characteristics of photographic and graphics image content.

Type: Grant

Filed: April 15, 2004

Date of Patent: March 30, 2010

Assignee: Microsoft Corporation

Inventor: Kunal Mukerjee
Self-referencing bi-directionally predicted frames

Patent number: 7680185

Abstract: An encoder/decoder uses “self-referencing” frames. For example, a second B-field in a current frame references the first B-field from the current frame in motion compensated prediction. Allowing the first B-field in a frame to act as a reference for the second B-field in the frame allows more accurate prediction of the second B-field, while also preserving the temporal scalability benefits of having B-fields in the current frame.

Type: Grant

Filed: September 15, 2004

Date of Patent: March 16, 2010

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, Thomas W. Holcomb

prev 1 2 3 4 next