Patents by Inventor Kaushik Chakrabarti

Kaushik Chakrabarti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180232410
    Abstract: The present invention extends to methods, systems, and computer program products for refining structured data indexes. Aspects of the invention include associating structured data, such as, for example, tables, with additional content. Additional content can include content outside the <table> and </table> tags of a web table. Indexes for structured data (e.g., table indexes) can be refined based on the additional content to improve the relevance of providing parts of the structured data (e.g., parts of the table) in search results.
    Type: Application
    Filed: April 11, 2018
    Publication date: August 16, 2018
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Patent number: 10032131
    Abstract: A data service system is described herein which processes raw data assets from at least one network-accessible system (such as a search system), to produce processed data assets. Enterprise applications can then leverage the processed data assets to perform various environment-specific tasks. In one implementation, the data service system can generate any of: synonym resources for use by an enterprise application in providing synonyms for specified terms associated with entities; augmentation resources for use by an enterprise application in providing supplemental information for specified seed information; and spelling-correction resources for use by an enterprise application in providing spelling information for specified terms, and so on.
    Type: Grant
    Filed: June 20, 2012
    Date of Patent: July 24, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Tao Cheng, Kris Ganjam, Kaushik Chakrabarti, Zhimin Chen, Vivek R. Narasayya, Surajit Chaudhuri
  • Patent number: 9959305
    Abstract: The present invention extends to methods, systems, and computer program products for annotating structured data for search. Aspects of the invention include associating structured data, such as, for example, tables, with additional content to improve indexing of the structured data for search and/or provide improved search results for structured data. Web pages can include tables as well as other content. The other content in a web page, such as, for example, content outside the <table> and </table> tags of a web table, can be useful in supporting searches for web tables. Content in one web page can also be useful in supporting searches for a table in another web page.
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: May 1, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Patent number: 9940365
    Abstract: The present invention extends to methods, systems, and computer program products for ranking tables for keyword search. Aspects of the invention include generating lists of candidate tables for inclusion in a search query response, computing table hit matrices, retrieving content from fields of candidate tables having keyword hits, generating ranking features of tables, and computing ranking scores for tables. Aspects of the invention can be used to match keywords against column names, to match keywords against values in subject and non-subject columns, and to match keywords against table descriptions like page titles, table captions, cell values, nearest headings and surrounding text. Which keywords are matched against which fields can depend on the table and/or the query (referred to as “late binding”).
    Type: Grant
    Filed: July 8, 2014
    Date of Patent: April 10, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20170371958
    Abstract: The techniques discussed herein leverage structure within data of a corpus to parse unstructured data to obtain structured data and/or to predict latent data that is related to the unstructured and/or structured data. In some examples, parsing and/or predicting can be conducted at varying levels of granularity. In some examples, parsing and/or predicting can be iteratively conducted to improve accuracy and/or to expose more hidden data.
    Type: Application
    Filed: June 28, 2016
    Publication date: December 28, 2017
    Inventors: Kris K. Ganjam, Kaushik Chakrabarti
  • Publication number: 20170371924
    Abstract: A processing unit can determine a first subset of a data set including data records selected based on measure values thereof. The processing unit can determine an index mapping a predicate to data records associated with that predicate and approximation values of the records. The processing unit can process a query against the first subset to provide a first result and a first accuracy value, determine that the first accuracy value does not satisfy an accuracy criterion, and process the query against the index. In some examples, the processing unit can process the query against a second subset including data records satisfying a predetermined predicate. In some examples, the processing unit can receive data records and determine the first subset. Data records can include respective measure values. Data records with higher measure values can occur in the first subset more frequently than data records with lower measure values.
    Type: Application
    Filed: June 24, 2016
    Publication date: December 28, 2017
    Inventors: Bolin Ding, Silu Huang, Chi Wang, Kaushik Chakrabarti, Surajit Chaudhuri
  • Publication number: 20170322964
    Abstract: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject tuple (e.g., a subject column) for a table, detecting a tuple header (e.g., a column header) using other tables, and detecting a tuple header (e.g., a column header) using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
    Type: Application
    Filed: July 27, 2017
    Publication date: November 9, 2017
    Inventors: Zhongyuan Wang, Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Patent number: 9734181
    Abstract: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject column for a table, detecting a column header using other tables, and detecting a column header using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
    Type: Grant
    Filed: October 2, 2014
    Date of Patent: August 15, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhongyuan Wang, Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20170132329
    Abstract: Techniques for using digital entity correlation to generate a composite knowledge graph from constituent graphs. In an aspect, digital attribute values associated with primary entities may be encoded into primitives, e.g., using a multi-resolution encoding scheme. A pairs graph may be constructed, based on seed pairs calculated from correlating encoded primitives, and further expanded to include subjects and objects of the seed pairs, as well as pairs connected to relationship entities. A similarity metric is computed for each candidate pair to determine whether a match exists. The similarity metric may be based on summing a weighted landing probability over all primitives associated directly or indirectly with each candidate pair. By incorporating primitive matches from not only the candidate pair but also from pairs surrounding the candidate pair, entity matching may be efficiently implemented on a holistic basis.
    Type: Application
    Filed: November 5, 2015
    Publication date: May 11, 2017
    Inventors: Mohamed Yakout, Kaushik Chakrabarti, Maria Pershina
  • Patent number: 9594831
    Abstract: A targeted disambiguation system is described herein which determines true mentions of a list of named entities in a collection of documents. The list of named entities is homogenous in the sense that the entities pertain to the same subject matter domain. The system determines the true mentions by leveraging the homogeneity in the list, and, more specifically by applying a context similarity hypothesis, a co-mention hypothesis, and an interdependency hypothesis. In one implementation, the system executes its analysis using a graph-based model. The system can operate without the existence of additional information regarding the entities in the list; nevertheless, if such information is available, the system can integrate it into its analysis.
    Type: Grant
    Filed: June 22, 2012
    Date of Patent: March 14, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Chi Wang, Kaushik Chakrabarti, Tao Cheng, Surajit Chaudhuri
  • Publication number: 20160378765
    Abstract: Concept expansion using tables, such as web tables, can return entities belonging to a concept based on an input of the concept and at least one seed entity that belongs to the concept. A concept expansion frontend can receive the concept and seed entity and provide them to a concept expansion framework. The concept expansion framework can expand the coverage of entities for concepts, including tail concepts, using tables by leveraging rich content signals corresponding to concept names. Such content signals can include content matching the concept that appear in captions, early headings, page titles, surrounding text, anchor text, and queries for which the page has been clicked. The concept expansion framework can use the structured entities in tables to infer exclusive tables. Such inference differs from previous label propagation methods and involves modeling a table-entity relationship. The table-entity relationship reduces semantic drift without using a reference ontology.
    Type: Application
    Filed: June 29, 2015
    Publication date: December 29, 2016
    Inventors: Philip A. Bernstein, Kaushik Chakrabarti, Zhimin Chen, Yeye He, Chi Wang, Kris K. Ganjam
  • Patent number: 9501475
    Abstract: A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.
    Type: Grant
    Filed: June 3, 2014
    Date of Patent: November 22, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti
  • Patent number: 9298825
    Abstract: A plurality of description phrases associated with a first domain may be determined, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain. An entity associated with the first domain may be obtained. An analysis of a second plurality of documents may be initiated to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of description phrases, and contexts associated with each of the co-occurrences of the mentions and description phrases, in each one of the second plurality of documents. A description tag association between the obtained entity and one of the description phrases may be determined, based on an analysis of the identified contexts.
    Type: Grant
    Filed: November 17, 2011
    Date of Patent: March 29, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Tao Cheng
  • Publication number: 20160012052
    Abstract: The present invention extends to methods, systems, and computer program products for ranking tables for keyword search. Aspects of the invention include generating lists of candidate tables for inclusion in a search query response, computing table hit matrices, retrieving content from fields of candidate tables having keyword hits, generating ranking features of tables, and computing ranking scores for tables. Aspects of the invention can be used to match keywords against column names, to match keywords against values in subject and non-subject columns, and to match keywords against table descriptions like page titles, table captions, cell values, nearest headings and surrounding text. Which keywords are matched against which fields can depend on the table and/to the query (referred to as “late binding”).
    Type: Application
    Filed: July 8, 2014
    Publication date: January 14, 2016
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20160012091
    Abstract: The present invention extends to methods, systems, and computer program products for annotating structured data for search. Aspects of the invention include associating structured data, such as, for example, tables, with additional content to improve indexing of the structured data for search and/or provide improved search results for structured data. Web pages can include tables as well as other content. The other content in a web page, such as, for example, content outside the <table> and </table> tags of a web table, can be useful in supporting searches for web tables. Content in one web page can also be useful in supporting searches for a table in another web page.
    Type: Application
    Filed: July 8, 2014
    Publication date: January 14, 2016
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20160012051
    Abstract: The present invention extends to methods, systems, and computer program products for computing features of structured data. Aspects of the invention include computing features of table components (e.g., of rows, columns, cells, etc.). Computed features can be used for ranking the table components. When aggregated, features for different components of a table can be used for ranking the table (e.g., a web table).
    Type: Application
    Filed: July 8, 2014
    Publication date: January 14, 2016
    Inventors: Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20150379057
    Abstract: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject column for a table, detecting a column header using other tables, and detecting a column header using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
    Type: Application
    Filed: October 2, 2014
    Publication date: December 31, 2015
    Inventors: Zhongyuan Wang, Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Publication number: 20150310073
    Abstract: In general, the knowledge base table composer embodiments described herein provide table answers to keyword queries against one or more knowledge bases. Highly relevant patterns in a knowledge base are found for user-given keyword queries. These patterns are used to compose table answers. To this end, a knowledge base is modeled as a directed graph called a knowledge graph, where nodes represent entities in the knowledge base and edges represent the relationships among them. Each node/edge is labeled with a type and text. A pattern that is an aggregation of subtrees which contain all keywords in the texts and have the same structure and types on node/edges is sought. Patterns that are relevant to a query for a class can be found using a set of scoring functions. Furthermore, path-based indexes and various query-processing procedures can be employed to speed up processing.
    Type: Application
    Filed: April 29, 2014
    Publication date: October 29, 2015
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, Mohan Yang
  • Patent number: 9171081
    Abstract: The subject disclosure is directed towards providing data for augmenting an entity-attribute-related task. Pre-processing is preformed on entity-attribute tables extracted from the web, e.g., to provide indexes that are accessible to find data that completes augmentation tasks. The indexes are based on both direct mappings and indirect mappings between tables. Example augmentation tasks include queries for augmented data based on an attribute name or examples, or finding synonyms for augmentation. An online query is efficiently processed by accessing the indexes to return augmented data related to the task.
    Type: Grant
    Filed: March 6, 2012
    Date of Patent: October 27, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kris K. Ganjam, Kaushik Chakrabarti, Mohamed A. Yakout, Surajit Chaudhuri
  • Publication number: 20150227589
    Abstract: Techniques and constructs to facilitate semantic matching and automated annotation (SMA) of attributes can take entity names and a keyword describing an attribute associated with the named entities as input and leverage a corpus of data such as data from tables, which can include HTML web tables, to automatically populate values associated with the named entities for the attribute. The constructs enable accurate SMA of attributes, such as attributes that relate to the entity and include numeric values in a different unit than the query, in a different scale than the query, and/or reflecting a time different from that of the query. An entity augmentation application programming interface (API) may be used to accept queries that include numeric criteria, parameters, or arguments, including query attributes represented by numeric values, which may be in different units or scales, and attributes represented by numeric values that can vary by time.
    Type: Application
    Filed: February 10, 2014
    Publication date: August 13, 2015
    Applicant: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Meihui Zhang