Patents by Inventor Deepayan Chakrabarti

Deepayan Chakrabarti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090248608
    Abstract: A method of segmenting a webpage into visually and semantically cohesive pieces uses an optimization problem on a weighted graph, where the weights reflect whether two nodes in the webpage's DOM tree should be placed together or apart in the segmentation; the weights are informed by manually labeled data.
    Type: Application
    Filed: March 28, 2008
    Publication date: October 1, 2009
    Applicant: YAHOO! INC.
    Inventors: Shanmugasundaram Ravikumar, Deepayan Chakrabarti, Kunal Punera
  • Publication number: 20090177959
    Abstract: To provide valuable information regarding a webpage, the webpage must be divided into distinct semantically coherent segments for analysis. A set of heuristics allow a segmentation algorithm to identify an optimal number of segments for a given webpage or any portion thereof more accurately. A first heuristic estimates the optimal number of segments for any given webpage or portion thereof. A second heuristic coalesces segments where the number of segments identified far exceeds the optimal number recommended. A third heuristic coalesces segments corresponding to a portion of a webpage with much unused whitespace and little content. A fourth heuristic coalesces segments of nodes that have a recommended number of segments below a certain threshold into segments of other nodes. A fifth heuristic recursively analyzes and splits segments that correspond to webpage portions surpassing a certain threshold portion size.
    Type: Application
    Filed: January 8, 2008
    Publication date: July 9, 2009
    Inventors: DEEPAYAN CHAKRABARTI, Manav Ratan Mital, Swapnil Hajela, Emre Velipasaoglu
  • Publication number: 20090112865
    Abstract: Methods and apparatuses are provided for accessing taxonomic data associated with an item as classified into a taxonomy having a hierarchical structure, establishing dependency data associated with a distribution represented in the taxonomic data, and determining entropic data for the item based, at least in part, on the distribution and established dependency.
    Type: Application
    Filed: October 26, 2007
    Publication date: April 30, 2009
    Inventors: Erik N. Vee, Deepayan Chakrabarti, Anirban Dasgupta, Arpita Ghosh, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20090043597
    Abstract: An improved system and method for matching objects using a cluster-dependent multi-armed bandit is provided. The matching may be performed by using a multi-armed bandit where the arms of the bandit may be dependent. In an embodiment, a set of objects segmented into a plurality of clusters of dependent objects may be received, and then a two step policy may be employed by a multi-armed bandit by first running over clusters of arms to select a cluster, and then secondly picking a particular arm inside the selected cluster. The multi-armed bandit may exploit dependencies among the arms to efficiently support exploration of a large number of arms. Various embodiments may include policies for discounted rewards and policies for undiscounted reward. These policies may consider each cluster in isolation during processing, and consequently may dramatically reduce the size of a large state space for finding a solution.
    Type: Application
    Filed: August 7, 2007
    Publication date: February 12, 2009
    Applicant: Yahoo! Inc.
    Inventors: Deepak Agarwal, Deepayan Chakrabarti, Sandeep Pandey
  • Publication number: 20080275890
    Abstract: An improved system and method is provided for detecting a web page template. A web page template detector may be provided for performing page-level template detection on a web page. In general, the web page template classifier may be trained using automatically generated training data, and then the web page template classifier may be applied to web pages to identify web page templates. A web page template may be detected by classifying segments of a web page as template structures, by assigning classification scores to the segments of the web page classified as template structures, and then by smoothing the classification scores assigned to the segments of the web page. Generalized isotonic regression may be applied for smoothing scores associated with the nodes of a hierarchy by minimizing an optimization function using dynamic programming.
    Type: Application
    Filed: May 4, 2007
    Publication date: November 6, 2008
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Kunal Punera, Shanmugasundaram Ravikumar
  • Publication number: 20080275901
    Abstract: An improved system and method is provided for detecting a web page template. A web page template detector may be provided for performing page-level template detection on a web page. In general, the web page template classifier may be trained using automatically generated training data, and then the web page template classifier may be applied to web pages to identify web page templates. A web page template may be detected by classifying segments of a web page as template structures, by assigning classification scores to the segments of the web page classified as template structures, and then by smoothing the classification scores assigned to the segments of the web page. Generalized isotonic regression may be applied for smoothing scores associated with the nodes of a hierarchy by minimizing an optimization function using dynamic programming.
    Type: Application
    Filed: May 4, 2007
    Publication date: November 6, 2008
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Kunal Punera, Shanmugasundaram Ravikumar
  • Publication number: 20080250033
    Abstract: Described are a system and method for determined an event occurrence rate. A sample set of content items may be obtained. Each of the content items may be associated with at least one region in a hierarchical data structure. A first impression volume may be determined for the at least one region as a function of a number of impressions registered for the content items associated with the at least one region. A scale factor may be applied to the first impression volume to generate a second impression volume. The scale factor may be selected so that the second impression volume is within a predefined range of a third impression volume. A click-through-rate (CTR) may be estimated as a function of the second impression volume and a number of clicks on the content item.
    Type: Application
    Filed: April 5, 2007
    Publication date: October 9, 2008
    Inventors: Deepak Agarwal, Dejan Diklic, Deepayan Chakrabarti, Andrei Zary Border, Vanja Josifovski
  • Publication number: 20080140591
    Abstract: An improved system and method for matching objects belonging to hierarchies is provided and an optimal matching between two feature spaces organized as taxonomies may be learned. The matching may be performed through a multi-level exploration of the hierarchical feature spaces by using multi-armed bandits where the arms of the bandit may be dependent due to the structure induced by the taxonomies. Upon the arrival of an object assigned to the first taxonomy, multi-armed bandits may be run at multiple levels of the taxonomies to select an object assigned to the second taxonomy. Then shrinkage estimation may be performed in a Bayesian framework to exploit dependencies among the arms by estimating payoff probabilities from a beta-binomial model to update payoff probabilities for matching objects from the taxonomies.
    Type: Application
    Filed: December 12, 2006
    Publication date: June 12, 2008
    Applicant: Yahoo! Inc.
    Inventors: Deepak Agarwal, Deepayan Chakrabarti, Vanja Josifovski, Sandeep Pandey
  • Publication number: 20070255736
    Abstract: An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.
    Type: Application
    Filed: April 29, 2006
    Publication date: November 1, 2007
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20070255737
    Abstract: An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.
    Type: Application
    Filed: April 29, 2006
    Publication date: November 1, 2007
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20070255684
    Abstract: An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.
    Type: Application
    Filed: April 29, 2006
    Publication date: November 1, 2007
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Shanmugasundaram Ravikumar, Andrew Tomkins