Patents by Inventor Andrew Tomkins

Andrew Tomkins has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20070271270
    Abstract: An improved system and method for selecting and visualizing object metadata evolving over time is provided. An application may generate a visualization depicting the temporal evolution of metadata describing objects in an object store over a plurality of time intervals. The application may switch between a visualization of object metadata flowing like a river or cascading like a waterfall over time. A ranked list of metadata items may be determined for some pre-selected intervals during a pre-processing step. Then at runtime when a request may be received for providing a ranked list of metadata items for a query interval, a combination of time intervals from the pre-selected time intervals may be determined that cover the query time interval, and the ranked lists of metadata items for each time interval in the combination of time intervals that cover the query time interval may be aggregated and output for visualization.
    Type: Application
    Filed: May 19, 2006
    Publication date: November 22, 2007
    Applicant: Yahoo! Inc.
    Inventors: Micah Joel Dubinko, Shanmugasundaram Ravikumar, Joseph Andrew Magnani, Jasmine Novak, Prabhakar Raghavan, Andrew Tomkins
  • Publication number: 20070255684
    Abstract: An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.
    Type: Application
    Filed: April 29, 2006
    Publication date: November 1, 2007
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20070255736
    Abstract: An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.
    Type: Application
    Filed: April 29, 2006
    Publication date: November 1, 2007
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20070255737
    Abstract: An improved system and method for evolutionary clustering of sequential data sets is provided. A snapshot cost may be determined for representing the data set for a particular clustering method used and may determine the cost of clustering the data set independently of a series of clusterings of the data sets in the sequence. A history cost may also be determined for measuring the distance between corresponding clusters of the data set and the previous data set in the sequence of data sets to determine a cost of clustering the data set as part of a series of clusterings of the data sets in the sequence. An overall cost may be determined for clustering the data set by minimizing the combination of the snapshot cost and the history cost. Any clustering method may be used, including flat clustering and hierarchical clustering.
    Type: Application
    Filed: April 29, 2006
    Publication date: November 1, 2007
    Applicant: Yahoo! Inc.
    Inventors: Deepayan Chakrabarti, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20070094232
    Abstract: A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system constructs the set of potential headlines based on the title meta-tag. The system selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The system extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline.
    Type: Application
    Filed: October 25, 2005
    Publication date: April 26, 2007
    Inventors: Stephen Dill, Madhukar Korupolu, Andrew Tomkins
  • Publication number: 20070078880
    Abstract: A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Applicant: International Business Machines Corporation
    Inventors: Nadav Eiron, Daniel Meredith, Joerg Meyer, Jan Pieper, Andrew Tomkins
  • Publication number: 20070078811
    Abstract: A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Applicant: International Business Machines Corporation
    Inventors: Srinivasan Balasubramanian, Michael Ching, Piyoosh Jalan, Satish Penmetsa, Andrew Tomkins
  • Publication number: 20070027741
    Abstract: A sales prediction system predicts sales from online public discussions. The system utilizes manually or automatically formulated predicates to capture subsets of postings in online public discussions. The system predicts spikes in sales rank based on online chatter. The system comprises automated algorithms that predict spikes in sales rank given a time series of counts of online discussions such as blog postings. The system utilizes a stateless model of customer behavior based on a series of states of excitation that are increasingly likely to lead to a purchase decision. The stateless model of customer behavior yields a predictor of sales rank spikes that is significantly more accurate than conventional techniques operating on sales rank data alone.
    Type: Application
    Filed: July 27, 2005
    Publication date: February 1, 2007
    Inventors: Daniel Gruhl, Ramanathan Guha, Jasmine Novak, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20060282411
    Abstract: A multi-structural query system performs a high-level multi-dimensional query on a multi-structural database. The query system enables a user to navigate a search by adding restrictions incrementally. The query system uses a schema to discover structure in a multi-structural database. The query system leaves a choice of nodes to return in response to a query as a constrained set of choices available to the algorithm. The query system further casts the selection of a set of nodes as an optimization. The query system uses pairwise-disjoint collections to capture a concise set of highlights of a data set within the allowed schema. The query system further comprises efficient algorithms that yield approximately optimal solutions for several classes of objective functions.
    Type: Application
    Filed: June 13, 2005
    Publication date: December 14, 2006
    Inventors: Ronald Fagin, Ramanathan Guha, Phokion Kolaitis, Jasmine Novak, Shanmugasundaram Ravikumar, Dandapani Sivakumar, Andrew Tomkins
  • Publication number: 20060248037
    Abstract: A system and method of data mining comprises processing contents of a primary posting index; and producing a posting within a secondary posting index based on the processing of the contents of the primary posting index, wherein the processing of contents of the primary posting index comprises submitting a disjunction of terms or phrases to the primary posting index. The processing of contents of the primary posting index comprises generating a query result by submitting a query to the primary posting index using a query language of the primary posting index. Moreover, the processing of contents of the primary posting index comprises processing the primary posting index in order to generate results, wherein the results comprise a set of candidate entries with additional metadata; and filtering the results in order to produce the posting within the secondary posting index.
    Type: Application
    Filed: April 29, 2005
    Publication date: November 2, 2006
    Applicant: International Business Machines Corporation
    Inventors: Joerg Meyer, Jan Pieper, Andrew Tomkins
  • Publication number: 20060112089
    Abstract: Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined.
    Type: Application
    Filed: November 22, 2004
    Publication date: May 25, 2006
    Inventors: Andrei Broder, Ziv Bar-Yossef, Shanmagasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20050256949
    Abstract: A communication pattern inducing system focuses on the propagation of topics amongst a plurality of nodes based on the text of the node rather than hyperlinks of the node. A node could represent a weblog or any other source of information such as person, a conversation, images, etc. The system utilizes a model for information diffusion, wherein the parameters of the model capture how a new topic spreads from node to node. The system further comprises a process to learn the parameters of the model based on real data and to apply the process to real (or synthetic) node data. Consequently, the system is able to identify particular individuals that are highly effective at contributing to the spread of topics.
    Type: Application
    Filed: May 14, 2004
    Publication date: November 17, 2005
    Applicant: International Business Machines Corporation
    Inventors: Daniel Gruhl, Ramanathan Guha, Andrew Tomkins
  • Publication number: 20050256905
    Abstract: A topic segmenting system segments a topic into chatter and subtopics. The system decomposes a conversation into topics, producing a time-based structure for topics and subtopics in the conversation. The system extracts a large number of topics at all levels of granularity. Some of the topics extracted correspond to broad topics and some correspond to “spiky” topics or subtopics. The system comprises a process for automatically detecting spiky regions of a topic. For each possible broad topic, the present system finds regions where coverage of the broad topic overlaps significantly with the spiky region of another topic. The system then removes the spiky subtopic from the conversation. Processing is repeated until all discernable topics have been identified and removed from the conversation, yielding random topics of little duration or intensity.
    Type: Application
    Filed: May 15, 2004
    Publication date: November 17, 2005
    Applicant: International Business Machines Corporation
    Inventors: Daniel Gruhl, Ramanathan Guha, Andrew Tomkins
  • Publication number: 20050243736
    Abstract: An optimal path selection system extracts a connection subgraph in real time from an undirected, edge-weighted graph such as a social network that best captures the connections between two nodes of the graph. The system models the undirected, edge-weighted graph as an electrical circuit and solves for a relationship between two nodes in the undirected edge-weighted graph based on electrical analogues in the electric graph model. The system optionally accelerates the computations to produce approximate, high-quality connection subgraphs in real time on very large (disk resident) graphs. The connection subgraph is constrained to the integer budget that comprises a first node, a second node and a collection of paths from the first node to the second node that maximizes a “goodness” function g(H). The goodness function g(H) is tailored to capture salient aspects of a relationship between the first node and the second node.
    Type: Application
    Filed: April 19, 2004
    Publication date: November 3, 2005
    Applicant: International Business Machines Corporation
    Inventors: Christos Faloutsos, Kevin Snow McCurley, Andrew Tomkins
  • Patent number: 6418244
    Abstract: Inventive two-dimensional barcodes, each having encoded digital information in a bitmap representing preferably randomized encoded data bits, are printed onto a printed medium. Preferably, error correction codes are added to the digital information to ensure that the decoding process accurately reproduce the digital information. In one embodiment, the bitmap may further include “anchor” bits in each corner, which are used as part of the skew estimation and deskewing processes during decoding. In a second embodiment, no “anchor” bits are required. The encoded digital information is mapped into the two-dimensional barcode in such a way as to minimize the errors caused by damage to particular rows and/or columns, for example, row damage caused by faxing the printed barcode. To extract the encoded digital information from the printed medium, the printed medium is scanned, then the bitmap is located within the printed medium.
    Type: Grant
    Filed: January 23, 2001
    Date of Patent: July 9, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Jiangying Zhou, Daniel P. Lopresti, Andrew Tomkins
  • Patent number: 6356899
    Abstract: A method for identifying, filtering, ranking and cataloging information elements; as for example, World Wide Web pages, of the Internet in whole, part, or in combination. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred World Wide Web pages in whole, part, or in combination. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form; for example, a searchable, World Wide Web page database.
    Type: Grant
    Filed: March 3, 1999
    Date of Patent: March 12, 2002
    Assignee: International Business Machines Corporation
    Inventors: Soumen Chakrabarti, Byron Edward Dom, David Andrew Gibson, Prabhakar Raghavan, Sridhar Rajagopalan, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Patent number: 6336112
    Abstract: A method for cataloging, filtering and ranking information, as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form, for example, a searchable, World Wide Web page database.
    Type: Grant
    Filed: March 16, 2001
    Date of Patent: January 1, 2002
    Assignee: International Business Machines Corporation
    Inventors: Soumen Chakrabarti, Byron Edward Dom, David Andrew Gibson, Prabhakar Raghavan, Sridhar Rajagopalan, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Patent number: 6334131
    Abstract: A method for cataloging, filtering and ranking information, as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form, for example, a searchable, World Wide Web page database.
    Type: Grant
    Filed: August 29, 1998
    Date of Patent: December 25, 2001
    Assignee: International Business Machines Corporation
    Inventors: Soumen Chakrabarti, Byron Edward Dom, David Andrew Gibson, Prabhakar Raghavan, Sridhar Rajagopalan, Shanmugasundaram Ravikumar, Andrew Tomkins
  • Publication number: 20010039544
    Abstract: A method for cataloging, filtering and ranking information; as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method including steps for enabling a user to interactively creating a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form; for example, a searchable, World Wide Web page database.
    Type: Application
    Filed: August 29, 1998
    Publication date: November 8, 2001
    Inventors: SOUMEN CHAKRABARTI, BYRON EDWARD DORN, DAVID ANDREW GIBSON, PRABHAKAR RAGHAVAN, SRIDHAR RAJAGOPALAN, SHANMUGASUNDARAM RAVIKUMAR, ANDREW TOMKINS
  • Publication number: 20010016846
    Abstract: A method for cataloging, filtering and ranking information; as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method including steps for enabling a user to interactively creating a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form; for example, a searchable, World Wide Web page database.
    Type: Application
    Filed: March 16, 2001
    Publication date: August 23, 2001
    Applicant: International Business Machines Corp.
    Inventors: Soumen Chakrabarti, Byron Edward Dom, David Andrew Gibson, Prabhakar Raghavan, Sridhar Rajagopalan, Shanmugasundaram Ravikumar, Andrew Tomkins