Patents by Inventor Mahesh Tiyyagura

Mahesh Tiyyagura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reduction of annotations to extract structured web data

Patent number: 8046360

Abstract: Document, such as web pages of a domain, are annotated to facilitate extracting structured information from the documents. The documents are clustered. Each cluster is such that the documents within that cluster are similar to each other at least with respect to a first threshold, such as according to a shingling metric, where the first threshold is an 8/8 shingling match. There is at least one overlap cluster, each overlap cluster including at least one of the plurality of clusters such that documents of the at least one cluster included in that overlap cluster are similar to each other at least with respect to a second threshold that is lower than the first threshold. A particular overlap cluster is designated, as is a particular cluster of the particular overlap cluster. For the particular designated cluster, an obtained annotation is transferred to other clusters included in the designated particular overlap cluster.

Type: Grant

Filed: December 13, 2007

Date of Patent: October 25, 2011

Assignee: Yahoo! Inc.

Inventor: Mahesh Tiyyagura
Inverted indices in information extraction to improve records extracted per annotation

Patent number: 8010544

Abstract: A method is provided for information extraction from among a multiplicity of documents each having a corresponding document object model (DOM) comprising: computing signatures associated with nodes of a multiplicity of DOMs corresponding to the multiplicity of documents; producing an index that associates computed signatures to each document that has a DOM that has one or more nodes corresponding to such signature; annotating one or more nodes of a DOM that corresponds to the at least one selected document; wherein the one or more annotated nodes respectively correspond to one or more respective signatures included in the index; and matching the signatures that correspond to the annotated nodes with signatures in the index to determine which documents from the multiplicity of documents have one or more DOM nodes that correspond to one or more of the annotated nodes.

Type: Grant

Filed: June 6, 2008

Date of Patent: August 30, 2011

Assignee: Yahoo! Inc.

Inventor: Mahesh Tiyyagura
Unsupervised detection of web pages corresponding to a similarity class

Patent number: 7941421

Abstract: A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.

Type: Grant

Filed: March 2, 2010

Date of Patent: May 10, 2011

Assignee: Yahoo! Inc.

Inventor: Mahesh Tiyyagura
UNSUPERVISED DETECTION OF WEB PAGES CORRESPONDING TO A SIMILARITY CLASS

Publication number: 20100161588

Abstract: A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.

Type: Application

Filed: March 2, 2010

Publication date: June 24, 2010

Applicant: YAHOO! INC.

Inventor: Mahesh Tiyyagura
Unsupervised detection of web pages corresponding to a similarity class

Patent number: 7707229

Abstract: A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.

Type: Grant

Filed: December 12, 2007

Date of Patent: April 27, 2010

Assignee: Yahoo! Inc.

Inventor: Mahesh Tiyyagura
FRAMEWORK FOR AGGREGATING INFORMATION OF WEB PAGES FROM A WEBSITE

Publication number: 20090319481

Abstract: The present invention is directed towards systems and methods for extending media annotations using collective knowledge. The method according to one embodiment of the present invention comprises receiving a plurality of content items and associated annotations. The method further normalizes the plurality of associated annotations and calculates pair frequencies for the plurality of associated annotations. The method then retrieves a plurality of alternative annotations and provides the plurality of alternative annotations.

Type: Application

Filed: June 18, 2008

Publication date: December 24, 2009

Applicant: Yahoo! Inc.

Inventors: Krishna Prasad Chitrapura, Krishna Leela Poola, Mahesh Tiyyagura
SYSTEM AND METHOD FOR USING CONTEXTUAL SECTIONS OF WEB PAGE CONTENT FOR SERVING ADVERTISEMENTS IN ONLINE ADVERTISING

Publication number: 20090313127

Abstract: An improved system and method for using contextual sections of web page content for serving advertisements in online advertising is provided. A publisher may use a tool to identify sections of a web page that represent content to be used in contextual advertising. When rendered by a web browser, content from marked sections may be extracted from the web page and sent to an advertisement server for selectively matching advertisements for display to a user. Features may be identified from the content sections and used to select advertisements matching the extracted content of the web page. In particular, the features identified from the content sections may be matched with features designated by advertisers for advertisements. Web page placements may be allocated for advertisements matching the extracted content, and the advertisements may be served for display with the web page.

Type: Application

Filed: June 11, 2008

Publication date: December 17, 2009

Applicant: Yahoo! Inc.

Inventors: David Chaiken, Kalyan Kumar Kanuri, Arun Ramanujapuram, Mahesh Tiyyagura
INVERTED INDICES IN INFORMATION EXTRACTION TO IMPROVE RECORDS EXTRACTED PER ANNOTATION

Publication number: 20090307256

Abstract: A method is provided for information extraction from among a multiplicity of documents each having a corresponding document object model (DOM) comprising: computing signatures associated with nodes of a multiplicity of DOMs corresponding to the multiplicity of documents; producing an index that associates computed signatures to each document that has a DOM that has one or more nodes corresponding to such signature; annotating one or more nodes of a DOM that corresponds to the at least one selected document; wherein the one or more annotated nodes respectively correspond to one or more respective signatures included in the index; and matching the signatures that correspond to the annotated nodes with signatures in the index to determine which documents from the multiplicity of documents have one or more DOM nodes that correspond to one or more of the annotated nodes.

Type: Application

Filed: June 6, 2008

Publication date: December 10, 2009

Applicant: Yahoo! Inc.

Inventor: Mahesh TIYYAGURA
UNIFORM RESOURCE IDENTIFIER ALIGNMENT

Publication number: 20090240670

Abstract: Subject matter disclosed herein may relate to alignment of uniform resource identifiers associated with web pages, and further may relate to multiple sequence alignment of uniform resource identifiers. In one or more example embodiments, multiple sequence alignment techniques may provide improved tokenization of uniform resource identifiers associated with web pages, which may provide improved performance of applications such as, for example, uniform resource identifier normalization, sitemap construction, etc.

Type: Application

Filed: March 20, 2008

Publication date: September 24, 2009

Applicant: Yahoo! Inc.

Inventors: Mahesh Tiyyagura, Krishna Leela Poola
TECHNIQUES FOR CONSTRUCTING SITEMAP OR HIERARCHICAL ORGANIZATION OF WEBPAGES OF A WEBSITE USING DECISION TREES

Publication number: 20090171986

Abstract: A decision tree may be determined that is a site map for a domain of web pages. A clustering of a plurality of web pages of a domain is determined, in an unsupervised fashion, based on content-related features of the plurality of web pages. Each determined cluster includes a plurality of web pages, each of the plurality of web pages characterized by a resource locator and each of the resource locators being characterized by at least one resource locator token. The clustering is processed to organize indications of the content-related features of the plurality of web pages into a decision tree characterized by a plurality of nodes, each node characterized by a feature and a value, the feature being at least one of the resource locator tokens and the value being a value of that resource locator token.

Type: Application

Filed: December 27, 2007

Publication date: July 2, 2009

Applicant: YAHOO! INC.

Inventors: Krishna Prasad Chitrapura, Pavan Kumar Ganganahalli Marulappa, Krishna Leela Poola, Mahesh Tiyyagura
REDUCTION OF ANNOTATIONS TO EXTRACT STRUCTURED WEB DATA

Publication number: 20090157597

Abstract: Document, such as web pages of a domain, are annotated to facilitate extracting structured information from the documents. The documents are clustered. Each cluster is such that the documents within that cluster are similar to each other at least with respect to a first threshold, such as according to a shingling metric, where the first threshold is an 8/8 shingling match. There is at least one overlap cluster, each overlap cluster including at least one of the plurality of clusters such that documents of the at least one cluster included in that overlap cluster are similar to each other at least with respect to a second threshold that is lower than the first threshold. A particular overlap cluster is designated, as is a particular cluster of the particular overlap cluster. For the particular designated cluster, an obtained annotation is transferred to other clusters included in the designated particular overlap cluster.

Type: Application

Filed: December 13, 2007

Publication date: June 18, 2009

Applicant: YAHOO! INC.

Inventor: Mahesh Tiyyagura
UNSUPERVISED DETECTION OF WEB PAGES CORRESPONDING TO A SIMILARITY CLASS

Publication number: 20090157607

Abstract: A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.

Type: Application

Filed: December 12, 2007

Publication date: June 18, 2009

Applicant: YAHOO! INC.

Inventor: Mahesh TIYYAGURA
METHOD FOR NORMALIZING DYNAMIC URLS OF WEB PAGES THROUGH HIERARCHICAL ORGANIZATION OF URLS FROM A WEB SITE

Publication number: 20090063538

Abstract: Techniques are described for normalizing dynamic URLs using a hierarchical organization of a web site. Given web pages associated with a web site, an information extraction method is used to generate data structures that represent the content or structure of each of the web pages. These data structures are appended to the corresponding dynamic URLs. The modified URLs with the data structures are tokenized with the resulting tokens clustered to create a hierarchical organization. Nodes of the hierarchical organization may be merged based upon occurrence or patterns of content and structure. The merged hierarchical organization may then be pruned to remove irrelevant information and to reduce the memory footprint of the hierarchical organization. When a new dynamic URL is received, the new dynamic URL is matched to the hierarchical organization. Important parameters are taken into account and irrelevant information may be removed.

Type: Application

Filed: August 30, 2007

Publication date: March 5, 2009

Inventors: Krishna Prasad CHITRAPURA, Anandsudhakar Kesari, Alok Kirpal, Mahesh Tiyyagura