Patents by Inventor Alok S. Kirpal

Alok S. Kirpal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8793239
    Abstract: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
    Type: Grant
    Filed: October 8, 2009
    Date of Patent: July 29, 2014
    Assignee: Yahoo! Inc.
    Inventors: Nilesh Dalvi, Raghu Ramakrishnan, Vinay Kakade, Arup Kumar Choudhury, Sathiya Keerthi Selvaraj, Philip Bohannon, Mani Abrol, David Ciemiewicz, Arun Shankar Iyer, Vipul Agarwal, Alok S. Kirpal
  • Publication number: 20110087646
    Abstract: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
    Type: Application
    Filed: October 8, 2009
    Publication date: April 14, 2011
    Inventors: Nilesh Dalvi, Raghu Ramakrishnan, Vinay Kakade, Arup Kumar Choudhury, Sathiya Keerthi Selvaraj, Philip Bohannon, Mani Abrol, David Ciemiewicz, Arun Shankar Iyer, Vipul Agarwal, Alok S. Kirpal
  • Publication number: 20100241639
    Abstract: Disclosed are methods and apparatus for extracting (or annotating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain-specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain.
    Type: Application
    Filed: March 20, 2009
    Publication date: September 23, 2010
    Applicant: YAHOO! INC.
    Inventors: Daniel Kifer, Srujana Merugu, Ankur Jain, Sathiya Keerthi Selvaraj, Alok S. Kirpal, Philip L. Bohannon, Raghu Ramakrishnan
  • Publication number: 20100223214
    Abstract: A method and apparatus for automatically extracting information from a large number of documents through applying machine learning techniques and exploiting structural similarities among documents. A machine learning model is trained to have at least 50% accuracy. The trained machine learning model is used to identify information attributes in a sample of pages from a cluster of structurally similar documents. A structure-specific model of the cluster is created by compiling a list of top-K locations for each attribute identified by the trained machine learning model in the sample. These top-K lists are used to extract information from the pages of the cluster from which the sample of pages was taken.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 2, 2010
    Inventors: Alok S. Kirpal, Sandeepkumar Bhuramal Satpal, Meghana Kshirsagar, Srinivasan H. Sengamedu
  • Publication number: 20090240638
    Abstract: Subject matter disclosed herein may relate to analyses of uniform resource identifiers associated with web pages, and further may relate to gathering information about web pages by analyzing the uniform resource identifiers.
    Type: Application
    Filed: March 19, 2008
    Publication date: September 24, 2009
    Applicant: Yahoo! Inc.
    Inventors: Alok S. Kirpal, Krishna Lecla Poola, Kirshna Prasad Chitrapura
  • Publication number: 20090216739
    Abstract: Methods and apparatus are described for use with information extraction techniques based on sequential models. Additional statistics are maintained during inference and employed to boost the accuracy of the extraction algorithm and mitigate the effects of training bias.
    Type: Application
    Filed: February 22, 2008
    Publication date: August 27, 2009
    Applicant: YAHOO! INC.
    Inventors: Alok S. Kirpal, Meghana Kshirsagar
  • Publication number: 20090182759
    Abstract: A method for extracting entities from a web page includes first applying a high precision low recall (HPLR) technique on a first web page, producing one or more entities extracted from the first web page. Then a sequential model is trained using the one or more entities extracted from the first web page. The sequential model is then performed on a second web page, producing one or more entities extracted from the second web page.
    Type: Application
    Filed: January 11, 2008
    Publication date: July 16, 2009
    Applicant: YAHOO! INC.
    Inventor: Alok S. Kirpal