Abstract: A method and system of learning, or bootstrapping, facts from semi-structured text is described. Starting with a set of seed facts associated with an object, documents associated with the object are identified. The identified documents are checked to determine if each has at least a first predefined number of seed facts. If a document does have at least a first predefined number of seed facts, a contextual pattern associated with the seed facts is identified and other instances of content in the document matching the contextual pattern are identified. If the document includes at least a second predefined number of the other instances of content matching the contextual pattern, then facts may be extracted from the other instances.
Abstract: Methods, systems, and products are disclosed for dynamically updating web content using W3C standards. One such method sends a request to a web server for a web page. A web browser receives and renders a static HTML web page. The web browser periodically sends a query to the web server and, in response, receives a latest date and time stamp indicating the latest update to the web page. The web browser compares the latest date and time stamp to a previously stored date and time stamp representing a previous update. If the latest date and time stamp matches the previously stored date and time stamp, then no update has occurred and no update is required. If, however, the date and time stamps do not match, then the web page has changed since the previous update and the web browser retrieves the latest update to the web page.
Abstract: A computer-implemented method is disclosed for determining a type of landing page to which to transfer web searchers that enter a particular query, the method comprising: classifying a landing page as one of a plurality of landing page classes with a trained classifier of a computer based on textual content of the landing page; determining, by the computer, characteristics of one or more query to be associated with the landing page; and choosing, with the computer, whether to retain or to change classification of the landing page to be associated with the one or more query based on relative average conversion rates of advertisements on a plurality of manually-classified landing pages when associated with the characteristics of the one or more query.
Type:
Application
Filed:
December 23, 2008
Publication date:
June 24, 2010
Applicant:
Yahoo! Inc.
Inventors:
Evgeniy Gabrilovich, Andrei Broder, Bo Pang, Vanja Josifovski, Hila Becker
Abstract: The present invention provides a method of establishing a plain text document from a HTML document. The method including the steps of (A) acquiring a HTML document defined by HTML elements, each composed of tags and content between the tags; (B) pre-processing the HTML document by omitting some of the tags (including the content between those tags), whereby the rest of the HTML document comprises at least one target tag (including content between the target tags); (C) using a data structure to store the remaining tags of the pre-processed HTML document; (D) grouping the remaining tags (including the content between the remaining tags) stored in the data structure of the pre-processed HTML document into at least one target group according to the target tag(s); and (E) identifying the target group(s) most related to a title of the HTML document by comparing correlation(s) between the target group(s) and the title, and establishing a plain text document having the content of the identified target group.
Abstract: Term negotiation can utilize centralized systems accessed via web interfaces for purposes such as mediation of communications between buyers and sellers, maintenance of a history of negotiations, and notification of parties regarding changes suggested during negotiation. Changes to terms proposed by parties using centralized systems can be stored in a data warehouse, potentially along with timestamp and identification information.
Type:
Application
Filed:
October 16, 2009
Publication date:
April 22, 2010
Inventors:
Gregory Austin Allison, Matthew Allan Vorst
Abstract: Signature schema documents, pre-defined in a query language, provide one or more instructions for application by an engine to transcode web pages of respective web sites. The instructions identify a web page family for the web page and extract a subset of data from the web page using one or more signatures previously identified within web pages of the same web page family (e.g. in accordance with a shared template for each family) of the web site. The instructions may include one or more directional references relative to the signatures to locate and extract the subset of data within the web page. Signatures may comprise text strings within the code of the web page and the directional references indicate positions of respective data relative to the location of the text strings. Transcoding may facilitate use of e-commerce web sites by wireless mobile devices.
Type:
Application
Filed:
May 12, 2008
Publication date:
June 18, 2009
Inventors:
Sang-Heun Kim, Charles Laurence Stinson
Abstract: The present invention relates to the field of computer software. More specifically, the present invention relates to methods of assisting aggregation of form-enabled web services. Systems and methods for handling the submission of user data into a plurality of form-enabled web sites are disclosed. The improved system allows for the presentation of a unified user interface, pre-filling of forms in order to increase user efficiency, and a fully automatic interface to the aggregated form-enabled web services.
Type:
Application
Filed:
October 19, 2007
Publication date:
April 23, 2009
Inventors:
David Jonathan Sickmiller, Jonathan Leighton Brown
Abstract: An exemplary embodiment of the present invention sets forth a system, method and/or computer program product which may include a graphical user interface (GUI) application embodied on a computer readable medium, which when executed on a processor performs a method. The method may include receiving a playlist may include a plurality of content of a plurality of different formats; and enabling a presenter to seamlessly deliver a presentation of the plurality of content to an audience.
Type:
Application
Filed:
July 18, 2008
Publication date:
April 16, 2009
Applicant:
Freepath, Inc.
Inventors:
John C. Schultheiss, Louis C. Douros, Adrian R. Pell, Kathryn M. Manley, John D. Stone, Jacob W. Jorgensen
Abstract: Techniques are described for organizing structurally similar web pages for a website. Fingerprints are made of the structure of the web pages using shingling by placing the web page's HTML tags and attributes in sequence and encoding the tags and attributes using a standard encoding technique. Fixed-size portions of the encoded sequence are taken and a set of values extracted using independent hash functions to compute the shingles. Alternatively, a DOM tree representation of HTML of the web page is generated and each path of the DOM tree encoded and values extracted using independent hash functions to compute the shingles. A specified number of shingles are retained as the fingerprint. The pages are then clustered based upon the URL and the similarity of the shingles. The clustered hierarchal organization of pages is further pruned by various criteria including similarity of shingles or support of the cluster node in the hierarchy.
Type:
Application
Filed:
August 14, 2007
Publication date:
February 19, 2009
Inventors:
Krishna Prasad Chitrapura, Krishna Leela Poola
Abstract: A web application and a method for creating complex query strings for conducting searches in through at least one database comprising structured documents that are structured in content-fields. The application comprises a GUI with an interactive table enabling users to insert search words, where the user can define the relations between at least some of the search words by the words in the interactive table. The searches through the structured documents' database may be conducted according to the search words and he relations between them. Additionally, the application may allow the user to associate content-fields with at least some of the search words and conduct the searches in the content-fields defined for each search word.
Abstract: Method for ordering nodes within hierarchical data. The concept of isolated ordered regions to maintain coordinates of nodes is used by associating each node with coordinates relative to a containing region. Modifications to nodes within a region only affect the nodes in that region, and not nodes in other regions. Traversals that retrieve information from the nodes can rebase the coordinates from their containing region and return with a total order.
Type:
Application
Filed:
August 13, 2007
Publication date:
February 14, 2008
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: The present invention provides a method of rendering document data compliant with an KM-based mark-up language, comprising the steps of: fetching the document data) parsing the document data into a document object model (DOM) representation so as to provide a tree structure, comprising nodes representative of the document data elements including tags and/or attributes; reconstructing the document object model (DOM) representation by replacing the nodes of pre-specified elements of said document data elements by one or more nodes comprising standard XML compliant elements having standard tags and attributes; rendering the document data with the reconstructed document object model (DOM) representation.