Patents Assigned to AltaVista Company

Method and apparatus for ranking web page search results

Publication number: 20040111412

Abstract: A method and apparatus for ranking a plurality of pages identified during a search of a linked database includes forming a linear combination of two or more matrices, and using the coefficients of the eigenvector of the resulting matrix to rank the quality of the pages. The matrices includes information about the pages and are generally normalized, stochastic matrices. The linear combination can include attractor matrices that indicate desirable or “high quality” sites, and/or non-attractor matrices that indicate sites that are undesirable. Attractor matrices and non-attractor matrices can be used alone or in combination with each other in the linear combination. Additional bias toward high quality sites, or away from undesirable sites, can be further introduced with probability weighting matrices for attractor and non-attractor matrices. Other known matrices, such as a co-citation matrix or a bibliographic coupling matrix, can also be used in the present invention.

Type: Application

Filed: May 6, 2003

Publication date: June 10, 2004

Applicant: AltaVista Company

Inventor: Andrei Z. Broder
Method for search engine generating supplemented search not included in conventional search result identifying entity data related to portion of located web page

Patent number: 6735585

Abstract: It is determined that an entity is registered as having control over the use of at least a portion of a World-Wide Web address for a first set of computer data that is accessible in accordance with Internet protocols. Enhanced search results are acquired that include a second set of computer data. The second set of computer data includes or links to information pertaining to the entity, other than information provided at the Web address.

Type: Grant

Filed: August 12, 1999

Date of Patent: May 11, 2004

Assignee: AltaVista Company

Inventors: Jeffery Dean Black, Jason Harvey Titus, Ira Joseph Woodhead
Technique for organizing data information in a network

Patent number: 6704738

Abstract: A technique for organizing data information in a network having a plurality of network stations is disclosed. In one embodiment, the technique is realized by having a first processing device store a representation of data at an address of a first of the plurality of network stations. A second processing device then stores the address at a second of the plurality of network stations in association with an identifier of the data. A third processing device then stores the data identifier at a third of the plurality of network stations in association with an annotation of the data.

Type: Grant

Filed: December 3, 1998

Date of Patent: March 9, 2004

Assignee: AltaVista Company

Inventors: Arjen P. de Vries, Leondias Kontothanassis, Frederic Dufaux, Michael Sokolov, David E. Kovalcin, Brian Eberman
Web page connectivity server

Patent number: 6598051

Abstract: A connectivity server for a collecting, arranging and representing data defining the interconnection of pages on the World Wide Web (Web). A URL Database stores URLs and associates a fingerprint and CS_id with each URL. The URL Database interface is operable to translate between any two of a URL, a fingerprint, and a Host_id. A Host Database associates a Host_id with each distinct hostname in the URL Database. The Host Database interface is operable to accept a Host_id and return a number equal to the number of URLs on the respective host and to return the CS_ids of those URLs. A Link Database stores links between source URLs and destination URLs. The Link Database interface is operable to retrieve, for a given CS_id, the number of inlinks to and outlinks from the URL corresponding to the CS_id.

Type: Grant

Filed: September 19, 2000

Date of Patent: July 22, 2003

Assignee: Altavista Company

Inventors: Janet L. Wiener, Raymond P. Stata, Michael Burrows
Technique for deleting duplicate records referenced in an index of a database

Publication number: 20020049753

Abstract: A computer implemented method performs constrained searching of an index of a database. The information of the database is stored as a plurality of records. A unique location is assigned to each indexable portion of information of the database. Index entries are written to a memory where each index entry includes a word entry representing a unique indexable portion of information, and one or more location entries for each occurrence of the unique indexable portion information. The index entries are sorted according to a collating order of the word entries, and sequentially according to the location entries of each index entry. A query is parsed to generate a first term and a second term related by an AND logical operator, the AND operator requires that a first index entry corresponding to the first term and a second index entry corresponding to the second term both have locations in the same record to satisfy a query.

Type: Application

Filed: August 3, 2001

Publication date: April 25, 2002

Applicant: AltaVista Company

Inventor: Michael Burrows
Method for clustering closely resembling data objects

Patent number: 6349296

Abstract: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.

Type: Grant

Filed: August 21, 2000

Date of Patent: February 19, 2002

Assignee: AltaVista Company

Inventors: Andrei Z. Broder, Steven C. Glassman, Charles G. Nelson, Mark S. Manasse, Geoffrey G. Zweig
Technique for annotating media

Patent number: 6332144

Abstract: To annotate media, one or more particular times within a period defined by a start time and an end time of a media stream forming an item of audio or video media, are identified. The identified times are those at which content within the media stream corresponds to an annotation value. The annotation value is associated with the identified times to annotate the media.

Type: Grant

Filed: December 3, 1998

Date of Patent: December 18, 2001

Assignee: AltaVista Company

Inventors: Arjen P. deVries, Michael Sokolov, David E. Kovalcin, Brian Eberman
Technique for locating an item of interest within a stored representation of data

Publication number: 20010051958

Abstract: A technique for accessing an item of interest within a particular one of a plurality of stored representations of data is disclosed. In one embodiment, the technique is realized by having a processing device searching a plurality of stored annotations corresponding to different items within the plurality of stored representations to locate an annotation of interest corresponding to the item of interest. The annotation of interest has an associated search identifier and an associated location identifier corresponding to a location of interest within the particular one of the plurality of stored representations. The processing device then searches a plurality of stored search identifiers associated with the plurality of stored annotations to locate the search identifier and an address identifier corresponding to a location of the particular one of the plurality of stored representations within the plurality of stored representations.

Type: Application

Filed: March 22, 2001

Publication date: December 13, 2001

Applicant: AltaVista Company

Inventors: Arjen P. deVries, Michael Sokolov, David E. Kovalcin, Brian Eberman, Leonidas Kontothanassis
System and method for enforcing politeness while scheduling downloads in a web crawler

Patent number: 6321265

Abstract: A web crawler downloads data sets from among a plurality of host computers. The web crawler enqueues data set addresses in a set of queues, with all data set addresses sharing a respective common host address being stored in a respective common one of the queues. Each non-empty queue is assigned a next download time. Multiple threads substantially concurrently process the data set addresses in the queues. The number of queues is at least as great as the number of threads, and the threads are dynamically assigned to the queues. In particular, each thread selects a queue not being serviced by any of the other threads. The queue is selected in accordance with the next download times assigned to the queues. The data set corresponding to a data set address in the selected queue is downloaded and processed, and the data set address is dequeued from the selected queue. When the selected queue is not empty after the dequeuing step, it is assigned an updated download time.

Type: Grant

Filed: November 2, 1999

Date of Patent: November 20, 2001

Assignee: AltaVista Company

Inventors: Marc Alexander Najork, Clark Allan Heydon
Method and apparatus for preventing topic drift in queries in hyperlinked environments

Patent number: 6321220

Abstract: A method and apparatus for preventing topic drift in queries in hyperlinked environments uses equivalence components for ranking pages containing information that is relevant to the topic of a user query input to a search engine. The method includes the step of providing a query to a search engine, where the query represents a predetermined topic; retrieving at least one page associated with the query; constructing a graph representing the pages in memory; creating at least one equivalence component representing a subset of the graph; processing each equivalence component; eliminating the equivalence component in accordance with whether it matches the predetermined topic; and ranking the remaining pages.

Type: Grant

Filed: December 7, 1998

Date of Patent: November 20, 2001

Assignee: AltaVista Company

Inventors: Jeffrey Dean, Monika R. Henzinger, Krishna Asur Bharat
Technique for ranking records of a database

Patent number: 6317741

Abstract: A technique for ranking records of a database is disclosed. The database records to be ranked are located during a search of an index to the database performed in response to a query received from a user. The index has a plurality of index entries, wherein each index entry has a weight. The query has a plurality of query terms, wherein each query term corresponds to an index entry. In one embodiment, the technique is realized by scoring each located record according to the number of times portions of information corresponding to each query term occur in each record and the weight of each index entry corresponding to each occurring query term. The score and an identifier of each located record are then stored in a respective entry of a ranking list. The ranking list has a limit on the number of entries that are stored therein.

Type: Grant

Filed: August 7, 2000

Date of Patent: November 13, 2001

Assignee: Altavista Company

Inventor: Michael Burrows
Technique for matching a query to a portion of media

Patent number: 6311189

Abstract: A method for matching a query to a portion of media, includes receiving a query relating to media of interest and searching, based upon the query, an index of annotations. Each of the annotations represents a respective item of available media and includes a plurality of annotation values. Each of the plurality of annotation values represents a portion of the represented item of available media. By matching the query to an annotation value within the index, the start time of a media stream forming the portion of the item of available media represented by the identified annotation value can be identified. The identified media stream start time can then be provided in response to the query, allowing the appropriate portion of the applicable item of available media to be directly accessed.

Type: Grant

Filed: December 3, 1998

Date of Patent: October 30, 2001

Assignee: Altavista Company

Inventors: Arjen P. deVries, Michael Sokolov, David E. Kovalcin, Brian Eberman, Leondias Kontothanassis
Technique for processing data

Patent number: 6275827

Abstract: A technique for processing data is disclosed. In one embodiment, the technique is realized by receiving a first representation of data at a processing device, which then processes the first representation of data so as to generate a second representation of data. The second representation of data includes a plurality of dependent data representations and a plurality of independent data representations. Each of the plurality of dependent data representations is substantially aligned in time with a corresponding one of the plurality of independent data representations.

Type: Grant

Filed: December 3, 1998

Date of Patent: August 14, 2001

Assignee: AltaVista Company

Inventors: Arjen P. deVries, Leondias Kontothanassis, Frederic Dufaux, Michael Sokolov, David E. Kovalcin, Brian Eberman
Technique for indexing data in a network

Patent number: 6266657

Abstract: A technique for indexing data in a network having a plurality of network stations is disclosed. In one embodiment, the technique is realized by receiving a data identifier at a first of the plurality of network stations from a second of the plurality of network stations. The data identifier is then stored at the first network station. The first network station then receives an annotation from the second network station, wherein the annotation is associated with the data identifier. The annotation is then stored at the first network station in association with the data identifier.

Type: Grant

Filed: October 26, 1999

Date of Patent: July 24, 2001

Assignee: AltaVista Company

Inventors: Arjen P. deVries, Leonidas Kontothanassis, Michael Sokolov, David E. Kovalcin, Brian Eberman
Method for determining the resemining the resemblance of documents

Patent number: 6230155

Abstract: A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.

Type: Grant

Filed: November 23, 1998

Date of Patent: May 8, 2001

Assignee: AltaVista Company

Inventors: Andrei Zary Broder, Charles Gregory Nelson
Method for indexing duplicate records of information of a database

Patent number: 6230158

Abstract: A computer implemented method indexes duplicate information stored in records having different unique addresses in a database. A fingerprint is generated for each record, the fingerprint is a singular value derived from all of the information of the record. The fingerprint is stored in the index as a unique fingerprint if the fingerprint is different than a previously stored fingerprint of the index. A reference to the unique address of the record is stored with the fingerprint. If the fingerprint is identical to the previously stored fingerprint, then store the reference to the address of the record with the previously stored fingerprint.

Type: Grant

Filed: October 19, 1999

Date of Patent: May 8, 2001

Assignee: Altavista Company

Inventor: Michael Burrows
Technique for storing data information within a network

Patent number: 6219671

Abstract: A method for storing data information in a network having a plurality of network stations includes receiving, at a first of the plurality of network stations from a second of the plurality of network stations, an address identifier corresponding to a location of a representation of data at a third of the plurality of network stations. The address identifier is then stored in association with a data identifier.

Type: Grant

Filed: December 3, 1998

Date of Patent: April 17, 2001

Assignee: AltaVista Company

Inventors: Arjen P. de Vries, Michael Sokolov, David E. Kovalcin, Brian Eberman, Leonidas Kontothanassis
Method for identifying near duplicate pages in a hyperlinked database

Patent number: 6138113

Abstract: A method is described for identifying pages that are near duplicates in a linked database. In the linked database, pages can have incoming links and outgoing links. Two pages are selected, a first page and a second page. For each selected page, the number of outgoing links is determined. The two pages are marked as near duplicates based on the number of common outgoing links for the two pages.

Type: Grant

Filed: August 10, 1998

Date of Patent: October 24, 2000

Assignee: AltaVista Company

Inventors: Jeffrey Dean, Monika R. Henzinger
Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis

Patent number: 6112203

Abstract: In a computerized method, a set of documents is ranked according to their content and their connectivity by using topic distillation. The documents include links that connect the documents to each other, either directly, or indirectly. A graph is constructed in a memory of a computer system. In the graph, nodes represent the documents, and directed edges represent the links. Based on the number of links connecting the various nodes, a subset of documents is selected to form a topic. A second subset of the documents is chosen based on the number of directed edges connecting the nodes. Nodes in the second subset are compared with the topic to determine similarity to the topic, and a relevance weight is correspondingly assigned to each node. Nodes in the second subset having a relevance weight less than a predetermined threshold are pruned from the graph. The documents represented by the remaining nodes in the graph are ranked by connectivity based ranking scheme.

Type: Grant

Filed: April 9, 1998

Date of Patent: August 29, 2000

Assignee: AltaVista Company

Inventors: Krishna Asur Bharat, Monika R. Henzinger