Patents Assigned to AltaVista Company
-
Publication number: 20040111412Abstract: A method and apparatus for ranking a plurality of pages identified during a search of a linked database includes forming a linear combination of two or more matrices, and using the coefficients of the eigenvector of the resulting matrix to rank the quality of the pages. The matrices includes information about the pages and are generally normalized, stochastic matrices. The linear combination can include attractor matrices that indicate desirable or “high quality” sites, and/or non-attractor matrices that indicate sites that are undesirable. Attractor matrices and non-attractor matrices can be used alone or in combination with each other in the linear combination. Additional bias toward high quality sites, or away from undesirable sites, can be further introduced with probability weighting matrices for attractor and non-attractor matrices. Other known matrices, such as a co-citation matrix or a bibliographic coupling matrix, can also be used in the present invention.Type: ApplicationFiled: May 6, 2003Publication date: June 10, 2004Applicant: AltaVista CompanyInventor: Andrei Z. Broder
-
Patent number: 6735585Abstract: It is determined that an entity is registered as having control over the use of at least a portion of a World-Wide Web address for a first set of computer data that is accessible in accordance with Internet protocols. Enhanced search results are acquired that include a second set of computer data. The second set of computer data includes or links to information pertaining to the entity, other than information provided at the Web address.Type: GrantFiled: August 12, 1999Date of Patent: May 11, 2004Assignee: AltaVista CompanyInventors: Jeffery Dean Black, Jason Harvey Titus, Ira Joseph Woodhead
-
Patent number: 6704738Abstract: A technique for organizing data information in a network having a plurality of network stations is disclosed. In one embodiment, the technique is realized by having a first processing device store a representation of data at an address of a first of the plurality of network stations. A second processing device then stores the address at a second of the plurality of network stations in association with an identifier of the data. A third processing device then stores the data identifier at a third of the plurality of network stations in association with an annotation of the data.Type: GrantFiled: December 3, 1998Date of Patent: March 9, 2004Assignee: AltaVista CompanyInventors: Arjen P. de Vries, Leondias Kontothanassis, Frederic Dufaux, Michael Sokolov, David E. Kovalcin, Brian Eberman
-
Patent number: 6598051Abstract: A connectivity server for a collecting, arranging and representing data defining the interconnection of pages on the World Wide Web (Web). A URL Database stores URLs and associates a fingerprint and CS_id with each URL. The URL Database interface is operable to translate between any two of a URL, a fingerprint, and a Host_id. A Host Database associates a Host_id with each distinct hostname in the URL Database. The Host Database interface is operable to accept a Host_id and return a number equal to the number of URLs on the respective host and to return the CS_ids of those URLs. A Link Database stores links between source URLs and destination URLs. The Link Database interface is operable to retrieve, for a given CS_id, the number of inlinks to and outlinks from the URL corresponding to the CS_id.Type: GrantFiled: September 19, 2000Date of Patent: July 22, 2003Assignee: Altavista CompanyInventors: Janet L. Wiener, Raymond P. Stata, Michael Burrows
-
Publication number: 20020049753Abstract: A computer implemented method performs constrained searching of an index of a database. The information of the database is stored as a plurality of records. A unique location is assigned to each indexable portion of information of the database. Index entries are written to a memory where each index entry includes a word entry representing a unique indexable portion of information, and one or more location entries for each occurrence of the unique indexable portion information. The index entries are sorted according to a collating order of the word entries, and sequentially according to the location entries of each index entry. A query is parsed to generate a first term and a second term related by an AND logical operator, the AND operator requires that a first index entry corresponding to the first term and a second index entry corresponding to the second term both have locations in the same record to satisfy a query.Type: ApplicationFiled: August 3, 2001Publication date: April 25, 2002Applicant: AltaVista CompanyInventor: Michael Burrows
-
Patent number: 6349296Abstract: A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.Type: GrantFiled: August 21, 2000Date of Patent: February 19, 2002Assignee: AltaVista CompanyInventors: Andrei Z. Broder, Steven C. Glassman, Charles G. Nelson, Mark S. Manasse, Geoffrey G. Zweig
-
Patent number: 6332144Abstract: To annotate media, one or more particular times within a period defined by a start time and an end time of a media stream forming an item of audio or video media, are identified. The identified times are those at which content within the media stream corresponds to an annotation value. The annotation value is associated with the identified times to annotate the media.Type: GrantFiled: December 3, 1998Date of Patent: December 18, 2001Assignee: AltaVista CompanyInventors: Arjen P. deVries, Michael Sokolov, David E. Kovalcin, Brian Eberman
-
Publication number: 20010051958Abstract: A technique for accessing an item of interest within a particular one of a plurality of stored representations of data is disclosed. In one embodiment, the technique is realized by having a processing device searching a plurality of stored annotations corresponding to different items within the plurality of stored representations to locate an annotation of interest corresponding to the item of interest. The annotation of interest has an associated search identifier and an associated location identifier corresponding to a location of interest within the particular one of the plurality of stored representations. The processing device then searches a plurality of stored search identifiers associated with the plurality of stored annotations to locate the search identifier and an address identifier corresponding to a location of the particular one of the plurality of stored representations within the plurality of stored representations.Type: ApplicationFiled: March 22, 2001Publication date: December 13, 2001Applicant: AltaVista CompanyInventors: Arjen P. deVries, Michael Sokolov, David E. Kovalcin, Brian Eberman, Leonidas Kontothanassis
-
Patent number: 6321265Abstract: A web crawler downloads data sets from among a plurality of host computers. The web crawler enqueues data set addresses in a set of queues, with all data set addresses sharing a respective common host address being stored in a respective common one of the queues. Each non-empty queue is assigned a next download time. Multiple threads substantially concurrently process the data set addresses in the queues. The number of queues is at least as great as the number of threads, and the threads are dynamically assigned to the queues. In particular, each thread selects a queue not being serviced by any of the other threads. The queue is selected in accordance with the next download times assigned to the queues. The data set corresponding to a data set address in the selected queue is downloaded and processed, and the data set address is dequeued from the selected queue. When the selected queue is not empty after the dequeuing step, it is assigned an updated download time.Type: GrantFiled: November 2, 1999Date of Patent: November 20, 2001Assignee: AltaVista CompanyInventors: Marc Alexander Najork, Clark Allan Heydon
-
Patent number: 6321220Abstract: A method and apparatus for preventing topic drift in queries in hyperlinked environments uses equivalence components for ranking pages containing information that is relevant to the topic of a user query input to a search engine. The method includes the step of providing a query to a search engine, where the query represents a predetermined topic; retrieving at least one page associated with the query; constructing a graph representing the pages in memory; creating at least one equivalence component representing a subset of the graph; processing each equivalence component; eliminating the equivalence component in accordance with whether it matches the predetermined topic; and ranking the remaining pages.Type: GrantFiled: December 7, 1998Date of Patent: November 20, 2001Assignee: AltaVista CompanyInventors: Jeffrey Dean, Monika R. Henzinger, Krishna Asur Bharat
-
Patent number: 6317741Abstract: A technique for ranking records of a database is disclosed. The database records to be ranked are located during a search of an index to the database performed in response to a query received from a user. The index has a plurality of index entries, wherein each index entry has a weight. The query has a plurality of query terms, wherein each query term corresponds to an index entry. In one embodiment, the technique is realized by scoring each located record according to the number of times portions of information corresponding to each query term occur in each record and the weight of each index entry corresponding to each occurring query term. The score and an identifier of each located record are then stored in a respective entry of a ranking list. The ranking list has a limit on the number of entries that are stored therein.Type: GrantFiled: August 7, 2000Date of Patent: November 13, 2001Assignee: Altavista CompanyInventor: Michael Burrows
-
Patent number: 6311189Abstract: A method for matching a query to a portion of media, includes receiving a query relating to media of interest and searching, based upon the query, an index of annotations. Each of the annotations represents a respective item of available media and includes a plurality of annotation values. Each of the plurality of annotation values represents a portion of the represented item of available media. By matching the query to an annotation value within the index, the start time of a media stream forming the portion of the item of available media represented by the identified annotation value can be identified. The identified media stream start time can then be provided in response to the query, allowing the appropriate portion of the applicable item of available media to be directly accessed.Type: GrantFiled: December 3, 1998Date of Patent: October 30, 2001Assignee: Altavista CompanyInventors: Arjen P. deVries, Michael Sokolov, David E. Kovalcin, Brian Eberman, Leondias Kontothanassis
-
Patent number: 6275827Abstract: A technique for processing data is disclosed. In one embodiment, the technique is realized by receiving a first representation of data at a processing device, which then processes the first representation of data so as to generate a second representation of data. The second representation of data includes a plurality of dependent data representations and a plurality of independent data representations. Each of the plurality of dependent data representations is substantially aligned in time with a corresponding one of the plurality of independent data representations.Type: GrantFiled: December 3, 1998Date of Patent: August 14, 2001Assignee: AltaVista CompanyInventors: Arjen P. deVries, Leondias Kontothanassis, Frederic Dufaux, Michael Sokolov, David E. Kovalcin, Brian Eberman
-
Patent number: 6266657Abstract: A technique for indexing data in a network having a plurality of network stations is disclosed. In one embodiment, the technique is realized by receiving a data identifier at a first of the plurality of network stations from a second of the plurality of network stations. The data identifier is then stored at the first network station. The first network station then receives an annotation from the second network station, wherein the annotation is associated with the data identifier. The annotation is then stored at the first network station in association with the data identifier.Type: GrantFiled: October 26, 1999Date of Patent: July 24, 2001Assignee: AltaVista CompanyInventors: Arjen P. deVries, Leonidas Kontothanassis, Michael Sokolov, David E. Kovalcin, Brian Eberman
-
Patent number: 6230155Abstract: A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.Type: GrantFiled: November 23, 1998Date of Patent: May 8, 2001Assignee: AltaVista CompanyInventors: Andrei Zary Broder, Charles Gregory Nelson
-
Patent number: 6230158Abstract: A computer implemented method indexes duplicate information stored in records having different unique addresses in a database. A fingerprint is generated for each record, the fingerprint is a singular value derived from all of the information of the record. The fingerprint is stored in the index as a unique fingerprint if the fingerprint is different than a previously stored fingerprint of the index. A reference to the unique address of the record is stored with the fingerprint. If the fingerprint is identical to the previously stored fingerprint, then store the reference to the address of the record with the previously stored fingerprint.Type: GrantFiled: October 19, 1999Date of Patent: May 8, 2001Assignee: Altavista CompanyInventor: Michael Burrows
-
Patent number: 6219671Abstract: A method for storing data information in a network having a plurality of network stations includes receiving, at a first of the plurality of network stations from a second of the plurality of network stations, an address identifier corresponding to a location of a representation of data at a third of the plurality of network stations. The address identifier is then stored in association with a data identifier.Type: GrantFiled: December 3, 1998Date of Patent: April 17, 2001Assignee: AltaVista CompanyInventors: Arjen P. de Vries, Michael Sokolov, David E. Kovalcin, Brian Eberman, Leonidas Kontothanassis
-
Patent number: 6138113Abstract: A method is described for identifying pages that are near duplicates in a linked database. In the linked database, pages can have incoming links and outgoing links. Two pages are selected, a first page and a second page. For each selected page, the number of outgoing links is determined. The two pages are marked as near duplicates based on the number of common outgoing links for the two pages.Type: GrantFiled: August 10, 1998Date of Patent: October 24, 2000Assignee: AltaVista CompanyInventors: Jeffrey Dean, Monika R. Henzinger
-
Patent number: 6112203Abstract: In a computerized method, a set of documents is ranked according to their content and their connectivity by using topic distillation. The documents include links that connect the documents to each other, either directly, or indirectly. A graph is constructed in a memory of a computer system. In the graph, nodes represent the documents, and directed edges represent the links. Based on the number of links connecting the various nodes, a subset of documents is selected to form a topic. A second subset of the documents is chosen based on the number of directed edges connecting the nodes. Nodes in the second subset are compared with the topic to determine similarity to the topic, and a relevance weight is correspondingly assigned to each node. Nodes in the second subset having a relevance weight less than a predetermined threshold are pruned from the graph. The documents represented by the remaining nodes in the graph are ranked by connectivity based ranking scheme.Type: GrantFiled: April 9, 1998Date of Patent: August 29, 2000Assignee: AltaVista CompanyInventors: Krishna Asur Bharat, Monika R. Henzinger