Web Crawlers Patents (Class 707/709)
  • Patent number: 8744839
    Abstract: Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.
    Type: Grant
    Filed: September 22, 2011
    Date of Patent: June 3, 2014
    Assignee: Alibaba Group Holding Limited
    Inventors: Haibo Sun, Yang Yang, Yining Chen
  • Patent number: 8745183
    Abstract: An improved system and method is provided for adaptively refreshing a web page. A base version of the web page may be partitioned into a collection of fragments. Then the collection of fragments may be compared with the corresponding fragments of a recent version of the web page to determine a divergence measurement of the difference between the base version and the recent version of the web page. The divergence measurement may be recorded in a change profile representing a change history of the web page that includes a sequence of numeric pairs indicating a time offset and a divergence measurement of the difference between a version of the web page at the time offset and a base version of the web page. The refresh period for the web page may be adjusted by applying an adaptive refresh policy using the divergence measurements recorded in the change profile.
    Type: Grant
    Filed: October 26, 2006
    Date of Patent: June 3, 2014
    Assignee: Yahoo! Inc.
    Inventor: Christopher Olston
  • Publication number: 20140149381
    Abstract: A system for determining whether a website is an illegitimate website, the system comprising: a requester module configured to request one or more rules from a host server for a website and to receive a response from the host server in response to a request; an analysis module configured to determine whether a response or lack of a response received by the requester module indicates that the website is an illegitimate website; and a record module configured to store an indication that the website is an illegitimate website, wherein the one or more rules provide one or more instructions to a robot computer program regarding access of the website by the robot computer program.
    Type: Application
    Filed: July 3, 2013
    Publication date: May 29, 2014
    Inventors: Alexey Chudnovskiy, Steve Pitchford
  • Publication number: 20140149379
    Abstract: This system provides a web site with favorable web ranking on a local basis by major search engines. It does this by providing a web page containing the proprietary Question & Answer content section (this is the content section in which this patent will cover). The Q&A section contains content about the business, product(s) and location (city, state) and also contains embedded contextual web links that links to similar content within the same business vertical determined by SIC (Standard Industry Code). These outbound links contains primary keywords that the business wants to obtain optimal ranking and will link other businesses within the same state but not the same city so that direct local area competitors are not linked. The Q&A content section can be deployed as HTML embedded on a separate web page part of a client's web site, or be hosted by anyone else as a micro site.
    Type: Application
    Filed: January 1, 2012
    Publication date: May 29, 2014
    Applicant: YouInWeb Software Inc.
    Inventors: James Andreas Hjelming, Telemaco Bamabei
  • Publication number: 20140149380
    Abstract: Briefly, the disclosure describes embodiments of methods or apparatuses for document processing at distributed processing nodes.
    Type: Application
    Filed: November 26, 2012
    Publication date: May 29, 2014
    Applicant: YAHOO! INC.
    Inventors: Fakrudeen Ali Ahmed, Souri Nath Datta, Vikram Verma, Aravindan Raghuveer, Muralidhar Hanumantachar Sortur, Syama Prasad Suprasadachandranpillai, Tom Praison Rajadurai A., Sachidanand Alle
  • Publication number: 20140149382
    Abstract: A web site page has a reference for providing an address for a next page. The web site is crawled by a crawler program, which parses the reference from one of the web pages and sends the reference to an applet running in a browser. The address for the next page is determined by the browser responsive to the reference and is sent to the crawler. The crawler selects non-hypertext-link parameters from the web page of the web site server by performing a programmed action sequence, including selecting items from lists of the web page in a particular sequence. The crawler sends the applet running in the browser, for the query to the web server for the next page referenced by the one web page, the selected parameters and a context arising from the particular sequence.
    Type: Application
    Filed: January 30, 2014
    Publication date: May 29, 2014
    Applicant: International Business Machines Corporation
    Inventors: Elizabeth A. Brodsky, Elmootazbellah N. Elnozahy, Ramakrishnan Rajamony
  • Patent number: 8738604
    Abstract: One embodiment of a method of the present invention for discovering sensitive information on computer network provides for discovering databases on a computer network, defining a pattern for a data discovery, discovering qualifying records by matching the pattern with field names and/or record values in the databases, sending electronic notification to a database administrator managing the qualifying database, receiving a selection choice from the database administrator managing the qualifying database identifying the status for the qualifying records.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: May 27, 2014
    Assignee: Go Daddy Operating Company, LLC
    Inventors: Ganesh Devarajan, Todd Redfoot
  • Patent number: 8738605
    Abstract: One embodiment of a system of the present invention for discovering sensitive information on computer network includes means for discovering databases on a computer network, means for defining a pattern for a data discovery, means for discovering qualifying records by matching the pattern with field names and/or record values in the databases, means for sending electronic notification to a database administrator managing the qualifying database, means for receiving a selection choice from the database administrator managing the qualifying database identifying the status for the qualifying records.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: May 27, 2014
    Assignee: Go Daddy Operating Company, LLC
    Inventors: Ganesh Devarajan, Todd Redfoot
  • Patent number: 8738603
    Abstract: A method of accessing feeds based on metrics is provided. Feeds, each associated with an object stored in a database system, are provided to users of the database system. Inferential user interaction data captures implicit user behavior of users of the database system, wherein the data is generated in relation to a feed. Feed metrics are determined based on the user interaction data, wherein a feed metric is based upon statistics comprising user consumption, user responsiveness, content proliferation, and feed life. Finally, an action is executed in relation to at least one feed based on the feed metrics, wherein the action comprises discontinuing the feed, characterizing a feed, determining that a feed can be monetized, determining that a feed should be cached, or determining that intervention in a feed is advisable.
    Type: Grant
    Filed: May 19, 2011
    Date of Patent: May 27, 2014
    Assignee: salesforce.com, inc.
    Inventor: Ronald F. Fischer
  • Publication number: 20140143228
    Abstract: Techniques for ascribing social attributes to content items and for selecting content to display in a content feed are described. According to various embodiments, accessing one or more content items accessible via a network are accessed, each of the content items having received one or more social activity signals. Thereafter, members of an online social network service that submitted the social activity signals may be identified. Member profile data identifying member profile attributes of the members cemented the social activity signals may then be accessed. Thereafter, social attribute information may be generated and associated with each of the content items, the social attribute information identifying the member profile attributes of the members that submitted the social activity signals associated with each of the content items.
    Type: Application
    Filed: November 20, 2013
    Publication date: May 22, 2014
    Applicant: Linkedln Corporation
    Inventors: Allen Blue, Ryan Roslansky
  • Patent number: 8731929
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Grant
    Filed: February 4, 2009
    Date of Patent: May 20, 2014
    Assignee: VoiceBox Technologies Corporation
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
  • Patent number: 8732150
    Abstract: Disclosed are systems, apparatus, methods, and computer readable media for suppressing network feed activities using an information feed in an on-demand database service environment. In one embodiment, a message is received, including data indicative of a user action. An entity associated with the user action is identified, where the entity is a type of record stored in a database. A type of the entity is identified. It is determined whether the entity type is a prohibited entity type. When the entity type is not a prohibited entity type, the message data is saved to one or more tables in the database. The tables are configured to store feed items of an information feed capable of being displayed on a device. When the entity type is a prohibited entity type, the saving of the message data, to the one or more tables in the database configured to store the feed items, is prohibited.
    Type: Grant
    Filed: February 10, 2011
    Date of Patent: May 20, 2014
    Assignee: salesforce.com, inc.
    Inventors: William Gradin, Matthew Davidchuk, Qiu Ma, Leonid Zemskov, Amy Palke
  • Publication number: 20140136509
    Abstract: A method for searching includes displaying keywords on an electronic display. The keywords are from results of an internet search of search criteria. A keyword of a search result is related to another keyword of the search result with a particular bond strength and the bond strength includes an amount that keywords in a search result are related. The method includes receiving a selection of two or more of the displayed keywords, setting a bond strength between two or more of the selected keywords, and displaying search results with a bond strength of at least the selected bond strength.
    Type: Application
    Filed: February 26, 2013
    Publication date: May 15, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Barry Alan Kritt, Sarbajit Kumar Rakshit
  • Publication number: 20140136508
    Abstract: A system and method for providing Web site navigation recommendations is provided. A Web page of interest is identified as a destination Web page. A domain of Web pages related to the destination Web page is determined. Information is extracted from each Web page in the domain and a recommendation comprising instructions for navigating to the destination Web page is generated based on the extracted information.
    Type: Application
    Filed: November 9, 2012
    Publication date: May 15, 2014
    Applicant: Palo Alto Research Center Incorporated
    Inventors: Kristian Lyngbaek, Lester D. Nelson, Eric A. Bier, Margaret H. Szymanski
  • Patent number: 8725719
    Abstract: In accordance with certain embodiments, requests to collect structured data in a web page and to subscribe to that structured data are received. This structured data is stored in a data store to allow offline use of the structured data. In accordance with other embodiments, a computing device displays multiple links each of which identifies a different one of multiple web pages. Additionally, the multiple pages include structured data. The display of these multiple links is altered as the computing device detects changes to the structured data in the web pages. In accordance with other embodiments, a web page includes structured data that has been subscribed to. The computing device detects changes to the web page, and notifies a user of a change to the web page only if the change is a change to the structured data and not a change to other portions of the web page.
    Type: Grant
    Filed: February 13, 2007
    Date of Patent: May 13, 2014
    Assignee: Microsoft Corporation
    Inventors: Jane T. Kim, Walter VonKoch, Sean O. Lyndersay, Benjamin N. Truelove, Miladin Pavlicic
  • Patent number: 8725710
    Abstract: A method for generating a probabilistic relational database based on a previous relational database, wherein the probability estimation is carried out within the relational paradigm. The method may include frequency-based and information theoretic based (idf-based) estimation of tuple probabilities. The method generalises the use of information retrieval ranking methods such that these and new ranking methods are available for relational databases.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: May 13, 2014
    Assignee: Queen Mary & Westfield College
    Inventor: Thomas Rölleke
  • Publication number: 20140129539
    Abstract: Personalization of Internet search is effected through the use of ResultRank and searcher selected profile attributes and searcher selected query context attributes. These attributes are also referred to as hats (worn by the searcher). Searcher privacy is maintained by allowing limited use of a searcher's profile by the search engine. Query language interpretation is improved by capture and use of searcher behavior and hat selection, in past search sessions, without storage of individual profile or context information. ResultRank is maintained and adjusted, on a per hat basis such that future, similarly hatted searchers benefit from these past sessions. An average of ResultRank, across searcher selected hats, is utilized for improved SERP ranking Recognition of QLP's is improved by use of the hats. Custom support of public and private language community circles is incorporated. The technique is applied to organic as well as sponsored results.
    Type: Application
    Filed: October 13, 2012
    Publication date: May 8, 2014
    Inventor: Paul Vincent Hayes
  • Publication number: 20140129540
    Abstract: Automatically creating and modifying a search engine for a website. User input may be received specifying an address of a website. A search engine may be automatically created for the website based on the user input. Webpages of the website may specify a plurality of tags specifying custom attributes of the webpages. During creation of the search engine, these custom attributes may be incorporated into the search engine index. Additional user input may be received customizing the search engine for various search engine contexts, e.g., based on the custom attributes of the webpages. Search engine results for the website may be based on various ranking functions, potentially including social impact of webpages of the website.
    Type: Application
    Filed: November 2, 2012
    Publication date: May 8, 2014
    Applicant: SWIFTYPE, INC.
    Inventors: Matthew T. Riley, Quinlan J. Hoxie
  • Publication number: 20140129541
    Abstract: Web crawling configuration includes: obtaining a webpage comprising a plurality of receiving a user selection of a node in the webpage; presenting a set of web crawling configuration options pertaining to a web crawling action to be performed with respect to the node, the set of web crawling configuration options depending at least in part on a type of an element included in the node and comprising: a first option to perform a first web crawling action in the event that the node include a first type of the element; and a second option to perform a second web crawling action in the event that the node includes a second type of the element; receiving a user input specifying the web crawling configuration option; and storing user specified web crawling configuration option, performing the web crawling action on the node according to the user input, or both.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 8, 2014
    Applicant: Alibaba Group Holding Limited
    Inventors: Yiming Sun, Qi Qiang, Boyang Cai, Xiaojun Jin, Zongyuan Wu
  • Publication number: 20140129490
    Abstract: Architecture that includes a junk (unwanted) image detection algorithm which performs junk image detection of unwanted images before the images are actually downloaded for indexing. Features are employed related to image location information and host websites, such as image path descriptor (e.g., URL-uniform resource locator) pattern features, webpage content features, click features, and image aggregated information in a machine learning based framework to predict the probability that an image is unwanted (or wanted) before the images are downloaded. The framework is then applied to build a statistical model and predict junk scores. By removing image URLs marked as “junk” from the work list of an automated indexer (e.g., crawler), the indexer bandwidth is significantly improved with a corresponding improvement in the publish rate.
    Type: Application
    Filed: November 5, 2012
    Publication date: May 8, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Zhong Wu, Xian-Sheng Hua
  • Patent number: 8712992
    Abstract: A method and system for retrieving data from a webpage is described herein. A scheduler organizes, or rather orders, a group of webpage identifiers according to some predetermined criteria. Based upon this ordering, a fetcher may be configured to fetch data from webpages identified by the identifiers. To promote efficiency and reduce the latency between when a webpage is updated and when the fetcher retrieves data from the webpage, the scheduler may be configured to reorder the identifiers in such a manner that it causes an identifier that was less relevant, and would not have been sent to the fetcher, to become more relevant. In this way, the method and system may be particularly useful for retrieving data related to webpages that are updated frequently, such as social media webpages, for example.
    Type: Grant
    Filed: March 28, 2009
    Date of Patent: April 29, 2014
    Assignee: Microsoft Corporation
    Inventors: Alexey Maykov, Matthew F. Hurst
  • Patent number: 8712999
    Abstract: A computer-implemented method for generating online search results includes receiving, over the Internet, referring URL data including a query, and a network site ID for a network site that was visited based on third party search engine analysis of the query; generating indexed query and network site data based on the received referring URL data and network site; receiving a new query from a user; determining a network site relevant to the new query based on the indexed query and network site data; and displaying to the user a link to the network site. Systems for generating online search results are also disclosed.
    Type: Grant
    Filed: June 9, 2011
    Date of Patent: April 29, 2014
    Assignee: AOL Inc.
    Inventors: Ian Holsman, Vaijanath N. Rao
  • Publication number: 20140114946
    Abstract: A flexible and extensible architecture allows for secure searching across an enterprise. Such an architecture can provide a simple Internet-like search experience to users searching secure content inside (and outside) the enterprise. The architecture allows for the crawling and searching of a variety of sources across an enterprise, regardless of whether any of these sources conform to a conventional user role model. The architecture further allows for security attributes to be submitted at query time, for example, in order to provide real-time secure access to enterprise resources. The user query also can be transformed to provide for dynamic querying that provides for a more current result list than can be obtained for static queries.
    Type: Application
    Filed: December 30, 2013
    Publication date: April 24, 2014
    Applicant: Oracle International Corporation
    Inventors: Mark Ture, Muralidhar Krishnaprasad, Joaquin Delgado
  • Patent number: 8706631
    Abstract: An Internet-coupled transaction service has a link to a computer appliance coupled to a merchant site, the computer appliance operated by a person who has selected one or more products or services to purchase at the merchant site, and who has selected, through the merchant site, the transaction service to arrange payment, and software executing from a computer-readable medium accessible to the service. The transaction service, via the software verifies the identity of the person, determines a credit worthiness for the person, and the score being sufficient, arranges payment to be made to the merchant on behalf of the person, and arranges repayment terms with the person for the payment to the merchant.
    Type: Grant
    Filed: March 21, 2008
    Date of Patent: April 22, 2014
    Assignee: Sound Starts, Inc.
    Inventor: Pankaj Gupta
  • Patent number: 8700600
    Abstract: A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages.
    Type: Grant
    Filed: January 17, 2012
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Lei Zhang, Wei-Ying Ma, Wei Lai, Jiangming Yang, Rui Cai
  • Patent number: 8700599
    Abstract: Various technologies described herein pertain to suggesting context dependent keywords for advertising. A set of seed queries can be identified from a context, where the context is a source keyword, a search query, a category, or a landing page. Moreover, the set of seed queries can be inputted to a search engine. A predetermined number of web pages returned by the search engine upon executing the set of seed queries can be retrieved. Candidate keywords can be extracted from the web pages returned by the search engine. Further, keywords from the candidate keywords can be selected from the candidate keywords based on relevance scores of the candidate keywords.
    Type: Grant
    Filed: November 21, 2011
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Uppinakuduru Raghavendra Udupa, Santosh Raju Vysyaraju
  • Patent number: 8694667
    Abstract: A filtering method and system. The method includes receiving by a computer processor an audio/video data file and filtering data. The computer processor analyzes the filtering data with respect to the audio/video data file and retrieves specified audio/video data portions comprising data objects within frames of the audio/video data file. The computer processor removes gaps existing in the audio/video data file and receives tags comprising instructions for presenting video data of the audio/video data file, audio data of the audio/video data file, and the specified audio/video data portions. The computer processor stores the video data in a first layer of a multimedia file, the audio data in a second layer of the multimedia file, and the specified audio/video data portions in additional layers of the multimedia file. Each of the first layer, the second layer, and the additional layers comprises a tag layer comprising the tags.
    Type: Grant
    Filed: January 5, 2011
    Date of Patent: April 8, 2014
    Assignee: International Business Machines Corporation
    Inventor: Sarbajit K. Rakshit
  • Patent number: 8688860
    Abstract: A method for migrating information, and a migrator for migrating information, are disclosed. The method may include extracting organizational information from at least two service providers, accessing a first at least one of the at least two service providers upon selection of a migration selection interface by the user, receiving of a first plurality of information related to the user from one of the service providers, accessing a second at least one of the at least two service providers, and writing the first plurality of information to the second at least one of the at least two service providers.
    Type: Grant
    Filed: October 31, 2013
    Date of Patent: April 1, 2014
    Assignee: LinkedIn Corporation
    Inventors: Tomy K. Isaac, Mark Kasiraja
  • Patent number: 8688681
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying web hosting entities. In one aspect, a system includes one or more computers programmed to perform operations including maintaining an Internet Protocol (IP) address history for each hostname in a plurality of hostnames. Each IP address history is a time series of IP addresses. The operations further include organizing the hostnames into a collection of groups so that each hostname of the plurality of hostnames is a member of exactly one group in the collection of groups. Each group has a kernel calculated from the IP address histories of the members of the group, and the IP address history of each member of the group is within a threshold distance of the kernel of the group.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: April 1, 2014
    Assignee: Google Inc.
    Inventors: Li Xiao, Arup Mukherjee
  • Publication number: 20140089289
    Abstract: Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.
    Type: Application
    Filed: November 26, 2013
    Publication date: March 27, 2014
    Applicant: Lockheed Martin Corporation
    Inventors: Abha Moitra, David Brian Bracewell, Steven Matt Gustafson, T. Michael Baylor, Tina H. Chau
  • Publication number: 20140089288
    Abstract: A system rates content on a network. A database stores ratings for the content. A rating service creates the ratings for the content. The rating service merges a first rating of the content with a second rating of the content to produce a third rating for the content. A user interface obtains search results from the rating service. When the search results include the content, the user interface displays the rating of the content along with the search results.
    Type: Application
    Filed: September 26, 2012
    Publication date: March 27, 2014
    Inventors: Farah Ali, Ayub S. Khan, Azeez M. Chollampat, Damodaran Kesavath
  • Patent number: 8682883
    Abstract: Embodiments of the present invention relate to systems and methods for determining sets of products which are similar to each other in terms of consumers' wants and needs. Queries are performed on a particular product. Documents relating to the query are received and stored. A dictionary is created from the received documents, whereby the documents, which are text files, are scrubbed of certain data to create a scrubbed text file. Topic modeling is then performed on the cleansed text file. Various methods can be used to perform topic modeling, including, but not limited to, latent semantic analysis, nonnegative matrix factorization, and singular value decomposition.
    Type: Grant
    Filed: April 16, 2012
    Date of Patent: March 25, 2014
    Assignee: Predictix LLC
    Inventors: Loren Williams, Emir Pasalic, Nikolaos Vasiloglou
  • Patent number: 8682723
    Abstract: Conversations in an online content universe are monitored. A social analysis module analyzes individual conversations between publishers in the online content universe. Publishers that influence a conversation are identified.
    Type: Grant
    Filed: September 14, 2009
    Date of Patent: March 25, 2014
    Assignee: Twelvefold Media Inc.
    Inventors: Todd Parsons, Mitch Ratcliffe, Rob Crumpler, Will Kessler, Kurt Freytag
  • Publication number: 20140081946
    Abstract: Embodiments relating to a computer-implemented process, an apparatus and a computer program product is provided for crawling rich Internet applications. In one aspect the method includes executing an event in a set of events discovered in a state exploration phase according to a predetermined priority of events in each set of events in the sets of events discovered, wherein the event from a higher priority is exhausted before an event from a lower priority is executed and determining any transitions. Responsive to a determination that there are at least one transition any remaining set of events is executed in a transition exploration phase. In addition the method determines the existence of any new states as a result of executing an event in the set of events and returns to the state exploration phase, responsive to a determination that a new state exists.
    Type: Application
    Filed: September 20, 2013
    Publication date: March 20, 2014
    Applicant: International Business Machines Corporation
    Inventors: Suryakant Choudhary, Paul Ionescu, Guy V. Jourdan, Iosif V. Onut, Gregor von Bochmann
  • Publication number: 20140081947
    Abstract: A method for processing an intranet includes crawling the intranet to identify at least some of the pages in the intranet, and determining, for each identified page, a number of links in a shortest path from a root page to the identified page.
    Type: Application
    Filed: November 25, 2013
    Publication date: March 20, 2014
    Applicant: Microsoft Corporation
    Inventor: Mark S. D'Urso
  • Publication number: 20140081945
    Abstract: Synchronizing requests with a respective context includes, responsive to a determination that there are more pages to explore, performing regular crawling operations for a current page, recording a current page in a list of explored pages and extracting links from the current page. Responsive to a determination that there are more links to extract, a next link to analyze is selected to form a selected link and responsive to a determination that there is a new request associated with the selected link, a new request identifier is created and saved as an entry in a hashmap. Responsive to a determination that there is not a new request associated with selected link, a request associated with the selected link is updated with a new link value when the link value differs.
    Type: Application
    Filed: September 13, 2013
    Publication date: March 20, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Khalil A. Ayoub, Paul Ionescu, Gil Mirmovitch, Iosif Viorel Onut
  • Patent number: 8676783
    Abstract: The technology described relates to reducing a backlog of pending URL crawls in view of a limited URL crawl capacity. This technology is useful for crawling URLs with low latency. Because of the limited crawl capacity, uncrawled URLs from crawl requests are entered into a backlog data structure of pending crawl requests. Various criteria are applied to the URL's that are requested to be crawled, so that less important URL crawls are rejected early from the backlog data structure. This early rejection tends to limit the backlog data structure to the more important pending URL crawls, and tends to keep the average latency low by quickly failing the less important requested URL crawls.
    Type: Grant
    Filed: June 28, 2011
    Date of Patent: March 18, 2014
    Assignee: Google Inc.
    Inventors: Pawel Aleksander Fedorynski, Sumitro Samaddar
  • Patent number: 8676667
    Abstract: A method for simulating the entire superset of potential valid keyword regular expression requests constructed during an Internet browser search and converting the result sets into Environmental summary report to enable efficient and accurate searching without requiring Browser Engine supercomputer cluster searching capabilities.
    Type: Grant
    Filed: April 21, 2010
    Date of Patent: March 18, 2014
    Inventor: Richard Paiz
  • Patent number: 8676781
    Abstract: A method and system for associating an advertisement with a web page are disclosed. Web pages associated with potential queries may be identified using a search engine. A mapping operation may be performed to obtain a map of the web pages as a function of the potential queries. A reverse mapping operation may be performed to obtain a grouping of potential queries as a function of one of the web pages. An active query may be selected from the grouping of potential queries to provide to an advertising service to associate an advertisement with the web page.
    Type: Grant
    Filed: October 19, 2005
    Date of Patent: March 18, 2014
    Assignee: A9.Com, Inc.
    Inventors: Viatcheslav Galperin, Udi Manber, Taylor Nicole Van Vleet
  • Patent number: 8676782
    Abstract: The present invention provides an information collection apparatus, an information collection method, and a program capable of collecting information from information resources on a network effectively as well as a search engine that searches the information resources collected.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Seiji Hamada, Makoto Yamamoto
  • Publication number: 20140074815
    Abstract: A computer system including a computer-readable memory unit; and a processor coupled to the memory unit. The processor is configured to provide a graphical image representing a search engine interface for display on a screen of the computer system, wherein the search engine interface comprises an arrangement of cells, each cell representative of a calendar unit of time; cause performance of a search, upon selection of a particular cell, wherein said search is based on the unit of time represented by the selected cell; and display the results of the search.
    Type: Application
    Filed: May 10, 2012
    Publication date: March 13, 2014
    Inventor: David Plimton
  • Patent number: 8671090
    Abstract: A method of utilizing a Web Service folder interface. A user defines a folder in a local folder directory as a Web Services enabled folder. The folder includes file data and metadata corresponding to the file data. The metadata includes a configurable Web Services type property that corresponds to a remote Web Service. The metadata also includes a configurable data handling property that includes one or more allowable file formats. When a user submits the file data to the remote Web Service by selecting an option in a pull down menu of a graphical user interface (GUI) or dropping the file data in a local output folder, the operating system (OS) sends the file data to the remote Web Service. The OS automatically converts an output file received from the remote Web Service into one of the allowable file formats and updates the local file data with the output file.
    Type: Grant
    Filed: August 29, 2007
    Date of Patent: March 11, 2014
    Assignee: International Business Machines Corporation
    Inventors: Indran Naick, Jeffrey K. Wilson
  • Publication number: 20140067787
    Abstract: A method and a system to identify machine-readable codes using a web crawler are provided. Machine-readable codes include, but are not limited to, Universal Product Codes (UPC), quick response (QR) codes, stock-keeping units (SKUs) and international standard book number (ISBN) codes. A web crawler downloads pages from the World Wide Web. A determination module accesses the downloaded pages and identifies a machine-readable code corresponding to a product description included in the downloaded pages. The machine-readable code is included in a downloaded page of the downloaded pages. The determination module further extracts the product description from the downloaded page. A code database stores a record of the machine-readable code and the product description.
    Type: Application
    Filed: November 5, 2013
    Publication date: March 6, 2014
    Applicant: eBay, Inc.
    Inventor: Tom Normile
  • Patent number: 8666967
    Abstract: An exemplary system for managing an applications and data space includes a strategy layer configured to receive a query statement and to formulate one or more custom queries based on the query statement and a query scheduler layer configured to schedule issuance of the one or more custom queries to one or more query response modules associated with the applications and data space. Other methods, devices and systems are also disclosed.
    Type: Grant
    Filed: September 23, 2011
    Date of Patent: March 4, 2014
    Assignee: Microsoft Corporation
    Inventors: John D. Dunagan, Albert Greenberg, Emre M. Kiciman, Heather E. Warncke, Alastair Wolman
  • Patent number: 8666964
    Abstract: Determining a schedule for recrawling pages is disclosed. A crawling schedule that specifies a due date at which each page is to be crawled is determined according to a first scheme. A set of pages that includes one or more pages each of which has a due date that has passed is determined. The set of pages is ordered according to a second scheme.
    Type: Grant
    Filed: April 25, 2005
    Date of Patent: March 4, 2014
    Assignee: Google Inc.
    Inventor: Jesse L. Alpert
  • Patent number: 8666990
    Abstract: Methods and systems are provided for weighting contemporaneous content includes, in response to a user content request, by determining a plurality of contemporaneous content items relating to the user content request, the contemporaneous content items including, ultra-fresh content items having been only recently generated. For each of the contemporaneous content items, one or more authors of the content items are identified, and an expertise level for the one or more authors and an expert weighting for each of the content items based on the expertise level for the corresponding one or more authors are determined. Weighting the contemporaneous content includes ranking the contemporaneous content items in response to the user content request based on the expert weighting and presenting at least a portion of the contemporaneous content items in response to the user content request.
    Type: Grant
    Filed: March 15, 2010
    Date of Patent: March 4, 2014
    Assignee: Yahoo! Inc.
    Inventor: Su-Lin Wu
  • Patent number: 8666965
    Abstract: An Internet infrastructure supports searching of web links to select search results by processing browser activity information along with one or more of favorite lists, and related metadata, user profiles, and trends based on browser activity behavior and favorite behavior. A plurality of web browsers located on client device are incorporated with a browser activity-monitoring module that tracks user's Internet usage, processes this information, and sends this information periodically or upon user request to the server to aid in improving search operation results. The search engine server communicatively couples to the plurality of web browsers and supports delivery of search results/web links to the client device based upon a search string, browser activity information, and possibly the favorite lists and related metadata. The gathered browser activity information, favorite lists, and related metadata are stored in one or more server databases that are associated with the search engine server.
    Type: Grant
    Filed: October 24, 2012
    Date of Patent: March 4, 2014
    Assignee: Enpulz, L.L.C.
    Inventor: James D. Bennett
  • Patent number: 8666962
    Abstract: Providing a speculative search result for a search query prior to completion of the search query. In response to receiving a search query from a client node, a speculative search result is provided to the client node for the search query prior to receiving an indication from the client node that said search query is completely formed. The speculative search result may be displayed on the same web page on the client node as the search query, while the search query is being entered by the user. As the user further enters the search query, a new speculative search result may be provided to the user.
    Type: Grant
    Filed: June 6, 2011
    Date of Patent: March 4, 2014
    Assignee: Yahoo! Inc.
    Inventors: Stephen Hood, Ralph Rabbat, Mihir Shah, Adam Durfee, Alastair Gourlay, Peter Anick, Richard Kasperski, Oliver Thomas Bayley, Ashley Woodman Hall, Shyam Kapur, John Thrall
  • Publication number: 20140052708
    Abstract: A system and method for generating search engine data to be displayed on a display. A processor may send search queries to a search engine and receive result sets in response. Search engine data may be generated for URLs based on the search queries and the result sets. Report data may be displayed on the display based on the search engine data. The report data may include data effective to display a raw data page based on the search engine data. The processor may receive a request message to modify the report data. The request message may include a request to generate a user customized data page including filtered data from the search engine data. The processor may generate modified report data in response to the request message. The modified report data includes data effective to display the raw data page and the user customized data page.
    Type: Application
    Filed: August 15, 2012
    Publication date: February 20, 2014
    Applicant: Conductor, Inc.
    Inventor: Martin Luis Alonso Lago
  • Patent number: 8655912
    Abstract: A computer-implemented method and system for combining keywords into logical clusters that share a similar behavior with respect to a considered dimension are disclosed. Various embodiments are operable to order a list of keywords from high activity to low activity, partition the list into at least two sets, a head partition including keywords with an activity level above a predefined threshold, a tail partition including the remainder of the keywords in the list, model the keywords in the head partition based on a set of variables, score the keywords in the head partition based on the modeling, and cluster head partition keywords with tail partition keywords having at least one common variable into at least one keyword cluster.
    Type: Grant
    Filed: August 20, 2010
    Date of Patent: February 18, 2014
    Assignee: eBay, Inc.
    Inventors: Xiaofeng Tang, Salvador Duran, Joel R. Minton