Web Crawlers Patents (Class 707/709)
  • Patent number: 9100406
    Abstract: A system and method of external link processing is disclosed. The system includes an interface configured to receive a user request to access an encoded external link in networked content. The encoded external link comprises a domain name of an external link server and an encoded portion which is an encoded result of an original external link encoded with an encoding function, wherein the original external link is an address to an external destination. One or more processors determine a safety level of the encoded external link using a criterion. In the event that the determined safety level of the encoded external link is determined unsafe, a warning message is generated indicating that the original external link is unsafe and the user is prevented from directly navigating to the original external link.
    Type: Grant
    Filed: February 18, 2014
    Date of Patent: August 4, 2015
    Assignee: Alibaba Group Holding Limited
    Inventors: Jiawei Liu, Jinhua Wang, Chenming Hua
  • Patent number: 9043306
    Abstract: A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.
    Type: Grant
    Filed: August 23, 2010
    Date of Patent: May 26, 2015
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Fabrice Canel, Junaid Ahmed, Thomas Francis McElroy, Walter Sun, Kumar Chellapilla, Abhishek Singh, Vishnu Challam
  • Patent number: 9037585
    Abstract: A system and method for mapping an input uniform resource identifier (URI) to an entry in a database. The system cleans an input URI to produce a prime URI that references an entry in a database. The prime URI is created by applying a regular expression determined for a particular domain to the input URI. Once the prime URI is generated, the prime URI can be used to retrieve information from a database.
    Type: Grant
    Filed: December 29, 2010
    Date of Patent: May 19, 2015
    Inventors: Kristopher Kubicki, Lawrence Hsieh
  • Publication number: 20150134636
    Abstract: System and method for collecting information from a plurality of related sites, analyzing the information and storing the relevant information in a data base for future use. According to one embodiment of the present invention, the system uses the provided list of sites, whether obtained automatically or separately, queries them and analyzes the result retrieved from each site. The information may also optionally and preferably be ranked.
    Type: Application
    Filed: September 28, 2014
    Publication date: May 14, 2015
    Inventors: Michael RUBANOVICH, Dmitry BABITSKY
  • Publication number: 20150134635
    Abstract: A computer performs a search and generates a context-aware search result. The computer crawls a plurality of servers to fetch a plurality of knowledge documents, parses the plurality of knowledge documents, and indexes the plurality of parsed knowledge documents in a search index. Parsing can include annotating at least one of the plurality of knowledge documents, and indexing can include building a term index and an annotation index. The computer receives from a requestor a search request including a search term, and requests and receives a context of an asset environment associated with the requestor. The computer determines a context-aware search result based, at least in part, on the search term, on the context, and on information stored in the search index, and transmits the context-aware search result to the requestor.
    Type: Application
    Filed: September 8, 2014
    Publication date: May 14, 2015
    Inventors: Gaurav Gupta, Arun Ramakrishnan, Rohit Shetty
  • Publication number: 20150134634
    Abstract: A computer performs a search and generates a context-aware search result. The computer crawls a plurality of servers to fetch a plurality of knowledge documents, parses the plurality of knowledge documents, and indexes the plurality of parsed knowledge documents in a search index. Parsing can include annotating at least one of the plurality of knowledge documents, and indexing can include building a term index and an annotation index. The computer receives from a requestor a search request including a search term, and requests and receives a context of an asset environment associated with the requestor. The computer determines a context-aware search result based, at least in part, on the search term, on the context, and on information stored in the search index, and transmits the context-aware search result to the requestor.
    Type: Application
    Filed: November 13, 2013
    Publication date: May 14, 2015
    Applicant: International Business Machines Corporation
    Inventors: Gaurav Gupta, Arun Ramakrishnan, Rohit Shetty
  • Patent number: 9031943
    Abstract: Method, system, and programs for realtime de-duplication of objects. A received object is hashed to generate a hashed object, which is then used to generate a query for an inverted index. Candidate matching objects are determined based on the query of the inverted index. From the candidate matching objects, a matched object that corresponds to the received object is determined.
    Type: Grant
    Filed: May 14, 2012
    Date of Patent: May 12, 2015
    Assignee: Yahoo! Inc.
    Inventors: Michael Jason Welch, Aamod Sane
  • Patent number: 9026520
    Abstract: An enhanced metadata structure and associated process is provided which captures and stores metadata gathered about the source and usage of a media asset or file. The source and usage metadata is integrated, such as by encoding within the enhanced media file, as the media asset is transferred and used. The integrated metadata accumulates, as a trail of source information and usage information in the enhanced media asset, and can be extracted upon arrival at a target computer system.
    Type: Grant
    Filed: March 7, 2013
    Date of Patent: May 5, 2015
    Assignee: Facebook, Inc.
    Inventors: Vidur Apparao, John Bandhauer, Christopher Robert Waterson
  • Patent number: 9026519
    Abstract: Methods, systems, and media are provided for delivering clustered search results for recent and non-recent events by maintaining the identification (ID) numbers of the respective clustered documents beyond the “fresh” life span of the clustered documents. When clusters are formed according to similar content, an ID number and associated attributes are assigned to each of the clusters. This provides a mechanism to track and retrieve the respective clusters for subsequent delivery of search results. The respective ID numbers of the clusters are maintained, even after the documents are no longer considered “fresh.” These similar-content clusters are further subdivided according to publication date. This provides individual subdivided clusters for similar content events that occurred at different time spans, which are delivered along with individual non-clustered search results in a SERP.
    Type: Grant
    Filed: August 9, 2011
    Date of Patent: May 5, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sasi Kumar Parthasarathy, Junaid Ahmed, Yatharth Saraf, Walter Sun
  • Publication number: 20150120692
    Abstract: Embodiments of the present invention provide a method, a device, and a system for acquiring a user behavior. In the embodiments of the present invention, an acquired URL request matches a database, and the database stores a URL actively initiated by a user recognized by adopting a web crawler technology. If a URL contained in the URL request matches a corresponding URL actively initiated by a user in the database, it may be determined that the URL request is actively initiated by the user. Therefore, a network forwarding device or a server can rapidly and accurately acquire a behavior that a user actively initiates a URL request so as to further analyze a user behavior.
    Type: Application
    Filed: December 29, 2014
    Publication date: April 30, 2015
    Inventors: Yusheng Hu, Jing Zhang, Jinxing Zhang
  • Patent number: 9020927
    Abstract: Methods, systems, and apparatus for determining resource quality based on resource competition. In an aspect, a method comprises: for each of a plurality of resource locators: generating first value for the resource locator that indicates, for a plurality of first sets of search results that each include the resource locator, a number of occurrences of other resource locators that were impressed and not selected when the resource locator was selected; generating a second value for the resource locator that indicates, for a plurality of second sets of search results that each include the resource locator, a number of occurrences of other resource locators that were selected when the resource locator was impressed and not selected; and generating, based on a difference between the first value and the second value, an adjustment factor for the resource locator for adjusting a score associated with the resource locator during a search operation.
    Type: Grant
    Filed: July 31, 2012
    Date of Patent: April 28, 2015
    Assignee: Google Inc.
    Inventors: Moustafa A. Hammad, Hyung-Jin Kim, Rajan Patel, Thomas E. Bagby
  • Patent number: 9020922
    Abstract: A method for optimizing search results for an entity includes determining a grouping for actions related to an entity. The grouping may include a plurality of terms. The method may also include searching a network for the terms associated with the grouping. Thereafter, results of the searches may be analyzed to determine a rank for the entity within the results.
    Type: Grant
    Filed: August 10, 2010
    Date of Patent: April 28, 2015
    Assignee: Brightedge Technologies, Inc.
    Inventors: Jimmy Yu, Sammy Yu, Lemuel S. Park, Rolland Yip
  • Publication number: 20150112961
    Abstract: Methods and apparatus related to obtaining search related structured data from a user. A user submitted update instruction may identify at least one URL and provide access to associated user supplied search related structured data. An associated record in a database may be modified by including the user supplied search related structured data in the record. The record is related to the URL and the database may be a structured data database associated with a search engine.
    Type: Application
    Filed: September 18, 2012
    Publication date: April 23, 2015
    Applicant: Google Inc.
    Inventor: Google Inc.
  • Publication number: 20150112962
    Abstract: A method and system for launching applications on a user device responsive to a user intent are configured. The method includes receiving at least one environmental variable; analyzing the at least one environmental variable to determine the user intent, wherein the user intent represents a current topic of interest of a user of the user device; matching the determined user intent against an applications index to find at least a category of interest that best matches the determined user intent; and selecting an application associated with the matching category of interest; and causing a launch of the selected application on the user device.
    Type: Application
    Filed: December 26, 2014
    Publication date: April 23, 2015
    Applicant: Doat Media Ltd.
    Inventors: Joey Joseph Simhon, Amir Taichman, Avi Charkam
  • Publication number: 20150113019
    Abstract: Methods and apparatus related to obtaining access-restricted search related structured data. Stored access-restricted search related structured data may be obtained in response to an authorized informational query request. An access-restricted data key corresponding to the informational query request may be compared with a database data access key in a database that includes the access-restricted search related structured data to determine whether access to such data is allowed. Search results that include and/or are based on access-restricted search related structured data may also be obtained.
    Type: Application
    Filed: September 18, 2012
    Publication date: April 23, 2015
    Applicant: Google Inc.
    Inventors: Rui Jiang, Hui Xu
  • Patent number: 9015206
    Abstract: The present invention provides a general solution to presenting media interface and navigation tools for content provided from a plurality of sources. The invention maintains a user at a single site regardless of the source of the media content. This permits a consistent interface to be presented to the user. Because the user remains at the same site, differences in tiered membership may be tracked so that the user is only presented with content that the user is permitted to view. The invention uses a metadata language to characterize content so that viewer type, membership level, and other information can be maintained and used for an enjoyable viewing experience.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: April 21, 2015
    Assignee: Yahoo! Inc.
    Inventors: Andrew R. Volk, Ronald Jacoby
  • Patent number: 9015199
    Abstract: A method and an apparatus request web pages and content rating information thereof have been disclosed. In one embodiment, the method includes receiving a request from a user for a web page, retrieving content rating of the web page in response to the request, and fetching the web page substantially simultaneously with the retrieving of the content rating in response to the request. Other embodiments have been claimed and described.
    Type: Grant
    Filed: September 6, 2011
    Date of Patent: April 21, 2015
    Assignee: SonicWALL, Inc.
    Inventors: John E. Gmuender, Alex M. Dubrovsky, Nikolay V. Popov, Alexander Shor, Roman Yanovsky, Shunhui Zhu, Boris Yanovsky
  • Publication number: 20150106356
    Abstract: Technologies are generally described to develop and implement a searchable knowledge source to identify distributed user interface (DUI) elements. In some examples, a DUI identification system may receive a control record of an application and populate one or more searchable knowledge sources based on an application description retrieved. The application description may include keywords, input elements, and output elements, and the searchable knowledge sources may be generated from control records of a multitude of applications. The DUI identification system may execute a query on the searchable knowledge sources based on the received keywords, input elements, and output elements associated with a target workflow from a requesting client. A query result that includes one or more DUI elements may be provided to the requesting client. The DUI elements may connect the input elements to corresponding output elements and match the keywords associated with the target workflow.
    Type: Application
    Filed: October 2, 2013
    Publication date: April 16, 2015
    Inventor: Ezekiel Kruglick
  • Publication number: 20150106357
    Abstract: Web crawling configuration includes: obtaining a webpage comprising a plurality of receiving a user selection of a node in the webpage; presenting a set of web crawling configuration options pertaining to a web crawling action to be performed with respect to the node, the set of web crawling configuration options depending at least in part on a type of an element included in the node and comprising: a first option to perform a first web crawling action in the event that the node include a first type of the element; and a second option to perform a second web crawling action in the event that the node includes a second type of the element; receiving a user input specifying the web crawling configuration option; and storing user specified web crawling configuration option, performing the web crawling action on the node according to the user input, or both.
    Type: Application
    Filed: October 24, 2014
    Publication date: April 16, 2015
    Inventors: Yiming Sun, Qi Qiang, Boyang Cai, Xiaojun Jin, Zongyuan Wu
  • Patent number: 9009130
    Abstract: An affinity server estimates an affinity between two different time based media events (e.g., TV, radio, social media content stream), between a time based media event and a specific topic, or between two different topics, where the affinity score represents an intersection between the populations of social media users who have authored social media content items regarding the two different events and/or topics. The affinity score represents an estimation of the real world affinity between the real world population of people who have an interest in both time based media events, both topics, or in a time based media event and a topic. One possible threshold for including a social media user in a population may be based on a confidence score that indicates the confidence that one or more social media content items authored by the social media user are relevant to the topic or event in question.
    Type: Grant
    Filed: October 30, 2013
    Date of Patent: April 14, 2015
    Assignee: Bluefin Labs, Inc.
    Inventors: Michael Ben Fleischman, Deb Kumar Roy, Jeremy Rishel, Anjali Midha, Matthew Miller
  • Publication number: 20150100563
    Abstract: Systems and methods for implementing changes to a website without losing the indexing status and accumulated SEO metrics for web pages of the website may include creating a page mapping table that associates old web page URLs with new web page URLs. Old web page URLs may be obtained by crawling the website or by searching the indexing cache of one or more search engines. The old web page URLs are saved as source paths in the table. New web page URLs may be manually associated with the source paths as destination paths in the table, or the destination paths maybe automatically obtained. A web server or a reverse proxy server uses the page mapping table to send 301 redirects to devices that request the old web pages. Usage data of the new web page may be collected and analyzed to determine if an automatically identified destination path is correct.
    Type: Application
    Filed: October 9, 2013
    Publication date: April 9, 2015
    Inventor: Guy Ellis
  • Publication number: 20150100564
    Abstract: System, method, and computer program product to perform an operation to obfuscate search queries via broadened subqueries and recombining, by referencing an ontology to identify a set of generalized terms corresponding to at least one term of a received query, generating a plurality of subqueries based on the received query and the set of generalized terms, executing each of the plurality of subqueries to retrieve a result set for each respective subquery, and filtering the result sets using the received query to produce a result set responsive to the received query.
    Type: Application
    Filed: December 12, 2014
    Publication date: April 9, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam T. Clark, Brian J. Cragun, John E. Petri
  • Patent number: 9002819
    Abstract: Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.
    Type: Grant
    Filed: April 8, 2013
    Date of Patent: April 7, 2015
    Assignee: Google Inc.
    Inventors: Sascha B. Brawer, Maximilian Ibel, Ralph Michael Keller, Narayanan Shivakumar
  • Patent number: 9002887
    Abstract: An external traffic advertisement system is provided that generates advertisement sets based on analysis of visits to a web site that were referred by an external source. The advertisement system aggregates the referral information for each referral type. A referral type may be defined by one or more of keyword text derived from the query text of the referrals, landing page type, external source, product identifier, and so on. The advertisement system may, for each referral type, aggregate the total revenue from the visits of that referral type and may generate a count of the number of converting visits for that referral type. The advertisement system then identifies those referral types whose aggregated information satisfies an advertisement criterion and generates an advertisement set for each identified referral type with a keyword derived from keyword text and with a link based on the landing page type of the referral type.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: April 7, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Eric Alfred Herrmann, Stephan G. Betz, Joel Andrew Shapiro
  • Patent number: 9002818
    Abstract: A method for calculating a content subset can include crawling a number of webpages for content, determining a relevance to a particular domain of the content, determining a penalty value for each of the number of webpages; and calculating, utilizing a data tree-based model, a subset of the content to analyze based on the relevance and the penalty value.
    Type: Grant
    Filed: January 31, 2013
    Date of Patent: April 7, 2015
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Mehmet Kivanc Ozonat, Claudio Bartolini
  • Publication number: 20150095305
    Abstract: Detecting multistep operations when interacting with web applications is performed by identifying a set of multiple web pages of a web application, where the web pages in the set of multiple web pages are sequentially navigable, identifying a group of multiple web page elements at the same relative location in each of the web pages in the set of multiple web pages, determining that the identified groups of web page elements are similar to each other in accordance with a predefined similarity criterion, identifying an element that is common to each identified group of web page elements, and determining that a characteristic of the element is uniquely varied in each of the identified groups of web page elements.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: International Business Machines Corporation
    Inventor: Omer Tripp
  • Publication number: 20150095304
    Abstract: Crawling computer-based objects is implemented by identifying a dependency between a first portion of a computer-based object set and a second portion of the computer-based object set, where the second portion is data-dependent on the first portion, and responsive to identifying the dependency, effecting a crawling of the first portion and thereafter a crawling of the second portion.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: International Business Machines Corporation
    Inventors: Shahar Sperling, Omer Tripp
  • Patent number: 8996507
    Abstract: A computer-implemented method includes receiving a search query from a remote device, determining if the search query includes location-related information, and if the search query includes location-related information, generating a first result set based on the search query and the location-related information, and if the search query does not include location-related information, determining whether a location indicator is associated with the remote device, and if a location indicator is associated with the remote device, generating a second result set based on the search query and the location indicator.
    Type: Grant
    Filed: August 22, 2011
    Date of Patent: March 31, 2015
    Assignee: Google Inc.
    Inventors: Leland Rechis, Scott Jenson, Yael Shacham
  • Patent number: 8996497
    Abstract: User queries are received, with each query requesting a service from a server. Overlapping experiments are performed on at least a portion of the queries, with each experiment modifying one or more parameters associated with the queries or parameters associated with processing of the queries, and with the experiments organized into layers. Two or more experiments in different layers are allowed to be performed on the same query, and for any given layer, at most one experiment is allowed to be performed on the same query.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: March 31, 2015
    Assignee: Google Inc.
    Inventors: Ashish Agarwal, Eric Bauer Arbanovella, Diane Lambert, Ilia Mirkin, Michael M. Meyer, James A. Morrison, Daryl Pregibon, Susan Shannon, Diane L. Tang
  • Patent number: 8990345
    Abstract: A method for migrating information, and a migrator for migrating information, are disclosed. The method may include extracting organizational information from at least two service providers, accessing a first at least one of the at least two service providers upon selection of a migration selection interface by the user, receiving of a first plurality of information related to the user from one of the service providers, accessing a second at least one of the at least two service providers, and writing the first plurality of information to the second at least one of the at least two service providers.
    Type: Grant
    Filed: February 3, 2014
    Date of Patent: March 24, 2015
    Assignee: LinkedIn Corporation
    Inventors: Tomy K. Isaac, Mark Kasiraja
  • Patent number: 8990183
    Abstract: The deep application crawling technique described herein crawls one or more applications, commonly referred to as “apps”, in order to extract information inside of them. This can involve crawling and extracting static data that are embedded within apps or resource files that are associated with the apps. The technique can also crawl and extract dynamic data that apps download from the Internet or display to the user on demand, in order to extract data. This extracted static and/or data can then be used by another application or an engine to perform various functions. For example, the technique can use the extracted data to provide search results in response to a user query entered into a search engine. Alternately, the extracted static and/or dynamic data can be used by an advertisement engine to select application-specific advertisements. Or the data can be used by a recommendation engine to make recommendations for goods/services.
    Type: Grant
    Filed: June 6, 2012
    Date of Patent: March 24, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jie Liu, Suman Kumar Nath, Jitendra D. Padhye, Lenin Ravindranath Sivalingam
  • Patent number: 8990357
    Abstract: A proxy server receives a request for a web page from a client device. In response to determining that a portion of the web page is available in cache, the proxy server retrieves that portion and transmits it to the client device. The portion of the web page is not the entire web page and is a prediction of the portion of the page that will remain static if the page is reloaded or requested by a different client device. The proxy server transmits a request to an origin server for the full web page. In response to receiving the full web page from the origin server, the proxy server modifies the full web page to remove the portion that was already transmitted to the client device, and transmits the modified web page to the client device.
    Type: Grant
    Filed: July 29, 2013
    Date of Patent: March 24, 2015
    Assignee: Cloudflare, Inc.
    Inventors: John Graham-Cumming, Andrew Galloni, Albertus Strasheim
  • Publication number: 20150081664
    Abstract: Determining a video audience is disclosed, including: identifying a set of videos based at least in part on a received criterion; querying a video database to retrieve engagements associated with each of at least a subset of the set of videos; identifying a set of audience members associated with the engagements associated with each of the at least subset of the set of videos; and querying a user database to gather events associated with each of at least a subset of the set of audience members.
    Type: Application
    Filed: November 25, 2014
    Publication date: March 19, 2015
    Inventors: Robert L. Gabel, David A. Koblas, Allison J. Stern
  • Patent number: 8983933
    Abstract: Disclosed herein are techniques for measuring or assessing the costs of executing operations across a plurality of computing systems. The cost of transferring data across at least one arrangement of computing systems is determined. The cost of executing at least one arrangement of the operations is also determined.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: March 17, 2015
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: William K. Wilkinson, Alkiviadis Simitsis
  • Publication number: 20150074078
    Abstract: Embodiments are directed to establishing a metadata repository that aggregates metadata for a plurality of data sources, inferring data source metadata at a metadata repository and to providing recommendations to data managers based on aggregated inputs. In one scenario, a computer system establishes a reference to one or more data sources, where each data source includes data elements. The computer system receives a data request for specified data elements stored on the data sources and accesses the established references to determine which data source the specified data elements are stored on. The computer system then retrieves at least one of the specified data elements from its determined data source and sends the retrieved data elements to a specified computer system, along with an indication of additional data elements that are relevant to the received data request, and a further indication of how those additional data elements are to be accessed.
    Type: Application
    Filed: November 18, 2013
    Publication date: March 12, 2015
    Applicant: Microsoft Corporation
    Inventors: Matthew Roche, Christian Liensberger, Ziv Kasperski, Stéphane Nyombayire
  • Patent number: 8977606
    Abstract: A method and apparatus for generating an extended page snippet in a search engine. The method includes: retrieving and returning an associated table webpage having a table related to an inquired keyword; obtaining a parsed result of the table in the associated table webpage, and extracting column names and respective row instances on the basis of the parsed result; determining the row instances related to the inquired keyword; and generating a page snippet in a table style in accordance with the column names and the relative row instances. The page snippet in the table style can be generated by using a solution of the present invention.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: March 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Sheng Hua Bao, Jian Chen, Zhong Su, Xin Ying Yang, Xiang Zhou
  • Publication number: 20150066893
    Abstract: Methods and systems for tracking end users who submit reviews are provided. In some embodiments, reviews are submitted by end users via a reviewing application that reports review submission to a tracking system. In some embodiments, reviews are reported to the tracking system by review web sites that receive the reviews. In some embodiments, the tracking system uses a web crawler to retrieve review information from review web sites. User click records are used to attribute user acquisition to ad providers, and an amount of a reward granted for acquiring a given user may be altered based on records of reviews submitted by the given user.
    Type: Application
    Filed: August 29, 2014
    Publication date: March 5, 2015
    Inventor: Niek Sanders
  • Publication number: 20150066895
    Abstract: Provided are systems and methods for building a domain-specific facts network. A system includes an optical character recognition (OCR) system configured to perform OCR on an image of a domain-specific document. The system also includes an OCR results analysis system configured to analyze the results of OCR of the domain-specific document. The system also includes a fact extraction system configured to extract data from the domain-specific document based on the analysis of the results of the OCR. The system also includes a web fact extraction system configured to extract data from the Internet; wherein the data is related to the data in the domain-specific document. The system also includes a validation system configured to validate data extracted from the domain-specific document and the Internet. The validated data is stored in a domain-specific facts network.
    Type: Application
    Filed: November 17, 2014
    Publication date: March 5, 2015
    Applicant: GLENBROOK NETWORKS
    Inventors: Julia Komissarchik, Edward Komissarchik
  • Publication number: 20150066894
    Abstract: Automatically creating and modifying a search engine for a website. User input may be received specifying an address of a website. A search engine may be automatically created for the website based on the user input. Webpages of the website may specify a plurality of tags specifying custom attributes of the webpages. During creation of the search engine, these custom attributes may be incorporated into the search engine index. Additional user input may be received customizing the search engine for various search engine contexts, e.g., based on the custom attributes of the webpages. Search engine results for the website may be based on various ranking functions, potentially including social impact of webpages of the website.
    Type: Application
    Filed: October 30, 2014
    Publication date: March 5, 2015
    Inventors: Matthew T. Riley, Quinlan J. Hoxie
  • Patent number: 8972375
    Abstract: A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: March 3, 2015
    Assignee: Google Inc.
    Inventors: Pawel Opalinski, Brandon Player Iles, Eric Jon Anderson, John Felton
  • Patent number: 8972434
    Abstract: The present invention provides a methodology and system for efficiently performing travel reservation queries and presenting significant search results to a user. A travel reservation search engine constructs a first query from one or more constraints. The first query has a threshold probability of returning a first set of search results that will lead to the purchase of a travel reservation. Additionally, if determined necessary by the search engine a second query is constructed from one or more constraints. The second query returns a second set of search results.
    Type: Grant
    Filed: December 5, 2007
    Date of Patent: March 3, 2015
    Assignee: Kayak Software Corporation
    Inventors: Paul M. English, Travis M. Gebhardt, Kristin P. Harkness, Lincoln D. Jackson, Jeffrey A. Rago, Paul D. Schwenk, Brenda L. White
  • Patent number: 8972374
    Abstract: A system is provided which solves content acquisition issues by providing an automated method to acquire content in mass and maintain an association between available meta-data and the actual content, e.g., video file. The system includes a first component configured to log network traffic. The system also includes a second component configured to correlate downloaded content of the logged network traffic with an XML stream of URLs and respective content descriptions.
    Type: Grant
    Filed: February 12, 2008
    Date of Patent: March 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Jesse L. Benson, Stephen E. Jaffe, John R. Smith, Matthew B. Trevathan
  • Patent number: 8965865
    Abstract: A method is provided for identifying documents that include a searchable form relevant to a topic. A document is received. If the received document comprises a form is determined. A form includes a field presented to a user requesting information from the user. If the received document is determined to comprise a form, a determination is made concerning whether or not the form is a searchable form. A searchable form returns non-trivial information to a requester in response to a submission of the form. If the form is determined to be a searchable form, a determination is made concerning whether or not the form is relevant to an identified topic. If the form is determined to be relevant to the identified topic, the document is identified as a searchable form relevant to the identified topic.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: February 24, 2015
    Assignee: The University of Utah Research Foundation
    Inventors: Juliana Freire, Luciano Barbosa
  • Patent number: 8954416
    Abstract: A computer-implemented method is provided for searching for files on the Internet. In one embodiment, the method may provide an application crawler that assembles and dynamically instantiates all components of a web page. The instantiated web application may then be analyzed to locate desired components on the web page. This may involve finding and analyzing all clickable items in the application, driving the web application by injecting events, and extracting information from the application and writing it to a file or database.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: February 10, 2015
    Assignee: Facebook, Inc.
    Inventors: Timothy D. Tuttle, Adam L. Beguelin, Peter F. Kocks
  • Publication number: 20150039584
    Abstract: A determination is made that each of at least two social network contacts involved in a social messaging interaction initiate a separate web search associated with the social messaging interaction. A separate set of web search results returned to each of the at least two social network contacts is captured in association with each initiated separate web search. A combined live search results view that includes each captured separate set of web search results is provided to each of the at least two social network contacts. The combined live search results view provides navigation to web content returned to other social network contacts.
    Type: Application
    Filed: August 7, 2013
    Publication date: February 5, 2015
    Applicant: International Business Machines Corporation
    Inventors: Paul R. Bastide, Lisa Seacat DeLuca, Lydia M. Do
  • Patent number: 8949216
    Abstract: A computer receives a search request, wherein the search request contains one or more parameters that allow a search to be performed. Responsive to the search request, the computer identifies a plurality of web pages connected by a plurality of links. The computer determines the number of links in the longest path that connects at least a portion of the plurality of web pages, wherein the longest path includes a sequence of at least two web pages of the plurality of web pages connected by a link of the plurality of links. The computer determines the number of links included in a web page of the plurality of web pages.
    Type: Grant
    Filed: December 7, 2012
    Date of Patent: February 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Gary D. Cudak, Lydia M. Do, Christopher J. Hardee, Adam Roberts
  • Publication number: 20150032717
    Abstract: A method and apparatus for utilizing user behavior to immediately modify sets of search results so that the most relevant documents are moved to the top. In one embodiment of the invention, behavior data, which can come from virtually any activity, is used to infer the user's intent. The updated inferred implicit user model is then exploited immediately by re-ranking the set of matched documents and advertisements to best reflect the information need of the user. The system updates the user model and immediately re-ranks documents and advertisements at every opportunity in order to constantly provide the most optimal results. In another embodiment, the system determines, based on the similarity of results sets, if the current query belongs in the same information session as one or more previous queries. If so, the current query is expanded with additional keywords in order to improve the targeting of the results.
    Type: Application
    Filed: October 10, 2014
    Publication date: January 29, 2015
    Inventors: Mark Cramer, Cheng Xiang Zhai, Xuehua Shen, Bin Tan
  • Patent number: 8943039
    Abstract: A system and method for modifying a parameter of a website in order to optimize an organic listing of the website at one or more search engines is described. Several embodiments include methods and systems for generating scored representations based upon different portions of data associated with a website, and then combining the scored representations to achieve a result. The result indicates a feature of the website that may be modified in order to optimize the organic ranking of the website at one or more search engines.
    Type: Grant
    Filed: November 2, 2012
    Date of Patent: January 27, 2015
    Assignee: RioSoft Holdings, Inc.
    Inventors: Ray Grieselhuber, Brian Bartell, Dema Zlotin, Russ Mann, Pete Dudchenko, Patrick Hall
  • Patent number: 8938441
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying web pages as gallery web pages, and for presenting search results for gallery web pages. In one aspect, a method includes receiving a web page that includes text and one or more images, evaluating one or more characteristics of the web page against predefined criteria, generating a score for the web page based on evaluating the characteristics of the web page against the predefined criteria, and classifying the web page as a gallery web page or as not a gallery web page when the score meets or does not meet a predefined threshold, respectively.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: January 20, 2015
    Assignee: Google Inc.
    Inventors: Yuguo Liao, Ning Wang
  • Patent number: 8935234
    Abstract: A method, system, and computer program product for relational database management. The method constructs a referentially-complete target subset database from a source database by first estimating the expected size of the target subset database based on application of one or more subsetting rules. If the estimated size needs reduction, the user can modify the subsetting rules, and then modules are invoked to receive the modified subset rules. The method continues by generating a subsetting execution plan by applying the user-modified subset rules to the source database, and then modules process the generated execution plan, which processing results in storage of a referentially-complete target subset database. The user can influence the construction of the execution plan by suggesting an execution model to use during processing of the subsetting execution plan.
    Type: Grant
    Filed: September 4, 2012
    Date of Patent: January 13, 2015
    Assignee: Oracle International Corporation
    Inventors: Ravi Pattabhi, Balasubrahmanyam Kuchibhotla