Web Crawlers Patents (Class 707/709)
-
Patent number: 8930305Abstract: An adaptive information processing system for updating product documentation and associated knowledge base is disclosed, the system including at least one subsystem for receiving original data from a data source, and a central dynamic data system to integrate the original data from the at least one subsystem. The central dynamic data system is configured to integrate system knowledge with the original data to form integrated data, wherein the central dynamic data system is configured to dynamically update the product documentation and the knowledge base based on the integrated data. A computer implemented method for dynamically updating product documentation and knowledge base is further disclosed, the method includes receiving original data from a data source, and integrating the knowledge base with the original data from the data source to form integrated data.Type: GrantFiled: November 16, 2009Date of Patent: January 6, 2015Assignee: Toyota Motor Engineering & Manfuacturing North America, Inc.Inventors: Setu Madhavi Namburu, Danil Prokhorov, Liu Qiao, Sandesh Ghimire
-
Patent number: 8930343Abstract: Provided is a system and method for collecting a document. The system may include an identification information receiver to receive, from a host of a site, identification information of a document of which an update may occur, a collection request transfer unit to transmit a collection request for the document based on the identification information, an update information collector to receive update information of the document from the host, and a search result provider to provide, to the host, a search result extracted from the update information of the document, in response to the search request being received from the host. The system for collecting the document may reduce load of a web site, and may improve accuracy of the document to be collected.Type: GrantFiled: June 21, 2011Date of Patent: January 6, 2015Assignee: NHN CorporationInventors: Young Su Ko, Seung Yeop Han, Jung Woo Seo
-
Publication number: 20140365459Abstract: Some embodiments of the invention provide an address harvester that harvests addresses from one or more applications executing on a device. Some embodiments use the harvested addresses to facilitate the operation of one or more applications executing on the device. Alternatively, or conjunctively, some embodiments use the harvested addresses to facilitate the operation of one or more applications executing on another device than the one used for harvesting the addresses. In some embodiments, a prediction system uses the harvested addresses to formulate predictions, which it then provides to the same set of applications from which it harvested the addresses in some embodiments.Type: ApplicationFiled: November 15, 2013Publication date: December 11, 2014Applicant: Apple Inc.Inventors: Ashley B. Clark, James Magahem, Jorge Fino, Scott Hertz, Emanuele Vulcano
-
Patent number: 8909632Abstract: A method, system and computer-usable medium are disclosed for maintaining persistent links to information stored on a network. Information elements are tagged and their original network location is saved as a hyperlink. The tagged information elements are then acquired at the original network location by a search engine crawler, indexed by a search engine, and stored in an information location index. The tagged information elements are periodically submitted to the search engine to generate search results. Comparison operations are performed to determine the search results comprising the closest-matching information elements and their current network location. The network location stored in the hyperlink is replaced with the current network location if it is not the same.Type: GrantFiled: October 17, 2007Date of Patent: December 9, 2014Assignee: International Business Machines CorporationInventors: Saurabh Shukla, Mandar U. Jog, Shreyansh Shukla, Scott W. Newman
-
Patent number: 8908996Abstract: An automated and extensible system is provided for the analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image. The system uses an ROI database that is a relational or analytical database containing searchable vectors representing images stored in a repository. Entries in the ROI database are created by an image locator and ROI classifier that locate images within the repository and extract relevant information to be stored in the ROI database. The ROI classifier analyzes objects in an image to arrive at actual features of the true object. Graphical searches may also be performed.Type: GrantFiled: January 31, 2012Date of Patent: December 9, 2014Assignee: Google Inc.Inventors: Jamie E. Retterath, Robert A. Laumeyer
-
Patent number: 8908997Abstract: The present invention is an automated and extensible system for the analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image. The system uses an ROI database that is a relational or analytical database containing searchable vectors that represent the images stored in a repository. Entries in the database are created by an image locator and ROI classifier that work to locate images within the repository and extract relevant information that will be stored in the ROI database. The ROI classifier analyzes objects in an image identify actual features of the true object. Graphical searches are performed by the collaborative workings of an image retrieval module, an image search requestor and an ROI query module. The image search requestor is an abstraction layer that translates user or agent search requests into the language understood by the ROI query.Type: GrantFiled: May 29, 2014Date of Patent: December 9, 2014Assignee: Google Inc.Inventors: Jamie E. Retterath, Robert A. Laumeyer
-
Patent number: 8909617Abstract: A method, apparatus, system, article of manufacture, and computer readable storage medium provide media content. A web page context for a web page is determined and stored in a database. One or more media content files are analyzed to extract information that is stored in the database. The information is compared to the web page context. A matching media content file is determined from the one of the one or more media content files that matches the web page context based on the comparison. The matching media content file is then provided (e.g., to an internet portal web site).Type: GrantFiled: January 26, 2011Date of Patent: December 9, 2014Assignee: Hulu, LLCInventor: Dong Wang
-
Publication number: 20140358887Abstract: A search service accesses application content accessible via one or more enumerated applications. The search service ranks the accessed application content in combination with non-application content to produce a combined ranking. Responsive to a search query, the search service provides one or more search results based on the combined ranking.Type: ApplicationFiled: August 28, 2013Publication date: December 4, 2014Applicant: Microsoft CorporationInventors: Max Glenn Morris, Robert Emmett Kolba, JR., Yi Li, Kang Li, Tyler Beam, Kyle Beck, Rylan Hawkins, Daniel Oliver, Sandy Wong, Shajib Sadhukha
-
Publication number: 20140358888Abstract: The present invention provides the capability to quickly and easily determine the online reputation a target, and to quickly and easily take steps to improve the online reputation of the target. For example, a method of monitoring and affecting online reputation may comprise gathering information potentially related to an online reputation of a target, filtering the gathered information to eliminate information not related to the target, computing a reputation score for the filtered information based on both positive and negative information related to the target, generating positive information relating to the target, and distributing the generated positive information relating to the target to a plurality of online locations.Type: ApplicationFiled: May 28, 2014Publication date: December 4, 2014Inventors: Granger WHITELAW, Richard KANE, Scott Jeffrey EMRICH, Steven MENDELSON
-
Patent number: 8903199Abstract: An automated and extensible system for analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image is provided. The system uses an database that is a relational or analytical database containing searchable vectors that represent the images stored in a repository. Entries in the database are created by an image locator and ROI classifier working together to locate images within the repository and extract relevant information to be stored in the ROI database. The ROI classifier analyzes objects in an image to arrive at actual features of the true object. Graphical searches are performed by the collaborative workings of an image retrieval module, an image search requestor and an ROI query module. The image search requestor is an abstraction layer that translates user or agent search requests into the language understood by the ROI query.Type: GrantFiled: February 6, 2012Date of Patent: December 2, 2014Assignee: Google Inc.Inventors: Jamie E. Retterath, Robert A. Laumeyer
-
Publication number: 20140351236Abstract: A method, apparatus, server and system for websites searching in a browser of a mobile terminal is presented. The method includes the steps of: loading one or more preconfigured website search engine information for generating a website search engine list on a browser search bar; receiving information on which website search engine has been selected from the generated website search engine list; receiving a search keyword input to the browser search bar; sending a search request to the selected website search engine to query the received search keyword; and displaying a search result returned by the selected website search engine upon a successful search.Type: ApplicationFiled: July 21, 2014Publication date: November 27, 2014Inventor: Zhigang Zhu
-
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR CRAWLING A WEBSITE BASED ON A SCHEME OF THE WEBSITE
Publication number: 20140351235Abstract: A system, method, and computer program product are provided for crawling a website based on a scheme of the website. In use, a difference between a first content and second content of a website is identified. Additionally, a scheme of the website is identified based on the difference. Furthermore, the website is crawled based on the scheme.Type: ApplicationFiled: June 5, 2014Publication date: November 27, 2014Inventor: Gabriel Pack -
Publication number: 20140351237Abstract: Systems and methods for the creation of hierarchical networks of overlapping informational web neighborhoods using percolation crawling. Each neighborhood comprises a set of closely linked pages that share a common set of concepts and intent and purpose. The neighborhoods represent web pages that share a common set of underlying concepts and semantic associations. Each such neighborhood can be semantically tagged.Type: ApplicationFiled: August 12, 2014Publication date: November 27, 2014Inventors: Behnam Attaran Rezaei, Alice Hwei-Yuan Meng Muntz
-
Patent number: 8898297Abstract: An embodiment of the disclosed system provides the user of a computing device with information concerning the expected usefulness of an item, such as a hyperlink, within a network resource, such as a search result webpage, with the expected usefulness information based at least in part on an attribute of the user's computing device. For example, the system may provide the user with information identifying a particular website as poorly suited for the user's device, based on data that the system collected identifying an aggregate bounce-back rate from computing devices with a similar attribute to the user's computing device.Type: GrantFiled: August 17, 2012Date of Patent: November 25, 2014Assignee: Amazon Technologies, Inc.Inventors: Brett R. Taylor, Ameet N. Vaswani, Faizal S. Kassamali, Ryan Tucker, Ranganath Atreya, Michael V. Zampani
-
Publication number: 20140344241Abstract: A method for user-enhanced ranking of information objects, comprising: generating a graphical user-interface (40) on a display (13), the graphical user-interface comprising a graph (41), wherein the graph comprises a plurality of icons each representing an information object of the collection of information objects and a plurality of connectors connecting the icons, each connector representing at least one link of the collection of links, modifying the graph by generating an additional connector between the icons in response to graph modification commands received from a user-controlled interaction means, storing an additional link in the database (21) as a function of the additional connector, wherein the additional link interrelates information objects represented by he icons connected by the additional connector, computing a link-based rank for an information object of the collection of information objects as a function of the additional link and the collection of links.Type: ApplicationFiled: August 21, 2012Publication date: November 20, 2014Applicant: Alcatel LucentInventor: Dohy Hong
-
Publication number: 20140344242Abstract: Embodiment of the disclosure may includes systems, methods, and devices for providing multidimensional search results on a plurality of search planes. Such systems, methods, and devices may: (i) receive one or more search terms from one or more user interfaces of the system; (ii) perform a search of one or more informational repositories to obtain a list of search results wherein the informational repositories may include the Internet and one or more databases; (iii) process the list of search results to classify each search result in one of a plurality of categories; (iv) cause a presentation of the search results in a plurality of search planes on the display of the system such that each search plane corresponds to one of the plurality of categories. In addition, the software applications may include a sorting software application that groups the list of search results into one of a plurality of categories.Type: ApplicationFiled: August 4, 2014Publication date: November 20, 2014Applicant: Ariel Inventions LLCInventor: Leigh M. Rothschild
-
Patent number: 8892541Abstract: A new approach is proposed that contemplates systems and methods to determine temporality of a query in order to generate a search result including a list of objects that are not only based on matching of the objects to the query but also based on temporality analysis of the query. Here, the temporality of the query can be defined as the distribution over time of the objects matching the query, i.e., the chronology histogram of the query. Such distribution can be analyzed to provide a classification of the intent of the query. Classification of the intent of the query can result either in discrete classification of the query into categories, or in continuous classification of the query which may be a scalar or vector value resulting from transformations of the chronology histogram.Type: GrantFiled: June 15, 2011Date of Patent: November 18, 2014Assignee: Topsy Labs, Inc.Inventors: Rishab Aiyer Ghosh, Thomas James Emerson, Lun Ted Cui
-
Patent number: 8892543Abstract: System and method for indexing rendered web page images. A web crawling engine stores the content and crawl time of a web page. A scheduling engine sends the content and crawl time to a rendering engine, and processes requests for embedded objects. If a requested object has been crawled, it sends the contents to the rendering engine. Otherwise it schedules the crawl of the object, and once the object is crawled, it resends the content and crawl time of the web page to the rendering engine. The rendering engine receives the content and crawl time of a web page, requests all embedded objects, and renders the web page to an image once all embedded objects are received.Type: GrantFiled: September 13, 2012Date of Patent: November 18, 2014Assignee: Google Inc.Inventors: Rupesh Kapoor, Erik Hendriks, Sathayanarayana Giridhar, Andrei Pascovici, Pawel Aleksander Fedorynski
-
Publication number: 20140337309Abstract: Embodiments relate to systems and methods employing personalized query expansion to suggest measures and dimensions allowing iterative building of consistent queries over a data warehouse. Embodiments may leverage one or more of: semantics defined in multi-dimensional domain models, user profiles defining preferences, and collaborative usage statistics derived from existing repositories of Business Intelligence (BI) documents (e.g. dashboards, reports). Embodiments may utilize a collaborative co-occurrence value derived from profiles of users or social network information of a user.Type: ApplicationFiled: June 5, 2014Publication date: November 13, 2014Applicant: SAP AGInventors: Raphael Thollot, Nicolas Kuchmann-Beauger, Corentin FollenFant
-
Patent number: 8886625Abstract: Provided are methods and computer-readable media for providing recommended entities based on a user's external social graph, such as asymmetric social graph of a social networking service. In some embodiments, entities responsive to a search query or other request may be obtained. Each entity may be evaluated to determine if the entity is associated with a contact from a user's social graph. The association may include an evaluation (e.g., a rating, review, other evaluation or combination thereof) of the entity by the contact. Additionally, the contacts having associations with an entity may be ranked based on a relationship score with a user. The entities having associations with the contacts from a user's social graph may be provided as recommended entities to the user, and the association may be annotated to the recommended entity for viewing by the user.Type: GrantFiled: October 31, 2012Date of Patent: November 11, 2014Assignee: Google Inc.Inventors: Sebastian Dorner, Mat Balez
-
Patent number: 8880498Abstract: System and method for collecting information from a plurality of related sites, analyzing the information and storing the relevant information in a data base for future use. According to one embodiment of the present invention, the system uses the provided list of sites, whether obtained automatically or separately, queries them and analyzes the result retrieved from each site. The information may also optionally and preferably be ranked.Type: GrantFiled: September 27, 2009Date of Patent: November 4, 2014Assignee: Fornova Ltd.Inventors: Michael Rubanovich, Dmitry Babitsky
-
Patent number: 8880559Abstract: A computer system that includes a computer that couples with a database. The computer includes program code or modules to gather location and activity content from disparate sources, and through text analytics, extract associations from the content and populate the database with the associations between locations and activities. Further modules provide end user interaction through presentation of a search user interface specific to locations and activities. Additional modules provide the capability to search the database, rank the results of the search and present the results to the user.Type: GrantFiled: April 2, 2010Date of Patent: November 4, 2014Inventor: Brian Bartell
-
Publication number: 20140324818Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.Type: ApplicationFiled: July 7, 2014Publication date: October 30, 2014Inventor: Keith H. RANDALL
-
Publication number: 20140324816Abstract: A system and method is provided for internet searching infrastructures and more particularly to hosted client device status supporting the delivery of search results hosted by a client device. A registry table retains client device status information so that when a search result includes specific device hosted content, that client device's status will be known. Client device status includes sleep, offline, predicted period of availability, do-not-disturb (DnD), power availability, or busy along with other status indications.Type: ApplicationFiled: May 24, 2013Publication date: October 30, 2014Applicant: Broadcom CorporationInventors: James Duane Bennett, Yasantha Nirmal Rajakarunanayake, Wael William Diab
-
Publication number: 20140324817Abstract: A system and method is provided to distribute preprocessing of client device content. The client device performs preprocessing or alternatively transfers search accessible content to remote systems for preprocessing such as search system infrastructure, set-top boxes, other client devices, etc. Client device content is preprocessed so as to provide, for example, a preview of images available by providing thumbnails of the images, small excerpts of text or a video preview. Offloading of client device content preprocessing duties reduces web server operational requirements and subsequent power needs. Additionally, preprocessing of searchable content can be distributed across multiple content hosts and search infrastructure elements.Type: ApplicationFiled: May 24, 2013Publication date: October 30, 2014Applicant: BROADCOM CORPORATIONInventors: Wael William Diab, Yasantha Nirmal Rajakarunanayake, James Duane Bennett
-
Publication number: 20140324815Abstract: A system and method for supporting searching of client device hosted content. A search infrastructure supports creation, managing and searching of client device hosted content. A client device, which hosts content, communicates its client device identification (ID), type and access restrictions to the search infrastructure. In addition, the client device communicates a global network route to the client device content as a pointer for the search engine to provide a search requestor access to both the client device and specified content. Client device information is also provided to a client device registry accessible by the search infrastructure, for example a registry maintained in a cloud based service. Client devices can enter into client device services agreement with a third party storage system for the purposes of providing a higher probability that their client device hosted content will be available.Type: ApplicationFiled: May 24, 2013Publication date: October 30, 2014Inventors: Wael William Diab, Yasantha Nirmal Rajakarunanayake, James Duane Bennett
-
Patent number: 8874540Abstract: A system and method for semantically classifying numerical data includes using semantic classification techniques on ‘nearby’ non-numerical data to identify a context whereby opaque data sets of numbers can be semantically classified inside of that context. An Electronic Knowledge Base is used to query against the context and determine the semantics of the opaque numeric data sets.Type: GrantFiled: September 7, 2011Date of Patent: October 28, 2014Assignee: Xerox CorporationInventors: Michael David Shepherd, Dale Ellen Gaucas, Kirk J. Ocke
-
Patent number: 8874544Abstract: A system and method for exposing internal search indices to Internet search engines. The internal search indices are exposed to external search engines in such a way that the data may be segregated into at least two types including one layer of search data specifically for the search engines, and another for potential users of the application. This significantly improves the probability of discovery by search engines and also provides for presentation of discovered content to users in a manner consistent with the content itself, or consistent with the intended controls or presentations established by the content's owner. The system and method also includes one or more components that reproduce information about IP in a format that search engines can recognize and locate. The component also forwards users coming through the search engines to the actual IP graphical user interface (GUI) instead of the files that the search engine discovered.Type: GrantFiled: January 13, 2005Date of Patent: October 28, 2014Assignee: International Business Machines CorporationInventors: Clifton E. Grim, III, Christopher I. Schmidt, John D. Wilson
-
Patent number: 8868541Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scheduling resource crawls. In one aspect, a framework is provided for scheduling resource crawls such that a crawl scheduler determines the health of a document, i.e., whether it can be crawled, the popularity of the document, and the frequency of “interesting,” i.e., substantive, content changes, and based on this information, estimates an appropriate crawl interval for each web resource to improve crawl resource utilization.Type: GrantFiled: January 21, 2011Date of Patent: October 21, 2014Assignee: Google Inc.Inventors: Zhen Lin, Keith Stevens
-
Patent number: 8868540Abstract: A flexible and extensible architecture allows for secure searching across an enterprise. Such an architecture can provide a simple Internet-like search experience to users searching secure content inside (and outside) the enterprise. The architecture allows for the crawling and searching of a variety of sources across an enterprise, regardless of whether any of these sources conform to a conventional user role model. The architecture further allows for security attributes to be submitted at query time, for example, in order to provide real-time secure access to enterprise resources. The user query also can be transformed to provide for dynamic querying that provides for a more current result list than can be obtained for static queries.Type: GrantFiled: February 28, 2007Date of Patent: October 21, 2014Assignee: Oracle International CorporationInventors: Mark Ture, Muralidhar Krishnaprasad, Vishu Krishnamurthy
-
Publication number: 20140310257Abstract: A computerized system and method is presented for analyzing quotations made in a quoting document of text originally found in a source document. The quoting document and source document can be web pages publicly available on the World Wide Web. The present invention analyzes the quoting document for quoted text, searches the source document for that text, and stores the existence of the quotation in association with the source document. When displaying the source document, quoted text is highlighted. A link is provided between items of quoted text and a list of documents that have quoted that text. From this list the full text of a quoting document may be displayed.Type: ApplicationFiled: June 27, 2014Publication date: October 16, 2014Applicant: GERONIMO DEVELOPMENT CORPORATIONInventor: Orin Russell Armstrong
-
Patent number: 8862579Abstract: Systems and methods for search and search optimization using a pattern in a location identifier is disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of search and search optimization. The method includes, detecting a set of location identifiers that have a pattern that matches a specified pattern and identifying a set of search results as having content related to the semantic type. The specified pattern can be stored in a computer-readable storage medium and corresponds to a semantic type. The set of search results can include objects associated with the set of location identifiers having the specified pattern.Type: GrantFiled: April 14, 2010Date of Patent: October 14, 2014Assignee: VCVC III LLCInventors: James M. Wissner, Nova Spivack
-
Patent number: 8862568Abstract: A system and method for time-multiplexing the display of a plurality of electronic documents are provided. Time-multiplexing criteria for displaying a plurality of selected documents associated with a concept on a time-multiplexed basis is determined. The plurality of selected documents are caused to be displayed at an output device in a predetermined sequence according to the time-multiplexing criteria. The time-multiplexing criteria may be a variety of criteria related to the selected documents, the source of the selected documents, or other factors such as a relevance to a concept and one or more preferences associated with the selected documents.Type: GrantFiled: September 8, 2009Date of Patent: October 14, 2014Assignee: Google Inc.Inventors: Gregory Joseph Badros, Jeff Eddings, Rama Ranganath
-
Publication number: 20140304249Abstract: Determining experts based on a search query of a user includes identifying items in a content collection that correspond to the search query, determining authors of the items, and ranking the authors according to relevance to the search query for each of the items for each of the authors. Determining experts based on a search query of a user may also include complementing the query with additional public search results prior to identifying the items. Complementing the query may include using an external data source to search based on the query. The external data source may be selected from the group consisting of Google Search, Yahoo Search, and Microsoft Bing. Determining experts based on a search query of a user may also include presenting the authors to the user in order of ranking The query may be a natural language query.Type: ApplicationFiled: February 26, 2014Publication date: October 9, 2014Applicant: Evernote CorporationInventors: Mark Ayzenshtat, Zeesha Currimbhoy
-
Patent number: 8856169Abstract: A multi-modality, multi-resource, information integration environment system is disclosed that comprises: (a) at least one computer readable medium capable of securely storing and archiving system data; (b) at least one computer system, or program thereon, designed to permit and facilitate web-based access of the at least one computer readable medium containing the secured and archived system data; (c) at least one computer system, or program thereon, designed to permit and facilitate resource scheduling or management; (d) at least one computer system, or program thereon, designed to monitor the overall resource usage of a core facility; and (e) at least one computer system, or program thereon, designed to track regulatory and operational qualifications.Type: GrantFiled: July 13, 2012Date of Patent: October 7, 2014Assignee: Case Western Reserve UniversityInventors: Guo-Qiang Zhang, Remo Sebastian Wolfgang Mueller, Jacek Szymanski, Adam Troy, David L. Wilson, Chris A. Flask, Raymond F. Muzic, Jr.
-
Publication number: 20140297617Abstract: A system and method provide for geo-augmentation through virtual tagging. A search infrastructure supports creation, managing and searching geo-coded virtual tags using mobile communication devices. Associated geolocations are added to a geolocation database along with pointers to the stored content. Searching of the geolocation database is performed upon receiving geolocation search input, wherein the infrastructure applies the geolocation based search input to the search database yielding search results delivered from the mobile communications device for presentation to the user.Type: ApplicationFiled: April 23, 2013Publication date: October 2, 2014Applicant: BROADCOM CORPORATIONInventors: Yasantha Nirmal Rajakarunanayake, William Stuart Bunch, Wael William Diab
-
Patent number: 8849826Abstract: The sentiment engine includes a sentiment module configured to gather opinions or determine sentiment expressed in documents, a crawling module configured to crawl servers to obtain at least a subset of the documents or opinions from social media websites, a keyword module configured to extract keywords from documents, a filtering module configured to filter keywords and documents, and a classification module configured to classify documents, sentences, and/or keywords, a polarity prediction module configured to predict the polarity of a sentiment sentence, and a social media net promoter score (SNPS) configured to calculate a loyalty metric of users from social media websites. The functionality of these modules may be combined with one another or in addition to other modules.Type: GrantFiled: September 30, 2012Date of Patent: September 30, 2014Assignee: Metavana, Inc.Inventor: Duong-Van Minh
-
Patent number: 8849649Abstract: A system, computer readable storage medium storing instructions, and computer-implemented method for determining sentiment expressed in documents is disclosed. A document is received from a plurality of documents. A sentence in the document that includes at least one sentiment signature within a predetermined distance of at least one keyword from a list of keywords is identified, wherein the list of keywords is extracted from the plurality of documents and is filtered using a phase transition formula, and wherein the at least one sentiment signature corresponds to an expression of at least one sentiment in the sentence. At least one category corresponding to the at least one keyword of the sentence is determined, wherein the at least one category is included in a list of categories that is generated using the list of keywords. At least one sentiment corresponding to the at least one category is determined based on the at least one sentiment signature.Type: GrantFiled: December 23, 2010Date of Patent: September 30, 2014Assignee: Metavana, Inc.Inventor: Minh Duong-van
-
Publication number: 20140289045Abstract: Method and system for delivery of personal search services and advertising. The method includes collecting information from the user about the user's personal search engine, including, but not limited to digital content data sources, link crawl depth of those digital content data sources, and time interval to refresh the index of the digital content data sources created. In one embodiment of the present invention users do not pay a fee in return for allowing the provider to present advertising to the user as the user uses the invention. In another embodiment, advertisers purchase advertising display services from the provider to be displayed to specific users.Type: ApplicationFiled: June 10, 2014Publication date: September 25, 2014Inventor: Nancy KRAMER
-
Publication number: 20140280011Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicating a measure of quality for a site, e.g., a web site. In some implementations, the methods include obtaining baseline site quality scores for multiple previously scored sites; generating a phrase model for multiple sites including the previously scored sites, wherein the phrase model defines a mapping from phrase specific relative frequency measures to phrase specific baseline site quality scores; for a new site that is not one of the previously scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Applicant: Google Inc.Inventors: Yun Zhou, Navneet Panda
-
Publication number: 20140280009Abstract: Methods and apparatus to supplement web crawling with cached data from distributed devices are disclosed. An example method includes accessing a first set of websites cached in a panelist device; comparing the first set of websites to a second set of websites to be analyzed by a crawler; and retrieving with the crawler a first website included in the second set of websites but not included in the first set of websites from a server associated with the first website.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventor: Chad Hage
-
Publication number: 20140280012Abstract: Methods and system allow for creating rules for a tag management system. One or more implementations create rules for a tag management system can include crawling a page of a website. Additionally, one or more implementations identify the configuration of each of the tags implemented within the page. Further, one or implementations generate one or more rules that enable a tag management system to recreate the configuration of one or more tags implemented within the page. Further still, one or more implementations export the generated one or more rules to a tag management system.Type: ApplicationFiled: May 9, 2013Publication date: September 18, 2014Applicant: OBSERVEPOINT LLCInventors: Alan Martin Feuerlein, Matthew T. Miller, Robert K. Seolas, John Pestana
-
Publication number: 20140279056Abstract: An intelligent platform for real-time bidding (RTB) includes a bidder that allows for the association of additional private or proprietary information with each bid it receives, and allows advertisers to filter impressions based on a rich set of attributes. The bidder can be used to bid across many ad exchanges using the same augmented bidding criteria. The system can have crawlers that include virtual web browser rendering for analysis to allow the system to determine location on a page, a size of the video, how it is played, and information about content in the video. The crawlers can include a browser-specific rendering crawler, which can determine browser-specific behavior.Type: ApplicationFiled: March 18, 2014Publication date: September 18, 2014Applicant: TriVu Media, Inc.Inventors: Michael SULLIVAN, Paul CALENTO, Miles DENNISON
-
Publication number: 20140280010Abstract: The embodiments relate to transcoding, cataloging, and extracting metadata about files stored in a storage device. In one embodiment, a crawler runs on the storage device and maintains a database that is stored in the volume with the data that has been cataloged by the crawler. The crawler may discover files of any type and extract associated metadata about the files. The crawler can extract metadata about client interaction with various files, such as edits, play counts, etc. The crawler may discover files of any type and extract associated metadata about the files automatically during a scan or at the request of a client. In one embodiment, the crawler may be responsive to file system events that indicate changes to the file system, such as additions, deletions, or other types of changes. In addition, the crawler may synchronize the database with the file system so that they indicated the same state for a particular file.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Applicant: Western Digital Technologies, Inc.Inventor: Western Digital Technologies, Inc.
-
Patent number: 8838570Abstract: In one embodiment, a web browser running in a client computer is configured to connect to an external server computer upon invocation of a home page or other configurable uniform resource locator. The server computer may receive the IP address of the client computer and check the IP address of the client computer against a listing of IP addresses of known bot-infected computers. The web browser may pass the URL address of the home page as a URL parameter. The server computer may redirect the web browser to the home page or other location when the client computer is not infected by a bot or, when the client computer is bot-infected, to a solutions web page that provides access to a malicious code scanner that may be utilized to remove the bot.Type: GrantFiled: November 6, 2006Date of Patent: September 16, 2014Assignee: Trend Micro IncorporatedInventor: Edward D. English
-
Patent number: 8838572Abstract: Method and system for organizing and sharing content through experience are described. In one embodiment, content may be organized and shared among users through a specific experience. A method for sharing content in a network may include: collecting contents related to a specific experience from a specific user; generating an experience graph of the specific experience; enabling the specific user to invite other users to join the experience graph; and enabling each user inside the experience graph to share new content into the experience graph.Type: GrantFiled: September 13, 2012Date of Patent: September 16, 2014Assignee: Airtime Media, Inc.Inventors: Andrew C. Lin, Eric I. Feng, Eugene C. Wei
-
Patent number: 8838571Abstract: Techniques are provided for data-discriminate search engine updates, where, in accordance with a first crawling session frequency associated with a first update type, a search engine index is updated by recording an update to a first set of data, where the update to the first set of data is of the first update type, and, in accordance with a second crawling session frequency associated with a second update type, the search engine index is updated by recording an update to a second set of data, where the update to the second set of data is of the second update type, where the first crawling session frequency is of a different frequency than the second crawling session frequency.Type: GrantFiled: June 28, 2010Date of Patent: September 16, 2014Assignee: International Business Machines CorporationInventors: Shai Erera, Laurent Hasson, Eitan Shapiro
-
Patent number: 8838584Abstract: A method for selecting a subset of content sources from a collection of content sources is disclosed. A server retrieves, in response to a plurality of queries on a topic from a client, using a programmed computer, a plurality of sets of documents from the collection of content sources. The server enumerates all subsets of the plurality of sets of documents. The server calculates, for each subset, a count of effectiveness of a subset and a price of the subset. The server selects a subset having the highest calculated ratio of count of effectiveness of the subset to price of the subset. The server delivers the selected subset of the plurality of sets of documents to the client.Type: GrantFiled: March 29, 2012Date of Patent: September 16, 2014Assignee: Acquire Media Ventures, Inc.Inventors: Lawrence C. Rafsky, Thomas B. Donchez
-
Publication number: 20140258262Abstract: A method and computer readable medium is described for directing a search engine web crawler's local web browser to refresh the top-level container that is currently displaying the content presented by a remote computer with the new content that a navigational link, within a remote desktop, remote application window, or remote graphical windowing user session, points to. Links can be modified so as to be recognizable by the remote machine as unique from traditional hyperlinks. Upon navigation action on such a link, the client of a remote desktop, remote graphical application window, or remote graphical windowing user session is redirected so that it wholly reloads its computing context with that provided by a destination URL or URI. Such a URL or URI may point to another remote desktop, remote application window, or remote graphical windowing user session.Type: ApplicationFiled: March 7, 2014Publication date: September 11, 2014Inventor: Christopher Balz
-
Publication number: 20140258261Abstract: A web page identified by a URL stored in a downloads queue is downloaded, and hyperlinks in the downloaded web page are identified. Each hyperlink is screened by parsing the hyperlink (optionally only the URL of the hyperlink) to identify features comprising character strings, computing for each feature values for one or more meta-features indicative of the hyperlinked web page being in a target language, aggregating the meta-feature values to generate a score for the hyperlink, and adding the URL of the hyperlink to the downloads queue conditional upon the score satisfying a screening criterion. The downloading, identifying, and screening are iteratively repeated to perform web crawling, and an index of web pages in the target language is constructed based on analysis of content of the downloaded web pages. The meta-features may include a transliterated target word meta-feature, a language code meta-feature, a country code meta-feature, or so forth.Type: ApplicationFiled: March 11, 2013Publication date: September 11, 2014Applicant: Xerox CorporationInventors: Nidhi Singh, Jean-Marc Coursimault, Nicolas Monet, Herve Poirer