Web Crawlers Patents (Class 707/709)
  • Patent number: 8930305
    Abstract: An adaptive information processing system for updating product documentation and associated knowledge base is disclosed, the system including at least one subsystem for receiving original data from a data source, and a central dynamic data system to integrate the original data from the at least one subsystem. The central dynamic data system is configured to integrate system knowledge with the original data to form integrated data, wherein the central dynamic data system is configured to dynamically update the product documentation and the knowledge base based on the integrated data. A computer implemented method for dynamically updating product documentation and knowledge base is further disclosed, the method includes receiving original data from a data source, and integrating the knowledge base with the original data from the data source to form integrated data.
    Type: Grant
    Filed: November 16, 2009
    Date of Patent: January 6, 2015
    Assignee: Toyota Motor Engineering & Manfuacturing North America, Inc.
    Inventors: Setu Madhavi Namburu, Danil Prokhorov, Liu Qiao, Sandesh Ghimire
  • Patent number: 8930343
    Abstract: Provided is a system and method for collecting a document. The system may include an identification information receiver to receive, from a host of a site, identification information of a document of which an update may occur, a collection request transfer unit to transmit a collection request for the document based on the identification information, an update information collector to receive update information of the document from the host, and a search result provider to provide, to the host, a search result extracted from the update information of the document, in response to the search request being received from the host. The system for collecting the document may reduce load of a web site, and may improve accuracy of the document to be collected.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: January 6, 2015
    Assignee: NHN Corporation
    Inventors: Young Su Ko, Seung Yeop Han, Jung Woo Seo
  • Publication number: 20140365459
    Abstract: Some embodiments of the invention provide an address harvester that harvests addresses from one or more applications executing on a device. Some embodiments use the harvested addresses to facilitate the operation of one or more applications executing on the device. Alternatively, or conjunctively, some embodiments use the harvested addresses to facilitate the operation of one or more applications executing on another device than the one used for harvesting the addresses. In some embodiments, a prediction system uses the harvested addresses to formulate predictions, which it then provides to the same set of applications from which it harvested the addresses in some embodiments.
    Type: Application
    Filed: November 15, 2013
    Publication date: December 11, 2014
    Applicant: Apple Inc.
    Inventors: Ashley B. Clark, James Magahem, Jorge Fino, Scott Hertz, Emanuele Vulcano
  • Patent number: 8909632
    Abstract: A method, system and computer-usable medium are disclosed for maintaining persistent links to information stored on a network. Information elements are tagged and their original network location is saved as a hyperlink. The tagged information elements are then acquired at the original network location by a search engine crawler, indexed by a search engine, and stored in an information location index. The tagged information elements are periodically submitted to the search engine to generate search results. Comparison operations are performed to determine the search results comprising the closest-matching information elements and their current network location. The network location stored in the hyperlink is replaced with the current network location if it is not the same.
    Type: Grant
    Filed: October 17, 2007
    Date of Patent: December 9, 2014
    Assignee: International Business Machines Corporation
    Inventors: Saurabh Shukla, Mandar U. Jog, Shreyansh Shukla, Scott W. Newman
  • Patent number: 8908996
    Abstract: An automated and extensible system is provided for the analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image. The system uses an ROI database that is a relational or analytical database containing searchable vectors representing images stored in a repository. Entries in the ROI database are created by an image locator and ROI classifier that locate images within the repository and extract relevant information to be stored in the ROI database. The ROI classifier analyzes objects in an image to arrive at actual features of the true object. Graphical searches may also be performed.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: December 9, 2014
    Assignee: Google Inc.
    Inventors: Jamie E. Retterath, Robert A. Laumeyer
  • Patent number: 8908997
    Abstract: The present invention is an automated and extensible system for the analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image. The system uses an ROI database that is a relational or analytical database containing searchable vectors that represent the images stored in a repository. Entries in the database are created by an image locator and ROI classifier that work to locate images within the repository and extract relevant information that will be stored in the ROI database. The ROI classifier analyzes objects in an image identify actual features of the true object. Graphical searches are performed by the collaborative workings of an image retrieval module, an image search requestor and an ROI query module. The image search requestor is an abstraction layer that translates user or agent search requests into the language understood by the ROI query.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: December 9, 2014
    Assignee: Google Inc.
    Inventors: Jamie E. Retterath, Robert A. Laumeyer
  • Patent number: 8909617
    Abstract: A method, apparatus, system, article of manufacture, and computer readable storage medium provide media content. A web page context for a web page is determined and stored in a database. One or more media content files are analyzed to extract information that is stored in the database. The information is compared to the web page context. A matching media content file is determined from the one of the one or more media content files that matches the web page context based on the comparison. The matching media content file is then provided (e.g., to an internet portal web site).
    Type: Grant
    Filed: January 26, 2011
    Date of Patent: December 9, 2014
    Assignee: Hulu, LLC
    Inventor: Dong Wang
  • Publication number: 20140358887
    Abstract: A search service accesses application content accessible via one or more enumerated applications. The search service ranks the accessed application content in combination with non-application content to produce a combined ranking. Responsive to a search query, the search service provides one or more search results based on the combined ranking.
    Type: Application
    Filed: August 28, 2013
    Publication date: December 4, 2014
    Applicant: Microsoft Corporation
    Inventors: Max Glenn Morris, Robert Emmett Kolba, JR., Yi Li, Kang Li, Tyler Beam, Kyle Beck, Rylan Hawkins, Daniel Oliver, Sandy Wong, Shajib Sadhukha
  • Publication number: 20140358888
    Abstract: The present invention provides the capability to quickly and easily determine the online reputation a target, and to quickly and easily take steps to improve the online reputation of the target. For example, a method of monitoring and affecting online reputation may comprise gathering information potentially related to an online reputation of a target, filtering the gathered information to eliminate information not related to the target, computing a reputation score for the filtered information based on both positive and negative information related to the target, generating positive information relating to the target, and distributing the generated positive information relating to the target to a plurality of online locations.
    Type: Application
    Filed: May 28, 2014
    Publication date: December 4, 2014
    Inventors: Granger WHITELAW, Richard KANE, Scott Jeffrey EMRICH, Steven MENDELSON
  • Patent number: 8903199
    Abstract: An automated and extensible system for analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image is provided. The system uses an database that is a relational or analytical database containing searchable vectors that represent the images stored in a repository. Entries in the database are created by an image locator and ROI classifier working together to locate images within the repository and extract relevant information to be stored in the ROI database. The ROI classifier analyzes objects in an image to arrive at actual features of the true object. Graphical searches are performed by the collaborative workings of an image retrieval module, an image search requestor and an ROI query module. The image search requestor is an abstraction layer that translates user or agent search requests into the language understood by the ROI query.
    Type: Grant
    Filed: February 6, 2012
    Date of Patent: December 2, 2014
    Assignee: Google Inc.
    Inventors: Jamie E. Retterath, Robert A. Laumeyer
  • Publication number: 20140351236
    Abstract: A method, apparatus, server and system for websites searching in a browser of a mobile terminal is presented. The method includes the steps of: loading one or more preconfigured website search engine information for generating a website search engine list on a browser search bar; receiving information on which website search engine has been selected from the generated website search engine list; receiving a search keyword input to the browser search bar; sending a search request to the selected website search engine to query the received search keyword; and displaying a search result returned by the selected website search engine upon a successful search.
    Type: Application
    Filed: July 21, 2014
    Publication date: November 27, 2014
    Inventor: Zhigang Zhu
  • Publication number: 20140351235
    Abstract: A system, method, and computer program product are provided for crawling a website based on a scheme of the website. In use, a difference between a first content and second content of a website is identified. Additionally, a scheme of the website is identified based on the difference. Furthermore, the website is crawled based on the scheme.
    Type: Application
    Filed: June 5, 2014
    Publication date: November 27, 2014
    Inventor: Gabriel Pack
  • Publication number: 20140351237
    Abstract: Systems and methods for the creation of hierarchical networks of overlapping informational web neighborhoods using percolation crawling. Each neighborhood comprises a set of closely linked pages that share a common set of concepts and intent and purpose. The neighborhoods represent web pages that share a common set of underlying concepts and semantic associations. Each such neighborhood can be semantically tagged.
    Type: Application
    Filed: August 12, 2014
    Publication date: November 27, 2014
    Inventors: Behnam Attaran Rezaei, Alice Hwei-Yuan Meng Muntz
  • Patent number: 8898297
    Abstract: An embodiment of the disclosed system provides the user of a computing device with information concerning the expected usefulness of an item, such as a hyperlink, within a network resource, such as a search result webpage, with the expected usefulness information based at least in part on an attribute of the user's computing device. For example, the system may provide the user with information identifying a particular website as poorly suited for the user's device, based on data that the system collected identifying an aggregate bounce-back rate from computing devices with a similar attribute to the user's computing device.
    Type: Grant
    Filed: August 17, 2012
    Date of Patent: November 25, 2014
    Assignee: Amazon Technologies, Inc.
    Inventors: Brett R. Taylor, Ameet N. Vaswani, Faizal S. Kassamali, Ryan Tucker, Ranganath Atreya, Michael V. Zampani
  • Publication number: 20140344241
    Abstract: A method for user-enhanced ranking of information objects, comprising: generating a graphical user-interface (40) on a display (13), the graphical user-interface comprising a graph (41), wherein the graph comprises a plurality of icons each representing an information object of the collection of information objects and a plurality of connectors connecting the icons, each connector representing at least one link of the collection of links, modifying the graph by generating an additional connector between the icons in response to graph modification commands received from a user-controlled interaction means, storing an additional link in the database (21) as a function of the additional connector, wherein the additional link interrelates information objects represented by he icons connected by the additional connector, computing a link-based rank for an information object of the collection of information objects as a function of the additional link and the collection of links.
    Type: Application
    Filed: August 21, 2012
    Publication date: November 20, 2014
    Applicant: Alcatel Lucent
    Inventor: Dohy Hong
  • Publication number: 20140344242
    Abstract: Embodiment of the disclosure may includes systems, methods, and devices for providing multidimensional search results on a plurality of search planes. Such systems, methods, and devices may: (i) receive one or more search terms from one or more user interfaces of the system; (ii) perform a search of one or more informational repositories to obtain a list of search results wherein the informational repositories may include the Internet and one or more databases; (iii) process the list of search results to classify each search result in one of a plurality of categories; (iv) cause a presentation of the search results in a plurality of search planes on the display of the system such that each search plane corresponds to one of the plurality of categories. In addition, the software applications may include a sorting software application that groups the list of search results into one of a plurality of categories.
    Type: Application
    Filed: August 4, 2014
    Publication date: November 20, 2014
    Applicant: Ariel Inventions LLC
    Inventor: Leigh M. Rothschild
  • Patent number: 8892541
    Abstract: A new approach is proposed that contemplates systems and methods to determine temporality of a query in order to generate a search result including a list of objects that are not only based on matching of the objects to the query but also based on temporality analysis of the query. Here, the temporality of the query can be defined as the distribution over time of the objects matching the query, i.e., the chronology histogram of the query. Such distribution can be analyzed to provide a classification of the intent of the query. Classification of the intent of the query can result either in discrete classification of the query into categories, or in continuous classification of the query which may be a scalar or vector value resulting from transformations of the chronology histogram.
    Type: Grant
    Filed: June 15, 2011
    Date of Patent: November 18, 2014
    Assignee: Topsy Labs, Inc.
    Inventors: Rishab Aiyer Ghosh, Thomas James Emerson, Lun Ted Cui
  • Patent number: 8892543
    Abstract: System and method for indexing rendered web page images. A web crawling engine stores the content and crawl time of a web page. A scheduling engine sends the content and crawl time to a rendering engine, and processes requests for embedded objects. If a requested object has been crawled, it sends the contents to the rendering engine. Otherwise it schedules the crawl of the object, and once the object is crawled, it resends the content and crawl time of the web page to the rendering engine. The rendering engine receives the content and crawl time of a web page, requests all embedded objects, and renders the web page to an image once all embedded objects are received.
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: November 18, 2014
    Assignee: Google Inc.
    Inventors: Rupesh Kapoor, Erik Hendriks, Sathayanarayana Giridhar, Andrei Pascovici, Pawel Aleksander Fedorynski
  • Publication number: 20140337309
    Abstract: Embodiments relate to systems and methods employing personalized query expansion to suggest measures and dimensions allowing iterative building of consistent queries over a data warehouse. Embodiments may leverage one or more of: semantics defined in multi-dimensional domain models, user profiles defining preferences, and collaborative usage statistics derived from existing repositories of Business Intelligence (BI) documents (e.g. dashboards, reports). Embodiments may utilize a collaborative co-occurrence value derived from profiles of users or social network information of a user.
    Type: Application
    Filed: June 5, 2014
    Publication date: November 13, 2014
    Applicant: SAP AG
    Inventors: Raphael Thollot, Nicolas Kuchmann-Beauger, Corentin FollenFant
  • Patent number: 8886625
    Abstract: Provided are methods and computer-readable media for providing recommended entities based on a user's external social graph, such as asymmetric social graph of a social networking service. In some embodiments, entities responsive to a search query or other request may be obtained. Each entity may be evaluated to determine if the entity is associated with a contact from a user's social graph. The association may include an evaluation (e.g., a rating, review, other evaluation or combination thereof) of the entity by the contact. Additionally, the contacts having associations with an entity may be ranked based on a relationship score with a user. The entities having associations with the contacts from a user's social graph may be provided as recommended entities to the user, and the association may be annotated to the recommended entity for viewing by the user.
    Type: Grant
    Filed: October 31, 2012
    Date of Patent: November 11, 2014
    Assignee: Google Inc.
    Inventors: Sebastian Dorner, Mat Balez
  • Patent number: 8880498
    Abstract: System and method for collecting information from a plurality of related sites, analyzing the information and storing the relevant information in a data base for future use. According to one embodiment of the present invention, the system uses the provided list of sites, whether obtained automatically or separately, queries them and analyzes the result retrieved from each site. The information may also optionally and preferably be ranked.
    Type: Grant
    Filed: September 27, 2009
    Date of Patent: November 4, 2014
    Assignee: Fornova Ltd.
    Inventors: Michael Rubanovich, Dmitry Babitsky
  • Patent number: 8880559
    Abstract: A computer system that includes a computer that couples with a database. The computer includes program code or modules to gather location and activity content from disparate sources, and through text analytics, extract associations from the content and populate the database with the associations between locations and activities. Further modules provide end user interaction through presentation of a search user interface specific to locations and activities. Additional modules provide the capability to search the database, rank the results of the search and present the results to the user.
    Type: Grant
    Filed: April 2, 2010
    Date of Patent: November 4, 2014
    Inventor: Brian Bartell
  • Publication number: 20140324818
    Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.
    Type: Application
    Filed: July 7, 2014
    Publication date: October 30, 2014
    Inventor: Keith H. RANDALL
  • Publication number: 20140324816
    Abstract: A system and method is provided for internet searching infrastructures and more particularly to hosted client device status supporting the delivery of search results hosted by a client device. A registry table retains client device status information so that when a search result includes specific device hosted content, that client device's status will be known. Client device status includes sleep, offline, predicted period of availability, do-not-disturb (DnD), power availability, or busy along with other status indications.
    Type: Application
    Filed: May 24, 2013
    Publication date: October 30, 2014
    Applicant: Broadcom Corporation
    Inventors: James Duane Bennett, Yasantha Nirmal Rajakarunanayake, Wael William Diab
  • Publication number: 20140324817
    Abstract: A system and method is provided to distribute preprocessing of client device content. The client device performs preprocessing or alternatively transfers search accessible content to remote systems for preprocessing such as search system infrastructure, set-top boxes, other client devices, etc. Client device content is preprocessed so as to provide, for example, a preview of images available by providing thumbnails of the images, small excerpts of text or a video preview. Offloading of client device content preprocessing duties reduces web server operational requirements and subsequent power needs. Additionally, preprocessing of searchable content can be distributed across multiple content hosts and search infrastructure elements.
    Type: Application
    Filed: May 24, 2013
    Publication date: October 30, 2014
    Applicant: BROADCOM CORPORATION
    Inventors: Wael William Diab, Yasantha Nirmal Rajakarunanayake, James Duane Bennett
  • Publication number: 20140324815
    Abstract: A system and method for supporting searching of client device hosted content. A search infrastructure supports creation, managing and searching of client device hosted content. A client device, which hosts content, communicates its client device identification (ID), type and access restrictions to the search infrastructure. In addition, the client device communicates a global network route to the client device content as a pointer for the search engine to provide a search requestor access to both the client device and specified content. Client device information is also provided to a client device registry accessible by the search infrastructure, for example a registry maintained in a cloud based service. Client devices can enter into client device services agreement with a third party storage system for the purposes of providing a higher probability that their client device hosted content will be available.
    Type: Application
    Filed: May 24, 2013
    Publication date: October 30, 2014
    Inventors: Wael William Diab, Yasantha Nirmal Rajakarunanayake, James Duane Bennett
  • Patent number: 8874540
    Abstract: A system and method for semantically classifying numerical data includes using semantic classification techniques on ‘nearby’ non-numerical data to identify a context whereby opaque data sets of numbers can be semantically classified inside of that context. An Electronic Knowledge Base is used to query against the context and determine the semantics of the opaque numeric data sets.
    Type: Grant
    Filed: September 7, 2011
    Date of Patent: October 28, 2014
    Assignee: Xerox Corporation
    Inventors: Michael David Shepherd, Dale Ellen Gaucas, Kirk J. Ocke
  • Patent number: 8874544
    Abstract: A system and method for exposing internal search indices to Internet search engines. The internal search indices are exposed to external search engines in such a way that the data may be segregated into at least two types including one layer of search data specifically for the search engines, and another for potential users of the application. This significantly improves the probability of discovery by search engines and also provides for presentation of discovered content to users in a manner consistent with the content itself, or consistent with the intended controls or presentations established by the content's owner. The system and method also includes one or more components that reproduce information about IP in a format that search engines can recognize and locate. The component also forwards users coming through the search engines to the actual IP graphical user interface (GUI) instead of the files that the search engine discovered.
    Type: Grant
    Filed: January 13, 2005
    Date of Patent: October 28, 2014
    Assignee: International Business Machines Corporation
    Inventors: Clifton E. Grim, III, Christopher I. Schmidt, John D. Wilson
  • Patent number: 8868541
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scheduling resource crawls. In one aspect, a framework is provided for scheduling resource crawls such that a crawl scheduler determines the health of a document, i.e., whether it can be crawled, the popularity of the document, and the frequency of “interesting,” i.e., substantive, content changes, and based on this information, estimates an appropriate crawl interval for each web resource to improve crawl resource utilization.
    Type: Grant
    Filed: January 21, 2011
    Date of Patent: October 21, 2014
    Assignee: Google Inc.
    Inventors: Zhen Lin, Keith Stevens
  • Patent number: 8868540
    Abstract: A flexible and extensible architecture allows for secure searching across an enterprise. Such an architecture can provide a simple Internet-like search experience to users searching secure content inside (and outside) the enterprise. The architecture allows for the crawling and searching of a variety of sources across an enterprise, regardless of whether any of these sources conform to a conventional user role model. The architecture further allows for security attributes to be submitted at query time, for example, in order to provide real-time secure access to enterprise resources. The user query also can be transformed to provide for dynamic querying that provides for a more current result list than can be obtained for static queries.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: October 21, 2014
    Assignee: Oracle International Corporation
    Inventors: Mark Ture, Muralidhar Krishnaprasad, Vishu Krishnamurthy
  • Publication number: 20140310257
    Abstract: A computerized system and method is presented for analyzing quotations made in a quoting document of text originally found in a source document. The quoting document and source document can be web pages publicly available on the World Wide Web. The present invention analyzes the quoting document for quoted text, searches the source document for that text, and stores the existence of the quotation in association with the source document. When displaying the source document, quoted text is highlighted. A link is provided between items of quoted text and a list of documents that have quoted that text. From this list the full text of a quoting document may be displayed.
    Type: Application
    Filed: June 27, 2014
    Publication date: October 16, 2014
    Applicant: GERONIMO DEVELOPMENT CORPORATION
    Inventor: Orin Russell Armstrong
  • Patent number: 8862579
    Abstract: Systems and methods for search and search optimization using a pattern in a location identifier is disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of search and search optimization. The method includes, detecting a set of location identifiers that have a pattern that matches a specified pattern and identifying a set of search results as having content related to the semantic type. The specified pattern can be stored in a computer-readable storage medium and corresponds to a semantic type. The set of search results can include objects associated with the set of location identifiers having the specified pattern.
    Type: Grant
    Filed: April 14, 2010
    Date of Patent: October 14, 2014
    Assignee: VCVC III LLC
    Inventors: James M. Wissner, Nova Spivack
  • Patent number: 8862568
    Abstract: A system and method for time-multiplexing the display of a plurality of electronic documents are provided. Time-multiplexing criteria for displaying a plurality of selected documents associated with a concept on a time-multiplexed basis is determined. The plurality of selected documents are caused to be displayed at an output device in a predetermined sequence according to the time-multiplexing criteria. The time-multiplexing criteria may be a variety of criteria related to the selected documents, the source of the selected documents, or other factors such as a relevance to a concept and one or more preferences associated with the selected documents.
    Type: Grant
    Filed: September 8, 2009
    Date of Patent: October 14, 2014
    Assignee: Google Inc.
    Inventors: Gregory Joseph Badros, Jeff Eddings, Rama Ranganath
  • Publication number: 20140304249
    Abstract: Determining experts based on a search query of a user includes identifying items in a content collection that correspond to the search query, determining authors of the items, and ranking the authors according to relevance to the search query for each of the items for each of the authors. Determining experts based on a search query of a user may also include complementing the query with additional public search results prior to identifying the items. Complementing the query may include using an external data source to search based on the query. The external data source may be selected from the group consisting of Google Search, Yahoo Search, and Microsoft Bing. Determining experts based on a search query of a user may also include presenting the authors to the user in order of ranking The query may be a natural language query.
    Type: Application
    Filed: February 26, 2014
    Publication date: October 9, 2014
    Applicant: Evernote Corporation
    Inventors: Mark Ayzenshtat, Zeesha Currimbhoy
  • Patent number: 8856169
    Abstract: A multi-modality, multi-resource, information integration environment system is disclosed that comprises: (a) at least one computer readable medium capable of securely storing and archiving system data; (b) at least one computer system, or program thereon, designed to permit and facilitate web-based access of the at least one computer readable medium containing the secured and archived system data; (c) at least one computer system, or program thereon, designed to permit and facilitate resource scheduling or management; (d) at least one computer system, or program thereon, designed to monitor the overall resource usage of a core facility; and (e) at least one computer system, or program thereon, designed to track regulatory and operational qualifications.
    Type: Grant
    Filed: July 13, 2012
    Date of Patent: October 7, 2014
    Assignee: Case Western Reserve University
    Inventors: Guo-Qiang Zhang, Remo Sebastian Wolfgang Mueller, Jacek Szymanski, Adam Troy, David L. Wilson, Chris A. Flask, Raymond F. Muzic, Jr.
  • Publication number: 20140297617
    Abstract: A system and method provide for geo-augmentation through virtual tagging. A search infrastructure supports creation, managing and searching geo-coded virtual tags using mobile communication devices. Associated geolocations are added to a geolocation database along with pointers to the stored content. Searching of the geolocation database is performed upon receiving geolocation search input, wherein the infrastructure applies the geolocation based search input to the search database yielding search results delivered from the mobile communications device for presentation to the user.
    Type: Application
    Filed: April 23, 2013
    Publication date: October 2, 2014
    Applicant: BROADCOM CORPORATION
    Inventors: Yasantha Nirmal Rajakarunanayake, William Stuart Bunch, Wael William Diab
  • Patent number: 8849826
    Abstract: The sentiment engine includes a sentiment module configured to gather opinions or determine sentiment expressed in documents, a crawling module configured to crawl servers to obtain at least a subset of the documents or opinions from social media websites, a keyword module configured to extract keywords from documents, a filtering module configured to filter keywords and documents, and a classification module configured to classify documents, sentences, and/or keywords, a polarity prediction module configured to predict the polarity of a sentiment sentence, and a social media net promoter score (SNPS) configured to calculate a loyalty metric of users from social media websites. The functionality of these modules may be combined with one another or in addition to other modules.
    Type: Grant
    Filed: September 30, 2012
    Date of Patent: September 30, 2014
    Assignee: Metavana, Inc.
    Inventor: Duong-Van Minh
  • Patent number: 8849649
    Abstract: A system, computer readable storage medium storing instructions, and computer-implemented method for determining sentiment expressed in documents is disclosed. A document is received from a plurality of documents. A sentence in the document that includes at least one sentiment signature within a predetermined distance of at least one keyword from a list of keywords is identified, wherein the list of keywords is extracted from the plurality of documents and is filtered using a phase transition formula, and wherein the at least one sentiment signature corresponds to an expression of at least one sentiment in the sentence. At least one category corresponding to the at least one keyword of the sentence is determined, wherein the at least one category is included in a list of categories that is generated using the list of keywords. At least one sentiment corresponding to the at least one category is determined based on the at least one sentiment signature.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: September 30, 2014
    Assignee: Metavana, Inc.
    Inventor: Minh Duong-van
  • Publication number: 20140289045
    Abstract: Method and system for delivery of personal search services and advertising. The method includes collecting information from the user about the user's personal search engine, including, but not limited to digital content data sources, link crawl depth of those digital content data sources, and time interval to refresh the index of the digital content data sources created. In one embodiment of the present invention users do not pay a fee in return for allowing the provider to present advertising to the user as the user uses the invention. In another embodiment, advertisers purchase advertising display services from the provider to be displayed to specific users.
    Type: Application
    Filed: June 10, 2014
    Publication date: September 25, 2014
    Inventor: Nancy KRAMER
  • Publication number: 20140280011
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicating a measure of quality for a site, e.g., a web site. In some implementations, the methods include obtaining baseline site quality scores for multiple previously scored sites; generating a phrase model for multiple sites including the previously scored sites, wherein the phrase model defines a mapping from phrase specific relative frequency measures to phrase specific baseline site quality scores; for a new site that is not one of the previously scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: Google Inc.
    Inventors: Yun Zhou, Navneet Panda
  • Publication number: 20140280009
    Abstract: Methods and apparatus to supplement web crawling with cached data from distributed devices are disclosed. An example method includes accessing a first set of websites cached in a panelist device; comparing the first set of websites to a second set of websites to be analyzed by a crawler; and retrieving with the crawler a first website included in the second set of websites but not included in the first set of websites from a server associated with the first website.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventor: Chad Hage
  • Publication number: 20140280012
    Abstract: Methods and system allow for creating rules for a tag management system. One or more implementations create rules for a tag management system can include crawling a page of a website. Additionally, one or more implementations identify the configuration of each of the tags implemented within the page. Further, one or implementations generate one or more rules that enable a tag management system to recreate the configuration of one or more tags implemented within the page. Further still, one or more implementations export the generated one or more rules to a tag management system.
    Type: Application
    Filed: May 9, 2013
    Publication date: September 18, 2014
    Applicant: OBSERVEPOINT LLC
    Inventors: Alan Martin Feuerlein, Matthew T. Miller, Robert K. Seolas, John Pestana
  • Publication number: 20140279056
    Abstract: An intelligent platform for real-time bidding (RTB) includes a bidder that allows for the association of additional private or proprietary information with each bid it receives, and allows advertisers to filter impressions based on a rich set of attributes. The bidder can be used to bid across many ad exchanges using the same augmented bidding criteria. The system can have crawlers that include virtual web browser rendering for analysis to allow the system to determine location on a page, a size of the video, how it is played, and information about content in the video. The crawlers can include a browser-specific rendering crawler, which can determine browser-specific behavior.
    Type: Application
    Filed: March 18, 2014
    Publication date: September 18, 2014
    Applicant: TriVu Media, Inc.
    Inventors: Michael SULLIVAN, Paul CALENTO, Miles DENNISON
  • Publication number: 20140280010
    Abstract: The embodiments relate to transcoding, cataloging, and extracting metadata about files stored in a storage device. In one embodiment, a crawler runs on the storage device and maintains a database that is stored in the volume with the data that has been cataloged by the crawler. The crawler may discover files of any type and extract associated metadata about the files. The crawler can extract metadata about client interaction with various files, such as edits, play counts, etc. The crawler may discover files of any type and extract associated metadata about the files automatically during a scan or at the request of a client. In one embodiment, the crawler may be responsive to file system events that indicate changes to the file system, such as additions, deletions, or other types of changes. In addition, the crawler may synchronize the database with the file system so that they indicated the same state for a particular file.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: Western Digital Technologies, Inc.
    Inventor: Western Digital Technologies, Inc.
  • Patent number: 8838570
    Abstract: In one embodiment, a web browser running in a client computer is configured to connect to an external server computer upon invocation of a home page or other configurable uniform resource locator. The server computer may receive the IP address of the client computer and check the IP address of the client computer against a listing of IP addresses of known bot-infected computers. The web browser may pass the URL address of the home page as a URL parameter. The server computer may redirect the web browser to the home page or other location when the client computer is not infected by a bot or, when the client computer is bot-infected, to a solutions web page that provides access to a malicious code scanner that may be utilized to remove the bot.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: September 16, 2014
    Assignee: Trend Micro Incorporated
    Inventor: Edward D. English
  • Patent number: 8838572
    Abstract: Method and system for organizing and sharing content through experience are described. In one embodiment, content may be organized and shared among users through a specific experience. A method for sharing content in a network may include: collecting contents related to a specific experience from a specific user; generating an experience graph of the specific experience; enabling the specific user to invite other users to join the experience graph; and enabling each user inside the experience graph to share new content into the experience graph.
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: September 16, 2014
    Assignee: Airtime Media, Inc.
    Inventors: Andrew C. Lin, Eric I. Feng, Eugene C. Wei
  • Patent number: 8838571
    Abstract: Techniques are provided for data-discriminate search engine updates, where, in accordance with a first crawling session frequency associated with a first update type, a search engine index is updated by recording an update to a first set of data, where the update to the first set of data is of the first update type, and, in accordance with a second crawling session frequency associated with a second update type, the search engine index is updated by recording an update to a second set of data, where the update to the second set of data is of the second update type, where the first crawling session frequency is of a different frequency than the second crawling session frequency.
    Type: Grant
    Filed: June 28, 2010
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Shai Erera, Laurent Hasson, Eitan Shapiro
  • Patent number: 8838584
    Abstract: A method for selecting a subset of content sources from a collection of content sources is disclosed. A server retrieves, in response to a plurality of queries on a topic from a client, using a programmed computer, a plurality of sets of documents from the collection of content sources. The server enumerates all subsets of the plurality of sets of documents. The server calculates, for each subset, a count of effectiveness of a subset and a price of the subset. The server selects a subset having the highest calculated ratio of count of effectiveness of the subset to price of the subset. The server delivers the selected subset of the plurality of sets of documents to the client.
    Type: Grant
    Filed: March 29, 2012
    Date of Patent: September 16, 2014
    Assignee: Acquire Media Ventures, Inc.
    Inventors: Lawrence C. Rafsky, Thomas B. Donchez
  • Publication number: 20140258262
    Abstract: A method and computer readable medium is described for directing a search engine web crawler's local web browser to refresh the top-level container that is currently displaying the content presented by a remote computer with the new content that a navigational link, within a remote desktop, remote application window, or remote graphical windowing user session, points to. Links can be modified so as to be recognizable by the remote machine as unique from traditional hyperlinks. Upon navigation action on such a link, the client of a remote desktop, remote graphical application window, or remote graphical windowing user session is redirected so that it wholly reloads its computing context with that provided by a destination URL or URI. Such a URL or URI may point to another remote desktop, remote application window, or remote graphical windowing user session.
    Type: Application
    Filed: March 7, 2014
    Publication date: September 11, 2014
    Inventor: Christopher Balz
  • Publication number: 20140258261
    Abstract: A web page identified by a URL stored in a downloads queue is downloaded, and hyperlinks in the downloaded web page are identified. Each hyperlink is screened by parsing the hyperlink (optionally only the URL of the hyperlink) to identify features comprising character strings, computing for each feature values for one or more meta-features indicative of the hyperlinked web page being in a target language, aggregating the meta-feature values to generate a score for the hyperlink, and adding the URL of the hyperlink to the downloads queue conditional upon the score satisfying a screening criterion. The downloading, identifying, and screening are iteratively repeated to perform web crawling, and an index of web pages in the target language is constructed based on analysis of content of the downloaded web pages. The meta-features may include a transliterated target word meta-feature, a language code meta-feature, a country code meta-feature, or so forth.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 11, 2014
    Applicant: Xerox Corporation
    Inventors: Nidhi Singh, Jean-Marc Coursimault, Nicolas Monet, Herve Poirer