Category Specific Web Crawling Patents (Class 707/710)
  • Publication number: 20130339337
    Abstract: A method for categorizing content from a website associated with an enterprise company for ranking of said company, said method performed by a computing device having a processing structure; and a memory including instructions executable by said processing structure to cause said processing structure to at least: request a uniform resource locator (URL) associated with the website; validate the URL; create a profile associated with the enterprise company and storing the URL in the memory; automatically crawl the website for content and to create a site index; parse the content to determine the occurrence of a predefined set of keywords pertaining to products and services and business activities of the company, and rank the keywords according to relevance pertaining to at least one category; categorize the website into at least one industry category; and determine whether the website is properly categorized.
    Type: Application
    Filed: May 29, 2013
    Publication date: December 19, 2013
    Inventors: Raad ALKHATEEB, Kumar ERRAMILLI
  • Patent number: 8612420
    Abstract: Web crawling configuration includes: obtaining, using one or more computer processors, a webpage comprising a plurality of nodes; presenting the webpage to a user; receiving a user selection of a node in the webpage, the node comprising at least one element; in response to the user selection of the node, presenting a web crawling configuration option pertaining to a web crawling action to be performed with respect to the node, the web crawling configuration option depending at least in part on a type of an element included in the node; receiving a user input specifying the web crawling configuration options pertaining to the web crawling action to be performed with respect to the node; and storing user specified web crawling configuration options, performing the web crawling action on the node according to the user input, or both.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: December 17, 2013
    Assignee: Alibaba Group Holding Limited
    Inventors: Yiming Sun, Qi Qiang, Boyang Cai, Xiaojun Jin, Zongyuan Wu
  • Patent number: 8600984
    Abstract: An affinity server estimates an affinity between two different time based media events (e.g., TV, radio, social media content stream), between a time based media event and a specific topic, or between two different topics, where the affinity score represents an intersection between the populations of social media users who have authored social media content items regarding the two different events and/or topics. The affinity score represents an estimation of the real world affinity between the real world population of people who have an interest in both time based media events, both topics, or in a time based media event and a topic. One possible threshold for including a social media user in a population may be based on a confidence score that indicates the confidence that one or more social media content items authored by the social media user are relevant to the topic or event in question.
    Type: Grant
    Filed: July 13, 2012
    Date of Patent: December 3, 2013
    Assignee: Bluefin Labs, Inc.
    Inventors: Michael Ben Fleischman, Deb Kumar Roy, Jeremy Rishel, Anjali Midha, Matthew Miller
  • Patent number: 8601594
    Abstract: A method and system for automatically classifying an input form field as designed to hold sensitive information. The method may include selecting an input characteristic associated with the input form field. The method may also include classifying the input form field as designed to hold sensitive information by considering classifying information of other input form fields having the same input characteristic. The method may further include statistically determining whether a similar input form field is indicated as designed to hold sensitive information by at least a predetermined threshold value of the other input fields. A computer program product is also disclosed.
    Type: Grant
    Filed: November 30, 2010
    Date of Patent: December 3, 2013
    Assignee: International Business Machines Corporation
    Inventor: Amir Geva
  • Patent number: 8595718
    Abstract: A computer system in accordance with one or more embodiments of the invention includes one or more data miners configured to mine software deliverables for metadata, a metadata filter configured to generate a filtered view of metadata associated with a subset of the software deliverables, an inventory generator configured to generate an inventory of the subset, a rules manager configured to generate rules using the filtered view and the inventory, where the rules are based on software relationships within the subset, and a package generator configured to generate a knowledge package based on the rules, where the knowledge package includes guidelines for obtaining the subset and installing the subset.
    Type: Grant
    Filed: August 17, 2007
    Date of Patent: November 26, 2013
    Assignee: Oracle America, Inc.
    Inventors: Ilan Naslavsky, Yuval Turgeman
  • Patent number: 8589380
    Abstract: A method and system for discovering a control event from electronically published documents is provided, in which a control program on a computer identifies electronically published documents stored in a plurality of network servers which potentially contain control events relevant to the control of goods and/or services, the control events being identified by reference to a user interest database containing user interest identifiers. Identified documents are analyzed by a classification program to determine whether control events are present, referring to a control event database. A control event classification is assigned to documents determined to contain at least one discovered control event, the assigned control event classification and information identifying the associated document is stored in a classification database, and a report of discovery of documents containing control events is be provided to a user.
    Type: Grant
    Filed: December 8, 2009
    Date of Patent: November 19, 2013
    Assignee: Decernis, LLC
    Inventors: Patrick Blackmon Waldo, Andrew B. Waldo
  • Publication number: 20130304721
    Abstract: A computer implemented method for a user of a network to locate one or more human resources, the method comprising the steps of: providing a record in a database for each of a plurality of human resources, the record including one or more keywords associated with the human resource; receiving from a first user a search request including one or more keywords; searching the records in the database to find matching records associated with one or more human resources with a keyword that matches a keyword in the received search request; and returning search results to the first user, the search results identifying the matching records.
    Type: Application
    Filed: April 29, 2013
    Publication date: November 14, 2013
    Inventor: Adnan Fakeih
  • Patent number: 8583685
    Abstract: Providing category information includes: receiving a plurality of search key word sets that were previously input by a plurality of users; obtaining category information corresponding to the plurality of search key word sets; segmenting each of the plurality of search key word sets into search key word units; combining the search key word units into a plurality of search key word unit groups that correspond to a plurality of stages; based at least in part on the category information, determining category information that specifically corresponds to the plurality of search key word unit groups; and based at least in part on category information, establishing a plurality of search key word tables corresponding to the plurality of stages.
    Type: Grant
    Filed: October 27, 2011
    Date of Patent: November 12, 2013
    Assignee: Alibaba Group Holding Limited
    Inventor: Jianping Qian
  • Patent number: 8577866
    Abstract: Methods, systems, and apparatus, including computer program products for identifying original content. In one aspect a method is described that includes deriving a plurality of content pieces from a collection of documents, each content piece occurring in one or more documents in the collection of documents. Each document in the collection of documents is associated with a time and an author. A first document in the collection of documents is identified, the identified first document being the earliest document containing an occurrence of a first piece of content. A first author associated with the first document is ranked based on a number of documents that contain at least one occurrence of the content piece and that are associated with an author other than the first author.
    Type: Grant
    Filed: December 7, 2006
    Date of Patent: November 5, 2013
    Assignee: Googe Inc.
    Inventors: Douwe Osinga, Stefan Christoph
  • Patent number: 8577868
    Abstract: A system receives a search query from a user and searches a repository of documents based on the search query to obtain search results. The system provides the search results to the user and automatically bookmarks one or more of the search results without the user explicitly requesting that the one or more search results be bookmarked.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: November 5, 2013
    Assignee: Google Inc.
    Inventors: Oren Zamir, Jeffrey Korn
  • Patent number: 8577867
    Abstract: Information regarding the structure of information in a content database is maintained in a structure database. The structure database is used to correlate the data structure of a query to the structure of the content database, in order to determine that information in the content database which needs to be provided to a searcher in response to the query. In one embodiment, this search method is used in an online forum, and the forum maintains a reputation score for users with respect to given subject matter. The reputation score is dependent upon the quality of a user's participation in the forum. A user's reputation score depends upon the evaluation by others of information he posts and upon the user evaluating information posted by others.
    Type: Grant
    Filed: April 18, 2012
    Date of Patent: November 5, 2013
    Assignee: Transparensee Systems, Inc.
    Inventor: Steven David Lavine
  • Publication number: 20130282691
    Abstract: An optimization engine allows website publishers and other network document publishers to view and navigate statistics and scoring methodologies of a search engine. Publishers may thus gain a better understanding of how their website or network document is scored and how to optimize those documents to increase a search engine score. The user is thus able to navigate the network from the perspective of a search engine, viewing webpages, websites, and links in the same way a search engine would analyze them. Upon making changes to a website or network document, publishers may further request on-demand re-crawling of their website or network document to view changes in the score. Alerts may also be activated by a user to notify the user when certain conditions are met.
    Type: Application
    Filed: March 18, 2013
    Publication date: October 24, 2013
    Applicant: Efficient Systems, LLC.
    Inventors: Scott A. Stouffer, Maura D. Stouffer
  • Publication number: 20130282693
    Abstract: An object oriented search mechanism extracts structural metadata and data based on type of document contents and data sources connected to the documents. Relationships between textual and non-textual elements within documents as well as metadata associated with the elements and data sources are utilized to generate a unified object model with the addition of semantic information derived from metadata and taxonomy, which are used to enhance search indexing, ranking of search results, and dynamic adjustment of result rendering user interface with fine tuned relevancy. Additional data from data sources connected to the documents may also be used to unlock hidden data such as data that has been filtered out in an original document.
    Type: Application
    Filed: June 19, 2013
    Publication date: October 24, 2013
    Inventors: Luming Wang, Xiaohong Yang, Hailei Zhang, Sonal Jain
  • Publication number: 20130282692
    Abstract: Techniques described herein generally relate to real time inference based systems. Example embodiments may set forth devices, methods, and computer programs related to search engine inference based virtual assistance. One example method may include a computing device adapted to receive text as input and a computer processor arranged to determine at least one inference regarding subject matter of the text based on one or more web searches of one or more terms within the text. The inference(s) may then be automatically displayed upon the inference(s) being determined. The text may be automatically received as input from a voice-to-text converter as voice-to-text conversion producing the text is occurring.
    Type: Application
    Filed: June 17, 2013
    Publication date: October 24, 2013
    Inventor: EZEKIEL KRUGLICK
  • Patent number: 8560637
    Abstract: A web server is connected to a terminal computer capable of performing hypertext transfer protocol communications with the web server. The terminal computer includes a browser for displaying information. The web server executes a plurality of web applications upon receiving a request from the terminal computer. The web server transmits messages output by the applications being executed to the terminal computer. The terminal computer displays messages received from the web server collectively in one window of the browser.
    Type: Grant
    Filed: May 31, 2006
    Date of Patent: October 15, 2013
    Assignee: Fujitsu Limited
    Inventors: Naoki Tsukada, Haruo Higashiwaki, Kyoko Sawada
  • Publication number: 20130262429
    Abstract: A method, computer readable medium and system for automatically tracking content in a peer-to-peer environment are disclosed. For example, the method monitors a number of times each content title of a plurality of content titles are downloaded in the peer-to-peer environment, adds one or more content titles of the plurality of content titles that are downloaded above a predetermined threshold to a list, downloads each one of the one or more content titles in the list via the peer-to-peer environment and verifies that each one of the one or more content titles that are downloaded matches at least one content title in the list.
    Type: Application
    Filed: June 3, 2013
    Publication date: October 3, 2013
    Inventors: Alexandre Gerber, Subhabrata Sen, Oliver Spatscheck, Ajay Todimala
  • Patent number: 8548978
    Abstract: A system and method that provides a hosted network video guide application. The guide application is provided as a service to web portals and other websites that wish to expose access to the video content available on a public network such as the Internet. The operation of the guide includes mechanisms for search application hosting and processes for content gathering. Video index information can be derived from random content owners, guide affiliates, proactively gathered public domain content, and proactively harvested video content from the network via a video spidering mechanism. The video index information can be collected and maintained in a hosted, centralized repository and made available via an application interface, which can be customized, to users of the network. The video spidering mechanism generates an index of each accessed video, and the index is committed to the guide repository along with the URL information of the video being indexed.
    Type: Grant
    Filed: August 21, 2007
    Date of Patent: October 1, 2013
    Assignee: Virage, Inc.
    Inventors: Owen Lynn, Richard Humphrey, Dale Thoms
  • Publication number: 20130246389
    Abstract: A database of user preference information is extracted and compiled from multiple websites by web-crawling robots without cooperation or specific participation by users. Users who interact with a website are frequently required to register and create a login or userID name that uniquely identifies them. Thereafter, when an individual rates an item, it is often recorded and published under their userID name such that other users can see how a specific individual rated the item. Although there is no requirement that a specific user register on different websites utilizing the identical userID, it is extremely common that this practice occurs and the use of identical userIDs on multiple sites is used herein to expand preference analysis beyond a single site. Once the database exists, users can request or be passively offered suggestions that result from preference associations across multiple websites as performed by a preference analysis and suggestion function.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 19, 2013
    Inventor: Robert Osann, JR.
  • Publication number: 20130246336
    Abstract: A method is provided in one example and includes crawling a storage location of a network environment to identify objects, fetching the identified objects, creating indexes corresponding to the identified objects, and classifying one or more objects of the identified objects based on a first category. The method further includes providing first sets of metadata elements and corresponding first category information representing the classified one or more objects of the identified objects, searching the indexes for a selected group of the classified one or more objects of the identified objects, and classifying one or more objects of the selected group based on a second category. In more specific embodiments, the method includes applying a remediation policy to the classified one or more objects of the selected group. In other more specific embodiments, the method includes registering the classified one or more objects of the selected group.
    Type: Application
    Filed: December 27, 2011
    Publication date: September 19, 2013
    Inventors: Ratinder Paul Singh Ahuja, Bimalesh Jha, Nitin Maini, Sujata Patel, Ankit R. Jain, Damodar K. Hegde, Rajaram V. Nanganure, Avinash Vishnu Pawar
  • Patent number: 8539329
    Abstract: Methods for configuring website categorization software, categorizing websites and a method and system for controlling access to websites. A number of websites are selected, all of which relate to a single predetermined category of subject matter. In order to create a category profile, a website is selected from the set (3), the website markup language is read (5), page content information extracted (7) and then analyzed (9). The system may then check whether it has analyzed a sufficient number of websites to allow for a reliable categorization of subsequent websites (13). Individual websites are categorized by extracting their page content information (45) and categorizing (51) on the basis of the degree of similarity between the information and the category profile (55). To control access, the system compares a website identifier with the database of categorized identifiers.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: September 17, 2013
    Assignee: Bloxx Limited
    Inventor: James Wilson
  • Publication number: 20130238593
    Abstract: Systems and methods for providing an enterprise crawl and search framework, including features such as use with middleware and enterprise application environments, pluggable security, search development tools, user interfaces, and governance. In accordance with an embodiment, the system includes an enterprise crawl and search framework which abstracts an underlying search engine, provides a common set of application programming interfaces for developing search functionalities, and allows the framework to serve as an integration layer between one or more enterprise search engine and one or more enterprise application. A plurality of searchable objects which are sets of data derived from enterprise applications are used to make view objects available for full text search.
    Type: Application
    Filed: January 2, 2013
    Publication date: September 12, 2013
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: DJ Vasant Ursal, Tulasi Kodali
  • Publication number: 20130238594
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying entities that are related to an entity to which a search query is directed. One of the methods includes receiving a search query, wherein the search query has been determined to relate to a first entity of a first entity type, and wherein one or more entities of a second entity type have a relationship with the first entity; receiving search results for the search query; determining that a count of search results identifying a resource containing a reference to the first entity satisfies a first threshold value; determining that a count of search results identifying a resource having the second entity type as a relevant entity type satisfies a second threshold value; and transmitting information identifying the one or more entities of the second entity type as part of the response to the search query.
    Type: Application
    Filed: February 22, 2013
    Publication date: September 12, 2013
    Inventors: Peter Jin Hong, Pravir K. Gupta, Nathaniel J. Gaylinn, Ramakrishnan Kazhiyur-Mannar, Kavi J. Goel, Omer Bar-or, Jack W. Menzel, Christina R. Dhanaraj, Jared L. Levy, Shashidhar A. Thakur, Grace Chung, Benson Tsai
  • Patent number: 8533175
    Abstract: Collections of music and other items, related by time, location, genre, and artist, are registered in a data model to provide a foundation for their curatorship, discovery, and procurement. A series of choices, where a choice is a combination of time, place, genre, and artist, represents a map through the history and culture of music. Both expert and regular individual curators define the maps. Animated murals depicting a fundamental combination of time, place, genre, and artist provide a user interface for the navigation of music, its history, and culture. Integration with hand held GPS enabled devices provides users with knowledge of music events and history relative to their present location. A network view presents the user with an interactive diagram of connections between elements in the tunesmap database used as a dynamic filter construction device.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: September 10, 2013
    Inventor: Gilbert Marquard Roswell
  • Patent number: 8527495
    Abstract: A plug-in interface is provided in a crawling search engine. Plug-in parsers are also provided for use with the search engine. The plug-in interface allows the search engine to be configured with different plug-in parsers. Thus, a customer may configure a search engine with a parser that best suits the needs of the customer and to try new parsing algorithms to find the best results.
    Type: Grant
    Filed: February 19, 2002
    Date of Patent: September 3, 2013
    Assignee: International Business Machines Corporation
    Inventor: Richard J. Redpath
  • Patent number: 8515932
    Abstract: The invention comprises systems, methods and a computerized data management device for creating and using data relating to a medical or non-medical product or device to enhance the safety of the product or device. A vast amount of data regarding adverse events associated with a particular product or device is analyzed to identify new essential adverse events associated with the product or device. At least one database of new essential adverse event information is created and utilized, and new characteristics of or uses for the product or device related to the new essential adverse event information are determined. Adverse event information is gathered for a large number of population sub-groups. The system may also be programmed to incorporate the information into intellectual property and contract documents.
    Type: Grant
    Filed: June 9, 2011
    Date of Patent: August 20, 2013
    Assignee: Classen Immunotherapies, Inc.
    Inventor: John Barthelow Classen
  • Patent number: 8515938
    Abstract: An information processing system including, a client capable of receiving and reproducing content from a media server, and a collecting server for receiving content management information on the content from the media server and managing the content management information.
    Type: Grant
    Filed: May 6, 2008
    Date of Patent: August 20, 2013
    Assignee: Sony Corporation
    Inventors: Toshiaki Kusakabe, Satoshi Hiroi, Masahiro Hara
  • Patent number: 8516554
    Abstract: A Dynamic Web Service server may facilitate custom Enterprise Application interface development with little or no developer input by dynamically creating a web service for performing a particular transaction according to a transaction map. An Enterprise Application client device may create a transaction map by “recording” a transaction between an Enterprise Application client and an Enterprise Application server and mapping transaction fields to a custom interface generated to collect data for re-performing the recorded transaction. The Enterprise Application client device may call the dynamic web service, and the Dynamic Web Service server may then perform the recorded transaction using input data collected in the custom interface.
    Type: Grant
    Filed: November 1, 2012
    Date of Patent: August 20, 2013
    Assignee: Winshuttle, LLC
    Inventors: Vishal Chalana, Amit Sharma, Piyush Nagar, Vishal Sharma, Vikram Chalana
  • Patent number: 8510289
    Abstract: A system processes user queries. The system may generate a list of query patterns of a first type. The system may also receive a user query and determine whether the received query is a query of the first type based at least in part on the list of query patterns.
    Type: Grant
    Filed: October 20, 2011
    Date of Patent: August 13, 2013
    Assignee: Google Inc.
    Inventors: Amit Singhal, Matt Cutts, Jun Wu
  • Publication number: 20130204860
    Abstract: The statistics from a reference page serves as a seed to compare the selected page statistics between other webpages. The statistics of all results can be graphically displayed, if desired, in a display or popup window. These results can be analyzed for the determination of a category so an appropriate search expression or a statistical mask can be developed. In addition, statistics of several pages and compare and analyze the results for search term commonality. This step determine how strongly tied the scanned data content of two different webpages are to each other. These results can be analyzed against each other to generate common search terms, a final histogram, and how this histogram compares to the reference histogram. The search expression term can be a Boolean expression or a statistical mask. The statistical mask is used as a seed to start another search moving closer to the final target or desire goal.
    Type: Application
    Filed: February 3, 2012
    Publication date: August 8, 2013
    Applicant: TrueMaps LLC
    Inventor: Thaddeus John Gabara
  • Patent number: 8504550
    Abstract: Systems and methods of identifying and categorizing social network messages that are relevant to selected categories and text terms are provided. The frequency of text terms appearing in social network messages are calculated for multiple categories. Based on the calculated text term frequency, social network messages can be identified and/or categorized that match a provided set of text terms. Selecting and/or associating text terms and categories are determined by repeatedly analyzing social network messages.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: August 6, 2013
    Assignee: CitizenNet Inc.
    Inventors: Michael Aaron Hall, Daniel Benyamin, Aaron Chu
  • Patent number: 8504551
    Abstract: Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content.
    Type: Grant
    Filed: April 11, 2011
    Date of Patent: August 6, 2013
    Assignee: Google Inc.
    Inventors: Darrell Anderson, Paul Buchheit, Alexander Paul Carobus, Yingwei Cui, Jeffrey A. Dean, Georges R. Harik, Deepak Jindal, Narayanan Shivakumar
  • Publication number: 20130198162
    Abstract: A plurality of methods for searching one or more business entities utilizing a web service and a browser plug-in application based on one or more keywords, a domain name or a user's geographic location and displaying the relevant business entities. The one or more keywords will be based on selected text entered into a web service and a browser plug-in application. The search process is initiated based on selecting one or more keywords from a text, double-clicking on the one or more keywords, selecting a right-click context menu option and clicking on a browser extension button. The server receives the request and carries out a search in a database in communication with the web service. The search is based on which relevant business entities are found and a result is created, which is sent back to the browser, where the received search results are displayed in a browser pop-up window.
    Type: Application
    Filed: January 30, 2013
    Publication date: August 1, 2013
    Inventor: Rasmus Refer
  • Patent number: 8498978
    Abstract: Slideshow video file detection. A method includes receiving a search query for video files of a desired type. A portion of a video file is extracted. A frame difference based histogram and an active pixel based histogram are generated for the portion. Further, the frame difference based histogram and an active pixel based histogram are provided to a machine learning tool. An indicator is determined for the portion based on a plurality of parameters. The video file is classified as the desired type based on the indicator. The video file is provided to the user.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: July 30, 2013
    Assignee: Yahoo! Inc.
    Inventors: Venkatesh Babu Radhakrishnan, Srinivasan H. Sengamedu
  • Publication number: 20130191366
    Abstract: A pattern matching engine and associated method for detecting one or more of headers, footers, watermarks, page numbering, page colors, and page borders appearing in a fixed format document. The pattern matching engine performs pattern matching across pages of the fixed format document to identify repeating patterns. Using heuristic analysis, repeating patterns meeting selected criteria are classified as headers, footers, or watermarks. Filtering removes repeating patterns unlikely to represent headers, footers, or watermarks. The information produced by the pattern matching engine allows the repeating elements to be properly reconstructed as flowable elements when converting a fixed format document into a flow format document.
    Type: Application
    Filed: January 23, 2012
    Publication date: July 25, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Vuk Jovanovic, Milos Lazarevic, Milos Raskovic, Nenad Bozidarevic, Milan Sesum
  • Patent number: 8495049
    Abstract: A system and a method for automatically submitting Web pages to a search engine, which is preferably used for submitting dynamic Web pages, but may optionally be used for any type of Web page. The present invention features a gateway server for providing these Web pages to the search engine, either directly or optionally through an autonomous software search program. Optionally and more preferably, the gateway server modifies the Web page before serving it to the autonomous software search program and/or search engine.
    Type: Grant
    Filed: October 4, 2010
    Date of Patent: July 23, 2013
    Assignee: Microsoft Corporation
    Inventors: Yaron Galai, Oded Itzhak
  • Patent number: 8489578
    Abstract: A method, system, and article are provided for management of a data ingester and associated content collected by the data ingester. The computer system is configured with a taxonomy together with rules and policies for ingesting and classifying the collected data. Based upon the classification of the collected data with respect to the taxonomy, the data is assigned to a location in the taxonomy.
    Type: Grant
    Filed: October 20, 2008
    Date of Patent: July 16, 2013
    Assignee: International Business Machines Corporation
    Inventors: Varun Bhagwan, Rajesh M. Desai, Piyoosh Jalan
  • Publication number: 20130179423
    Abstract: A computer-automated method and system of providing a searchable knowledge base with decision-relevant attributes (including some subjective or sentiment-based attributes) for a plurality of individual items within a choice set are described. First, information (including texts) relevant to the plurality of individual items in the choice set is harvested from Internet sources. Next, normalized representations of statements are extracted from excerpts of the harvested texts that pertain to attributes of interest for the choice set, and corresponding scores for the attributes are derived from each of the normalized representations. The scores derived from the various harvested sources are aggregated for each attribute of each item. Finally, the knowledge base of the plurality of individual topics is generated.
    Type: Application
    Filed: January 5, 2012
    Publication date: July 11, 2013
    Applicant: SRI International
    Inventors: Nadav Gur, David Israel, Imri Goldberg
  • Publication number: 20130179425
    Abstract: A program search apparatus and method using a related keyword is provided. The program search apparatus may include an interface to extract a search keyword from a program search request, in response to the program search request being received, and a processor to obtain a related keyword with respect to the search keyword, using the extracted search keyword, to search a database for first program information using the obtained related keyword, and to provide found first program information.
    Type: Application
    Filed: January 4, 2013
    Publication date: July 11, 2013
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Electronics and Telecommunications Research Institute
  • Publication number: 20130179426
    Abstract: Systems and methods of identifying and retrieving messages that satisfy a search query using the context of the message and term frequencies are provided. One embodiment includes identifying at least one category relevant to the search query, wherein a plurality of scored keywords are associated with each category, selecting at least one of the scored keywords that is relevant to an identified category, performing a plurality of searches of messages from a social networking messaging service to retrieve messages, where at least one search includes retrieving messages based on the original search query and one of the selected scored keywords, scoring the retrieved messages with respect to each of the at least one identified categories using at least the scored keywords relevant to each category, and returning at least the message with the highest score as the search result.
    Type: Application
    Filed: January 14, 2013
    Publication date: July 11, 2013
    Applicant: CitizenNet Inc.
    Inventor: CitizenNet Inc.
  • Publication number: 20130179424
    Abstract: Methods, systems and computer-readable storage medium for determining a crawling schedule. In an aspect, a method includes obtaining crawl history data for a Web site having Web pages, determining a status of the Web pages, determining a total quantity of Web pages that have a status of deleted, calculating a probability that another Web page of the Web site will be removed based on the total quantity, and storing data associating the calculated probability with the Web site. The method can further include determining, for a plurality of sets of the previous time periods, a respective crawl penalty as a combination of a penalty for crawling the Web site and a penalty for showing a deleted Web page based on the calculated probability, and determining a re-crawl schedule based on the crawl penalties.
    Type: Application
    Filed: January 11, 2012
    Publication date: July 11, 2013
    Inventors: Cheng Xu, Qiying Lin, Xin Li
  • Patent number: 8484194
    Abstract: A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data. The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.
    Type: Grant
    Filed: January 13, 2012
    Date of Patent: July 9, 2013
    Assignee: Google Inc.
    Inventors: Philo Juang, Christopher Testa, Nicolaus Mote
  • Patent number: 8473470
    Abstract: A software program and associated web-based portal is provided for industry-specific product comparison. The program and an associated web portal allows the user the ability to search multiple manufacturers' catalogs and to enter a query based upon customized search criteria. Query results are returned of products that satisfy the user's search criteria. The query is made available to manufacturers whose products are identified in the query results and a communication link is provided whereby such manufacturers can contact the user to discuss the product identified in the search. The user can respond using the message board associated with the web portal. The program and portal can also integrate updates to pump manufacturers' catalogs and can also produce best-fit solutions for users' design criteria.
    Type: Grant
    Filed: May 23, 2005
    Date of Patent: June 25, 2013
    Assignee: Bentley Systems, Incorporated
    Inventors: Jack S. Cook, Jr., Diego Alexander Diaz Pabon, Benjamin John Ewing
  • Patent number: 8473473
    Abstract: An object oriented search mechanism extracts structural metadata and data based on type of document contents and data sources connected to the documents. Relationships between textual and non-textual elements within documents as well as metadata associated with the elements and data sources are utilized to generate a unified object model with the addition of semantic information derived from metadata and taxonomy, which are used to enhance search indexing, ranking of search results, and dynamic adjustment of result rendering user interface with fine tuned relevancy. Additional data from data sources connected to the documents may also be used to unlock hidden data such as data that has been filtered out in an original document.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: June 25, 2013
    Assignee: Microsoft Corporation
    Inventors: Luming Wang, Xiaohong Yang, Hailei Zhang, Sonal Jain
  • Publication number: 20130151500
    Abstract: A search query is received. Personal information for a user is then determined. A search is performed in a general subdomain of general content using the search query. For example, the general subdomain of general content may be a WWW search. Then, a vertical subdomain is determined based on the personal information. A search is then performed in the vertical subdomain of specialized content using the search query. The search performed in the general subdomain and the search performed in the vertical subdomain generate general search results and vertical search results. The results may be combined and outputted to a client.
    Type: Application
    Filed: November 11, 2012
    Publication date: June 13, 2013
    Applicant: YAHOO! INC.
    Inventor: YAHOO! INC.
  • Publication number: 20130144861
    Abstract: An Internet infrastructure contains a search server that delivers search result pages of web sites to client devices based upon a search string. Maxima categories are provided that sort search results or web pages based upon popularity and/or context similarity. A web browser contained within a client device is coupled to display various search result pages of web sites delivered by the search server. A maxima determination module within the search server responds to the delivery of the initial search string by first categorizing search results applicability to the search string on the basis of maxima or by generating maxima categories with search results contained therein that correlated to the search string. These search results within each applicable maximum are then sorting on the basis of popularity within each of the maxima categories to effectuate popularity ranks for each search result or web page.
    Type: Application
    Filed: January 29, 2013
    Publication date: June 6, 2013
    Applicant: ENPULZ, L.L.C.
    Inventor: James D. Bennett
  • Publication number: 20130144862
    Abstract: Systems and methods for clustering user reviews are disclosed in which a plurality of user reviews are extracted from electronic documents. The electronic documents contain user reviews of a plurality of items of interest. A set of user reviews is identified in the plurality of user reviews as being associated with the same item of interest in the plurality of items of interest. Item identifying information included in the electronic documents is used for this identification. The set of user reviews is then associated with the same item of interest. Examples of item identifying information include unique product identifiers, brand names, model numbers, and category information. In some instances, the item identifying information is extracted from metadata included in the electronic document. In some instances, the electronic documents are obtained from e-commerce websites or product-review websites.
    Type: Application
    Filed: January 30, 2013
    Publication date: June 6, 2013
    Inventors: Jan Matthias Ruhl, Mayur D. Datar
  • Publication number: 20130132366
    Abstract: A domain classifier develops and maintains relevance data about specific domains based on historical relevance data and source context data. Such data may be used to classify the user's interest in attempting to visit a specific domain and thereby redirect the user to a website expected to be aligned with the user's interest. In various implementations, the historical relevance data is derived from source context data and/or post-visit user behavior collected from previous attempts to visit a specific domain. The source context data collected from the current visit may also be used as source context-factors to influence domain classification. Based on such historical and current source context factors, as well as the domain address provided in the user's navigation request, a domain classifier consults the historical relevance data and provides the user with Web content that is identified as likely to be relevant to the user's interests.
    Type: Application
    Filed: January 22, 2013
    Publication date: May 23, 2013
    Applicant: Working Research Inc.
    Inventor: Keith Merle Pieper
  • Publication number: 20130132365
    Abstract: An extensible offer inventory database of offers in a domain is established. Further, an offer ontology is generated based on the extensible offer inventory database. The offer ontology provides an extensible vocabulary that correlates to categories in the offer inventory database. In addition, offers are automatically located. The offers are also semantically analyzed to generate semantic analysis data. Further, user data is obtained. In addition, an optimal offer match is automatically determined based upon the semantic analysis data and the user data.
    Type: Application
    Filed: June 3, 2011
    Publication date: May 23, 2013
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: WALTER CHANG, Geoff Baum
  • Publication number: 20130117190
    Abstract: A system, apparatus, and method is provided for the copyright infringement lifecycle. Content to be added to a catalogue is received and stored in a database. Material that is same or similar to the content is automatically searched for by deriving and using keywords indicative of the material, and a user is notified of the material that is same or similar to the content. Infringing activity is monitored and content owners' rights are enforced via automatic dispatch of electronic notifications. Defendant pools are built by cross-referencing the physical location of infringing activity and jurisdictions where attorneys are admitted to practice.
    Type: Application
    Filed: November 7, 2011
    Publication date: May 9, 2013
    Applicant: Singularis, Inc.
    Inventor: Emanuel I. Wald
  • Patent number: 8438466
    Abstract: In one embodiment, on a first date, a computer system receives a watch list term that is specified by a user. The computer system receives an electronic version of a paper on a second date after the first date, and the computer system searches the electronic version for the watch list term, without the user specifying the watch list term after the first date, and without the user initiating the searching after the first date. The computer system outputs a result of the searching for display on a display device.
    Type: Grant
    Filed: March 22, 2006
    Date of Patent: May 7, 2013
    Assignee: Libredigital, Inc.
    Inventors: Tracey L. Jones, Billy P. Taylor, Frank H. Moeller