Category Specific Web Crawling Patents (Class 707/710)
-
Publication number: 20130339337Abstract: A method for categorizing content from a website associated with an enterprise company for ranking of said company, said method performed by a computing device having a processing structure; and a memory including instructions executable by said processing structure to cause said processing structure to at least: request a uniform resource locator (URL) associated with the website; validate the URL; create a profile associated with the enterprise company and storing the URL in the memory; automatically crawl the website for content and to create a site index; parse the content to determine the occurrence of a predefined set of keywords pertaining to products and services and business activities of the company, and rank the keywords according to relevance pertaining to at least one category; categorize the website into at least one industry category; and determine whether the website is properly categorized.Type: ApplicationFiled: May 29, 2013Publication date: December 19, 2013Inventors: Raad ALKHATEEB, Kumar ERRAMILLI
-
Patent number: 8612420Abstract: Web crawling configuration includes: obtaining, using one or more computer processors, a webpage comprising a plurality of nodes; presenting the webpage to a user; receiving a user selection of a node in the webpage, the node comprising at least one element; in response to the user selection of the node, presenting a web crawling configuration option pertaining to a web crawling action to be performed with respect to the node, the web crawling configuration option depending at least in part on a type of an element included in the node; receiving a user input specifying the web crawling configuration options pertaining to the web crawling action to be performed with respect to the node; and storing user specified web crawling configuration options, performing the web crawling action on the node according to the user input, or both.Type: GrantFiled: July 18, 2012Date of Patent: December 17, 2013Assignee: Alibaba Group Holding LimitedInventors: Yiming Sun, Qi Qiang, Boyang Cai, Xiaojun Jin, Zongyuan Wu
-
Patent number: 8600984Abstract: An affinity server estimates an affinity between two different time based media events (e.g., TV, radio, social media content stream), between a time based media event and a specific topic, or between two different topics, where the affinity score represents an intersection between the populations of social media users who have authored social media content items regarding the two different events and/or topics. The affinity score represents an estimation of the real world affinity between the real world population of people who have an interest in both time based media events, both topics, or in a time based media event and a topic. One possible threshold for including a social media user in a population may be based on a confidence score that indicates the confidence that one or more social media content items authored by the social media user are relevant to the topic or event in question.Type: GrantFiled: July 13, 2012Date of Patent: December 3, 2013Assignee: Bluefin Labs, Inc.Inventors: Michael Ben Fleischman, Deb Kumar Roy, Jeremy Rishel, Anjali Midha, Matthew Miller
-
Patent number: 8601594Abstract: A method and system for automatically classifying an input form field as designed to hold sensitive information. The method may include selecting an input characteristic associated with the input form field. The method may also include classifying the input form field as designed to hold sensitive information by considering classifying information of other input form fields having the same input characteristic. The method may further include statistically determining whether a similar input form field is indicated as designed to hold sensitive information by at least a predetermined threshold value of the other input fields. A computer program product is also disclosed.Type: GrantFiled: November 30, 2010Date of Patent: December 3, 2013Assignee: International Business Machines CorporationInventor: Amir Geva
-
Patent number: 8595718Abstract: A computer system in accordance with one or more embodiments of the invention includes one or more data miners configured to mine software deliverables for metadata, a metadata filter configured to generate a filtered view of metadata associated with a subset of the software deliverables, an inventory generator configured to generate an inventory of the subset, a rules manager configured to generate rules using the filtered view and the inventory, where the rules are based on software relationships within the subset, and a package generator configured to generate a knowledge package based on the rules, where the knowledge package includes guidelines for obtaining the subset and installing the subset.Type: GrantFiled: August 17, 2007Date of Patent: November 26, 2013Assignee: Oracle America, Inc.Inventors: Ilan Naslavsky, Yuval Turgeman
-
Apparatus and method for the automatic discovery of control events from the publication of documents
Patent number: 8589380Abstract: A method and system for discovering a control event from electronically published documents is provided, in which a control program on a computer identifies electronically published documents stored in a plurality of network servers which potentially contain control events relevant to the control of goods and/or services, the control events being identified by reference to a user interest database containing user interest identifiers. Identified documents are analyzed by a classification program to determine whether control events are present, referring to a control event database. A control event classification is assigned to documents determined to contain at least one discovered control event, the assigned control event classification and information identifying the associated document is stored in a classification database, and a report of discovery of documents containing control events is be provided to a user.Type: GrantFiled: December 8, 2009Date of Patent: November 19, 2013Assignee: Decernis, LLCInventors: Patrick Blackmon Waldo, Andrew B. Waldo -
Publication number: 20130304721Abstract: A computer implemented method for a user of a network to locate one or more human resources, the method comprising the steps of: providing a record in a database for each of a plurality of human resources, the record including one or more keywords associated with the human resource; receiving from a first user a search request including one or more keywords; searching the records in the database to find matching records associated with one or more human resources with a keyword that matches a keyword in the received search request; and returning search results to the first user, the search results identifying the matching records.Type: ApplicationFiled: April 29, 2013Publication date: November 14, 2013Inventor: Adnan Fakeih
-
Patent number: 8583685Abstract: Providing category information includes: receiving a plurality of search key word sets that were previously input by a plurality of users; obtaining category information corresponding to the plurality of search key word sets; segmenting each of the plurality of search key word sets into search key word units; combining the search key word units into a plurality of search key word unit groups that correspond to a plurality of stages; based at least in part on the category information, determining category information that specifically corresponds to the plurality of search key word unit groups; and based at least in part on category information, establishing a plurality of search key word tables corresponding to the plurality of stages.Type: GrantFiled: October 27, 2011Date of Patent: November 12, 2013Assignee: Alibaba Group Holding LimitedInventor: Jianping Qian
-
Patent number: 8577866Abstract: Methods, systems, and apparatus, including computer program products for identifying original content. In one aspect a method is described that includes deriving a plurality of content pieces from a collection of documents, each content piece occurring in one or more documents in the collection of documents. Each document in the collection of documents is associated with a time and an author. A first document in the collection of documents is identified, the identified first document being the earliest document containing an occurrence of a first piece of content. A first author associated with the first document is ranked based on a number of documents that contain at least one occurrence of the content piece and that are associated with an author other than the first author.Type: GrantFiled: December 7, 2006Date of Patent: November 5, 2013Assignee: Googe Inc.Inventors: Douwe Osinga, Stefan Christoph
-
Patent number: 8577868Abstract: A system receives a search query from a user and searches a repository of documents based on the search query to obtain search results. The system provides the search results to the user and automatically bookmarks one or more of the search results without the user explicitly requesting that the one or more search results be bookmarked.Type: GrantFiled: June 29, 2012Date of Patent: November 5, 2013Assignee: Google Inc.Inventors: Oren Zamir, Jeffrey Korn
-
Patent number: 8577867Abstract: Information regarding the structure of information in a content database is maintained in a structure database. The structure database is used to correlate the data structure of a query to the structure of the content database, in order to determine that information in the content database which needs to be provided to a searcher in response to the query. In one embodiment, this search method is used in an online forum, and the forum maintains a reputation score for users with respect to given subject matter. The reputation score is dependent upon the quality of a user's participation in the forum. A user's reputation score depends upon the evaluation by others of information he posts and upon the user evaluating information posted by others.Type: GrantFiled: April 18, 2012Date of Patent: November 5, 2013Assignee: Transparensee Systems, Inc.Inventor: Steven David Lavine
-
Publication number: 20130282691Abstract: An optimization engine allows website publishers and other network document publishers to view and navigate statistics and scoring methodologies of a search engine. Publishers may thus gain a better understanding of how their website or network document is scored and how to optimize those documents to increase a search engine score. The user is thus able to navigate the network from the perspective of a search engine, viewing webpages, websites, and links in the same way a search engine would analyze them. Upon making changes to a website or network document, publishers may further request on-demand re-crawling of their website or network document to view changes in the score. Alerts may also be activated by a user to notify the user when certain conditions are met.Type: ApplicationFiled: March 18, 2013Publication date: October 24, 2013Applicant: Efficient Systems, LLC.Inventors: Scott A. Stouffer, Maura D. Stouffer
-
Publication number: 20130282693Abstract: An object oriented search mechanism extracts structural metadata and data based on type of document contents and data sources connected to the documents. Relationships between textual and non-textual elements within documents as well as metadata associated with the elements and data sources are utilized to generate a unified object model with the addition of semantic information derived from metadata and taxonomy, which are used to enhance search indexing, ranking of search results, and dynamic adjustment of result rendering user interface with fine tuned relevancy. Additional data from data sources connected to the documents may also be used to unlock hidden data such as data that has been filtered out in an original document.Type: ApplicationFiled: June 19, 2013Publication date: October 24, 2013Inventors: Luming Wang, Xiaohong Yang, Hailei Zhang, Sonal Jain
-
Publication number: 20130282692Abstract: Techniques described herein generally relate to real time inference based systems. Example embodiments may set forth devices, methods, and computer programs related to search engine inference based virtual assistance. One example method may include a computing device adapted to receive text as input and a computer processor arranged to determine at least one inference regarding subject matter of the text based on one or more web searches of one or more terms within the text. The inference(s) may then be automatically displayed upon the inference(s) being determined. The text may be automatically received as input from a voice-to-text converter as voice-to-text conversion producing the text is occurring.Type: ApplicationFiled: June 17, 2013Publication date: October 24, 2013Inventor: EZEKIEL KRUGLICK
-
Patent number: 8560637Abstract: A web server is connected to a terminal computer capable of performing hypertext transfer protocol communications with the web server. The terminal computer includes a browser for displaying information. The web server executes a plurality of web applications upon receiving a request from the terminal computer. The web server transmits messages output by the applications being executed to the terminal computer. The terminal computer displays messages received from the web server collectively in one window of the browser.Type: GrantFiled: May 31, 2006Date of Patent: October 15, 2013Assignee: Fujitsu LimitedInventors: Naoki Tsukada, Haruo Higashiwaki, Kyoko Sawada
-
Publication number: 20130262429Abstract: A method, computer readable medium and system for automatically tracking content in a peer-to-peer environment are disclosed. For example, the method monitors a number of times each content title of a plurality of content titles are downloaded in the peer-to-peer environment, adds one or more content titles of the plurality of content titles that are downloaded above a predetermined threshold to a list, downloads each one of the one or more content titles in the list via the peer-to-peer environment and verifies that each one of the one or more content titles that are downloaded matches at least one content title in the list.Type: ApplicationFiled: June 3, 2013Publication date: October 3, 2013Inventors: Alexandre Gerber, Subhabrata Sen, Oliver Spatscheck, Ajay Todimala
-
Patent number: 8548978Abstract: A system and method that provides a hosted network video guide application. The guide application is provided as a service to web portals and other websites that wish to expose access to the video content available on a public network such as the Internet. The operation of the guide includes mechanisms for search application hosting and processes for content gathering. Video index information can be derived from random content owners, guide affiliates, proactively gathered public domain content, and proactively harvested video content from the network via a video spidering mechanism. The video index information can be collected and maintained in a hosted, centralized repository and made available via an application interface, which can be customized, to users of the network. The video spidering mechanism generates an index of each accessed video, and the index is committed to the guide repository along with the URL information of the video being indexed.Type: GrantFiled: August 21, 2007Date of Patent: October 1, 2013Assignee: Virage, Inc.Inventors: Owen Lynn, Richard Humphrey, Dale Thoms
-
Publication number: 20130246389Abstract: A database of user preference information is extracted and compiled from multiple websites by web-crawling robots without cooperation or specific participation by users. Users who interact with a website are frequently required to register and create a login or userID name that uniquely identifies them. Thereafter, when an individual rates an item, it is often recorded and published under their userID name such that other users can see how a specific individual rated the item. Although there is no requirement that a specific user register on different websites utilizing the identical userID, it is extremely common that this practice occurs and the use of identical userIDs on multiple sites is used herein to expand preference analysis beyond a single site. Once the database exists, users can request or be passively offered suggestions that result from preference associations across multiple websites as performed by a preference analysis and suggestion function.Type: ApplicationFiled: March 14, 2013Publication date: September 19, 2013Inventor: Robert Osann, JR.
-
Publication number: 20130246336Abstract: A method is provided in one example and includes crawling a storage location of a network environment to identify objects, fetching the identified objects, creating indexes corresponding to the identified objects, and classifying one or more objects of the identified objects based on a first category. The method further includes providing first sets of metadata elements and corresponding first category information representing the classified one or more objects of the identified objects, searching the indexes for a selected group of the classified one or more objects of the identified objects, and classifying one or more objects of the selected group based on a second category. In more specific embodiments, the method includes applying a remediation policy to the classified one or more objects of the selected group. In other more specific embodiments, the method includes registering the classified one or more objects of the selected group.Type: ApplicationFiled: December 27, 2011Publication date: September 19, 2013Inventors: Ratinder Paul Singh Ahuja, Bimalesh Jha, Nitin Maini, Sujata Patel, Ankit R. Jain, Damodar K. Hegde, Rajaram V. Nanganure, Avinash Vishnu Pawar
-
Patent number: 8539329Abstract: Methods for configuring website categorization software, categorizing websites and a method and system for controlling access to websites. A number of websites are selected, all of which relate to a single predetermined category of subject matter. In order to create a category profile, a website is selected from the set (3), the website markup language is read (5), page content information extracted (7) and then analyzed (9). The system may then check whether it has analyzed a sufficient number of websites to allow for a reliable categorization of subsequent websites (13). Individual websites are categorized by extracting their page content information (45) and categorizing (51) on the basis of the degree of similarity between the information and the category profile (55). To control access, the system compares a website identifier with the database of categorized identifiers.Type: GrantFiled: November 1, 2007Date of Patent: September 17, 2013Assignee: Bloxx LimitedInventor: James Wilson
-
Publication number: 20130238593Abstract: Systems and methods for providing an enterprise crawl and search framework, including features such as use with middleware and enterprise application environments, pluggable security, search development tools, user interfaces, and governance. In accordance with an embodiment, the system includes an enterprise crawl and search framework which abstracts an underlying search engine, provides a common set of application programming interfaces for developing search functionalities, and allows the framework to serve as an integration layer between one or more enterprise search engine and one or more enterprise application. A plurality of searchable objects which are sets of data derived from enterprise applications are used to make view objects available for full text search.Type: ApplicationFiled: January 2, 2013Publication date: September 12, 2013Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: DJ Vasant Ursal, Tulasi Kodali
-
Publication number: 20130238594Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying entities that are related to an entity to which a search query is directed. One of the methods includes receiving a search query, wherein the search query has been determined to relate to a first entity of a first entity type, and wherein one or more entities of a second entity type have a relationship with the first entity; receiving search results for the search query; determining that a count of search results identifying a resource containing a reference to the first entity satisfies a first threshold value; determining that a count of search results identifying a resource having the second entity type as a relevant entity type satisfies a second threshold value; and transmitting information identifying the one or more entities of the second entity type as part of the response to the search query.Type: ApplicationFiled: February 22, 2013Publication date: September 12, 2013Inventors: Peter Jin Hong, Pravir K. Gupta, Nathaniel J. Gaylinn, Ramakrishnan Kazhiyur-Mannar, Kavi J. Goel, Omer Bar-or, Jack W. Menzel, Christina R. Dhanaraj, Jared L. Levy, Shashidhar A. Thakur, Grace Chung, Benson Tsai
-
Patent number: 8533175Abstract: Collections of music and other items, related by time, location, genre, and artist, are registered in a data model to provide a foundation for their curatorship, discovery, and procurement. A series of choices, where a choice is a combination of time, place, genre, and artist, represents a map through the history and culture of music. Both expert and regular individual curators define the maps. Animated murals depicting a fundamental combination of time, place, genre, and artist provide a user interface for the navigation of music, its history, and culture. Integration with hand held GPS enabled devices provides users with knowledge of music events and history relative to their present location. A network view presents the user with an interactive diagram of connections between elements in the tunesmap database used as a dynamic filter construction device.Type: GrantFiled: August 12, 2010Date of Patent: September 10, 2013Inventor: Gilbert Marquard Roswell
-
Patent number: 8527495Abstract: A plug-in interface is provided in a crawling search engine. Plug-in parsers are also provided for use with the search engine. The plug-in interface allows the search engine to be configured with different plug-in parsers. Thus, a customer may configure a search engine with a parser that best suits the needs of the customer and to try new parsing algorithms to find the best results.Type: GrantFiled: February 19, 2002Date of Patent: September 3, 2013Assignee: International Business Machines CorporationInventor: Richard J. Redpath
-
Patent number: 8515932Abstract: The invention comprises systems, methods and a computerized data management device for creating and using data relating to a medical or non-medical product or device to enhance the safety of the product or device. A vast amount of data regarding adverse events associated with a particular product or device is analyzed to identify new essential adverse events associated with the product or device. At least one database of new essential adverse event information is created and utilized, and new characteristics of or uses for the product or device related to the new essential adverse event information are determined. Adverse event information is gathered for a large number of population sub-groups. The system may also be programmed to incorporate the information into intellectual property and contract documents.Type: GrantFiled: June 9, 2011Date of Patent: August 20, 2013Assignee: Classen Immunotherapies, Inc.Inventor: John Barthelow Classen
-
Patent number: 8515938Abstract: An information processing system including, a client capable of receiving and reproducing content from a media server, and a collecting server for receiving content management information on the content from the media server and managing the content management information.Type: GrantFiled: May 6, 2008Date of Patent: August 20, 2013Assignee: Sony CorporationInventors: Toshiaki Kusakabe, Satoshi Hiroi, Masahiro Hara
-
Patent number: 8516554Abstract: A Dynamic Web Service server may facilitate custom Enterprise Application interface development with little or no developer input by dynamically creating a web service for performing a particular transaction according to a transaction map. An Enterprise Application client device may create a transaction map by “recording” a transaction between an Enterprise Application client and an Enterprise Application server and mapping transaction fields to a custom interface generated to collect data for re-performing the recorded transaction. The Enterprise Application client device may call the dynamic web service, and the Dynamic Web Service server may then perform the recorded transaction using input data collected in the custom interface.Type: GrantFiled: November 1, 2012Date of Patent: August 20, 2013Assignee: Winshuttle, LLCInventors: Vishal Chalana, Amit Sharma, Piyush Nagar, Vishal Sharma, Vikram Chalana
-
Patent number: 8510289Abstract: A system processes user queries. The system may generate a list of query patterns of a first type. The system may also receive a user query and determine whether the received query is a query of the first type based at least in part on the list of query patterns.Type: GrantFiled: October 20, 2011Date of Patent: August 13, 2013Assignee: Google Inc.Inventors: Amit Singhal, Matt Cutts, Jun Wu
-
Publication number: 20130204860Abstract: The statistics from a reference page serves as a seed to compare the selected page statistics between other webpages. The statistics of all results can be graphically displayed, if desired, in a display or popup window. These results can be analyzed for the determination of a category so an appropriate search expression or a statistical mask can be developed. In addition, statistics of several pages and compare and analyze the results for search term commonality. This step determine how strongly tied the scanned data content of two different webpages are to each other. These results can be analyzed against each other to generate common search terms, a final histogram, and how this histogram compares to the reference histogram. The search expression term can be a Boolean expression or a statistical mask. The statistical mask is used as a seed to start another search moving closer to the final target or desire goal.Type: ApplicationFiled: February 3, 2012Publication date: August 8, 2013Applicant: TrueMaps LLCInventor: Thaddeus John Gabara
-
Patent number: 8504550Abstract: Systems and methods of identifying and categorizing social network messages that are relevant to selected categories and text terms are provided. The frequency of text terms appearing in social network messages are calculated for multiple categories. Based on the calculated text term frequency, social network messages can be identified and/or categorized that match a provided set of text terms. Selecting and/or associating text terms and categories are determined by repeatedly analyzing social network messages.Type: GrantFiled: May 17, 2010Date of Patent: August 6, 2013Assignee: CitizenNet Inc.Inventors: Michael Aaron Hall, Daniel Benyamin, Aaron Chu
-
Patent number: 8504551Abstract: Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content.Type: GrantFiled: April 11, 2011Date of Patent: August 6, 2013Assignee: Google Inc.Inventors: Darrell Anderson, Paul Buchheit, Alexander Paul Carobus, Yingwei Cui, Jeffrey A. Dean, Georges R. Harik, Deepak Jindal, Narayanan Shivakumar
-
Publication number: 20130198162Abstract: A plurality of methods for searching one or more business entities utilizing a web service and a browser plug-in application based on one or more keywords, a domain name or a user's geographic location and displaying the relevant business entities. The one or more keywords will be based on selected text entered into a web service and a browser plug-in application. The search process is initiated based on selecting one or more keywords from a text, double-clicking on the one or more keywords, selecting a right-click context menu option and clicking on a browser extension button. The server receives the request and carries out a search in a database in communication with the web service. The search is based on which relevant business entities are found and a result is created, which is sent back to the browser, where the received search results are displayed in a browser pop-up window.Type: ApplicationFiled: January 30, 2013Publication date: August 1, 2013Inventor: Rasmus Refer
-
Patent number: 8498978Abstract: Slideshow video file detection. A method includes receiving a search query for video files of a desired type. A portion of a video file is extracted. A frame difference based histogram and an active pixel based histogram are generated for the portion. Further, the frame difference based histogram and an active pixel based histogram are provided to a machine learning tool. An indicator is determined for the portion based on a plurality of parameters. The video file is classified as the desired type based on the indicator. The video file is provided to the user.Type: GrantFiled: December 30, 2008Date of Patent: July 30, 2013Assignee: Yahoo! Inc.Inventors: Venkatesh Babu Radhakrishnan, Srinivasan H. Sengamedu
-
Publication number: 20130191366Abstract: A pattern matching engine and associated method for detecting one or more of headers, footers, watermarks, page numbering, page colors, and page borders appearing in a fixed format document. The pattern matching engine performs pattern matching across pages of the fixed format document to identify repeating patterns. Using heuristic analysis, repeating patterns meeting selected criteria are classified as headers, footers, or watermarks. Filtering removes repeating patterns unlikely to represent headers, footers, or watermarks. The information produced by the pattern matching engine allows the repeating elements to be properly reconstructed as flowable elements when converting a fixed format document into a flow format document.Type: ApplicationFiled: January 23, 2012Publication date: July 25, 2013Applicant: MICROSOFT CORPORATIONInventors: Vuk Jovanovic, Milos Lazarevic, Milos Raskovic, Nenad Bozidarevic, Milan Sesum
-
Patent number: 8495049Abstract: A system and a method for automatically submitting Web pages to a search engine, which is preferably used for submitting dynamic Web pages, but may optionally be used for any type of Web page. The present invention features a gateway server for providing these Web pages to the search engine, either directly or optionally through an autonomous software search program. Optionally and more preferably, the gateway server modifies the Web page before serving it to the autonomous software search program and/or search engine.Type: GrantFiled: October 4, 2010Date of Patent: July 23, 2013Assignee: Microsoft CorporationInventors: Yaron Galai, Oded Itzhak
-
Patent number: 8489578Abstract: A method, system, and article are provided for management of a data ingester and associated content collected by the data ingester. The computer system is configured with a taxonomy together with rules and policies for ingesting and classifying the collected data. Based upon the classification of the collected data with respect to the taxonomy, the data is assigned to a location in the taxonomy.Type: GrantFiled: October 20, 2008Date of Patent: July 16, 2013Assignee: International Business Machines CorporationInventors: Varun Bhagwan, Rajesh M. Desai, Piyoosh Jalan
-
Publication number: 20130179423Abstract: A computer-automated method and system of providing a searchable knowledge base with decision-relevant attributes (including some subjective or sentiment-based attributes) for a plurality of individual items within a choice set are described. First, information (including texts) relevant to the plurality of individual items in the choice set is harvested from Internet sources. Next, normalized representations of statements are extracted from excerpts of the harvested texts that pertain to attributes of interest for the choice set, and corresponding scores for the attributes are derived from each of the normalized representations. The scores derived from the various harvested sources are aggregated for each attribute of each item. Finally, the knowledge base of the plurality of individual topics is generated.Type: ApplicationFiled: January 5, 2012Publication date: July 11, 2013Applicant: SRI InternationalInventors: Nadav Gur, David Israel, Imri Goldberg
-
Publication number: 20130179425Abstract: A program search apparatus and method using a related keyword is provided. The program search apparatus may include an interface to extract a search keyword from a program search request, in response to the program search request being received, and a processor to obtain a related keyword with respect to the search keyword, using the extracted search keyword, to search a database for first program information using the obtained related keyword, and to provide found first program information.Type: ApplicationFiled: January 4, 2013Publication date: July 11, 2013Applicant: Electronics and Telecommunications Research InstituteInventor: Electronics and Telecommunications Research Institute
-
Publication number: 20130179426Abstract: Systems and methods of identifying and retrieving messages that satisfy a search query using the context of the message and term frequencies are provided. One embodiment includes identifying at least one category relevant to the search query, wherein a plurality of scored keywords are associated with each category, selecting at least one of the scored keywords that is relevant to an identified category, performing a plurality of searches of messages from a social networking messaging service to retrieve messages, where at least one search includes retrieving messages based on the original search query and one of the selected scored keywords, scoring the retrieved messages with respect to each of the at least one identified categories using at least the scored keywords relevant to each category, and returning at least the message with the highest score as the search result.Type: ApplicationFiled: January 14, 2013Publication date: July 11, 2013Applicant: CitizenNet Inc.Inventor: CitizenNet Inc.
-
Publication number: 20130179424Abstract: Methods, systems and computer-readable storage medium for determining a crawling schedule. In an aspect, a method includes obtaining crawl history data for a Web site having Web pages, determining a status of the Web pages, determining a total quantity of Web pages that have a status of deleted, calculating a probability that another Web page of the Web site will be removed based on the total quantity, and storing data associating the calculated probability with the Web site. The method can further include determining, for a plurality of sets of the previous time periods, a respective crawl penalty as a combination of a penalty for crawling the Web site and a penalty for showing a deleted Web page based on the calculated probability, and determining a re-crawl schedule based on the crawl penalties.Type: ApplicationFiled: January 11, 2012Publication date: July 11, 2013Inventors: Cheng Xu, Qiying Lin, Xin Li
-
Patent number: 8484194Abstract: A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data. The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.Type: GrantFiled: January 13, 2012Date of Patent: July 9, 2013Assignee: Google Inc.Inventors: Philo Juang, Christopher Testa, Nicolaus Mote
-
Patent number: 8473470Abstract: A software program and associated web-based portal is provided for industry-specific product comparison. The program and an associated web portal allows the user the ability to search multiple manufacturers' catalogs and to enter a query based upon customized search criteria. Query results are returned of products that satisfy the user's search criteria. The query is made available to manufacturers whose products are identified in the query results and a communication link is provided whereby such manufacturers can contact the user to discuss the product identified in the search. The user can respond using the message board associated with the web portal. The program and portal can also integrate updates to pump manufacturers' catalogs and can also produce best-fit solutions for users' design criteria.Type: GrantFiled: May 23, 2005Date of Patent: June 25, 2013Assignee: Bentley Systems, IncorporatedInventors: Jack S. Cook, Jr., Diego Alexander Diaz Pabon, Benjamin John Ewing
-
Patent number: 8473473Abstract: An object oriented search mechanism extracts structural metadata and data based on type of document contents and data sources connected to the documents. Relationships between textual and non-textual elements within documents as well as metadata associated with the elements and data sources are utilized to generate a unified object model with the addition of semantic information derived from metadata and taxonomy, which are used to enhance search indexing, ranking of search results, and dynamic adjustment of result rendering user interface with fine tuned relevancy. Additional data from data sources connected to the documents may also be used to unlock hidden data such as data that has been filtered out in an original document.Type: GrantFiled: March 16, 2010Date of Patent: June 25, 2013Assignee: Microsoft CorporationInventors: Luming Wang, Xiaohong Yang, Hailei Zhang, Sonal Jain
-
Publication number: 20130151500Abstract: A search query is received. Personal information for a user is then determined. A search is performed in a general subdomain of general content using the search query. For example, the general subdomain of general content may be a WWW search. Then, a vertical subdomain is determined based on the personal information. A search is then performed in the vertical subdomain of specialized content using the search query. The search performed in the general subdomain and the search performed in the vertical subdomain generate general search results and vertical search results. The results may be combined and outputted to a client.Type: ApplicationFiled: November 11, 2012Publication date: June 13, 2013Applicant: YAHOO! INC.Inventor: YAHOO! INC.
-
Publication number: 20130144861Abstract: An Internet infrastructure contains a search server that delivers search result pages of web sites to client devices based upon a search string. Maxima categories are provided that sort search results or web pages based upon popularity and/or context similarity. A web browser contained within a client device is coupled to display various search result pages of web sites delivered by the search server. A maxima determination module within the search server responds to the delivery of the initial search string by first categorizing search results applicability to the search string on the basis of maxima or by generating maxima categories with search results contained therein that correlated to the search string. These search results within each applicable maximum are then sorting on the basis of popularity within each of the maxima categories to effectuate popularity ranks for each search result or web page.Type: ApplicationFiled: January 29, 2013Publication date: June 6, 2013Applicant: ENPULZ, L.L.C.Inventor: James D. Bennett
-
Publication number: 20130144862Abstract: Systems and methods for clustering user reviews are disclosed in which a plurality of user reviews are extracted from electronic documents. The electronic documents contain user reviews of a plurality of items of interest. A set of user reviews is identified in the plurality of user reviews as being associated with the same item of interest in the plurality of items of interest. Item identifying information included in the electronic documents is used for this identification. The set of user reviews is then associated with the same item of interest. Examples of item identifying information include unique product identifiers, brand names, model numbers, and category information. In some instances, the item identifying information is extracted from metadata included in the electronic document. In some instances, the electronic documents are obtained from e-commerce websites or product-review websites.Type: ApplicationFiled: January 30, 2013Publication date: June 6, 2013Inventors: Jan Matthias Ruhl, Mayur D. Datar
-
Publication number: 20130132366Abstract: A domain classifier develops and maintains relevance data about specific domains based on historical relevance data and source context data. Such data may be used to classify the user's interest in attempting to visit a specific domain and thereby redirect the user to a website expected to be aligned with the user's interest. In various implementations, the historical relevance data is derived from source context data and/or post-visit user behavior collected from previous attempts to visit a specific domain. The source context data collected from the current visit may also be used as source context-factors to influence domain classification. Based on such historical and current source context factors, as well as the domain address provided in the user's navigation request, a domain classifier consults the historical relevance data and provides the user with Web content that is identified as likely to be relevant to the user's interests.Type: ApplicationFiled: January 22, 2013Publication date: May 23, 2013Applicant: Working Research Inc.Inventor: Keith Merle Pieper
-
Publication number: 20130132365Abstract: An extensible offer inventory database of offers in a domain is established. Further, an offer ontology is generated based on the extensible offer inventory database. The offer ontology provides an extensible vocabulary that correlates to categories in the offer inventory database. In addition, offers are automatically located. The offers are also semantically analyzed to generate semantic analysis data. Further, user data is obtained. In addition, an optimal offer match is automatically determined based upon the semantic analysis data and the user data.Type: ApplicationFiled: June 3, 2011Publication date: May 23, 2013Applicant: ADOBE SYSTEMS INCORPORATEDInventors: WALTER CHANG, Geoff Baum
-
Publication number: 20130117190Abstract: A system, apparatus, and method is provided for the copyright infringement lifecycle. Content to be added to a catalogue is received and stored in a database. Material that is same or similar to the content is automatically searched for by deriving and using keywords indicative of the material, and a user is notified of the material that is same or similar to the content. Infringing activity is monitored and content owners' rights are enforced via automatic dispatch of electronic notifications. Defendant pools are built by cross-referencing the physical location of infringing activity and jurisdictions where attorneys are admitted to practice.Type: ApplicationFiled: November 7, 2011Publication date: May 9, 2013Applicant: Singularis, Inc.Inventor: Emanuel I. Wald
-
Patent number: 8438466Abstract: In one embodiment, on a first date, a computer system receives a watch list term that is specified by a user. The computer system receives an electronic version of a paper on a second date after the first date, and the computer system searches the electronic version for the watch list term, without the user specifying the watch list term after the first date, and without the user initiating the searching after the first date. The computer system outputs a result of the searching for display on a display device.Type: GrantFiled: March 22, 2006Date of Patent: May 7, 2013Assignee: Libredigital, Inc.Inventors: Tracey L. Jones, Billy P. Taylor, Frank H. Moeller