Based On Term Frequency Of Appearance Patents (Class 707/750)
  • Publication number: 20150058364
    Abstract: Matching systems and methods for social networking systems can select matches for users based on observed activities. A matching system can include, for example, a preference unit, a monitoring unit, and a matching unit. Generally, the preference unit can receive and process matching preference information for a user; the monitoring unit can monitor the user's activities on or observable by the server; and the matching unit can select and recommend matches for the user based on the monitored activities. Thus, matches can be suggested to the user based on the user's observed activities, and not simply based on the user's potentially inaccurate self-description.
    Type: Application
    Filed: November 3, 2014
    Publication date: February 26, 2015
    Inventors: Holly Pearson, Gregory A. Pearson, Ronald Shane Hamilton, David B. Hall
  • Publication number: 20150046472
    Abstract: A record is received including a token without a corresponding predetermined weight. Information pertaining to the received token is retrieved from at least one of external reference information and historic statistics. A token with a predetermined weight closest to the received token is determined based on the retrieved information. The predetermined weight of the closest token is assigned to the received token and data is matched based on the assigned weight of the received token.
    Type: Application
    Filed: August 6, 2013
    Publication date: February 12, 2015
    Applicant: International Business Machines Corporation
    Inventors: Karl J. Weinmeister, Yinle Zhou
  • Patent number: 8949249
    Abstract: Techniques to search for data elements in a distributed computing environment are described. An apparatus may comprise a processor and a memory unit communicatively coupled to the processor. The memory unit may store a correlation module that when executed by the processor is operative to determine a target rank position at a target percentile rank within a total data set. The correlation module may determine a target data item at the target rank position for the total data set using candidate data items at candidate rank positions for each of multiple sorted data subsets of the total data set, and correlation values associated with each of the candidate data items. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 15, 2010
    Date of Patent: February 3, 2015
    Assignee: SAS Institute, Inc.
    Inventor: Karl Moss
  • Patent number: 8938452
    Abstract: Query generation for searchable content is provided. In some embodiments, query generation for searchable content includes receiving searchable content (e.g., the searchable content can include a unique identifier for the searchable content, such as a Uniform Resource Locator (URL) for a web site, and the web site can include one or more web pages); and generating a set of queries, the set of queries including one or more queries (e.g., the set of queries can include ranked queries) that are relevant to the searchable content.
    Type: Grant
    Filed: January 29, 2014
    Date of Patent: January 20, 2015
    Assignee: BloomReach Inc.
    Inventors: Raj K. De Datta, Ashutosh Garg, Abhay Vardhan, Joshua Levy, Srinath Sridhar
  • Patent number: 8938438
    Abstract: Systems and method of the present invention provide for one or more server computers configured to receive one or more keywords topically relevant to a content of a web page, request from a search engine a first metric comprising a quantity of times the keywords have appeared in a search query with one or more question keywords during a time period and a second metric comprising a probability of receiving a high rank associated with the one or more keywords and the one or more question keywords, receive, from the search engine, the first metric and the second metric, calculate a keyword effectiveness index from the first metric and the second metric, and generate and transmit to a client computer one or more recommendations to include a high ranked suggested content on the web page according to the keyword effectiveness index.
    Type: Grant
    Filed: October 11, 2012
    Date of Patent: January 20, 2015
    Assignee: Go Daddy Operating Company, LLC
    Inventor: Rajinder Nijjer
  • Publication number: 20150019571
    Abstract: Relay of information from technical documentation by contact center workers to assist clients is limited by industry standard storage formats and query mechanisms. A method is disclosed for processing technical documents and tagging them against a Telecom Hardware domain ontology. The method comprises classical ontological Natural Language Processing (NLP) approaches to extract information from both text segments and tables, identifying text segments, named entities and relations between named entities described by an existing T-Box. A method for scoring candidate object property assertions derived from text before populating the Telecom Hardware ontology is also disclosed.
    Type: Application
    Filed: September 25, 2014
    Publication date: January 15, 2015
    Inventors: Christopher Baker, Alexander Kouznetsov
  • Patent number: 8930447
    Abstract: A usability analysis method of a web application including: a first step of acquiring a page transition log and operation logs on individual pages in the web application; a second step of detecting a segment having a specific page transition pattern in the page transition log; a third step of managing operation logs on individual pages included in the detected page transition pattern in relation to the individual pages; a fourth step of performing statistic processing on the managed operation logs and analyzing page utilization; and a fifth step of analyzing usability based on the page transition pattern and the page utilization.
    Type: Grant
    Filed: March 5, 2010
    Date of Patent: January 6, 2015
    Assignee: Hitachi, Ltd.
    Inventor: Katsuro Kikuchi
  • Patent number: 8930378
    Abstract: Particular embodiments of a social-networking system maintain one or more data stores storing a social graph comprising user nodes, concept nodes, and edges connecting the nodes. Particular embodiments may determine a confidence score with respect to a user node and a concept node, wherein the confidence score is based at least in part on affinity scores associated with the edges along a sequence of nodes between the user node and the concept node in the social graph. The confidence score may be based on an overall probability that a random walk starting at the user node will end at the concept node. This overall probability may be determined by calculating, for each edge in the random walk, the probability of taking that edge during the random walk, based on the affinity score associated with that edge.
    Type: Grant
    Filed: October 14, 2013
    Date of Patent: January 6, 2015
    Assignee: Facebook, Inc.
    Inventors: Pierre Moreels, Tudor Andrei Cristian Alexandrescu
  • Patent number: 8918406
    Abstract: A method of processing content files may include receiving the content file, employing processing circuitry to determine an identity score of a source of a portion of at least a portion the content file, to determine a word score based for the content file and to determine a metadata score for the content file, determining a composite priority score based on the identity score, the word score and the metadata score, and associating the composite priority score with the content file for electronic provision of the content file together with the composite priority score to a human analyst.
    Type: Grant
    Filed: December 14, 2012
    Date of Patent: December 23, 2014
    Assignee: Second Wind Consulting LLC
    Inventor: Donna Rober
  • Publication number: 20140372455
    Abstract: An aspect provides a method, including: storing an object; obtaining data associated with the object; analyzing, using one or more processors, the data associated with the object to identify one or more key words in the data associated with the object to create one or more tags; and storing the one or more tags in a searchable format. Other aspects are described and claimed.
    Type: Application
    Filed: June 17, 2013
    Publication date: December 18, 2014
    Inventors: Howard Locker, Daryl Cromer, Rod D. Waltermann, Aaron Michael Stewart
  • Publication number: 20140365510
    Abstract: A device for determining interest includes a storage portion configured to store, on a user-by-user basis, a co-occurrence frequency in correlation with a user, the co-occurrence frequency indicating how many times a pair of words is used in a same cluster of a first document, on a pair-by-pair basis, to which the user gained access previously; a designating portion configured to allow a person who is to conduct a search to designate a second document and any one of the users; and a determination portion configured to determine that, among the pairs, a pair which is used in a same cluster of the designated second document and which also satisfies a predetermined condition of the co-occurrence frequency corresponding to the designated user is a particular pair in the second document which is probably of high interest to the designated user.
    Type: Application
    Filed: June 6, 2014
    Publication date: December 11, 2014
    Inventors: Yoichi KAWABUCHI, Satoshi DEISHI, Kagumi MORIWAKI
  • Patent number: 8903843
    Abstract: A media recommendation system for recommending media content that is historically related to seed media content is provided. The recommended media content may be songs, television programs, movies, or a combination thereof, and the seed media content may be a song, television program, or movie.
    Type: Grant
    Filed: June 21, 2006
    Date of Patent: December 2, 2014
    Assignee: Napo Enterprises, LLC
    Inventor: Eugene M. Farrelly
  • Patent number: 8897486
    Abstract: Character identity recognition is applied to identify text strings corresponding to character identities in a written work. The textual strings are grouped according to character identity and, from each group, a primary name is selected. A significance value may be calculated for each of the character identities. The character identities including the primary names are presented in a catalog based on the calculated significance values.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: November 25, 2014
    Assignee: Amazon Technologies, Inc.
    Inventors: Peter Thomas Killalea, Janna S. Hamaker, Eugene Kalenkovich
  • Patent number: 8892574
    Abstract: Provided is a search apparatus, a search method, and a program that can improve search speed for a document set even when an object to be searched is a large-scale document set. A search apparatus, in an embodiment, includes an abstract matrix storage unit, a word frequency calculation unit, and a document frequency reference unit.
    Type: Grant
    Filed: November 6, 2009
    Date of Patent: November 18, 2014
    Assignee: NEC Corporation
    Inventor: Yukitaka Kusumura
  • Patent number: 8892575
    Abstract: A method for building dictionary entry names for data elements of a canonical data model includes identifying candidate terms for the dictionary entry name of a node or equivalence class of the canonical data model. The method includes counting a frequency of occurrence of candidate terms in use and based on the use counts creating a candidate ordering of terms for the complete ordered dictionary entry name of the node or equivalence class. The method further includes validating the candidate ordering of terms for the complete ordered dictionary entry name of the node or equivalence class by comparison of the ordering with reliable dictionary entry name entries in a database and/or by usage counts in search engine results.
    Type: Grant
    Filed: June 6, 2012
    Date of Patent: November 18, 2014
    Assignee: SAP SE
    Inventors: Gunther Stuhec, Dirk Weissmann
  • Patent number: 8874590
    Abstract: A keyword input supporting apparatus includes a document acquisition unit that acquires a document having a plurality of components containing text data, a main component selection unit that selects a component having many characters in the text data as a main component, a part-of-speech analysis unit that analyzes the part-of-speech of the text data contained in the main component, and adds a semantic attribute to each of words of the text data, a specific name extraction unit that extracts as a specific name a word, having a predetermined semantic attribute or part of speech, from the words, a specific name storage that stores the specific name together with the corresponding semantic attribute, a keyword candidate classification unit that performs classification of the specific name from the storage as a keyword candidate based on the semantic attribute, and a keyword candidate presentation unit that presents the keyword candidate to a user.
    Type: Grant
    Filed: May 27, 2009
    Date of Patent: October 28, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Suzuki, Satoshi Kinoshita, Hideo Umeki, Wataru Nakano
  • Patent number: 8874591
    Abstract: The invention discloses a system and method for managing feedback data that will be used for ranking search results. The invention can aggregate a plurality of user feedback data from more than one user into a search index. The user feedback data can be associated with one or more documents within the index such that the one or more documents can be ranked based on the type of feedback data that is aggregated. Once the documents have been ranked, the ranked documents can be provided to a requester.
    Type: Grant
    Filed: January 31, 2006
    Date of Patent: October 28, 2014
    Assignee: Microsoft Corporation
    Inventors: James Dai, Julia H. Farago, Natala J. Menezes, Ramaz Naam, Saleel Sathe, Hugh J. Williams
  • Publication number: 20140310289
    Abstract: Systems and methods are described which use associations between field values, more generally terms, called selectors, and data items, or structures within data items. The associative information is derived from the content of data and can be stored in optimal data structures, generally descriptively named associative matrices, which may be used to perform searches and calculations of data analytics. In some embodiments, calculations use only selector values and their counts, called frequencies, of associated data items, and/or structures within those items. Special queries, executed on the associative information, determine the frequencies. Methods of data analysis use the results of these queries. Applications can display results dynamically as a user creates queries by choosing selectors, changing the queries, and creating new ones, completely intuitively, using point and click.
    Type: Application
    Filed: April 11, 2014
    Publication date: October 16, 2014
    Applicant: SPEEDTRACK, INC.
    Inventors: Jerzy Josef Lewak, Pawel Grzes
  • Patent number: 8862581
    Abstract: There is a method and a system for concentration detection. The method for concentration detection includes the steps of extracting temporal features from brain signals; classifying the extracted temporal features using a classifier to give a score x1; extracting spectral-spatial features from brain signals; selecting spectral-spatial features containing discriminative information between concentration and non-concentration states from the set of extracted spectral-spatial features; classifying the selected spectral-spatial features using a classifier to give a score x2; combining the scores x1 and x2 to give a single score; and determining if the subject is in a concentration state based on the single score.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: October 14, 2014
    Assignee: Agency for Science Technology and Research
    Inventors: Haihong Zhang, Cuntai Guan, Brahim Ahmed Salah Hamadi Charef, Chuanchu Wang, Kok Soon Phua
  • Publication number: 20140304279
    Abstract: The technology disclosed relates to automatic generation of tuples from a record set for outlier analysis. Applying this new technology, user need not specify which 1-tuples to combine into n-tuples. The tuples are generated from structured records organized into features (that also could be fields, objects or attributes.) Tuples are generated from combinations of feature values in the records. Thresholding is applied to manage the number of tuples generated. The technology disclosed further relates to indexing and searching high dimensional tuple spaces in a computer-implemented system.
    Type: Application
    Filed: April 3, 2014
    Publication date: October 9, 2014
    Applicant: Salesforce.com. inc.
    Inventors: Matthew Fuchs, Stanislav Georgiev
  • Patent number: 8856145
    Abstract: The present invention is directed towards systems and methods for indexing one or more items of content. The method of the present invention comprises extracting one or more items of text from a given item of content. The one or more items of extracted text are tokenized into one or more concepts. One or more related concepts associated with the one or more concepts are identified. A support score is generated for the one or more concepts, and the item of content is index with the one or more concepts and the one or more associated support scores.
    Type: Grant
    Filed: December 15, 2006
    Date of Patent: October 7, 2014
    Assignee: Yahoo! Inc.
    Inventors: Jignashu Parikh, John Thrall
  • Publication number: 20140297659
    Abstract: Categorizing data sets obtained from a number of sources includes determining the frequency of appearance of symbols in a first collection of data sets and the frequency of appearance of symbols in a second collection of data sets, determining the most significant symbols for the second collection based on the frequency of appearance in the first collection and the frequency of appearance in the second collection, grouping the most significant symbols into groups according to their appearance in the same data set and ranking the data sets in relation to the symbol groups according to a ranking scheme. Related methods, devices, and/or computer program products are described.
    Type: Application
    Filed: November 9, 2012
    Publication date: October 2, 2014
    Applicant: Kairos Future Group AB
    Inventors: Tomas Larsson, Mats Lindgren
  • Publication number: 20140297658
    Abstract: A search technology generates recommendations with minimal user data and participation, and provides better interpretation of user data, such as popularity, thus obtaining breadth and quality in recommendations. It is sensitive to the semantic content of natural language terms taken from user profiles, which can include interests, eccentricities, age, gender, and location information associated with the user. The interest information can include music, movies, sports and personality traits. Based on the user's profile information, the system determines which ad from a stock of ads is best suited to a given profile and delivers that ad. The system can be used to match user profiles to provide mate-matching.
    Type: Application
    Filed: May 23, 2014
    Publication date: October 2, 2014
    Applicant: PIKSEL, INC.
    Inventors: Issar Amit Kanigsberg, Daniel M. Veidlinger, Myer Joshua Mozersky, Tamer El Shazli
  • Patent number: 8849837
    Abstract: A system is provided that that dynamically matches data originating from one or more data sources. The system analyzes a matching configuration file, where the matching configuration file includes one or more matching configurations. The system modifies a probabilistic matching algorithm of a matching engine at runtime based on the one or more matching configurations and based on two or more data records of the plurality of data records that require matching. The system compares two data records of a plurality of data records using the modified probabilistic matching algorithm. The system generates a match score for the two data records based on the match weight for each data record field.
    Type: Grant
    Filed: October 5, 2012
    Date of Patent: September 30, 2014
    Assignee: Oracle International Corporation
    Inventor: Swaranjit Singh Dua
  • Patent number: 8849835
    Abstract: Methods, systems, and apparatus, including computer program products, are described for reconciling data. In one implementation, a method includes generating co-occurrence scores indicating whether data in entries in a first source of data co-occur within documents in a plurality of documents with data in entries in a second source of data. The co-occurrence scores for a given entry in the first source of data are used to identify a plurality of candidate matching entries in the second source of data for the given entry. Data in fields in the given entry are compared to that of one or more of the candidate matching entries to produce field similarity scores. The field similarity scores and the co-occurrence scores are used to determine a match for the given entry among the plurality of candidate matching entries.
    Type: Grant
    Filed: November 2, 2011
    Date of Patent: September 30, 2014
    Assignee: Google Inc.
    Inventors: Eyal Carmi, Daniel H Harrison, Andrew Hogue, Gregory A Morris
  • Publication number: 20140280242
    Abstract: A method includes: a first word set is acquired from community data within a period; words are selected from the first word set according to a frequency that each word of the first word set appears in the community data during a first group of days, the selected words are determined as hot words and form a second word set, wherein the first group of days are a plurality of days backward from a designated day; and topics are selected from a community topic set according to the second word set, and are determined as hot topics.
    Type: Application
    Filed: May 19, 2014
    Publication date: September 18, 2014
    Inventor: Gang CHENG
  • Publication number: 20140280011
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicating a measure of quality for a site, e.g., a web site. In some implementations, the methods include obtaining baseline site quality scores for multiple previously scored sites; generating a phrase model for multiple sites including the previously scored sites, wherein the phrase model defines a mapping from phrase specific relative frequency measures to phrase specific baseline site quality scores; for a new site that is not one of the previously scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: Google Inc.
    Inventors: Yun Zhou, Navneet Panda
  • Patent number: 8838619
    Abstract: One or more server devices may simultaneously calculate first ranking scores for a group of users and second ranking scores for a group of comments authored by the group of users. The calculating may occur during a same process. The one or more server devices may further provide one of a first ranked list that includes information identifying the group of users, the information identifying the group of users being ordered based on the first ranking scores, or a second group of comments of the group of comments, the comments in the second group of comments being ordered based on the second ranking scores.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: September 16, 2014
    Assignee: Google Inc.
    Inventors: Michal Cierniak, Na Tang
  • Patent number: 8838611
    Abstract: Disclosed are a document ranking system and method based on contribution scoring. The document ranking system includes: a content score calculating unit for calculating content scores for documents with respect to at least one word contained in the documents, with regard to each such word; a contribution score calculating unit for calculating contribution scores for the documents with respect to jointly occurring words; and a ranking unit for ranking the documents with respect to the at least one word, with regard to each such word, by using the content scores and the contribution scores.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: September 16, 2014
    Assignee: NHN Corporation
    Inventors: Dong Jin Kim, Sang-Wook Kim
  • Patent number: 8838604
    Abstract: A system identifies a set of documents from a corpus of documents that are relevant to a word, phrase or sentence and that were published at approximately a same time period, where each document of the set of documents includes news content and has an associated headline. The system extracts headlines from the set of documents and derives a score for each headline of the extracted headlines based on how many times selected words in each headline occurs among all of the extracted headlines.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: September 16, 2014
    Assignee: Google Inc.
    Inventor: Douwe Osinga
  • Patent number: 8838618
    Abstract: Embodiments may include, for each item in a subset of items from a larger group of items, evaluating item description information about that item to identify a respective set of candidate phrases to be evaluated. Embodiments may also include, for each phrase in the sets of candidate phrases, generating multiple component scores based on one or more of the frequency with which that phrase occurs in the item description information for the subset of items and/or the frequency with which that phrase occurs in a corpus of item description information for the overall group of items. Embodiments may also include, for each phrase in the sets of candidate phrases, generating a respective phrase score based on the component scores generated for that phrase. Embodiments may include, based on phrase scores, selecting a subset of phrases from the sets of candidate phrases as being feature phrases for the subset of items.
    Type: Grant
    Filed: July 1, 2011
    Date of Patent: September 16, 2014
    Assignee: Amazon Technologies, Inc.
    Inventors: Jianhui Wu, Nicholas R. Boyd, Srikanth Thirumalai
  • Patent number: 8832079
    Abstract: Methods, apparatuses, and computer program products are provided for providing facilitating searching. A method may include determining a search term. The method may further include searching a database having at least one codified terminology set. The searching may be performed based at least in part upon the search term and historical search data. The method may additionally include determining one or more search results from the search. Each search result may include a codified term from the at least one codified terminology set. Corresponding apparatuses and computer program products are also provided.
    Type: Grant
    Filed: April 5, 2010
    Date of Patent: September 9, 2014
    Assignee: McKesson Financial Holdings
    Inventors: Arien Malec, Aron Ralston
  • Patent number: 8825641
    Abstract: Measuring duplication in search results is described. In one example, duplication between a pair of results provided by an information retrieval system in response to a query is measured. History data for the information retrieval system is accessed and query data retrieved, which describes the number of times that users have previously selected either or both of the pair of results, and a relative presentation sequence of the pair of results when displayed at each selection. From the query data, a fraction of user selections is determined in which a predefined combination of one or both of the pair of results were selected for a predefined presentation sequence. From the fraction, a measure of duplication between the pair of results is found. In further examples, the information retrieval system uses the measure of duplication to determine an overall redundancy value for a result set, and controls the result display accordingly.
    Type: Grant
    Filed: November 9, 2010
    Date of Patent: September 2, 2014
    Assignee: Microsoft Corporation
    Inventors: Filip Radlinski, Paul Nathan Bennett, Emine Yilmaz
  • Patent number: 8818337
    Abstract: A method of organizing mobile content in a network environment is provided that includes providing pieces of mobile content on a database, selecting one of the pieces of mobile content, receiving a descriptor to be associated with the selected piece of mobile content from user activity, and associating the descriptor to the selected piece of mobile content.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: August 26, 2014
    Assignee: FunMobility, Inc.
    Inventors: Srivathsan Narasimhan, Eric F. Allen, Skot Leach, Hudson George, Lincoln Lydick, Yu-Jen Dennis Chen, Adam Lavine, Silvy Mathews
  • Patent number: 8819032
    Abstract: A computing device receives, over a network, information regarding word phrases (e.g., search terms) and determines longevity values associated with content built around the word phrases. The computing device selects, based on the longevity values, a first phrase from the word phrases. Content is built or created around the first phrase, and the built or created content is presented or published over a network such as the Internet.
    Type: Grant
    Filed: May 24, 2012
    Date of Patent: August 26, 2014
    Inventor: Byron William Reese
  • Patent number: 8812509
    Abstract: Systems, techniques, and machine-readable instructions for inferring attributes from search queries. In one aspect, a method includes receiving a description of a collection of search queries, inferring attributes of entities from the description of the collection of search queries, associating the inferred attributes with identifiers of entities characterized by the attributes, and making the associations of the attributes and entities available.
    Type: Grant
    Filed: November 2, 2012
    Date of Patent: August 19, 2014
    Assignee: Google Inc.
    Inventors: Alexandru Marius Pasca, Benjamin Van Durme
  • Patent number: 8812518
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining one or more first search results that were generated for a search query; determining a score associated with the first search results; revising the search query using a query revision rule; obtaining one or more second search results that were generated for the revised search query; determining a score associated with the second search results; and evaluating the query revision rule by comparing the score associated with the first search results with the score associated with the second search results.
    Type: Grant
    Filed: February 2, 2012
    Date of Patent: August 19, 2014
    Assignee: Google Inc.
    Inventors: Dan Popovici, Robert Spalek
  • Patent number: 8811596
    Abstract: An apparatus for evaluating an audio communication comprises a data store for storing a plurality of digital units representing a plurality of characterized aspects of historical audio communications. The characterized aspects include words and sets of words. The apparatus further comprises an associative memory unit coupled with the data store to create associations between entities representing the characterized aspects, and relate the entities and the frequency of occurrence of the entities to identify relationships with call handling categories and priorities requiring intervention, and an assessing unit coupled with the associative memory unit to indicate whether the audio communication contains any associative memory associations related to a call handling category and call handling priority requiring intervention.
    Type: Grant
    Filed: June 25, 2007
    Date of Patent: August 19, 2014
    Assignee: The Boeing Company
    Inventors: William G. Arnold, Leonard J. Quadracci
  • Patent number: 8799269
    Abstract: A processor-implemented method, system, and/or computer program product optimizes a search for data from documents. A processor receives an instruction to perform an initial map/reduce search for a specific set of data in documents from a first database. A synthetic event, which is a non-executable descriptor of the specific set of data in documents from the first database, is generated, and a revised map/reduce search for the synthetic event in a second database is conducted. The processor then returns a solution for the revised map/reduce search.
    Type: Grant
    Filed: January 3, 2012
    Date of Patent: August 5, 2014
    Assignee: International Business Machines Corporation
    Inventors: Robert R. Friedlander, James R. Kraemer
  • Patent number: 8799773
    Abstract: Phrases in the reviews that express sentiment about a particular aspect are identified. Reviewable aspects of the entity are also identified. The reviewable aspects include static aspects that are specific to particular types of entities and dynamic aspects that are extracted from the reviews of a specific entity instance. The sentiment phrases are associated with the reviewable aspects to which the phrases pertain. The sentiment expressed by the phrases associated with each aspect is summarized, thereby producing a summary of sentiment associated with each reviewable aspect of the entity. The summarized sentiment and associated phrases can be stored and displayed to a user as a summary description of the entity.
    Type: Grant
    Filed: March 19, 2008
    Date of Patent: August 5, 2014
    Assignee: Google Inc.
    Inventors: George Reis, Sasha Blair-Goldensohn, Ryan T. McDonald
  • Patent number: 8799260
    Abstract: Techniques are provided for identifying topics that are unassociated with a dominant URL. A set of keywords associated with a topic is identified. A search log is scanned to identify search queries associated with the set of keywords. The identified search queries are grouped into clusters. Clusters associated with similar URLs are merged to generate an extended seed query string. The extended seed query string is analyzed to determine whether it relates to an existing dominant URL. If the extended seed query string is determined to be unassociated with an existing dominant URL, a web page associated with the topic may be generated.
    Type: Grant
    Filed: December 17, 2010
    Date of Patent: August 5, 2014
    Assignee: Yahoo! Inc.
    Inventors: Panagiotis Papadimitriou, Prabhakar Krishnamurthy, Frederick Kenneth Schmidt
  • Patent number: 8793261
    Abstract: An information retrieval system including a natural language parser (3) for parsing documents of a document space (1) to identify key terms of each document based on linguistic structure, and for parsing a search query to determine the search term, a feature extractor (4) for determining an importance score for terms of the document space based on distribution of the terms in the document space, an index term generator (5) for generating index terms using the key terms identified by the parser and the extractor and having an importance score above a threshold level, and a query clarifier (16) for selecting from the index terms, on the basis of the search term, index terms for selecting a document from the document space. A speech recognition engine (12) generates the query, and a bi-gram language module (6) generates grammar rules for the speech recognition engine using the index terms.
    Type: Grant
    Filed: October 17, 2001
    Date of Patent: July 29, 2014
    Assignee: Telstra Corporation Limited
    Inventors: Jason Jiang, Bradford Craig Starkie, Bhavani Laxman Raskutti
  • Patent number: 8782042
    Abstract: Some embodiments provide a program that identifies an entity having an entity attribute. The program receives, from each method of several methods, a set of candidate identity attributes that are each for identifying a particular entity having the entity attribute specified in the document. Each method of the several methods generates the corresponding set of candidate identity attributes based on the entity attribute specified in a document. The program calculates a score for each candidate identity attribute in the sets of candidate identity attributes. The program identifies, based on the sets of scores, an identity attribute from the sets of candidate identity attributes that identifies the entity having the entity attribute specified in the document.
    Type: Grant
    Filed: October 14, 2011
    Date of Patent: July 15, 2014
    Assignee: Firstrain, Inc.
    Inventors: David Cooke, Martin Betz, Ashutosh Joshi, Binay Mohanty
  • Patent number: 8775441
    Abstract: In one aspect, in general, a method is described for managing an archive. The archive is used for determining approximate matches associated with strings occurring in records. The method includes processing records to determine a set of string representations that correspond to strings occurring in the records. The method also includes generating, for each of at least some of the string representations in the set, a plurality of close representations that are each generated from at least some of the same characters in the string. The method also includes storing entries in the archive. Each stored entry represents a potential approximate match between at least two strings based on their respective close representations.
    Type: Grant
    Filed: January 16, 2008
    Date of Patent: July 8, 2014
    Assignee: Ab Initio Technology LLC
    Inventor: Arlen Anderson
  • Patent number: 8762371
    Abstract: A system and methods and user interface are disclosed for searching documents based on conceptual association, and for ranking documents based on content characteristics. A computer processing system receives a query containing a word or phrase that is a name of an object or concept, and can also receive a value for a customized ranking parameter. A computer storage device stores a dataset related to the object or concept name, wherein the dataset includes a plurality of property names and can also include association-strength values. A computer processing system can count a first frequency of a first property name and count a second frequency of a second property name in a document containing text in a natural language, calculate a relevance score as a function of the first frequency and the second frequency, and rank the plurality of documents using their respective relevance scores, and return one or more documents to the user based on the ranking of the plurality of documents.
    Type: Grant
    Filed: January 27, 2013
    Date of Patent: June 24, 2014
    Inventor: Guangsheng Zhang
  • Patent number: 8762393
    Abstract: A method for clustering multi-dimensional data streams includes: when data elements are input, determining 1-D subclusters and assigning identifiers to the determined 1-D subclusters; (b) generating a matching set that is a set of identifiers of the 1-D subclusters where each dimensional value of the data elements belongs to the range of the 1-D subclusters of the corresponding dimensions; and (c) determining subclusters by finding a set of frequently co-occurring 1-D subclusters among a set of 1-D subclusters that belong to the generated matching set. With the present invention, the processing time required to find the subclusters can be improved and the performance of the memory is further improved.
    Type: Grant
    Filed: September 22, 2009
    Date of Patent: June 24, 2014
    Assignee: Industry-Academic Cooperation Foundation, Yonsei University
    Inventor: Wong Suk Lee
  • Patent number: 8762389
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining one or more first search results that were generated for a search query; determining a score associated with the first search results; revising the search query using a query revision rule; obtaining one or more second search results that were generated for the revised search query; determining a score associated with the second search results; and evaluating the query revision rule by comparing the score associated with the first search results with the score associated with the second search results.
    Type: Grant
    Filed: February 2, 2012
    Date of Patent: June 24, 2014
    Assignee: Google Inc.
    Inventors: Dan Popovici, Robert Spalek
  • Patent number: 8745069
    Abstract: Methods for the automatic creation of a category tree with respect to the contents of a data stock, wherein a taxonomy of the data stock will be created on the base of co-occurrences. Another object of the present invention is furthermore a data processing system comprising data which represent information in at least one data stock which is accessible via at least one data source, which is designed and/or adapted to at least partially carry out a method according to the invention. Another object of the present invention is furthermore a data processing device for the electronic processing of data, comprising a control and/or computer unit, an input unit and an output unit, which is designed and/or adapted to at least partially carry out a method according to the invention, preferably using at least a part of a data processing system according to the invention.
    Type: Grant
    Filed: November 8, 2010
    Date of Patent: June 3, 2014
  • Patent number: 8744839
    Abstract: Target word recognition includes: obtaining a candidate word set and corresponding characteristic computation data, the candidate word set comprising text data, and characteristic computation data being associated with the candidate word set; performing segmentation of the characteristic computation data to generate a plurality of text segments; combining the plurality of text segments to form a text data combination set; determining an intersection of the candidate word set and the text data combination set, the intersection comprising a plurality of text data combinations; determining a plurality of designated characteristic values for the plurality of text data combinations; based at least in part on the plurality of designated characteristic values and according to at least a criterion, recognizing among the plurality of text data combinations target words whose characteristic values fulfill the criterion.
    Type: Grant
    Filed: September 22, 2011
    Date of Patent: June 3, 2014
    Assignee: Alibaba Group Holding Limited
    Inventors: Haibo Sun, Yang Yang, Yining Chen
  • Patent number: 8745054
    Abstract: A system, a method, an apparatus, and a computer-readable medium are provided. Co-occurrences of words or terms in a group of text documents are determined. A score for each of the co-occurrences of words or terms is calculated. A graphic view is presented. The graphic view has nodes that include at least one word or term and edges that join at least two nodes and depict a relationship among the at least two nodes. A layout of the graphic view includes a minimum number of crossings of the edges.
    Type: Grant
    Filed: November 30, 2005
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Wen-Ling Hsu, Guy J. Jacobson, Ann Eileen Skudlark, Thomas Paul Ventimiglia