Based On Term Frequency Of Appearance Patents (Class 707/750)
  • Patent number: 8566323
    Abstract: Methods and apparatus teach a digital spectrum of a file. The digital spectrum is used to map a file's position. This position relative to another file's position reveals closest neighbors. When multiple such neighbors are arranged, first “patterns” of data are created that further define digital spectrums of new files. It is within this sorted new data that emergent relationships or second “patterns” are examined, according to the techniques for its underlying files, or “patterns of patterns.” Representatively, original files are stored on computing devices. If encoded, they have pluralities of symbols representing an underlying data stream of original bits of data. The original files are examined for relationships between each of the files. The original relationships are converted to new files. The new files are representatively encoded and examined for other relationships.
    Type: Grant
    Filed: December 29, 2009
    Date of Patent: October 22, 2013
    Assignee: Novell, Inc.
    Inventors: Scott A. Isaacson, Craig N. Teerlink, Nadeem A. Nazeer
  • Publication number: 20130275436
    Abstract: Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 17, 2013
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Lev Novik, John C. Platt
  • Patent number: 8559724
    Abstract: An apparatus and method for generating additional information about moving picture content, including: comparing image feature information about each image frame in moving picture content with image feature information about each image frame in web information, searching for an image frame in the moving picture content, the image frame matching the image frame in the web information, determining location information about the found image frame in the moving picture content, and generating additional information by use of the determined location information and the web information.
    Type: Grant
    Filed: February 24, 2010
    Date of Patent: October 15, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yoon-hee Choi, Il-hwan Choi, Hee-seon Park
  • Publication number: 20130268482
    Abstract: Systems, methods, and computer-readable media for determining the Internet search popularity of an entity are provided. Embodiments of the present invention include receiving a group of Internet search records and assigning a popularity ranking based on the number of times an entity descriptor associated with an entity occurs within the group of Internet search records created over a designated time period. An entity descriptor is one or more terms commonly used to identify an entity. The trend in an entity's popularity rank may also be calculated. An entity's popularity rank and trend in popularity rank may be presented in a graph or in a list.
    Type: Application
    Filed: March 14, 2013
    Publication date: October 10, 2013
    Inventors: Tabreez Govani, Hugh Williams, Jamie Buckley, Nitin Agrawal, Andy Lam, Kenneth A. Moss
  • Publication number: 20130262481
    Abstract: A system and a method are disclosed for identifying video files on a webpage and streaming video files to a client device. A server receives browsing data including uniform resource locator for a webpage and identifies missing videos on the webpage. The server identifies a source file for the missing videos including identifying a location for each missing video. The server retrieves a thumbnail for each missing video and provides it to a client device. Additionally, the server transcodes the video file responsive to a user input provided by a user. The transcoded video is streamed to the client device.
    Type: Application
    Filed: May 10, 2013
    Publication date: October 3, 2013
    Applicant: Skyfire Labs, Inc.
    Inventors: Nitin Bhandari, Erik R. Swenson, Geoffrey Dale Benson, Ishika Paul, James Marzano, Jaime Heilpern, Robert Oberhofer, Michael Guzewicz, Vijay Kumar
  • Publication number: 20130246386
    Abstract: Systems are used for identifying key phrases within documents. These systems utilize a tags and a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight plus cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. These systems can scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
    Type: Application
    Filed: March 11, 2013
    Publication date: September 19, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Sorin Gherman, Kunal Mukerjee
  • Patent number: 8533195
    Abstract: Electronic documents are retrieved from a database and/or from a network of servers. The documents are topic modeled in accordance with a Regularized Latent Semantic Indexing approach. The Regularized Latent Semantic Indexing approach may allow an equation involving an approximation of a term-document matrix to be solved in parallel by multiple calculating units. The equation may include terms that are regularized via either l1 norm and/or via l2 norm. The Regularized Latent Semantic Indexing approach may be applied to a set, or a fixed number, of documents such that the set of documents is topic modeled. Alternatively, the Regularized Latent Semantic Indexing approach may be applied to a variable number of documents such that, over time, the variable of number of documents is topic modeled.
    Type: Grant
    Filed: June 27, 2011
    Date of Patent: September 10, 2013
    Assignee: Microsoft Corporation
    Inventors: Jun Xu, Hang Li, Nicholas Craswell
  • Publication number: 20130232154
    Abstract: Systems and methods of identifying and categorizing social network messages that are relevant to selected categories and text terms are provided. The frequency of text terms appearing in social network messages are calculated for multiple categories. Based on the calculated text term frequency, social network messages can be identified and/or categorized that match a provided set of text terms. Selecting and/or associating text terms and categories are determined by repeatedly analyzing social network messages.
    Type: Application
    Filed: April 11, 2013
    Publication date: September 5, 2013
    Applicant: CitizenNet Inc.
    Inventors: Michael Aaron Hall, Daniel Benyamin, Aaron Chu
  • Patent number: 8515971
    Abstract: The present invention relates to a method for assisting a user in making a decision to compare biometric data of an individual with data from a database relating to a large number of individuals, and biometric data is acquired for an individual concerned, that this data is encoded, that the data items are compared in pairs with corresponding data from the database, that, for each comparison score the duplicate occurrence frequency/non-duplicate occurrence frequency ration is established, that the product of all the available ratios is calculated, that this product is standardized, that the standardized ratio is compared to a pre-set threshold, that the values greater than the pre-set threshold are kept and that this result is submitted to the user for him to validate it as appropriate.
    Type: Grant
    Filed: November 2, 2006
    Date of Patent: August 20, 2013
    Assignee: Thales
    Inventor: Jean Beaudet
  • Patent number: 8515975
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using search entity transition probabilities. In some implementations, data identifying entities and transition probabilities between entities is stored in a computer readable medium. Each transition probability represents a strength of a relationship between a pair of entities as they are related in search history data. In some implementations, an increase in popularity for a query is identified and a different query is identified as temporally related to the query. Scoring data for documents responsive to the different query is modified to favor newer documents. In other implementations, data identifying a first session as spam is received, and a spam score is calculated for either a second session of queries or a single query using transition probabilities. The second session (or single query) is identified as spam from the spam score.
    Type: Grant
    Filed: December 7, 2009
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventor: Diego Federici
  • Patent number: 8515974
    Abstract: A method is presented for generating a list of frequently used words for an email application on a server computer. When a request is received for a word frequency list for emails stored in a user's mailbox, a word frequency list is returned if one exists. If the word frequency list does not exist, an asynchronous process is started on the server computer to generate a word frequency list. If the word frequency list exists but it is older than an aging limit, an asynchronous process is started on the server computer to regenerate the word frequency list. The word frequency list is stored in the user's mailbox along with a timestamp indicating the date and time that the list was created or updated.
    Type: Grant
    Filed: September 2, 2011
    Date of Patent: August 20, 2013
    Assignee: Microsoft Corporation
    Inventors: Ashish Consul, Suryanarayana M. Gorti, Michael Geoffrey Andrew Wilson, James C. Kleewein
  • Patent number: 8515973
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying geographic features. In one aspect, a method includes receiving a query. Geographic features are identified, each geographic feature being associated with one or more names, each geographic feature being associated with at least one name that includes the query. A feature-query score is computed for each geographic feature, including: for each name of the geographic feature that includes the query, identifying a computed feature-name score, wherein the feature-name score is computed based on a count of a number of occurrences of the name in a query log, wherein each occurrence is attributed to the feature; and computing the feature-query score based on the identified feature-name scores. The geographic features are ranked according to the feature-query scores.
    Type: Grant
    Filed: March 25, 2011
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Radu Jurca, Anja Hauth, Ivan Zauharodneu, Matsvei Zhdanovich, Luuk Van Dijk, Steffen Meschkat, David E. Lecomte
  • Patent number: 8515972
    Abstract: A programmed computer receives one or more documents that contain text that is relevant to a user (“interest documents”). The programmed computer automatically identifies groups of words that match the interest documents. The matching word groups are ranked by a weight that is assigned based on how infrequently a word group matches a reference corpus and how frequently the word group matches one or more interest document(s), in comparison to other word groups. A set of word groups are automatically identified based on ranking, and displayed to a user to select documents from a corpus. Selected documents are displayed to the user, e.g. with one or more group of words used in selecting the documents.
    Type: Grant
    Filed: February 10, 2010
    Date of Patent: August 20, 2013
    Assignee: Python 4 Fun, Inc.
    Inventors: Devabhaktuni Srikrishna, Marc Coram
  • Patent number: 8510314
    Abstract: Methods, systems, and apparatus, including computer program products are provided for ranking distinct book content items based on implicit links to other distinct book content items. The implicit links are defined based on the identification of matching features in the distinct book content items. In some implementations, the matching features are uncommon phrases in textual content of the distinct book content items. Edges representing implicit links are generated between distinct nodes representing distinct book content items in a weighted graph. Search results for distinct book content items can be ordered based on the edges connected to the distinct nodes in the weighted graph that represent the distinct book content items.
    Type: Grant
    Filed: October 6, 2011
    Date of Patent: August 13, 2013
    Assignee: Google Inc.
    Inventors: Shumeet Baluja, Yushi Jing
  • Patent number: 8504563
    Abstract: Sorting inquiry results includes, based on extracted inquiry results matching search conditions of a user, collecting features of the inquiry results. The collected features may be used as features of a respective inquiry result and feature fitting may be conducted based on a support vector machine (SVM) regression model to obtain a feature fitting value of the respective inquiry result. The inquiry results may be sorted based on relevancy values of the inquiry results, and, for inquiry results having a same relevancy level, the inquiry results may be sorted in a top-down manner based on feature fitting values of the inquiry results.
    Type: Grant
    Filed: July 22, 2011
    Date of Patent: August 6, 2013
    Assignee: Alibaba Group Holding Limited
    Inventors: Chao Chen, Xiaomei Han
  • Patent number: 8504357
    Abstract: A related word presentation device includes a program information storage unit that stores program information of each program; and an information dividing unit that generates, for each of the attributes of the words included in the program information, at least one group which includes a reference word belonging to the attribute and a set of words which co-occur with the reference word in a program. A degree-of-relevance calculating unit stores attribute-based association dictionaries each of which indicates, for the corresponding attribute of words, (i) the words and (ii) the degrees of relevance between the words calculated based on the frequency of co-occurrence in each of groups. A search condition obtaining unit obtains the search word and the attribute; a substitute word obtaining unit selects substitute words from the attribute-based association dictionary for the obtained attribute; and an output unit presents the selected substitute word.
    Type: Grant
    Filed: July 30, 2008
    Date of Patent: August 6, 2013
    Assignee: Panasonic Corporation
    Inventors: Takashi Tsuzuki, Satoshi Matsuura, Kazutoyo Takata
  • Patent number: 8504564
    Abstract: A method, apparatus and computer program product provides for a semantic analyzer to produce and rank semantic terms to reflect their relationship to the theme and topics of a document. The text and the document can have no relationship to any pre-selected keywords before the semantic analyzer performs text extraction. The semantic analyzer extracts text from a document and performs semantic analysis on the extracted text. The semantic analyzer provides a plurality of ranked semantic terms as a result of the semantic analysis and associates semantic terms with the document as semantic keywords. The semantic terms define content to be presented with the document where the content is an advertisement, a link to a remote information resource or a second document.
    Type: Grant
    Filed: December 15, 2010
    Date of Patent: August 6, 2013
    Assignee: Adobe Systems Incorporated
    Inventors: Walter Chang, Nadia Ghamrawi
  • Patent number: 8504578
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Grant
    Filed: August 16, 2012
    Date of Patent: August 6, 2013
    Assignee: MSC Intellectual Properties B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Patent number: 8504561
    Abstract: Techniques are described herein for using intent to access a domain (i.e., domain intent) to provide more search results that correspond to the domain. For example, a rule may specify a maximum number of search results that are allowed to be provided from a domain (or a host that corresponds to the domain) in response to a search query. Each search query may include any number of ngrams. An ngram is a subsequence of elements in a sequence (e.g., a search query). An intent to access a domain may be determined based on one or more of the ngrams in a search query. A number of search results that correspond to a domain may be increased to be greater than the maximum number based on one or more of the ngrams that are included in the search query being associated with the intent to access the domain.
    Type: Grant
    Filed: September 2, 2011
    Date of Patent: August 6, 2013
    Assignee: Microsoft Corporation
    Inventors: Timothy C. Hoad, Deepak Vijaywargi, Yatharth Saraf
  • Patent number: 8489617
    Abstract: Disclosed are systems for, and methods of, automatically detecting and treating field values of a particular field as null field values in records of a database. The system and method provide automatic treatment of these field values as null field values by calculating a critical frequency for the field. Based on the critical frequency of the field, the system and method treats field values that occur more than the critical frequency of the field as null field values and treats field values that occur less than the critical frequency as non-null field values.
    Type: Grant
    Filed: June 5, 2012
    Date of Patent: July 16, 2013
    Assignee: LexisNexis Risk Solutions FL Inc.
    Inventor: David Alan Bayliss
  • Publication number: 20130173568
    Abstract: Methods and systems are provided that may be utilized to generate website link suggestions.
    Type: Application
    Filed: December 28, 2011
    Publication date: July 4, 2013
    Applicant: YAHOO! INC.
    Inventors: Vanja Josifovski, Evgeniy Gabrilovich, Bo Pang, Fernando Diaz, Jangwon Seo
  • Patent number: 8473498
    Abstract: A method of text analytics includes filtering a plurality of unfiltered records having unstructured data into at least a first group and a second group. The first group and said second group each include at least two records and the first group is different than the second group. The method includes determining a first proportion of occurrence for a term by comparing a first number of records having at least one occurrence of the term in the first group to a first total number of records in the first group, determining a second proportion of occurrence for the term by comparing a second number of records having at least one occurrence of the term in said second group to a second total number of records in the second group, and comparing the first proportion of occurrence to the second proportion of occurrence to yield a resultant comparison occurrence.
    Type: Grant
    Filed: August 2, 2011
    Date of Patent: June 25, 2013
    Inventor: Tom H. C. Anderson
  • Publication number: 20130151538
    Abstract: An entity summarization system is described herein that mines the Internet and other data source to provide answers to questions such as the relative sentiment of users towards various brands. The system uses a controlled vocabulary list describing a specific aspect of entities of interest. Given an entity name, the system scans the whole content corpus to collect statistics on the words that occur most frequently in the context of the entity name, taking into account proximity information, to produce a weighted list of vocabulary terms describing the entity. Two entities can be compared by normalizing and comparing their weighted term lists. In some embodiments, the system performs these procedures efficiently by leveraging an N-gram web model. Thus, the system provides an automated way to compare two entities to derive information about how users feel about the entities at any given time.
    Type: Application
    Filed: December 12, 2011
    Publication date: June 13, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Pavel Dmitriev, Wei Zhuang
  • Patent number: 8463827
    Abstract: Embodiments are directed towards identifying auto-folder tags for messages by using a combinational optimization approach of bi-clustering folder names and features of messages based on relationship strengths. The combinational optimization approach of bi-clustering, generally, groups a plurality of folder names and a plurality of features into one or more metafolders to optimize a cost. The cost is based on an aggregate of cut relationship strengths, where a cut results when a relationship folder name and feature are grouped in separate metafolders. Furthermore, the plurality of folder names and the plurality of features are obtained by monitoring actions of a plurality of users, where the folder names are user generated folder names and features are from a plurality of messages. The metafolders may be used to tag new user messages with an auto-folder tag.
    Type: Grant
    Filed: January 4, 2011
    Date of Patent: June 11, 2013
    Assignee: Yahoo! Inc.
    Inventors: Vishwanath Tumkur Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark E. Risher, Yoelle Maarek Smadja
  • Patent number: 8458198
    Abstract: A term analyzer receives an ordered collection of text-based terms. The term analyzer analyzes groupings of consecutive text-based terms in the ordered collection to identify occurrences of different combinations of text-based terms. In addition, the term analyzer maintains frequency information representing the occurrences of the different combinations of text-based terms in the collection. The frequency information can then be used to determine relatively significant keywords and/or keyword phrases in the document. In an example configuration, the term analyzer creates a tree in which a first term in a given grouping of the groupings is defined as a parent node in the tree and a second term in the given grouping is defined as a child node of the parent node in the tree. The method of the analyzer generalizes to create a tree of multi-word terms in which the terms can be efficiently ranked by occurrence.
    Type: Grant
    Filed: December 5, 2011
    Date of Patent: June 4, 2013
    Assignee: Adobe Systems Incorporated
    Inventors: Michael J. Welch, Walter W. Chang
  • Patent number: 8452774
    Abstract: A method and system for splitting a text document into individual sentences using sentence boundary detection, and establishing co-relationships between terms which are present in the same sentence. A document corpus, or collection of text records, is provided, containing text with terms to be extracted. The text records in the document corpus are divided into individual sentences, using a set of rules for sentence boundary detection. The individual sentences are then analyzed to extract and correlate terms, such as parts and symptoms, symptoms and actions, or parts and failure modes. The correlated terms are then validated based on frequency of occurrence, with term pairs being considered valid if their frequency of occurrence exceeds a minimum frequency threshold. The validated term correlations can be used for fault model development, document classification, and document clustering.
    Type: Grant
    Filed: March 10, 2011
    Date of Patent: May 28, 2013
    Assignee: GM Global Technology Operations LLC
    Inventor: Dnyanesh Rajpathak
  • Publication number: 20130132407
    Abstract: Various embodiments of methods and apparatus for fitting a surface to a data set are disclosed. A frequency distribution of an input data set is determined. Determining the frequency distribution includes assigning each data point of the input data set to a category representing a value of a variable for the respective data point. Responsive to identifying one or more discontinuities of the frequency distribution, a continuous section of the frequency distribution is identified as a first data set. A first equation is fit to the first data set.
    Type: Application
    Filed: February 25, 2011
    Publication date: May 23, 2013
    Inventors: Balaji Krishnmurthy, Anubha Rastogi
  • Publication number: 20130124541
    Abstract: A method and system for collaborating tags in a bookmarking system wherein the bookmarking system includes a plurality of tags applied to content items by a plurality of users, the method and system including, examining all the tags that are applied to all the content items, determining whether two tags have been assigned to the same content item, if two tags have been assigned to the same content item, computing the relative strength of each of the two tags with respect to each other.
    Type: Application
    Filed: January 2, 2013
    Publication date: May 16, 2013
    Applicant: International Business Machines Corporation
    Inventor: International Business Machines Corporation
  • Patent number: 8442988
    Abstract: A cell-specific dictionary is applied adaptively to adequate cells, where the cell-specific dictionary subsequently optimizes the handling of frequency-partitioned multi-dimensional data. This includes improved data partitioning with super cells or adjusting resulting cells by sub-dividing very large cells and merging multiple small cells, both of which avoid the highly skewed data distribution in cells and improve the query processing. In addition, more efficient encoding is taught within a cell in case the distinct values that actually appear in that cell are much smaller than the size of the column dictionary.
    Type: Grant
    Filed: November 4, 2010
    Date of Patent: May 14, 2013
    Assignee: International Business Machines Corporation
    Inventors: Oliver Draese, Namik Hrle, Oliver Koeth, Tianchao Li, Vijayshankar Raman, Knut Stolze
  • Patent number: 8429176
    Abstract: The present invention is directed towards systems and methods for extending media annotations using collective knowledge. The method according to one embodiment of the present invention comprises receiving a plurality of content items and associated annotations. The method further normalizes the plurality of associated annotations and calculates pair frequencies for the plurality of associated annotations. The method then retrieves a plurality of alternative annotations and provides the plurality of alternative annotations.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: April 23, 2013
    Assignee: Yahoo! Inc.
    Inventors: Borkur Sigurbjornsson, Roelof van Zwol
  • Publication number: 20130091151
    Abstract: In accordance with disclosed embodiments, there are provided methods, systems, and apparatuses for performing time-partitioned collaborative filtering in an on-demand service environment including, for example, receiving as input, a plurality of access requests for data stored within the host organization and a corresponding plurality of actions for the data to which access is requested; accessing an input table having a time field, action field, item field, and agent field therein; recording time data and agent data for each of the received plurality of access requests and the corresponding plurality of actions; recording an item within the item field and an action within the action field for each of the received plurality of access requests and the corresponding plurality of actions based on the action performed on an item of the data to which access is requested; and analyzing the input table to generate one or more pairs of first actions and items to second actions and items and a time based score for eac
    Type: Application
    Filed: October 2, 2012
    Publication date: April 11, 2013
    Applicant: SALESFORCE.COM, INC.
    Inventor: Salesforce.com, Inc.
  • Publication number: 20130086086
    Abstract: A computer-readable recording medium stores a program causing a computer to execute an information generating process that includes tabulating an appearance frequency for each designated word in an object file group in which character strings are described; identifying for each designated word and based on the appearance frequency tabulated for the designated word, a rank in descending order up to a target appearance rate for the designated words; detecting in an object file selected from the object file group, specific designated words among the identified ranks; and generating for each of the detected specific designated words, index information that indicates the presence/absence of the specific designated word in each object file among the object file group.
    Type: Application
    Filed: November 27, 2012
    Publication date: April 4, 2013
    Applicant: FUJITSU LIMITED
    Inventor: FUJITSU LIMITED
  • Publication number: 20130086085
    Abstract: A computer-readable recording medium has stored therein a program for causing a computer to execute an analysis support process that includes storing to a storage device, a name of a second process that is a process included among a plurality of processes called in response to execution of a program, the computer storing the name of the second process when a first process having a name that matches a keyword stored in a storage device is included among the processes.
    Type: Application
    Filed: August 13, 2012
    Publication date: April 4, 2013
    Applicant: FUJITSU LIMITED
    Inventor: Shingo KATO
  • Patent number: 8407233
    Abstract: A method and system for calculating a relevance between words using a document set is provided. The method of calculating the relevance between words based on a document set, includes: obtaining statistical information about the words based on at least one of the words, documents, a word classification of the words, and a document classification of the documents, wherein the words and the documents are included in the document set; standardizing the statistical information; and calculating the relevance between the words based on the standardized statistical information.
    Type: Grant
    Filed: December 10, 2007
    Date of Patent: March 26, 2013
    Assignee: NHN Business Platform Corporation
    Inventors: Ki Ho Song, Byoung Hak Kim, Min uk Kim, Tae Yeong Kwak
  • Patent number: 8407216
    Abstract: Embodiments of the present invention provide systems and methods for automatically generating tag terms (or tags) for objects in databases of a web site. The metadata of the objects (or data) of the web site are processed and parsed to automatically generate tag terms for the corresponding objects. Information (or data, or content) downloaded from the Internet often comes with metadata, which can exist in titles, description, sources, and authors of the information, etc. The metadata of downloaded information can be process and parsed to generate tag terms for the corresponding objects. The system can automatically generate tag terms for the data, which are stored as objects in the databases, and make the data (or objects) searchable. In addition, the automatically generated tag terms allow associated data to maintain their relationship. For example, data from the same sources, same authors, or same subjects can be identified based on the common tag terms.
    Type: Grant
    Filed: September 25, 2008
    Date of Patent: March 26, 2013
    Assignee: Yahoo! Inc.
    Inventors: Hubert M. Walker, Noel C. Morrison, Timothy Caplis, Scott Bedard, Ankarino S. Lara, Stephen James Blake
  • Patent number: 8402035
    Abstract: Exemplary embodiments are directed to determining a media value associated mentions of an entity in one or more documents based on a sentiment attributed to the mentions of the entity and/or a frequency with which the entity is mentioned. Exemplary embodiments can include a media value engine that can identify mentions of an entity in documents, attribute sentiment to the mentions of the entity; determine a polarity of the sentiment, and calculate a media value attributed to the entity based on the sentiment.
    Type: Grant
    Filed: March 14, 2011
    Date of Patent: March 19, 2013
    Assignee: General Sentiment, Inc.
    Inventors: Greg Artzt, Mark Fasciano, Steve Skiena, Levon Lloyd
  • Patent number: 8402032
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for correcting entity names. One method includes receiving texts and deriving a plurality of name-context pairs from the texts. The method further includes calculating a context consistency measure for each name-context pair and storing context-entity name data representing the name-context pairs. Another method includes identifying an entity name and one or more context terms from a query and generating candidate names for the entity name. The method further includes determining a score for each of the candidate names, selecting a number of top scoring candidate names, and using the selected candidate names to respond to the query.
    Type: Grant
    Filed: March 24, 2011
    Date of Patent: March 19, 2013
    Assignee: Google Inc.
    Inventors: Lawrence J. Brunsman, Matthieu Devin, Uri N. Lerner, Simon Tong
  • Patent number: 8402034
    Abstract: A method for providing content-level data artifact recommendations can begin with the creation of a semantic library from the textual content of data artifacts by a newsworthy content recommendation engine. A base newsworthiness rating can be calculated using global newsworthiness parameters and behavioral functions that model newsworthy influences for each relationship contained in the semantic library. A user-specific search network can be generated that represents user-entered criteria and/or user task-related criteria. Within the semantic library, potential newsworthy semantic networks can be identified. Newsworthy content from each identified potential newsworthy semantic network can be dynamically determined based upon the base newsworthiness rating and a predefined newsworthiness threshold. The newsworthy content from the identified potential newsworthy semantic network can be related to the user-specific search network at the common node, creating a newsworthy content recommendation graph.
    Type: Grant
    Filed: March 2, 2012
    Date of Patent: March 19, 2013
    Assignee: International Business Machines Corporation
    Inventors: Daniel John McCloskey, Marcello Trovati, Carol Sue Zimmet
  • Patent number: 8402036
    Abstract: Disclosed herein is a method, a system and a computer product for generating a snippet for an entity, wherein each snippet comprises a plurality of sentiments about the entity. One or more textual reviews associated with the entity is selected. A plurality of sentiment phrases are identified based on the one or more textual reviews, wherein each sentiment phrase comprises a sentiment about the entity. One or more sentiment phrases from the plurality of sentiment phrases are selected to generate a snippet.
    Type: Grant
    Filed: June 24, 2011
    Date of Patent: March 19, 2013
    Assignee: Google Inc.
    Inventors: Sasha Blair-Goldensohn, Kerry Hannan, Ryan T. McDonald, Tyler Neylon, Jeffrey C. Reynar
  • Patent number: 8402022
    Abstract: Tools and techniques for converging terms within a collaborative tagging environment are described herein. Methods for converging divergent contributions to the collaborative tagging environment may include receiving respective contributions from users within the environment. The methods may identify at least some of the contributions as divergent, and enable the users to converge the divergent contributions.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: March 19, 2013
    Inventors: Martin R. Frank, Walter Manching Tseng
  • Patent number: 8396879
    Abstract: One or more server devices may simultaneously calculate first ranking scores for a group of users and second ranking scores for a group of comments authored by the group of users. The calculating may occur during a same process. The one or more server devices may further provide one of a first ranked list that includes information identifying the group of users, the information identifying the group of users being ordered based on the first ranking scores, or a second group of comments of the group of comments, the comments in the second group of comments being ordered based on the second ranking scores.
    Type: Grant
    Filed: February 28, 2012
    Date of Patent: March 12, 2013
    Assignee: Google Inc.
    Inventors: Michal Cierniak, Na Tang
  • Patent number: 8392398
    Abstract: A method for executing a query on a graph data stream. The graph stream comprises data representing edges that connect vertices of a graph. The method comprises constructing a plurality of synopsis data structures based on at least a subset of the graph data stream. Each vertex connected to an edge represented within the subset of the graph data stream is assigned to a synopsis data structure such that each synopsis data structure represents a corresponding section of the graph. The method further comprises mapping each received edge represented within the graph data stream onto the synopsis data structure which corresponds to the section of the graph which includes that edge, and using the plurality of synopsis data structures to execute the query on the graph data stream.
    Type: Grant
    Filed: July 29, 2009
    Date of Patent: March 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Min Wang, Peixiang Zhao
  • Patent number: 8380718
    Abstract: A system and method for grouping similar documents is provided. Frequencies of occurrences are determined for terms and noun phrases within a set of documents. A subset of the documents is selected by removing those documents having terms and noun phrases that fall outside a bounded range of upper and lower conditions for frequency of occurrence. Each of the documents in the subset is mapped to a cluster of documents based on a similarity of the documents to the cluster documents.
    Type: Grant
    Filed: September 2, 2011
    Date of Patent: February 19, 2013
    Assignee: FTI Technology LLC
    Inventors: Dan Gallivan, Kenji Kawai
  • Publication number: 20130041906
    Abstract: A privacy-preserving system and method is disclosed for profiling clients within a system for knowledge management. The method of the present invention discloses steps for generating a client profile in support of receiving and processing messages using scoring techniques and/or filtering techniques. The method of the present invention further includes steps for generating a client profile in support of a method for generating and obtaining responses to messages using scoring techniques and/or filtering techniques. The system of the present invention, includes all means for implementing the method.
    Type: Application
    Filed: August 7, 2012
    Publication date: February 14, 2013
    Inventors: Eytan Adar, Rajan Mathew Lukose, Joshua Rogers Tyler, Caesar Sengupta
  • Patent number: 8375036
    Abstract: Methods, systems, and apparatus, including computer program products are provided for ranking distinct book content items based on implicit links to other distinct book content items. The implicit links are defined based on the identification of matching features in the distinct book content items. In some implementations, the matching features are uncommon phrases in textual content of the distinct book content items. Edges representing implicit links are generated between distinct nodes representing distinct book content items in a weighted graph. Search results for distinct book content items can be ordered based on the edges connected to the distinct nodes in the weighted graph that represent the distinct book content items.
    Type: Grant
    Filed: November 17, 2011
    Date of Patent: February 12, 2013
    Assignee: Google Inc.
    Inventors: Shumeet Baluja, Yushi Jing
  • Patent number: 8368918
    Abstract: Methods and apparatus to methods and apparatus to identify images in print advertisements are disclosed. An example method comprises computing a first image feature vector for a first presented image, comparing the first image feature vector to a second image feature vector, and when the first image feature vector matches the second image feature vector, storing printed-media information associated with the first presented image in a database record associated with the second image feature vector.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: February 5, 2013
    Assignee: The Nielsen Company (US), LLC
    Inventors: Kevin Deng, Alan Nguyen Bosworth
  • Patent number: 8370366
    Abstract: Embodiments of systems and methods for comparing attributes of a data record are presented herein. Broadly speaking, embodiments of the present invention generate a weight based on a comparison of the name (or other) attributes of data records. More particularly, embodiments of the present invention generate a weight based on a comparison of name attributes. More specifically, embodiments of the present invention may calculate an information score for each of two name attributes to be compared to get an average information score for the two name attributes. The two name attributes may then be compared against one another to generate a weight between the two attributes. This weight can then be normalized to generate a final weight between the two business name attributes.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: February 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Norm Adams, Scott Ellard, Scott Schumacher
  • Patent number: 8370347
    Abstract: A system is described for assessing information in natural language contents. A user interface receives an object name as a query term and a value for a customized ranking parameter from a user. A computer storage device stores an object-specific data set related to the object name, wherein the object-specific data set includes a plurality of property names and association-strength values. A computer processing system can count a first frequency of a first property name and count a second frequency of a second property name in a document containing text in a natural language, calculate a relevance score as a function of the first frequency and the second frequency, and rank the plurality of documents using their respective relevance scores, and return one or more documents to the user based on the ranking of the plurality of documents. The function is in part defined by the customized ranking parameter.
    Type: Grant
    Filed: February 17, 2012
    Date of Patent: February 5, 2013
    Inventor: Guangsheng Zhang
  • Patent number: 8364706
    Abstract: A system and a method of retrieving information is described. In a system according to the invention, software modules may be used to provide the user with information that is most likely to be the information desired.
    Type: Grant
    Filed: June 18, 2004
    Date of Patent: January 29, 2013
    Assignee: ZI Corporation of Canada, Inc.
    Inventor: Todd Garrett Simpson
  • Publication number: 20130007020
    Abstract: An exemplary embodiment of the present techniques extracts concepts and relationships from a text. Concepts may be generated from the text using singular value decomposition, and ranked based on a term weight and a distance metric. The concepts that are ranked above a particular threshold may be iteratively extracted, and the concepts may be merged to form larger concepts until the generation of concepts has stabilized. Relationships may be generated based on the concepts using singular value decomposition, then ranked based on various metrics. The relationships that are ranked above a particular threshold may be extracted.
    Type: Application
    Filed: June 30, 2011
    Publication date: January 3, 2013
    Inventors: Sujoy Basu, Sharad Singhal