Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
-
Publication number: 20130013612Abstract: Certain example embodiments relate to techniques for analyzing documents. A plurality of documents/document portions are imported into a database, with at least some of the documents/document portions being structured and at least some being unstructured. The imported documents/document portions are organized into one or more collections. A selection of at least one of the one or more collections is made. An index of words and/or groups of words is built (and optionally refined in accordance with one or more predefined rules) based on each of the document or document portion in each selection. A document-word matrix is built (and optionally weighted using a semantic approach), with the matrix including a value indicative of a number of times each word and/or group of words in the index appears in each document/document portion. One or more clusters of documents are generated using the document-word matrix.Type: ApplicationFiled: July 7, 2011Publication date: January 10, 2013Applicant: Software AGInventors: Klaus FITTGES, Khalid El Mansouri
-
Patent number: 8352472Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.Type: GrantFiled: March 2, 2012Date of Patent: January 8, 2013Assignee: CommVault Systems, Inc.Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
-
Patent number: 8346775Abstract: The different illustrative embodiments provide a method, a computer program product, and an apparatus for managing information. A request to store text in a table in a database is received. A determination is made as to whether a first collection of textual information having a first concept that is related to a second concept for the text is present in the database responsive to receiving the request containing the text. The text is associated with the first collection of textual information in the database responsive to a determination that the first collection of textual information in the database having the first concept that is related to the second concept for the text is present in the database. A second collection for the data with a third concept that is related to the second concept for the text within the degree of relatedness is created.Type: GrantFiled: August 31, 2010Date of Patent: January 1, 2013Assignee: International Business Machines CorporationInventors: Sandra K. Johnson, Grant D. Miller, Robert F. Pryor
-
Publication number: 20120330959Abstract: A method for assessing a person's security risk includes receiving data from a plurality of disparate data sources in which at least two of the plurality of disparate data sources maintain their respective data in different manners. The method also includes identifying at least one item of data from at least two different data sources that correspond to a first real-world person. The method further includes merging the items from the at least two different data sources into a first record associated with the first real-world person. The method additionally includes identifying one or more relationships between the first real-world person and one or more other real-world people. The method also includes adding the identified one or more relationships to the first record associated with the first real-world person. The method further includes determining a level of risk associated with the first real-world person based on the first record.Type: ApplicationFiled: June 27, 2011Publication date: December 27, 2012Applicant: Raytheon CompanyInventors: Donald R. Kretz, Roderic W. Paulk
-
Patent number: 8341158Abstract: A computer-implemented method includes receiving a dataset representing a plurality of users, a plurality of items, and a plurality of ratings given to items by users; clustering the plurality of users into a plurality of user-groups such that at least one user belongs to more than one user-group; clustering the plurality of items into a plurality of item-groups such that at least one item belongs to more than one item-group; inducing a model describing a probabilistic relationship between the plurality of users, items, ratings, user-groups, and item-groups, the induced model defined by a plurality of model parameters; and predicting a rating of a user for an item using the induced model.Type: GrantFiled: November 21, 2005Date of Patent: December 25, 2012Assignees: Sony Corporation, Sony Electronics Inc.Inventor: Chiranjit Acharya
-
Patent number: 8341159Abstract: Methods, apparatus and systems are provided to generate from a set of training documents a set of training data and a set of features for a taxonomy of categories. In this generated taxonomy the degree of feature overlap among categories is minimized in order to optimize use with a machine-based categorizer. However, the categories still make sense to a human because a human makes the decisions regarding category definitions. In an example embodiment, for each category, a plurality of training documents selected using Web search engines is generated, the documents winnowed to produce a more refined set of training documents, and a set of features highly differentiating for that category within a set of categories (a supercategory) extracted. This set of training documents or differentiating features is used as input to a categorizer, which determines for a plurality of test documents the plurality of categories to which they best belong.Type: GrantFiled: April 12, 2007Date of Patent: December 25, 2012Assignee: International Business Machines CorporationInventor: Stephen C. Gates
-
Publication number: 20120323920Abstract: A method for creating a semantically aggregated index in an indexer-agnostic index building system includes: extracting documents from a data source, each document including a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: ApplicationFiled: August 24, 2012Publication date: December 20, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
-
Patent number: 8335791Abstract: Tools and techniques are described herein for detecting synonyms and merging synonyms into search indexes. The tools provide methods that include receiving input documents for indexing into a search index file. The tools may compare parts of the input documents to parts of other documents already indexed into the search index file. The methods may also evaluate, based on these comparisons, whether the input document and the existing document are sufficiently similar to justify an inference that any dissimilar terms between the input document and the existing document are candidate synonyms. Other methods may include receiving requests to perforin searches that include one or more input keywords. The method then searches for links to synonyms of the input keyword, and returns search results responsive to the input keyword and to the synonyms.Type: GrantFiled: December 28, 2006Date of Patent: December 18, 2012Assignee: Amazon Technologies, Inc.Inventors: Michel L. Goldstein, Walter Manching Tseng, Randall Winston Puttick
-
Patent number: 8332416Abstract: A specification establishing method for controlling semiconductor process, the steps includes: sampling a plurality of sample groups from a population, each sample group being a non-normal distribution; filtering the sample groups; summarizing the filtered sample groups to form a non-normal distribution diagram; getting a value-at-risk and a median by calculating from the non-normal distribution diagram; getting a critical value by calculating the value-at-risk and the median with a critical formula; getting a plurality of state values by calculating the filtered sample groups with a proportion formula; and getting an index value by calculating the non-normal distribution diagram with the proportion formula. Thus, the state values indicate the states of the sample groups are abnormal or not by comparing the state values to the index value.Type: GrantFiled: January 11, 2011Date of Patent: December 11, 2012Assignee: Inotera Memories, Inc.Inventors: Cheng-Hao Chen, Yun-Zong Tian, Shih-Chang Kao, Yij Chieh Chu, Wei Jun Chen
-
Publication number: 20120310939Abstract: In accordance with the teachings described herein, systems and methods are provided for clustering time series based on forecast distributions. A method for clustering time series based on forecast distributions may include: receiving time series data relating to one or more aspects of a physical process; applying a forecasting model to the time series data to generate forecasted values and confidence intervals associated with the forecasted values, the confidence intervals being generated based on distribution information relating to the forecasted values; generating a distance matrix that identifies divergence in the forecasted values, the distance matrix being generated based the distribution information relating to the forecasted values; and performing a clustering operation on the plurality of forecasted values based on the distance matrix. The distance matrix may be generated using a symmetric Kullback-Leibler divergence algorithm.Type: ApplicationFiled: June 6, 2011Publication date: December 6, 2012Inventors: Taiyeong Lee, David Rawlins Duling
-
Patent number: 8326836Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing time series information with search results. In one aspect, a method includes determining that a first query is indicative of a request for time series information; generating a cost estimate that quantifies one or more costs of including the time series information with one or more search results, each search result including a resource locator that references a corresponding resource determined to be responsive to the query; generating a benefit estimate; determining to generate the time series information when the benefit estimate is greater than the cost estimate and generating the time series information in response to the determination, wherein generating the time series information includes collecting responsive time series information from one or more resources; and determining to not generate the time series information when the cost estimate is greater than the benefit estimate.Type: GrantFiled: July 13, 2010Date of Patent: December 4, 2012Assignee: Google Inc.Inventors: Geoffrey Roeder Pike, Luigi Semenzato
-
Publication number: 20120303610Abstract: A system and method are provided for determining a dynamic relation tree based on images in an image collection. An example system includes a memory for storing computer executable instructions, and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions include an event classifier to classify main characters of images in an image collection as to an event identification based on events in which the main characters appear, wherein each main character is characterized as to at least one attribute; a relation determination engine to determine relation circles of the main characters; and a construction engine to construct a dynamic relation tree representative of relations among the main characters, where the dynamic relation tree provides representations of the positions of the main characters in the relation circles, and where views of the dynamic relation tree change when different time periods are specified.Type: ApplicationFiled: May 25, 2011Publication date: November 29, 2012Inventor: Tong Zhang
-
Publication number: 20120290571Abstract: Aggregation, analysis, and presentation of patent and business data in a common interface are described. The analysis includes techniques for evaluating a patent or patent application by examining claim-related information. These techniques include deriving unique signatures of individual claims and ascertaining scope of individual claims relative to other claims in a collection (such as claims found in a common class). The signature and scope of patent claims may be graphically depicted using various graphics elements in a user interface.Type: ApplicationFiled: April 15, 2012Publication date: November 15, 2012Applicant: IP StreetInventors: Lewis C. Lee, John Charles Vogel, Chad Eberle
-
Patent number: 8312005Abstract: A semantically aware relational database management system includes suitable programming to relate attributes of the relational database to semantic equivalents of such attributes. In response to receiving a query, the relational database management system performs at least one semantically aware operation on the data in the relational database in order to determine what data is to be retrieved in response to the query. Results of the query presented to a user may include data derived from performing the semantically aware operations.Type: GrantFiled: December 31, 2009Date of Patent: November 13, 2012Assignee: SAP AGInventors: Maria E. Orlowska, Wasim Sadiq, Shazia Sadiq
-
Patent number: 8312021Abstract: One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.Type: GrantFiled: September 16, 2005Date of Patent: November 13, 2012Assignee: Palo Alto Research Center IncorporatedInventors: Irina Matveeva, Ayman Farahart
-
Patent number: 8306983Abstract: Representing in a database, a collection of items characterized by features. In a data processing system, determining a semantic space representations of the features across the collection. Each representation characterized by parameters and settings, and differing from each other by only one of: the value of one parameter, and the configuration of one setting. Determining, for each feature pair of a set of feature pairs, the relatedness of the first feature to the second feature in each semantic space representation. And representing the collection by the semantic space that provides the best aggregate relatedness across the set of feature pairs.Type: GrantFiled: October 26, 2009Date of Patent: November 6, 2012Assignee: Agilex Technologies, Inc.Inventor: Roger B. Bradford
-
Patent number: 8301633Abstract: Systems and methods for semantic search are provided. A corpus of information grouped into passages are indexed by semantic key terms generated from packed knowledge representations that document the semantic relationships of information within those passages. When a search is conducted, a query is similarly transformed into a packed knowledge representation that documents the semantic relationships from which semantic key terms are also generated. An inverted index relating the semantic key terms associated to the passages is searched using the semantic key terms generated from the query. A set of candidate passages is selected and refined by analysis of the semantic key terms and other information. The semantic representations associated with the set of candidate passages are then matched to the semantic representation of the query to determine a search result set.Type: GrantFiled: October 1, 2007Date of Patent: October 30, 2012Assignee: Palo Alto Research Center IncorporatedInventor: Robert D. Cheslow
-
Publication number: 20120271828Abstract: In one implementation, a method includes receiving a request for translation of one or more first keywords from a source language to a target language; and translating, using a machine translation process, the first keywords from the source language into a plurality of second keywords in the target language. The method can also include determining, by a computer system, frequencies with which each of the second keywords occur in a corpus associated with the target language. The method can further include selecting, by the computer system, a subset of the second keywords to use in the target language based on the determined frequencies of occurrence.Type: ApplicationFiled: April 21, 2011Publication date: October 25, 2012Applicant: Google Inc.Inventor: Mandayam Thondanur Raghunath
-
Patent number: 8296297Abstract: A content analysis and correlation service system can include a summary manager service for generating content correlation summaries, wherein the generated content correlation summaries are based on discovered content and analyzed content based on the discovered content. The system can include a content search manager service for generating the discovered content based on search criteria and correlation criteria and a semantic analysis service for generating the analyzed content based on the discovered content. The system can also include a data store for storing the generated content correlation summaries and a notification service for providing notifications based on the generated content correlation summaries.Type: GrantFiled: December 30, 2008Date of Patent: October 23, 2012Assignee: Novell, Inc.Inventors: Tammy Green, Stephen R. Carter, Scott Alan Isaacson
-
Patent number: 8296302Abstract: The present invention provides a method and system for extending content based on the semantic meaning of content. It divides content into multiple content regions and finds words and/or phrases that are semantically relevant to the current content region and appends these words and/or phrases to the current content region as extended content. The extended content matches semantically with the original content in such a seamless way that users may think it is a part of the content.Type: GrantFiled: May 4, 2009Date of Patent: October 23, 2012Inventor: Gang Qiu
-
Patent number: 8290958Abstract: A system and method may be disclosed for facilitating the creation or modification of a document by providing a mechanism for locating relevant data from external sources and organizing and incorporating some or all of said data into the document. In the method for reusing data, there may be a set of documents that may be queried, where each document may be divided into a plurality of sections. A plurality of section text groups may be formed based on the set of documents, where each section text group may be associated with a respective section from the plurality of sections and each section group includes a plurality of items. Each item may be associated with a respective section from each document of the set of documents. A selected item within a selected section text group may be focused. The selected item may be extracted to a current document. The current document may be exported to a host application.Type: GrantFiled: May 30, 2003Date of Patent: October 16, 2012Assignee: Dictaphone CorporationInventors: Keith W. Boone, Sunitha Chaparala, Cameron Fordyce, Sean Gervais, Roubik Manoukian, Harry J. Ogrinc, Robert G. Titemore, Jeffrey G. Hopkins
-
Publication number: 20120259854Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium including receiving user interaction data, wherein the user interaction specifies user interactions with content items and conversion items. A conversion item is a user action that satisfies a predetermined conversion criteria. The method includes receiving conversion data including conversion path data for a plurality of conversion paths, wherein each conversion path includes user interaction data prior to and including a conversion event. The method includes determining a first interaction, an assist interaction or a last interaction with content items for the conversion event. The method includes providing an ability to define a segment, using a processor, the conversion path data based on path-level dimensions and path-level metrics.Type: ApplicationFiled: April 11, 2011Publication date: October 11, 2012Inventors: Sissie Ling-Ie HSIAO, Cameron Tangney, Nicholas Seckar, Brian Chatham
-
Publication number: 20120259853Abstract: Methods and systems for relating breaking news stories across content providers include receiving a breaking news headline for a breaking news from a content provider. The breaking news headline is tokenized in substantial real time by identifying a plurality of headline tokens. A plurality of news stories is received from a plurality of content providers. Each of the plurality of news stories is tokenized to identify a plurality of story tokens. The plurality of headline tokens and story tokens are analyzed to determine if one or more of the news stories are related to the breaking news headline. Based on the analysis, one or more of the news stories are mapped to the breaking news headline. The mapping enables presentation of the one or more news stories from one or more of the content providers while rendering the breaking news headline.Type: ApplicationFiled: April 11, 2011Publication date: October 11, 2012Applicant: Yahoo!, Inc.Inventors: Abhijit Khasnis, Subramanian Narayanan
-
Publication number: 20120259856Abstract: A Website may be automatically categorized by (a) accepting Website information, (b) determining a set of scored clusters (e.g., semantic, term co-occurrence, etc.) for the Website using the Website information, and (c) determining at least one category (e.g., a vertical category) of a predefined taxonomy using at least some of the set of clusters.Type: ApplicationFiled: June 20, 2012Publication date: October 11, 2012Inventors: David GEHRKING, Ching LAW, Andrew MAXWELL
-
Publication number: 20120259855Abstract: In the provided document clustering system (100), a concept tree structure accumulation unit (11) stores a concept tree structure that represents a hierarchical relationship among concepts represented by each of a plurality of words. For any two words, a concept similarity computation unit (12) obtains a concept similarity, which is an index indicating how close the concepts represented by the two words are. Using concept similarities for words that appear in two documents in a document set, an inter-document similarity computation unit (13) obtains an inter-document similarity, which indicates how similar the two documents are semantically. A clustering unit (14) uses inter-document similarities to cluster the documents in the document set.Type: ApplicationFiled: December 21, 2010Publication date: October 11, 2012Applicant: NEC CORPORATIONInventors: Hironori Mizuguchi, Dai Kusui
-
Patent number: 8285719Abstract: Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. A probabilistic model is presented for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, parametric hard and soft relational clustering algorithms are provided under a large number of exponential family distributions.Type: GrantFiled: August 10, 2009Date of Patent: October 9, 2012Assignee: The Research Foundation of State University of New YorkInventors: Bo Long, Zhongfei (Mark) Zhang
-
Patent number: 8285745Abstract: Systems and methods to determine relevant keywords from a user's search query sessions are disclosed. The described method includes identifying search session logs of a user, segmenting the search session logs into one or more search sessions. After the segmentation, the search sessions are analyzed to compose a list of semantically relevant keyword sets including at least a first keyword set and a second keyword set. The described method further includes determining a semantic relevance between the first and second keyword sets according to the frequency at which the first and second keyword sets are reported in the query results and displaying one or more semantically high relevant keyword sets after being filtered by a threshold.Type: GrantFiled: August 31, 2007Date of Patent: October 9, 2012Assignee: Microsoft CorporationInventors: Hua Li, HuaJun Zeng, Jian Hu, Zheng Chen, Jian Wang
-
Patent number: 8280877Abstract: Systems and methods for implementing diverse topic phrase extraction are disclosed. According to one implementation, multiple word candidate phrases are extracted from a corpus and weighed. One or more documents are re-weighed to identify less obvious candidate topics using latent semantic analysis (LSA). Phrase diversification is then used to remove redundancy and select informative and distinct topic phrases.Type: GrantFiled: September 21, 2007Date of Patent: October 2, 2012Assignee: Microsoft CorporationInventors: Benyu Zhang, Jilin Chen, Zheng Chen, HuaJun Zeng, Jian Wang
-
Patent number: 8275774Abstract: A streaming query system for extensible markup language is provided. An XPath query translator receives and analyzes a user-input XPath document. An abstract syntax tree analyzer establishes an abstract syntax tree. A XML parser receives and parses an XML document. An index generator generates an index for the XML document. A computation module performs a format calculation based on the abstract syntax tree and the index, and generates a query result accordingly.Type: GrantFiled: July 23, 2010Date of Patent: September 25, 2012Assignee: National Taiwan University of Science and TechnologyInventors: Hahn-Ming Lee, Li-Zhen Liu, Chieh-Hung Lin, Jerome Yeh, Chia-Hsin Huang
-
Publication number: 20120239655Abstract: A system for storing digital images and accessing and storing digital image information using a communication network includes a plurality of independently controlled digital storage repositories associated with one or more different authorization groups, wherein a first digital storage repository includes a first digital image with associated first semantic information and wherein a second digital storage repository in a common authorization group with the first digital storage repository includes a second digital image with associated second semantic information and an associated second category, and wherein the processor of the first digital storage repository uses its computer program to independently access and match the first semantic information with the second semantic information, to associate the second category with the first semantic information, and to store the second category in association with the first semantic information in the first digital storage repository.Type: ApplicationFiled: March 15, 2011Publication date: September 20, 2012Inventors: Ronald Steven Cok, Joseph Anthony Manico
-
Patent number: 8271496Abstract: A computer-readable medium stores computer-readable instructions that control a communication on a communication apparatus that obtains a content summary information having at least content location information from a server. The instructions cause the communication apparatus to perform steps. The steps include receiving a delivery source information inputted through a user operation, determining whether the delivery source information includes a predetermined character string. Content summary information corresponding to the inputted delivery source information is obtained when the determining step determines that the predetermined character string is not included in the delivery source information. Content summary information corresponding to a predetermined alternative delivery source information is obtained when the determining step determines that the predetermined character string is included in the delivery source information.Type: GrantFiled: September 30, 2009Date of Patent: September 18, 2012Assignee: Brother Kogyo Kabushiki KaishaInventor: Yusaku Takahashi
-
Publication number: 20120233150Abstract: Technologies pertaining to annotation aggregation are described herein. A user of a computing device assigns an annotation to a portion of a document, wherein the annotation comprises a tuple. The tuple comprises semantic relationships amongst words or phrases in the document. Relationship data is also generated, wherein the relationship data identifies the document, the author of the document, the author of the annotation, and other data.Type: ApplicationFiled: March 11, 2011Publication date: September 13, 2012Applicant: Microsoft CorporationInventors: Oscar Gerardo Naim, Lucretia Henrica Vanderwende, Krist Wongsuphasawat
-
Patent number: 8260664Abstract: Advertisements are selected for presentation on search result pages and web pages based on phrases generated from lateral concepts and topics identified for the search result pages and web pages. A search query or an indication of a web page is received for which advertisements are to be provided. Lateral concepts and topics are identified based on the search query or content of the web page. The lateral concepts and topics are used as phrases for selecting advertisements from an advertisement inventory. Selected advertisements are provided for presentation on a search results page in response to a search query or on a web page initially identified.Type: GrantFiled: February 5, 2010Date of Patent: September 4, 2012Assignee: Microsoft CorporationInventors: Viswanath Vadlamani, Abhinai Srivastava, Tarek Najm, Munirathnam Srikanth, Phani Vaddadi, Arungunram Chandrasekaran Surendran, Rajeev Prasad
-
Publication number: 20120221574Abstract: A pivot is determined from enrolled data by a pivot determination unit, raw data is acquired, features are extracted from the raw data, a score is calculated as one of a distance and a degree of similarity between the features, an index vector is generated by using the score for the pivot, a ? score is calculated as one of a distance and a degree of similarity between the index vectors, a parameter of each non-pivot including a regression coefficient is trained by using training data, order to select the non-pivots is, by using the ? score between search data and the non-pivot as well as the regression coefficient, determined in descending order of posterior probability through logistic regression, and a search result is outputted based on the score between the search data and the enrolled data.Type: ApplicationFiled: February 9, 2012Publication date: August 30, 2012Applicant: HITACHI, LTD.Inventors: Takao Murakami, Kenta Takahashi
-
Publication number: 20120209851Abstract: An apparatus and a method manage a received mobile transaction coupon in a mobile terminal. The apparatus includes a communication unit, an information analyzer, a schedule manager, an output unit, and a controller. The communication unit receives a mobile transaction coupon. The information analyzer obtains the received mobile transaction coupon information. The schedule manager registers the obtained mobile transaction coupon information in an alarm program. The output unit outputs the registered mobile transaction coupon information on a relevant date via the alarm program. The controller controls to register the mobile transaction coupon information in the alarm program, and controls to store the received mobile transaction coupon in a storage area corresponding to a reception type or a folder for a widget function.Type: ApplicationFiled: February 9, 2012Publication date: August 16, 2012Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Byung-Kwon Kong, Soon-Mi Cho
-
Patent number: 8244701Abstract: Systems and methods for applying user behavior data to improve search query result ranking are provided. Upon receiving an update file indicating that recent, significant user behavior data is available for a document associated with an inverted index, the update file is published periodically and frequently to an index server. After filtering out the relevant update information from the update file, the index server extracts identifiers of the documents having the associated user behavior data. The update file and the identifier of the documents are utilized to update an in-memory index containing representations of metadata indicative of the user behavior. The in-memory index is continuously updated and utilized to serve search query results in response to user search queries. Search query results from the in-memory index are ranked using the user behavior data prior to serving. Thus, results associated with recent, significant user-behavior metadata receive prominent placement on the search results page.Type: GrantFiled: June 27, 2011Date of Patent: August 14, 2012Assignee: Microsoft CorporationInventors: Walter Sun, Jay Kumar Goyal, Pratibha Permandla, Yinzhe Yu, Jingfeng Li
-
Patent number: 8244700Abstract: Systems and methods for performing an updating process to an in-memory index are provided. Upon receiving notice of document modifications covered by an inverted index associated with a search engine, in the form of an update file, a representation of the modification is published onto various index serving machines. Each index serving machine receiving the update file determines if the modifications are applicable to the index serving machine. If an index serving machine determines that it contains mapping information corresponding to the modified documents, the index serving machine utilizes the update file and associated mapping information to update an in-memory index. In embodiments, the in-memory index is used to provide results to user queries in tandem with the inverted index. In some embodiments, an extra in-memory index is maintained that is revised with constantly incoming metadata updates and the existing in-memory index is periodically swapped with the revised in-memory index.Type: GrantFiled: February 12, 2010Date of Patent: August 14, 2012Assignee: Microsoft CorporationInventors: Pratibha Permandla, Yinzhe Yu, Guarav Sareen, Abhas Kumar
-
Patent number: 8234279Abstract: A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.Type: GrantFiled: October 11, 2005Date of Patent: July 31, 2012Assignee: The Boeing CompanyInventors: Yuan-Jye Wu, Anne S-W Kao, Stephen R. Poteet, William Ferng, Robert E. Cranfill
-
Publication number: 20120179684Abstract: A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.Type: ApplicationFiled: January 12, 2011Publication date: July 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
-
Publication number: 20120173532Abstract: According to one embodiment, a determination tree generating apparatus includes a determination unit, a condition generating unit, a determining unit, and a point branch generating unit. The determination unit provisionally and sequentially determines all component categories to be classification component categories for a first point of a determination tree. The point branch generating unit generates a first point assigned to a classification component category, and generates component names to be assigned to one or more branches leading from an assigned first point to one or more child points.Type: ApplicationFiled: March 15, 2012Publication date: July 5, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Shigeta KUNINOBU
-
Publication number: 20120173531Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.Type: ApplicationFiled: March 2, 2012Publication date: July 5, 2012Applicant: COMMVAULT SYSTEMS, INC.Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
-
Patent number: 8214368Abstract: An extracting unit extracts keywords from metadata extracted from played scenes. An attaching unit attaches a semantic class to the keywords. A semantic class determining unit determines whether the semantic class is a should-be-played class. When there is a keyword with the should-be-played class attached, an acquiring unit acquires at least one keyword without having the should-be-played class as a should-be-observed keyword. When the metadata includes the should-be-observed keyword and a keyword to which a should-be-stopped class is attached, an appearance determining unit determines that a scene including the should-be-observed keyword appears in contents.Type: GrantFiled: September 22, 2008Date of Patent: July 3, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Tomohiro Yamasaki, Takahiro Kawamura
-
Patent number: 8214367Abstract: Systems for recording, searching, and outputting display information are provided. In some embodiments, systems for recording display information are provided. The systems include a virtual display that: intercepts display-changes describing changes to be made to a state of a display; sends the display-changes to a client for display; records the display-changes; and a context recorder that records context information describing a state of the display derived from a source independently of the display changes and independently of screen-images. In some embodiments, the systems further include a display system that generates an output screen-image based at least in part on at least one of the display-changes and in response to a search of the context information. In some embodiments, the virtual display further records screen-images; and the display system further generates the output screen-image based at least in part on a recorded-screen-image of the recorded screen-images.Type: GrantFiled: February 27, 2008Date of Patent: July 3, 2012Assignee: The Trustees of Columbia University in the City of New YorkInventors: Ricardo Baratto, Oren Laadan, Dan Phung, Shaya Joseph Potter, Jason Nieh
-
Patent number: 8209321Abstract: Computer-readable media, computerized methods, and computer systems for conducting semantic processes to present search results that include highlighted regions which are relevant to a conceptual meaning of a query are provided. Initially, content of document(s) is accessed and semantic representations are derived by distilling linguistic representations from the content. These semantic representations may be stored at a semantic index. Also, a proposition is derived from the query by parsing search terms of the query, and distilling the proposition from the search terms. Typically, the proposition is a logical representation of the conceptual meaning of the query. The proposition is compared against the semantic representations at the semantic index to identify a matching set. Regions of the content within the document, from which the matching set of semantic representations are derived, are targeted.Type: GrantFiled: August 29, 2008Date of Patent: June 26, 2012Assignee: Microsoft CorporationInventors: Barney Pell, Scott Prevost, Giovanni Lorenzo Thione, Brendan O'Connor, Lukas Biewald
-
Patent number: 8204903Abstract: Semantic queries are expressed and executed within a relational database. This can be done by defining semantic rules applied to execute the semantic queries using table valued functions and common table expressions, and then simply calling the defined table valued functions to execute the queries.Type: GrantFiled: February 16, 2010Date of Patent: June 19, 2012Assignee: Microsoft CorporationInventors: Stuart M. Bowers, Thomas E. Jackson, Chris Demetrios Karkanias, Allen L. Brown, David G. Campbell, Brian S. Aust
-
Patent number: 8204736Abstract: A mechanism is provided for determining a second document of a set of documents in a second language having the same textual content as a first document in a first language. A first histogram that is indicative of the textual content of the first document is generated. A second histogram is generated for each document of the set of documents. Each second histogram is indicative of the textual content of a document of the set of documents. Each second histogram is compared with the first histogram to determine at least one histogram from the plurality of second histograms which matches the first histogram. The second document is then identified as the document having the at least one histogram.Type: GrantFiled: November 6, 2008Date of Patent: June 19, 2012Assignee: International Business Machines CorporationInventors: Ossama Emam, Ahmed Hassan, Hany M. Hassan
-
Patent number: 8200672Abstract: In a search support server, a related word extraction unit generates frequency information and co-occurrence information of keywords, a graph generation unit generates coordinate information of a spring graph including the keywords as nodes, on the basis of the co-occurrence information, a cluster generation unit groups the nodes into clusters and thereby generates cluster definition information, and a display information generation unit generates display information of the spring graph. In addition, an operation determination unit determines which operation is performed on the spring graph. Then, when a level change is instructed, the display information generation unit generates display information of the spring graph after the level is changed. When a node change is instructed, a cluster re-generation unit changes the cluster definition information and the frequency information.Type: GrantFiled: June 24, 2009Date of Patent: June 12, 2012Assignee: International Business Machines CorporationInventors: Noritaka Adachi, Shinya Kawanaka, Yoshitaka Matsumoto, Raymond Harry Putra Rudy
-
Patent number: 8195662Abstract: A density-based data clustering method, comprising a parameter-setting step, a first retrieving step, a first determination step, a second determination step, a second retrieving step, a third determination step and first and second termination determination steps. The parameter-setting step sets parameters. The first retrieving step retrieves one data point and defines neighboring points. The first determination step determines whether the number of the data points exceeds the minimum threshold value. The second determination step arranges a plurality of first border symbols. The second retrieving step retrieves one seed data point from the seed list, arranges a plurality of second border symbols and defines seed neighboring points. The third determination step determines whether a data point density of searching ranges of the seed neighboring points is the same. The first termination determination step determines whether the clustering is finished.Type: GrantFiled: January 6, 2010Date of Patent: June 5, 2012Assignee: National Pingtung University Of Science & TechnologyInventors: Cheng-Fa Tsai, Yi-Ching Huang
-
Publication number: 20120136865Abstract: An approach is provided for determining and utilizing geographical locations contextually relevant to a user. A contextually relevant location platform determines location-based data associated with a user and/or user device. The contextually relevant location platform determines stationary points based, at least in part, on the location-based data. The contextually relevant location platform determines context data associated with the stationary points. The contextually relevant location platform determines at least one location anchor based, at least in part, on the stationary points and the associated context data, wherein the at least one location anchor represents a bounded geographical area of contextual relevance to the user.Type: ApplicationFiled: November 30, 2010Publication date: May 31, 2012Applicant: Nokia CorporationInventors: Jan Otto Blom, Gian Paolo Perrucci, Mats Lönngren, Juha Kalevi Laurila, Niko Tapani Kiukkonen, Julien Eberle, Daniel Gatica-Perez, Raul Montoliu-Colas, Julian Charles Nolan
-
Publication number: 20120124050Abstract: A system for harmonized commodity description and coding system (HS) code recommendation includes an ontology editor for creating an HS code ontology based on HS codes of export and import items, and a feature vector processor for extracting feature vectors of a product of a company requesting for an HS code of the product by with reference to the description of the product in response to the request. An HS code recommendation unit extracts one or more HS codes appropriate for the product by comparing the extracted feature vectors with feature vectors of the product searched from a feature vector database. The extracted HS codes are provided to the company requesting for an HS code of the product.Type: ApplicationFiled: November 16, 2011Publication date: May 17, 2012Applicant: Electronics and Telecommunications Research InstituteInventors: Kyung-Ah YANG, Moonyoung CHUNG, Kyong-I KU