Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)

TECHNIQUES FOR COMPARING AND CLUSTERING DOCUMENTS

Publication number: 20130013612

Abstract: Certain example embodiments relate to techniques for analyzing documents. A plurality of documents/document portions are imported into a database, with at least some of the documents/document portions being structured and at least some being unstructured. The imported documents/document portions are organized into one or more collections. A selection of at least one of the one or more collections is made. An index of words and/or groups of words is built (and optionally refined in accordance with one or more predefined rules) based on each of the document or document portion in each selection. A document-word matrix is built (and optionally weighted using a semantic approach), with the matrix including a value indicative of a number of times each word and/or group of words in the index appears in each document/document portion. One or more clusters of documents are generated using the document-word matrix.

Type: Application

Filed: July 7, 2011

Publication date: January 10, 2013

Applicant: Software AG

Inventors: Klaus FITTGES, Khalid El Mansouri
Systems and methods for using metadata to enhance data identification operations

Patent number: 8352472

Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.

Type: Grant

Filed: March 2, 2012

Date of Patent: January 8, 2013

Assignee: CommVault Systems, Inc.

Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
Managing information

Patent number: 8346775

Abstract: The different illustrative embodiments provide a method, a computer program product, and an apparatus for managing information. A request to store text in a table in a database is received. A determination is made as to whether a first collection of textual information having a first concept that is related to a second concept for the text is present in the database responsive to receiving the request containing the text. The text is associated with the first collection of textual information in the database responsive to a determination that the first collection of textual information in the database having the first concept that is related to the second concept for the text is present in the database. A second collection for the data with a third concept that is related to the second concept for the text within the degree of relatedness is created.

Type: Grant

Filed: August 31, 2010

Date of Patent: January 1, 2013

Assignee: International Business Machines Corporation

Inventors: Sandra K. Johnson, Grant D. Miller, Robert F. Pryor
Method and Apparatus for Assessing a Person's Security Risk

Publication number: 20120330959

Abstract: A method for assessing a person's security risk includes receiving data from a plurality of disparate data sources in which at least two of the plurality of disparate data sources maintain their respective data in different manners. The method also includes identifying at least one item of data from at least two different data sources that correspond to a first real-world person. The method further includes merging the items from the at least two different data sources into a first record associated with the first real-world person. The method additionally includes identifying one or more relationships between the first real-world person and one or more other real-world people. The method also includes adding the identified one or more relationships to the first record associated with the first real-world person. The method further includes determining a level of risk associated with the first real-world person based on the first record.

Type: Application

Filed: June 27, 2011

Publication date: December 27, 2012

Applicant: Raytheon Company

Inventors: Donald R. Kretz, Roderic W. Paulk
User's preference prediction from collective rating data

Patent number: 8341158

Abstract: A computer-implemented method includes receiving a dataset representing a plurality of users, a plurality of items, and a plurality of ratings given to items by users; clustering the plurality of users into a plurality of user-groups such that at least one user belongs to more than one user-group; clustering the plurality of items into a plurality of item-groups such that at least one item belongs to more than one item-group; inducing a model describing a probabilistic relationship between the plurality of users, items, ratings, user-groups, and item-groups, the induced model defined by a plurality of model parameters; and predicting a rating of a user for an item using the induced model.

Type: Grant

Filed: November 21, 2005

Date of Patent: December 25, 2012

Assignees: Sony Corporation, Sony Electronics Inc.

Inventor: Chiranjit Acharya
Creating taxonomies and training data for document categorization

Patent number: 8341159

Abstract: Methods, apparatus and systems are provided to generate from a set of training documents a set of training data and a set of features for a taxonomy of categories. In this generated taxonomy the degree of feature overlap among categories is minimized in order to optimize use with a machine-based categorizer. However, the categories still make sense to a human because a human makes the decisions regarding category definitions. In an example embodiment, for each category, a plurality of training documents selected using Web search engines is generated, the documents winnowed to produce a more refined set of training documents, and a set of features highly differentiating for that category within a set of categories (a supercategory) extracted. This set of training documents or differentiating features is used as input to a categorizer, which determines for a plurality of test documents the plurality of categories to which they best belong.

Type: Grant

Filed: April 12, 2007

Date of Patent: December 25, 2012

Assignee: International Business Machines Corporation

Inventor: Stephen C. Gates
CREATING A SEMANTICALLY AGGREGATED INDEX IN AN INDEXER-AGNOSTIC INDEX BUILDING SYSTEM

Publication number: 20120323920

Abstract: A method for creating a semantically aggregated index in an indexer-agnostic index building system includes: extracting documents from a data source, each document including a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.

Type: Application

Filed: August 24, 2012

Publication date: December 20, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
Detecting synonyms and merging synonyms into search indexes

Patent number: 8335791

Abstract: Tools and techniques are described herein for detecting synonyms and merging synonyms into search indexes. The tools provide methods that include receiving input documents for indexing into a search index file. The tools may compare parts of the input documents to parts of other documents already indexed into the search index file. The methods may also evaluate, based on these comparisons, whether the input document and the existing document are sufficiently similar to justify an inference that any dissimilar terms between the input document and the existing document are candidate synonyms. Other methods may include receiving requests to perforin searches that include one or more input keywords. The method then searches for links to synonyms of the input keyword, and returns search results responsive to the input keyword and to the synonyms.

Type: Grant

Filed: December 28, 2006

Date of Patent: December 18, 2012

Assignee: Amazon Technologies, Inc.

Inventors: Michel L. Goldstein, Walter Manching Tseng, Randall Winston Puttick
Specification establishing method for controlling semiconductor process

Patent number: 8332416

Abstract: A specification establishing method for controlling semiconductor process, the steps includes: sampling a plurality of sample groups from a population, each sample group being a non-normal distribution; filtering the sample groups; summarizing the filtered sample groups to form a non-normal distribution diagram; getting a value-at-risk and a median by calculating from the non-normal distribution diagram; getting a critical value by calculating the value-at-risk and the median with a critical formula; getting a plurality of state values by calculating the filtered sample groups with a proportion formula; and getting an index value by calculating the non-normal distribution diagram with the proportion formula. Thus, the state values indicate the states of the sample groups are abnormal or not by comparing the state values to the index value.

Type: Grant

Filed: January 11, 2011

Date of Patent: December 11, 2012

Assignee: Inotera Memories, Inc.

Inventors: Cheng-Hao Chen, Yun-Zong Tian, Shih-Chang Kao, Yij Chieh Chu, Wei Jun Chen
Systems And Methods For Clustering Time Series Data Based On Forecast Distributions

Publication number: 20120310939

Abstract: In accordance with the teachings described herein, systems and methods are provided for clustering time series based on forecast distributions. A method for clustering time series based on forecast distributions may include: receiving time series data relating to one or more aspects of a physical process; applying a forecasting model to the time series data to generate forecasted values and confidence intervals associated with the forecasted values, the confidence intervals being generated based on distribution information relating to the forecasted values; generating a distance matrix that identifies divergence in the forecasted values, the distance matrix being generated based the distribution information relating to the forecasted values; and performing a clustering operation on the plurality of forecasted values based on the distance matrix. The distance matrix may be generated using a symmetric Kullback-Leibler divergence algorithm.

Type: Application

Filed: June 6, 2011

Publication date: December 6, 2012

Inventors: Taiyeong Lee, David Rawlins Duling
Providing time series information with search results

Patent number: 8326836

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing time series information with search results. In one aspect, a method includes determining that a first query is indicative of a request for time series information; generating a cost estimate that quantifies one or more costs of including the time series information with one or more search results, each search result including a resource locator that references a corresponding resource determined to be responsive to the query; generating a benefit estimate; determining to generate the time series information when the benefit estimate is greater than the cost estimate and generating the time series information in response to the determination, wherein generating the time series information includes collecting responsive time series information from one or more resources; and determining to not generate the time series information when the cost estimate is greater than the benefit estimate.

Type: Grant

Filed: July 13, 2010

Date of Patent: December 4, 2012

Assignee: Google Inc.

Inventors: Geoffrey Roeder Pike, Luigi Semenzato
SYSTEM AND METHOD FOR DETERMINING DYNAMIC RELATIONS FROM IMAGES

Publication number: 20120303610

Abstract: A system and method are provided for determining a dynamic relation tree based on images in an image collection. An example system includes a memory for storing computer executable instructions, and a processing unit for accessing the memory and executing the computer executable instructions. The computer executable instructions include an event classifier to classify main characters of images in an image collection as to an event identification based on events in which the main characters appear, wherein each main character is characterized as to at least one attribute; a relation determination engine to determine relation circles of the main characters; and a construction engine to construct a dynamic relation tree representative of relations among the main characters, where the dynamic relation tree provides representations of the positions of the main characters in the relation circles, and where views of the dynamic relation tree change when different time periods are specified.

Type: Application

Filed: May 25, 2011

Publication date: November 29, 2012

Inventor: Tong Zhang
Evaluating Intellectual Property

Publication number: 20120290571

Abstract: Aggregation, analysis, and presentation of patent and business data in a common interface are described. The analysis includes techniques for evaluating a patent or patent application by examining claim-related information. These techniques include deriving unique signatures of individual claims and ascertaining scope of individual claims relative to other claims in a collection (such as claims found in a common class). The signature and scope of patent claims may be graphically depicted using various graphics elements in a user interface.

Type: Application

Filed: April 15, 2012

Publication date: November 15, 2012

Applicant: IP Street

Inventors: Lewis C. Lee, John Charles Vogel, Chad Eberle
Semantically aware relational database management system and related methods

Patent number: 8312005

Abstract: A semantically aware relational database management system includes suitable programming to relate attributes of the relational database to semantic equivalents of such attributes. In response to receiving a query, the relational database management system performs at least one semantically aware operation on the data in the relational database in order to determine what data is to be retrieved in response to the query. Results of the query presented to a user may include data derived from performing the semantically aware operations.

Type: Grant

Filed: December 31, 2009

Date of Patent: November 13, 2012

Assignee: SAP AG

Inventors: Maria E. Orlowska, Wasim Sadiq, Shazia Sadiq
Generalized latent semantic analysis

Patent number: 8312021

Abstract: One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.

Type: Grant

Filed: September 16, 2005

Date of Patent: November 13, 2012

Assignee: Palo Alto Research Center Incorporated

Inventors: Irina Matveeva, Ayman Farahart
Semantic space configuration

Patent number: 8306983

Abstract: Representing in a database, a collection of items characterized by features. In a data processing system, determining a semantic space representations of the features across the collection. Each representation characterized by parameters and settings, and differing from each other by only one of: the value of one parameter, and the configuration of one setting. Determining, for each feature pair of a set of feature pairs, the relatedness of the first feature to the second feature in each semantic space representation. And representing the collection by the semantic space that provides the best aggregate relatedness across the set of feature pairs.

Type: Grant

Filed: October 26, 2009

Date of Patent: November 6, 2012

Assignee: Agilex Technologies, Inc.

Inventor: Roger B. Bradford
System and method for semantic search

Patent number: 8301633

Abstract: Systems and methods for semantic search are provided. A corpus of information grouped into passages are indexed by semantic key terms generated from packed knowledge representations that document the semantic relationships of information within those passages. When a search is conducted, a query is similarly transformed into a packed knowledge representation that documents the semantic relationships from which semantic key terms are also generated. An inverted index relating the semantic key terms associated to the passages is searched using the semantic key terms generated from the query. A set of candidate passages is selected and refined by analysis of the semantic key terms and other information. The semantic representations associated with the set of candidate passages are then matched to the semantic representation of the query to determine a search result set.

Type: Grant

Filed: October 1, 2007

Date of Patent: October 30, 2012

Assignee: Palo Alto Research Center Incorporated

Inventor: Robert D. Cheslow
Localized Translation of Keywords

Publication number: 20120271828

Abstract: In one implementation, a method includes receiving a request for translation of one or more first keywords from a source language to a target language; and translating, using a machine translation process, the first keywords from the source language into a plurality of second keywords in the target language. The method can also include determining, by a computer system, frequencies with which each of the second keywords occur in a corpus associated with the target language. The method can further include selecting, by the computer system, a subset of the second keywords to use in the target language based on the determined frequencies of occurrence.

Type: Application

Filed: April 21, 2011

Publication date: October 25, 2012

Applicant: Google Inc.

Inventor: Mandayam Thondanur Raghunath
Content analysis and correlation

Patent number: 8296297

Abstract: A content analysis and correlation service system can include a summary manager service for generating content correlation summaries, wherein the generated content correlation summaries are based on discovered content and analyzed content based on the discovered content. The system can include a content search manager service for generating the discovered content based on search criteria and correlation criteria and a semantic analysis service for generating the analyzed content based on the discovered content. The system can also include a data store for storing the generated content correlation summaries and a notification service for providing notifications based on the generated content correlation summaries.

Type: Grant

Filed: December 30, 2008

Date of Patent: October 23, 2012

Assignee: Novell, Inc.

Inventors: Tammy Green, Stephen R. Carter, Scott Alan Isaacson
Method and system for extending content

Patent number: 8296302

Abstract: The present invention provides a method and system for extending content based on the semantic meaning of content. It divides content into multiple content regions and finds words and/or phrases that are semantically relevant to the current content region and appends these words and/or phrases to the current content region as extended content. The extended content matches semantically with the original content in such a seamless way that users may think it is a part of the content.

Type: Grant

Filed: May 4, 2009

Date of Patent: October 23, 2012

Inventor: Gang Qiu
Method, system, and apparatus for data reuse

Patent number: 8290958

Abstract: A system and method may be disclosed for facilitating the creation or modification of a document by providing a mechanism for locating relevant data from external sources and organizing and incorporating some or all of said data into the document. In the method for reusing data, there may be a set of documents that may be queried, where each document may be divided into a plurality of sections. A plurality of section text groups may be formed based on the set of documents, where each section text group may be associated with a respective section from the plurality of sections and each section group includes a plurality of items. Each item may be associated with a respective section from each document of the set of documents. A selected item within a selected section text group may be focused. The selected item may be extracted to a current document. The current document may be exported to a host application.

Type: Grant

Filed: May 30, 2003

Date of Patent: October 16, 2012

Assignee: Dictaphone Corporation

Inventors: Keith W. Boone, Sunitha Chaparala, Cameron Fordyce, Sean Gervais, Roubik Manoukian, Harry J. Ogrinc, Robert G. Titemore, Jeffrey G. Hopkins
Conversion Path Based Segmentation

Publication number: 20120259854

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium including receiving user interaction data, wherein the user interaction specifies user interactions with content items and conversion items. A conversion item is a user action that satisfies a predetermined conversion criteria. The method includes receiving conversion data including conversion path data for a plurality of conversion paths, wherein each conversion path includes user interaction data prior to and including a conversion event. The method includes determining a first interaction, an assist interaction or a last interaction with content items for the conversion event. The method includes providing an ability to define a segment, using a processor, the conversion path data based on path-level dimensions and path-level metrics.

Type: Application

Filed: April 11, 2011

Publication date: October 11, 2012

Inventors: Sissie Ling-Ie HSIAO, Cameron Tangney, Nicholas Seckar, Brian Chatham
Real Time Association of Related Breaking News Stories Across Different Content Providers

Publication number: 20120259853

Abstract: Methods and systems for relating breaking news stories across content providers include receiving a breaking news headline for a breaking news from a content provider. The breaking news headline is tokenized in substantial real time by identifying a plurality of headline tokens. A plurality of news stories is received from a plurality of content providers. Each of the plurality of news stories is tokenized to identify a plurality of story tokens. The plurality of headline tokens and story tokens are analyzed to determine if one or more of the news stories are related to the breaking news headline. Based on the analysis, one or more of the news stories are mapped to the breaking news headline. The mapping enables presentation of the one or more news stories from one or more of the content providers while rendering the breaking news headline.

Type: Application

Filed: April 11, 2011

Publication date: October 11, 2012

Applicant: Yahoo!, Inc.

Inventors: Abhijit Khasnis, Subramanian Narayanan
CATEGORIZING OBJECTS, SUCH AS DOCUMENTS AND/OR CLUSTERS, WITH RESPECT TO A TAXONOMY AND DATA STRUCTURES DERIVED FROM SUCH CATEGORIZATION

Publication number: 20120259856

Abstract: A Website may be automatically categorized by (a) accepting Website information, (b) determining a set of scored clusters (e.g., semantic, term co-occurrence, etc.) for the Website using the Website information, and (c) determining at least one category (e.g., a vertical category) of a predefined taxonomy using at least some of the set of clusters.

Type: Application

Filed: June 20, 2012

Publication date: October 11, 2012

Inventors: David GEHRKING, Ching LAW, Andrew MAXWELL
DOCUMENT CLUSTERING SYSTEM, DOCUMENT CLUSTERING METHOD, AND RECORDING MEDIUM

Publication number: 20120259855

Abstract: In the provided document clustering system (100), a concept tree structure accumulation unit (11) stores a concept tree structure that represents a hierarchical relationship among concepts represented by each of a plurality of words. For any two words, a concept similarity computation unit (12) obtains a concept similarity, which is an index indicating how close the concepts represented by the two words are. Using concept similarities for words that appear in two documents in a document set, an inter-document similarity computation unit (13) obtains an inter-document similarity, which indicates how similar the two documents are semantically. A clustering unit (14) uses inter-document similarities to cluster the documents in the document set.

Type: Application

Filed: December 21, 2010

Publication date: October 11, 2012

Applicant: NEC CORPORATION

Inventors: Hironori Mizuguchi, Dai Kusui
System and method for probabilistic relational clustering

Patent number: 8285719

Abstract: Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. A probabilistic model is presented for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, parametric hard and soft relational clustering algorithms are provided under a large number of exponential family distributions.

Type: Grant

Filed: August 10, 2009

Date of Patent: October 9, 2012

Assignee: The Research Foundation of State University of New York

Inventors: Bo Long, Zhongfei (Mark) Zhang
User query mining for advertising matching

Patent number: 8285745

Abstract: Systems and methods to determine relevant keywords from a user's search query sessions are disclosed. The described method includes identifying search session logs of a user, segmenting the search session logs into one or more search sessions. After the segmentation, the search sessions are analyzed to compose a list of semantically relevant keyword sets including at least a first keyword set and a second keyword set. The described method further includes determining a semantic relevance between the first and second keyword sets according to the frequency at which the first and second keyword sets are reported in the query results and displaying one or more semantically high relevant keyword sets after being filtered by a threshold.

Type: Grant

Filed: August 31, 2007

Date of Patent: October 9, 2012

Assignee: Microsoft Corporation

Inventors: Hua Li, HuaJun Zeng, Jian Hu, Zheng Chen, Jian Wang
Diverse topic phrase extraction

Patent number: 8280877

Abstract: Systems and methods for implementing diverse topic phrase extraction are disclosed. According to one implementation, multiple word candidate phrases are extracted from a corpus and weighed. One or more documents are re-weighed to identify less obvious candidate topics using latent semantic analysis (LSA). Phrase diversification is then used to remove redundancy and select informative and distinct topic phrases.

Type: Grant

Filed: September 21, 2007

Date of Patent: October 2, 2012

Assignee: Microsoft Corporation

Inventors: Benyu Zhang, Jilin Chen, Zheng Chen, HuaJun Zeng, Jian Wang
Streaming query system and method for extensible markup language

Patent number: 8275774

Abstract: A streaming query system for extensible markup language is provided. An XPath query translator receives and analyzes a user-input XPath document. An abstract syntax tree analyzer establishes an abstract syntax tree. A XML parser receives and parses an XML document. An index generator generates an index for the XML document. A computation module performs a format calculation based on the abstract syntax tree and the index, and generates a query result accordingly.

Type: Grant

Filed: July 23, 2010

Date of Patent: September 25, 2012

Assignee: National Taiwan University of Science and Technology

Inventors: Hahn-Ming Lee, Li-Zhen Liu, Chieh-Hung Lin, Jerome Yeh, Chia-Hsin Huang
DISTRIBUTED STORAGE AND METADATA SYSTEM

Publication number: 20120239655

Abstract: A system for storing digital images and accessing and storing digital image information using a communication network includes a plurality of independently controlled digital storage repositories associated with one or more different authorization groups, wherein a first digital storage repository includes a first digital image with associated first semantic information and wherein a second digital storage repository in a common authorization group with the first digital storage repository includes a second digital image with associated second semantic information and an associated second category, and wherein the processor of the first digital storage repository uses its computer program to independently access and match the first semantic information with the second semantic information, to associate the second category with the first semantic information, and to store the second category in association with the first semantic information in the first digital storage repository.

Type: Application

Filed: March 15, 2011

Publication date: September 20, 2012

Inventors: Ronald Steven Cok, Joseph Anthony Manico
Computer-readable media, communication apparatus, and communication system

Patent number: 8271496

Abstract: A computer-readable medium stores computer-readable instructions that control a communication on a communication apparatus that obtains a content summary information having at least content location information from a server. The instructions cause the communication apparatus to perform steps. The steps include receiving a delivery source information inputted through a user operation, determining whether the delivery source information includes a predetermined character string. Content summary information corresponding to the inputted delivery source information is obtained when the determining step determines that the predetermined character string is not included in the delivery source information. Content summary information corresponding to a predetermined alternative delivery source information is obtained when the determining step determines that the predetermined character string is included in the delivery source information.

Type: Grant

Filed: September 30, 2009

Date of Patent: September 18, 2012

Assignee: Brother Kogyo Kabushiki Kaisha

Inventor: Yusaku Takahashi
AGGREGATING DOCUMENT ANNOTATIONS

Publication number: 20120233150

Abstract: Technologies pertaining to annotation aggregation are described herein. A user of a computing device assigns an annotation to a portion of a document, wherein the annotation comprises a tuple. The tuple comprises semantic relationships amongst words or phrases in the document. Relationship data is also generated, wherein the relationship data identifies the document, the author of the document, the author of the annotation, and other data.

Type: Application

Filed: March 11, 2011

Publication date: September 13, 2012

Applicant: Microsoft Corporation

Inventors: Oscar Gerardo Naim, Lucretia Henrica Vanderwende, Krist Wongsuphasawat
Semantic advertising selection from lateral concepts and topics

Patent number: 8260664

Abstract: Advertisements are selected for presentation on search result pages and web pages based on phrases generated from lateral concepts and topics identified for the search result pages and web pages. A search query or an indication of a web page is received for which advertisements are to be provided. Lateral concepts and topics are identified based on the search query or content of the web page. The lateral concepts and topics are used as phrases for selecting advertisements from an advertisement inventory. Selected advertisements are provided for presentation on a search results page in response to a search query or on a web page initially identified.

Type: Grant

Filed: February 5, 2010

Date of Patent: September 4, 2012

Assignee: Microsoft Corporation

Inventors: Viswanath Vadlamani, Abhinai Srivastava, Tarek Najm, Munirathnam Srikanth, Phani Vaddadi, Arungunram Chandrasekaran Surendran, Rajeev Prasad
HIGH-ACCURACY SIMILARITY SEARCH SYSTEM

Publication number: 20120221574

Abstract: A pivot is determined from enrolled data by a pivot determination unit, raw data is acquired, features are extracted from the raw data, a score is calculated as one of a distance and a degree of similarity between the features, an index vector is generated by using the score for the pivot, a ? score is calculated as one of a distance and a degree of similarity between the index vectors, a parameter of each non-pivot including a regression coefficient is trained by using training data, order to select the non-pivots is, by using the ? score between search data and the non-pivot as well as the regression coefficient, determined in descending order of posterior probability through logistic regression, and a search result is outputted based on the score between the search data and the enrolled data.

Type: Application

Filed: February 9, 2012

Publication date: August 30, 2012

Applicant: HITACHI, LTD.

Inventors: Takao Murakami, Kenta Takahashi
APPARATUS AND METHOD FOR MANAGING MOBILE TRANSACTION COUPON INFORMATION IN MOBILE TERMINAL

Publication number: 20120209851

Abstract: An apparatus and a method manage a received mobile transaction coupon in a mobile terminal. The apparatus includes a communication unit, an information analyzer, a schedule manager, an output unit, and a controller. The communication unit receives a mobile transaction coupon. The information analyzer obtains the received mobile transaction coupon information. The schedule manager registers the obtained mobile transaction coupon information in an alarm program. The output unit outputs the registered mobile transaction coupon information on a relevant date via the alarm program. The controller controls to register the mobile transaction coupon information in the alarm program, and controls to store the received mobile transaction coupon in a storage area corresponding to a reception type or a folder for a widget function.

Type: Application

Filed: February 9, 2012

Publication date: August 16, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Byung-Kwon Kong, Soon-Mi Cho
Using behavior data to quickly improve search ranking

Patent number: 8244701

Abstract: Systems and methods for applying user behavior data to improve search query result ranking are provided. Upon receiving an update file indicating that recent, significant user behavior data is available for a document associated with an inverted index, the update file is published periodically and frequently to an index server. After filtering out the relevant update information from the update file, the index server extracts identifiers of the documents having the associated user behavior data. The update file and the identifier of the documents are utilized to update an in-memory index containing representations of metadata indicative of the user behavior. The in-memory index is continuously updated and utilized to serve search query results in response to user search queries. Search query results from the in-memory index are ranked using the user behavior data prior to serving. Thus, results associated with recent, significant user-behavior metadata receive prominent placement on the search results page.

Type: Grant

Filed: June 27, 2011

Date of Patent: August 14, 2012

Assignee: Microsoft Corporation

Inventors: Walter Sun, Jay Kumar Goyal, Pratibha Permandla, Yinzhe Yu, Jingfeng Li
Rapid update of index metadata

Patent number: 8244700

Abstract: Systems and methods for performing an updating process to an in-memory index are provided. Upon receiving notice of document modifications covered by an inverted index associated with a search engine, in the form of an update file, a representation of the modification is published onto various index serving machines. Each index serving machine receiving the update file determines if the modifications are applicable to the index serving machine. If an index serving machine determines that it contains mapping information corresponding to the modified documents, the index serving machine utilizes the update file and associated mapping information to update an in-memory index. In embodiments, the in-memory index is used to provide results to user queries in tandem with the inverted index. In some embodiments, an extra in-memory index is maintained that is revised with constantly incoming metadata updates and the existing in-memory index is periodically swapped with the revised in-memory index.

Type: Grant

Filed: February 12, 2010

Date of Patent: August 14, 2012

Assignee: Microsoft Corporation

Inventors: Pratibha Permandla, Yinzhe Yu, Guarav Sareen, Abhas Kumar
Streaming text data mining method and apparatus using multidimensional subspaces

Patent number: 8234279

Abstract: A streaming text data comparator performs real-time text data mining on streaming text data. The comparator receives a streaming text data document and generates a vector representation of the term frequencies relating to an existing document collection. The comparator then transforms the term frequency vector into a projection in a precomputed multidimensional subspace that represents the original document collection. The comparator further calculates a relationship value representing the similarities or differences between the vector representation and the subspace, and compares the relationship value to a predetermined threshold to determine whether the streaming text data document is related to the original document collection. If the streaming text data document is related, the streaming text data comparator intercalates the new document into the document collection. If the new document is not related, the comparator may store or delete the unrelated document.

Type: Grant

Filed: October 11, 2005

Date of Patent: July 31, 2012

Assignee: The Boeing Company

Inventors: Yuan-Jye Wu, Anne S-W Kao, Stephen R. Poteet, William Ferng, Robert E. Cranfill
SEMANTICALLY AGGREGATED INDEX IN AN INDEXER-AGNOSTIC INDEX BUILDING SYSTEM

Publication number: 20120179684

Abstract: A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.

Type: Application

Filed: January 12, 2011

Publication date: July 12, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
DETERMINATION TREE GENERATING APPARATUS

Publication number: 20120173532

Abstract: According to one embodiment, a determination tree generating apparatus includes a determination unit, a condition generating unit, a determining unit, and a point branch generating unit. The determination unit provisionally and sequentially determines all component categories to be classification component categories for a first point of a determination tree. The point branch generating unit generates a first point assigned to a classification component category, and generates component names to be assigned to one or more branches leading from an assigned first point to one or more child points.

Type: Application

Filed: March 15, 2012

Publication date: July 5, 2012

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Shigeta KUNINOBU
SYSTEMS AND METHODS FOR USING METADATA TO ENHANCE DATA IDENTIFICATION OPERATIONS

Publication number: 20120173531

Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.

Type: Application

Filed: March 2, 2012

Publication date: July 5, 2012

Applicant: COMMVAULT SYSTEMS, INC.

Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
Device, method, and computer-readable recording medium for notifying content scene appearance

Patent number: 8214368

Abstract: An extracting unit extracts keywords from metadata extracted from played scenes. An attaching unit attaches a semantic class to the keywords. A semantic class determining unit determines whether the semantic class is a should-be-played class. When there is a keyword with the should-be-played class attached, an acquiring unit acquires at least one keyword without having the should-be-played class as a should-be-observed keyword. When the metadata includes the should-be-observed keyword and a keyword to which a should-be-stopped class is attached, an appearance determining unit determines that a scene including the should-be-observed keyword appears in contents.

Type: Grant

Filed: September 22, 2008

Date of Patent: July 3, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventors: Tomohiro Yamasaki, Takahiro Kawamura
Systems, methods, means, and media for recording, searching, and outputting display information

Patent number: 8214367

Abstract: Systems for recording, searching, and outputting display information are provided. In some embodiments, systems for recording display information are provided. The systems include a virtual display that: intercepts display-changes describing changes to be made to a state of a display; sends the display-changes to a client for display; records the display-changes; and a context recorder that records context information describing a state of the display derived from a source independently of the display changes and independently of screen-images. In some embodiments, the systems further include a display system that generates an output screen-image based at least in part on at least one of the display-changes and in response to a search of the context information. In some embodiments, the virtual display further records screen-images; and the display system further generates the output screen-image based at least in part on a recorded-screen-image of the recorded screen-images.

Type: Grant

Filed: February 27, 2008

Date of Patent: July 3, 2012

Assignee: The Trustees of Columbia University in the City of New York

Inventors: Ricardo Baratto, Oren Laadan, Dan Phung, Shaya Joseph Potter, Jason Nieh
Emphasizing search results according to conceptual meaning

Patent number: 8209321

Abstract: Computer-readable media, computerized methods, and computer systems for conducting semantic processes to present search results that include highlighted regions which are relevant to a conceptual meaning of a query are provided. Initially, content of document(s) is accessed and semantic representations are derived by distilling linguistic representations from the content. These semantic representations may be stored at a semantic index. Also, a proposition is derived from the query by parsing search terms of the query, and distilling the proposition from the search terms. Typically, the proposition is a logical representation of the conceptual meaning of the query. The proposition is compared against the semantic representations at the semantic index to identify a matching set. Regions of the content within the document, from which the matching set of semantic representations are derived, are targeted.

Type: Grant

Filed: August 29, 2008

Date of Patent: June 26, 2012

Assignee: Microsoft Corporation

Inventors: Barney Pell, Scott Prevost, Giovanni Lorenzo Thione, Brendan O'Connor, Lukas Biewald
Expressing and executing semantic queries within a relational database

Patent number: 8204903

Abstract: Semantic queries are expressed and executed within a relational database. This can be done by defining semantic rules applied to execute the semantic queries using table valued functions and common table expressions, and then simply calling the defined table valued functions to execute the queries.

Type: Grant

Filed: February 16, 2010

Date of Patent: June 19, 2012

Assignee: Microsoft Corporation

Inventors: Stuart M. Bowers, Thomas E. Jackson, Chris Demetrios Karkanias, Allen L. Brown, David G. Campbell, Brian S. Aust
Access to multilingual textual resources

Patent number: 8204736

Abstract: A mechanism is provided for determining a second document of a set of documents in a second language having the same textual content as a first document in a first language. A first histogram that is indicative of the textual content of the first document is generated. A second histogram is generated for each document of the set of documents. Each second histogram is indicative of the textual content of a document of the set of documents. Each second histogram is compared with the first histogram to determine at least one histogram from the plurality of second histograms which matches the first histogram. The second document is then identified as the document having the at least one histogram.

Type: Grant

Filed: November 6, 2008

Date of Patent: June 19, 2012

Assignee: International Business Machines Corporation

Inventors: Ossama Emam, Ahmed Hassan, Hany M. Hassan
Supporting document data search

Patent number: 8200672

Abstract: In a search support server, a related word extraction unit generates frequency information and co-occurrence information of keywords, a graph generation unit generates coordinate information of a spring graph including the keywords as nodes, on the basis of the co-occurrence information, a cluster generation unit groups the nodes into clusters and thereby generates cluster definition information, and a display information generation unit generates display information of the spring graph. In addition, an operation determination unit determines which operation is performed on the spring graph. Then, when a level change is instructed, the display information generation unit generates display information of the spring graph after the level is changed. When a node change is instructed, a cluster re-generation unit changes the cluster definition information and the frequency information.

Type: Grant

Filed: June 24, 2009

Date of Patent: June 12, 2012

Assignee: International Business Machines Corporation

Inventors: Noritaka Adachi, Shinya Kawanaka, Yoshitaka Matsumoto, Raymond Harry Putra Rudy
Density-based data clustering method

Patent number: 8195662

Abstract: A density-based data clustering method, comprising a parameter-setting step, a first retrieving step, a first determination step, a second determination step, a second retrieving step, a third determination step and first and second termination determination steps. The parameter-setting step sets parameters. The first retrieving step retrieves one data point and defines neighboring points. The first determination step determines whether the number of the data points exceeds the minimum threshold value. The second determination step arranges a plurality of first border symbols. The second retrieving step retrieves one seed data point from the seed list, arranges a plurality of second border symbols and defines seed neighboring points. The third determination step determines whether a data point density of searching ranges of the seed neighboring points is the same. The first termination determination step determines whether the clustering is finished.

Type: Grant

Filed: January 6, 2010

Date of Patent: June 5, 2012

Assignee: National Pingtung University Of Science & Technology

Inventors: Cheng-Fa Tsai, Yi-Ching Huang
METHOD AND APPARATUS FOR DETERMINING CONTEXTUALLY RELEVANT GEOGRAPHICAL LOCATIONS

Publication number: 20120136865

Abstract: An approach is provided for determining and utilizing geographical locations contextually relevant to a user. A contextually relevant location platform determines location-based data associated with a user and/or user device. The contextually relevant location platform determines stationary points based, at least in part, on the location-based data. The contextually relevant location platform determines context data associated with the stationary points. The contextually relevant location platform determines at least one location anchor based, at least in part, on the stationary points and the associated context data, wherein the at least one location anchor represents a bounded geographical area of contextual relevance to the user.

Type: Application

Filed: November 30, 2010

Publication date: May 31, 2012

Applicant: Nokia Corporation

Inventors: Jan Otto Blom, Gian Paolo Perrucci, Mats Lönngren, Juha Kalevi Laurila, Niko Tapani Kiukkonen, Julien Eberle, Daniel Gatica-Perez, Raul Montoliu-Colas, Julian Charles Nolan
SYSTEM AND METHOD FOR HS CODE RECOMMENDATION

Publication number: 20120124050

Abstract: A system for harmonized commodity description and coding system (HS) code recommendation includes an ontology editor for creating an HS code ontology based on HS codes of export and import items, and a feature vector processor for extracting feature vectors of a product of a company requesting for an HS code of the product by with reference to the description of the product in response to the request. An HS code recommendation unit extracts one or more HS codes appropriate for the product by comparing the extracted feature vectors with feature vectors of the product searched from a feature vector database. The extracted HS codes are provided to the company requesting for an HS code of the product.

Type: Application

Filed: November 16, 2011

Publication date: May 17, 2012

Applicant: Electronics and Telecommunications Research Institute

Inventors: Kyung-Ah YANG, Moonyoung CHUNG, Kyong-I KU

prev … 3 4 5 6 7 8 9 10 next