Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
  • Patent number: 7836058
    Abstract: Mislabeled URLs are identified and corrected based upon a click relevance ranking computed from user data comprising user click information. The click relevance ranking is formed by applying a set of relevance ordering rules to user log data aggregated by query and URL and by mapping the results of the relevance ordering rules into a linear ordering. For a given query, the aggregated user log data comprises a relative total number of impression, a relative total number of clicks received and a rank associated with the query/URL pair at the time of the total number of impressions and total number of clicks received. The click relevance ranking is used to identify and correct mislabeled query/URL pairs of other rankings according to a number of disclosed methods.
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: November 16, 2010
    Assignee: Microsoft Corporation
    Inventors: Kumar H. Chellapilla, Anton Mityagin, Xuanhui Wang
  • Patent number: 7827174
    Abstract: Various techniques are disclosed for generating markup information to be displayed on a client computer system. A document (e.g. a web page) is selected for viewing via a web browser on the client system. Selected information relating to the document is parsed and analyzed using selected keyword information. In a specific implementation, the selected keyword information is provided by an entity other than the end user. Using the selected keyword information, specific context in the document is selected to be marked up. Markup operations may be implemented on at least a portion of the selected document context and displayed at the client system.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: November 2, 2010
    Assignee: Kontera Technologies, Inc.
    Inventors: Assaf Henkin, Yoav Shaham, Henit Vitos, Benny Friedman
  • Patent number: 7822750
    Abstract: The present invention relates in general to methods and systems for comparing and maximizing the appropriateness of a first set of one or more data objects to a set of second data objects. In one embodiment, the first set of data objects represent one or more tasks to be fulfilled by a set of capabilities represented by the second data objects. In one embodiment, this invention provides an effective and accurate method and system to compare and maximize the appropriateness between the requirements of a task and the second set's capabilities, while these capabilities and requirements are contained, even if only latently, in data objects such as written documents, electronic databases or other sources of data and information. In one embodiment, topic modeling techniques are utilized to compare the data objects.
    Type: Grant
    Filed: January 15, 2008
    Date of Patent: October 26, 2010
    Assignee: Aptima, Inc
    Inventors: Andrew Duchon, Kari Kelton, Pacey Foster, Kara Orvis
  • Publication number: 20100268526
    Abstract: Disclosed herein are methods, articles of manufacture, and systems for translating text. Such a method includes generating a conceptual representation space based on a plurality of source-language documents and a plurality of target-language documents. The method also includes generating, in the conceptual representation space, respective representations of a new source-language document and each of a plurality of dictionaries. The method further includes selecting a first dictionary from the plurality of dictionaries responsive to a similarity between the representation of the new source-language document and the representation of the first dictionary. The method still further includes translating, by using the first dictionary, a term in the new source-language document into a target-language term.
    Type: Application
    Filed: June 28, 2010
    Publication date: October 21, 2010
    Inventor: Roger Burrowes BRADFORD
  • Patent number: 7818319
    Abstract: Various techniques are disclosed for generating markup information to be displayed on a client computer system. A document (e.g. a web page) is selected for viewing via a web browser on the client system. Selected information relating to the document is parsed and analyzed using selected keyword information. In a specific implementation, the selected keyword information is provided by an entity other than the end user. Using the selected keyword information, specific context in the document is selected to be marked up. Markup operations may be implemented on at least a portion of the selected document context and displayed at the client system.
    Type: Grant
    Filed: August 9, 2007
    Date of Patent: October 19, 2010
    Assignee: Kontera Technologies, Inc.
    Inventors: Assaf Henkin, Yoav Shaham, Henit Vitos, Benny Friedman
  • Publication number: 20100257172
    Abstract: A system and method for text interpretation and normalization is presented. The method for text interpretation and normalization may include receiving a reference data entry that includes one or more strings of text and one or more associated numeric codes, creating a plurality of tokens from the one or more strings of text, each token being tied to an associated numeric code, formatting the plurality tokens with operations codes (opcodes) that provides additional information about the tokens, retrieving configuration data including the plurality of tokens, the opcodes, and numeric codes associated with the tokens, selecting one inbound, non-reference string for interpretation, comparing tokens from the configuration data to the non-reference string to determine the best matching token, and applying, using the processor, the numeric code associated with the best matching token to the non-reference string in order to normalize the non-reference string.
    Type: Application
    Filed: April 1, 2010
    Publication date: October 7, 2010
    Applicant: Touchstone Systems, Inc.
    Inventors: Jerry Lambert, Shiraz Khalid
  • Publication number: 20100254613
    Abstract: A system for duplicate text recognition includes a first means for dividing an electronic text into a plurality of phrase segments; a second means for converting each of the phrase segments into a unique and fixed-length bit string; a third means for storing a plurality of groups of the bit strings, each group of bit strings (string group) including a plurality of bit strings respectively corresponding to the phrase segments in a particular electronic text; and a fourth means for determining whether a predefined similarity between any two string groups in the third means reaches a first threshold, and for determining the two electronic texts corresponding to the two string groups are duplicate texts if the predefined similarity between the two string groups reaches the first threshold.
    Type: Application
    Filed: November 17, 2009
    Publication date: October 7, 2010
    Inventors: Tat Ming Damein Wu, Ka Yeung Sin
  • Patent number: 7809711
    Abstract: The present invention provides a method, system, and service of analyzing electronic documents in an intranet, where the intranet includes a plurality of web sites. In an exemplary embodiment, the method, system, and service include (1) crawling HTML content and text content in a set of the sites, (2) deep-scanning non-HTML content and non-text content in the set of sites, (3) reverse-scanning the set of sites, (4) performing a semantic analysis of the crawled content and the deep-scanned content, (5) correlating the results of the semantic analysis with the results of the reverse-scanning, and (6) comparing user navigation patterns and content from the members of the set of sites. In a further embodiment, the method, system, and service further include combining the results of the performing, the results of the correlating, and the results of the comparing.
    Type: Grant
    Filed: June 2, 2006
    Date of Patent: October 5, 2010
    Assignee: International Business Machines Corporation
    Inventors: Alfredo Alba, Varun Bhagwan, Daniel Frederick Gruhl, Savitha Srinivasan
  • Publication number: 20100250544
    Abstract: A multimedia data organization process, i.e. creation, of a photo album or slideshow, said multimedia data being represented by contingent individuals (14a, 14b, 14c, 14d, 14e, 14f) of an instantiated ontology that in addition to generic individuals (EC, C, F, M, T, A) comprises semantic links between individuals, comprising: —the presentation to the user of the choice of at least one individual from the instantiated ontology, and in response to a user-prompted choice, —the selection and organization of a subset of multimedia data corresponding to the contingent individuals of the instantiated ontology according to at least one selection and/or organization rule engaging the user-chosen individual and the related semantic links.
    Type: Application
    Filed: October 13, 2008
    Publication date: September 30, 2010
    Applicant: EASTMAN KODAK COMPANY
    Inventors: Jean-Marie Vau, Thierry Lebihen, Christophe E. Papin, Olivier M. Rigault, Eric Masera
  • Patent number: 7801896
    Abstract: An improved human user computer interface system, wherein a user characteristic or set of characteristics, such as demographic profile or societal “role”, is employed to define a scope or domain of operation. The operation itself may be a database search, to interactively define a taxonomic context for the operation, a business negotiation, or other activity. After retrieval of results, a scoring or ranking may be applied according to user define criteria, which are, for example, commensurate with the relevance to the context, but may be, for example, by date, source, or other secondary criteria. A user profile is preferably stored in a computer accessible form, and may be used to provide a history of use, persistent customization, collaborative filtering and demographic information for the user.
    Type: Grant
    Filed: February 19, 2007
    Date of Patent: September 21, 2010
    Inventor: Andrew J Szabo
  • Publication number: 20100228733
    Abstract: A system and method for performing classification using semantic distance measurements. Items of electronic content accessed by individuals over a global communications network are identified. A set of content that includes the plurality of identified items of electronic content are stored. The set of content is normalized. Each of the keywords contained the set of content is identified and a semantic distance between each of the identified keywords is measured.
    Type: Application
    Filed: November 11, 2009
    Publication date: September 9, 2010
    Applicant: COLLECTIVE MEDIA, INC.
    Inventors: Paul Harrison, James Oliphant, Hal Fulton, Armin Roehrl
  • Patent number: 7792838
    Abstract: Improved information processing techniques for measuring similarity between instances in an ontology are disclosed. For example, a method of measuring similarity between instances in an ontology for use in an information retrieval system includes the following steps. A set of instances from the ontology is obtained. At least one of the following similarity metrics for the set of instances is computed: (i) a first metric that measures similarity between instances in the set of instances with respect to ontology concepts to which the instances belong; (ii) a second metric which measures similarity between instances in the set of instances where the instances are subjects in statements involving a given ontology property; and (iii) a third metric which measures similarity between instances in the set of instances where the instances are objects in statements involving a given ontology property.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: September 7, 2010
    Assignee: International Business Machines Corporation
    Inventors: Anand Ranganathan, Royi Ronen
  • Patent number: 7788264
    Abstract: Systems and methods for classifying documents each having zero or more links thereto include generating a link matrix; generating a document term matrix; and jointly factorizing the document term matrix and the link matrix.
    Type: Grant
    Filed: October 10, 2007
    Date of Patent: August 31, 2010
    Assignee: NEC Laboratories America, Inc.
    Inventors: Shenghuo Zhu, Kai Yu, Yun Chi, Yihong Gong
  • Patent number: 7783642
    Abstract: The disclosure presents a method, system and computer-readable medium related to automatically analyzing structure for a web page. The method embodiment comprises building a training corpus comprising a broad stylistic coverage of web pages, segmenting a web page into information blocks, identifying semantic categories of the information blocks using the training corpus and applying the identical semantic categories in a web-based tool.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: August 24, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Junlan Feng, Barbara B. Hollister
  • Publication number: 20100211570
    Abstract: The present invention relates to distributed systems in which resource utilisation decisions depend upon the semi-automatic categorisation of resource descriptions stored in the distributed system. In the principal embodiment, the resource descriptions are web service descriptions which are augmented with tags (i.e. descriptive words or phrases) entered by users and/or by web service administrators. The initial use of automatic categorisation of these descriptions, followed by a user-driven fine-tuning of the automatically-generated categories enables the rapid creation of reliable categorisation of the resource descriptions, which in turns results in better resource utilisation decisions and hence a more efficient use of the resources of the distributed system.
    Type: Application
    Filed: September 3, 2008
    Publication date: August 19, 2010
    Inventors: Robert Ghanea-Hercock, Hakan Duman, Alexander L. Healing
  • Publication number: 20100198827
    Abstract: A method for finding text reading order in a document such as a scanned newspaper or magazine includes the steps of pruning unnecessary text zones using semantic analysis (40), using text correlation measures to cluster zones (41), and then finding a reading order within each of the clusters (42).
    Type: Application
    Filed: July 27, 2005
    Publication date: August 5, 2010
    Inventors: Sherif Yacoub, Daniel Ortega, Paolo Faraboschi, Jose Abad Peiro
  • Publication number: 20100198828
    Abstract: A system and method are provided for forming crowds of users and providing access to corresponding crowd data. In one embodiment, a central system, which includes one or more servers, operates to obtain current locations for users of mobile devices. The central system forms a crowd including a number of users based on the current locations of the number of users. The central system then generates crowd data for the crowd and provides access to the crowd data for the crowd. In one embodiment, the crowd data for the crowd includes an aggregate profile for the crowd. In another embodiment, the crowd data includes data characterizing the crowd. The central system provides access to the crowd data by serving crowd data requests.
    Type: Application
    Filed: December 23, 2009
    Publication date: August 5, 2010
    Applicant: KOTA ENTERPRISES, LLC
    Inventors: Steven L. Petersen, Scott Curtis, Kenneth Jennings
  • Publication number: 20100191734
    Abstract: A method of classifying a plurality of documents that form part of a data set comprises retrieving the plurality of documents from a computing device and applying a hashing representation scheme to the plurality of documents from the data set to obtain a feature vector representation of each of the plurality of documents. A classification label is associated with selected documents of the plurality of documents in the data set. A learning algorithm is executed to learn a functional relationship between the feature vector representations of the plurality of documents and the classification label associated with the at least one document. The functional relationship learned is utilized to associate classification labels with feature vector representations of other documents of the data set so as to provide document classifications.
    Type: Application
    Filed: January 23, 2009
    Publication date: July 29, 2010
    Inventors: Shyam Sundar Rajaram, Martin B. Scholz
  • Patent number: 7765203
    Abstract: The present invention is directed to a method and system for managing context information in a web portal or enterprise portal comprising a hierarchical structure of portal pages and portlets for accessing web content or enterprise content accessible via the portal.
    Type: Grant
    Filed: September 11, 2007
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: Stefan Liesche, Andreas Naurez
  • Patent number: 7765212
    Abstract: A system that facilitates organization of emails comprises a clustering component that clusters a plurality of emails and creates topics for emails by assigning key phrases extracted from emails within one or more clusters. An organization component then utilizes the key phrases to organize documents. Furthermore, the organization component can comprise a probability component that determines a probability that a document belongs to a certain topic.
    Type: Grant
    Filed: December 29, 2005
    Date of Patent: July 27, 2010
    Assignee: Microsoft Corporation
    Inventors: Arungunram C. Surendran, Erin L. Renshaw, John C. Platt
  • Patent number: 7765098
    Abstract: An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.
    Type: Grant
    Filed: April 24, 2006
    Date of Patent: July 27, 2010
    Assignee: Content Analyst Company, LLC
    Inventor: Roger Burrowes Bradford
  • Publication number: 20100185619
    Abstract: Sampling analysis includes classifying a plurality of query keywords into a plurality of query keyword subsets according to page view (PV) values associated with the plurality of query keywords, the plurality of query keywords being submitted by a plurality of users; determining a respective plurality of sample rates of a respective plurality of query keywords in a respective one of the plurality of query keyword subsets; and sampling query data in the respective one of the plurality of query keyword subsets according to the respective plurality of sample rates.
    Type: Application
    Filed: January 20, 2010
    Publication date: July 22, 2010
    Inventors: Junlin Zhang, Jian Sun, Lei Hou, Qin Zhang
  • Patent number: 7747625
    Abstract: Systems and methods of organizing a collection of objects are described. In one aspect, a sequence of objects is segmented into object clusters based on: comparisons of successive object intervals to weighted measures of cluster extent; and comparisons of successive object intervals to weighted measures of cluster object density. In another aspect, objects from the collection are segmented into clusters. Context-related meta data associated with the objects and parsable into multiple levels of a name hierarchy is extracted. Names are assigned to clusters based on the extracted context-related meta data corresponding to a level of the name hierarchy selected to distinguish segmented clusters from one another. In another aspect, a sequence of objects that are segmented into clusters is accessed. Each cluster includes multiple objects arranged in a respective sequence in accordance with context-related meta data associated with the objects.
    Type: Grant
    Filed: July 31, 2003
    Date of Patent: June 29, 2010
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Ullas Gargi, Daniel R. Tretter
  • Patent number: 7725466
    Abstract: Some embodiments of a high-accuracy document information element-vector (IE-vector) encoding server have been presented. In one embodiment, the high-accuracy document IE-vector encoding server applies finite state automaton (FSA) to parse a document to identify one or more information elements (IEs) in the document. Then a DNA sequence of the document is derived based on the one or more IEs. The concept of DNA sequence of a document is powerful and can be used in building automated tools such as computer based processes to automatically reason and search for similarity, dissimilarity, equivalence and other relationships between structured, semi-structured and unstructured data and information. The DNA sequence of a document provides powerful paradigm to build sophisticated information and data search and retrieval techniques and tools.
    Type: Grant
    Filed: October 23, 2007
    Date of Patent: May 25, 2010
    Inventor: Tarique Mustafa
  • Patent number: 7720849
    Abstract: An information processing device that includes a referring unit configured to refer to a table in which a characteristic of each piece of first information is expressed as distribution of model parameters in a plurality of semantic classes. The semantic classes are in units of pieces of the first information. An obtaining unit is configured to obtain second information to be searched. A calculating unit is configured to calculate similarities between the second information and the respective pieces of the first information. A first reading unit is configured to read the pieces of the first information from the table in descending order of the similarity.
    Type: Grant
    Filed: March 6, 2006
    Date of Patent: May 18, 2010
    Assignee: Sony Corporation
    Inventor: Yasuharu Asano
  • Patent number: 7720848
    Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.
    Type: Grant
    Filed: March 29, 2006
    Date of Patent: May 18, 2010
    Assignee: Xerox Corporation
    Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
  • Patent number: 7716221
    Abstract: Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value.
    Type: Grant
    Filed: June 1, 2007
    Date of Patent: May 11, 2010
    Inventors: Clifford A. Behrens, Dennis E. Egan, Devasis Bassu
  • Publication number: 20100114894
    Abstract: A semantically aware relational database management system includes suitable programming to relate attributes of the relational database to semantic equivalents of such attributes. In response to receiving a query, the relational database management system performs at least one semantically aware operation on the data in the relational database in order to determine what data is to be retrieved in response to the query. Results of the query presented to a user may include data derived from performing the semantically aware operations.
    Type: Application
    Filed: December 31, 2009
    Publication date: May 6, 2010
    Applicant: SAP AG
    Inventors: Maria E. Orlowska, Wasim Sadiq, Shazia Sadiq
  • Patent number: 7711682
    Abstract: The present invention provides methods, apparatus and systems for searching hypertext based multilingual Web information when searching on a network for keywords to be queried. A method includes: a receiving step for receiving keywords input by a user; a native language hypertext searching step for searching on the network, according to the keywords to be queried, for all hypertexts whose representing language is the same as a language representing the keywords and which matches the keywords to be queried; extracting hyperlinks related to an arbitrary language from all the searched hypertexts; a hyperlink ranking step for ranking the extracted hyperlinks according to the correlativity of the hyperlinks with the keywords to be queried; and returning to the user ranked search result. Thereby, an accurate cross language searching can be provided without extra machine translation effort, being more accurate and objective than machine translation, even than human translation.
    Type: Grant
    Filed: July 29, 2005
    Date of Patent: May 4, 2010
    Assignee: International Business Machines Corporation
    Inventor: Ling Zhang
  • Publication number: 20100100546
    Abstract: A method for creation of a semantic information management environment, said method comprised of steps of: providing said semantic information environment consisting of an architecture partitioned according to the classification of the use of natural language by information scale, dynamical properties, or semantic classifications; detection, classification, and storage of semantic and contextual information detected and stored by recording of observed contextual parameters associated with events in said semantic information management environment; said interactions including the use of information management or electronic communication applications embedded or linked to said architecture, or separate from said architecture; said observations including the use of natural language as parameters that have specific semantic properties; detection, classification and storage of use of natural language in said semantic information environment; representation of semantic processes containing said detected, classified
    Type: Application
    Filed: February 8, 2009
    Publication date: April 22, 2010
    Inventor: Steven Forrest Kohler
  • Publication number: 20100100547
    Abstract: A method and system for generating information tags from product-related documents. The system includes an accessible storage storing text documents, wherein the text documents are related to a plurality of products. The system includes a memory access module for retrieving a document from the accessible storage related to a specified product selected from the plurality of products. The system includes a parser module for parsing the retrieved document into sentences, wherein each sentence is stored as an array. The system includes a filter module for filtering the parsed sentences into a result set, wherein the result set includes a set of tags extracted from the retrieved document relevant to the selected product. The system includes an output module for outputting the result set to the accessible storage.
    Type: Application
    Filed: October 20, 2009
    Publication date: April 22, 2010
    Applicant: Flixbee, Inc.
    Inventors: Hamilton A. Ulmer, Svyatoslav Mishchenko
  • Patent number: 7702665
    Abstract: Methods and apparatus to evaluate the semantic proximity between reference free-form text entry and a candidate free-form text request.
    Type: Grant
    Filed: June 14, 2006
    Date of Patent: April 20, 2010
    Assignee: Colloquis, Inc.
    Inventors: Francois Huet, Gray Salmon Norton
  • Publication number: 20100063961
    Abstract: An online application and service operates over existing photo hosting services to allow users to share digital photos hosted by multiple online photo hosting services, including ability to automatically discover relevant photos, to create a photo “album” that includes photos hosted by multiple different online services; to automatically discover relevant content and add it to an album; to chat and/or send instant messages in relation to shared photos; and automatic notifications of events of interest to users. The service includes a server based application and an associated client-side application with a graphical user interface. The technology further includes a method of “reverse tagging”, i.e., automatically adding semantic tags to an online album and any photos included in it, by automatically performing an online search of an information resource in response to a user creating a name for the album, and applying results of the search as tags to the album.
    Type: Application
    Filed: September 3, 2009
    Publication date: March 11, 2010
    Inventors: Bertrand Guiheneuf, Jean-Marie Hullot, Manuel Colom, Olivier Gutknecht, Sebastien Maury
  • Patent number: 7676519
    Abstract: A method and apparatus for processing user entered input and providing a response in a system for autonomously processing requests includes rules. For each rule, whether the input is recognized is determined. If it is, a response is sent to the user. To determine recognized input, the method attempts to match the rule to a pattern. If a match is not found, the input is not recognized. If a match is found, the input is recognized and the response is sent. Alternatively, the input is conditionally recognized and a statement validator is executed which queries structured data to determine if a logic statement evaluates to true. Depending on how the statement evaluates: i) the input is recognized and the response is sent, ii) the structured data is queried again for the next statement validator, or iii) the input is not recognized and the method continues to the next rule.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: March 9, 2010
    Assignee: Conversive, Inc.
    Inventors: Aaron Joseph McBride, Rob Rappaport, Jeremy Romero, Christopher Brennan, Robert Williams
  • Publication number: 20100058237
    Abstract: A target estimation device that appropriately estimates a function targeted by a user from among functions of an operated apparatus includes: an item inputting unit (102) that obtains an item; a function representation storing unit (107) that stores function names of the operated apparatus; an identical word detecting unit (116) that detects a category of a word identical between each of the function names and the item; an operation intention calculating unit (117) that changes a method for calculating similarities between the item and the respective function names, for each of the function names, depending on the detected category, and that calculates the similarities as degrees of intentions of the user who has selected the item, using the changed calculation method; and a target estimation unit (112) that estimates a function so that the function having a higher degree of an intention is the function targeted by the user.
    Type: Application
    Filed: October 26, 2007
    Publication date: March 4, 2010
    Applicant: PANASONIC CORPORATION
    Inventors: Makoto Nishizaki, Tsuyoshi Inoue, Satoshi Matsuura