Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
  • Patent number: 10372823
    Abstract: Described is a system for generating a semantic space based on the lexical relations between words. The system determines synonym and antonym relations between a set of words. A lexical graph is generated based on the synonym and antonym relations. Manifold embedding of the lexical graph is determined, and Laplacian coordinates of the manifold embedding are assigned as semantic features of the set of words. A quantitative representation of the set of words is generated using the semantic features.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: August 6, 2019
    Assignee: HRL Laboratories, LLC
    Inventors: Hankyu Moon, Rajan Bhattacharyya, James Benvenuto
  • Patent number: 10353938
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for aggregating task data for multiple users. In one aspect, a method includes accessing action trail data that corresponds to a task and resources related to that task, wherein each task relates to one or more related topics and is defined by a sequence of user actions corresponding to the resources related to that task; clustering the action trails based on the action trail data such that each action trail cluster corresponds to a particular task and includes the action trails corresponding to that particular task; and for each action trail cluster, ranking the resources that correspond to the included action trails according to the topics of the particular task.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: July 16, 2019
    Assignee: Google LLC
    Inventors: Radhika Malpani, Elin R. Pedersen
  • Patent number: 10331676
    Abstract: Items of interest within digital information may be detected and associated with a label that provides context to the item of interest. The label may describe an item category of the item of interest. The knowledge base of item categories may be limited. Additional item categories may be learned by accessing sets of vocabulary that may relate to the known item categories.
    Type: Grant
    Filed: April 13, 2016
    Date of Patent: June 25, 2019
    Assignee: Disney Enterprises, Inc.
    Inventors: Yanwei Fu, Leonid Sigal
  • Patent number: 10282672
    Abstract: A processing device determines a plurality of visual concepts for visual data based on at least one of visual entities in the visual data or feature-level attributes in the visual data, wherein the visual entities are based on the feature-level attributes, and wherein each of the plurality of visual concepts comprises a subject visual entity related to an object visual entity by a predicate. The processing device further determines one or more visual semantics for the visual data based on the plurality of visual concepts, wherein the one or more visual semantics define relationships between the plurality of visual concepts.
    Type: Grant
    Filed: June 26, 2014
    Date of Patent: May 7, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Pragyana K. Mishra, Danny Guan
  • Patent number: 10212572
    Abstract: The present invention extends to methods, systems, and computer program products for detecting and validating planned event information. A plurality of normalized signals is accessed. Planned event data across the plurality of normalized signals is checked for inconsistencies. Any inconsistencies are resolved in an automated fashion, for example, through reference to databases containing additional information. A planned event can be detected/validated from concurring and/or resolved planned event data. A validator can refer to an event history database and/or a planning system to validate a possible planned event as an actual planned event.
    Type: Grant
    Filed: September 5, 2018
    Date of Patent: February 19, 2019
    Assignee: Banjo, Inc.
    Inventors: Damien Patton, Joshua J. Newman, Tilmann Bruckhaus
  • Patent number: 10159106
    Abstract: An automatic wireless docking system includes a source device that includes a source device display screen, a display device, and a sink device that is coupled to the display device. The sink device determines a location of the source device and determines a motion of the source device. The sink device then identifies a wireless docking intent of the source device with the sink device based on the location of the source device and the motion of the source device. In response to identifying the wireless docking intent, the sink device establishes a current wireless docking session between the source device and the sink device.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: December 18, 2018
    Assignee: Dell Products L.P.
    Inventors: Joseph Paul Marquardt, Todd Farrell Basche
  • Patent number: 10142774
    Abstract: Systems, methods, and computer-readable storage media for invitational content geofencing. A system first sends, to a server location data associated with the system, the location data being calculated at the system. The system then receives a listing of places of interest within a geofence including a geographical perimeter for identifying places of interest in the listing, the geofence being based on the location data associated with the system. Next, the system selects a place of interest from the listing based on a location of the system. The system then presents a content item associated with the place of interest.
    Type: Grant
    Filed: July 3, 2017
    Date of Patent: November 27, 2018
    Assignee: Apple Inc.
    Inventors: Thomas Alsina, David T. Wilson, Kenley Sun, Sagar Joshi
  • Patent number: 10078697
    Abstract: Computer-implemented method of and system for searching an inverted index having a plurality of posting lists, comprising: Receiving a search query including a plurality of search terms to be searched. Multithreadedly searching a plurality of complementary sets of corresponding interspaced segments of each of the plurality of posting lists corresponding to the plurality of search terms, each set being searched via a separate thread to yield per-thread search results. Aggregating the per-thread search results to yield aggregated search results. Transmitting at least a portion of the aggregated search results.
    Type: Grant
    Filed: February 25, 2013
    Date of Patent: September 18, 2018
    Assignee: Yandex Europe AG
    Inventor: Petr Sergeevich Popov
  • Patent number: 10073890
    Abstract: A comparison engine configured to utilize combined semantic-probabilistic algorithms to differentiate and compare an input to obtain enumerated results of similarity (items that are similar to other patent-related references), differences (items that are different from other patent-related references), and uniquenesses (how the input text is distinct from other patent-related references).
    Type: Grant
    Filed: August 3, 2015
    Date of Patent: September 11, 2018
    Assignee: MARCA RESEARCH & DEVELOPMENT INTERNATIONAL, LLC
    Inventors: Mahmoud Azmi Khamis, Bruce Golden, Rami Ikhreishi
  • Patent number: 10049164
    Abstract: Provided is a multidimensional-range search apparatus (10) including: an acquisition unit (11) that acquires a target index key indicating arbitrary point on a space-filling curve; an extraction unit (12) that extracts prefix data capable of indicating a bit string of an index key included in an unsearched section on the space-filling curve on the basis of a bit string of the target index key; a determination unit (13) that determines overlapping of an inquiry section of a multidimensional-range search and a prefix section on the space-filling curve which is indicated by the prefix data; a specification unit (14) that specifies, as a search point, an index key indicating a minimum point or a maximum point in an overlap section in which the inquiry section overlaps the prefix section, which is closest to the target index key on the space-filling curve, and which is determined to overlap the inquiry section by the determination unit; and a search unit (15) that searches an index storage unit (16) for page infor
    Type: Grant
    Filed: June 2, 2014
    Date of Patent: August 14, 2018
    Assignee: NEC Corporation
    Inventor: Shoji Nishimura
  • Patent number: 10027774
    Abstract: A method of obtaining information on navigation behavior of users accessing web pages, includes obtaining information on web page sessions and correlating the information of at least two web page sessions for one user based on the obtained information on web page sessions. The method further includes extracting information on links from the correlated information of the at least two web page sessions, and inferring information on navigation behavior of the user based on the extracted information on links and the correlated information of the at least two web page sessions for one user.
    Type: Grant
    Filed: October 15, 2013
    Date of Patent: July 17, 2018
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Icaro L. J. Da Silva, Åsa Bertze, Jing Fu
  • Patent number: 10007649
    Abstract: A content management system is disclosed. The system includes at least one server, non-transitory storage, documents, entity-specific section weights, and entity-specific review thresholds. The system further includes at least two client computer systems that enable a user to access a document for at least one of review or modification. The system will, in response to receipt of an indication that changes have been made to one or more sections of a document, A) determine a change value indicative of a quantity of changes made within each section, B) calculate an entity-specific provenance value by multiplying, on a section basis, the change value within each section by the assigned entity-specific weight value for each section, to produce an entity-specific section value for each section, and then summing the entity-specific section values; and C) when any entity-specific provenance value satisfies a review threshold value, to construct and send a review notification.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: June 26, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenytt D. Avery, Edward L. Bader, Jean-Marc Costecalde, Chi M. Nguyen, Kevin N. Trinh
  • Patent number: 10007690
    Abstract: A time series data stager that receives input data sets and outputs output data blocks for ingestion into a time series database, with the out data blocks being sent at timings according to a sliding window based on a predetermined time.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: June 26, 2018
    Assignee: International Business Machines Corporation
    Inventor: Ulrich A. Finkler
  • Patent number: 9911211
    Abstract: Provided is a process of adjusting a visualization of a graph in response to user interactions with the visualization, the process including: obtaining a graph; causing a visualization of the graph to be presented on one or more displays having a display area; receiving a request for a lens be applied to the visualization; selecting a first portion of the graph based on the first portion being presented within the region specified by the lens; and transforming the first portion of the graph.
    Type: Grant
    Filed: April 13, 2017
    Date of Patent: March 6, 2018
    Assignee: Quid, Inc.
    Inventors: Sashikanth Damaraju, Grant Titus
  • Patent number: 9785833
    Abstract: A method for efficiently grouping electronic documents that are likely textual near-duplicates includes processing first and second electronic documents to determine respective sets of character sequence counts. The processing may include, for each document, identifying a plurality of non-contiguous character sequences expressed within the document text, with each character sequence including at least one character from each of at least two different words in the text, and determining character sequence counts for each unique character sequence within the identified character sequences. The method also includes generating one or more similarity metrics, at least by comparing the sets of character sequence counts determined for the first and second electronic documents. The method may also include using the similarity metric(s) to calculate a similarity score, and assigning, based on the similarity score, the second electronic document to a same document group as the first electronic document.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: October 10, 2017
    Assignee: RELATIVITY ODA LLC
    Inventor: Robert Jenson Price
  • Patent number: 9779291
    Abstract: As visual recognition scales up to ever larger numbers of categories, maintaining high accuracy is increasingly difficult. Embodiment of the present invention include methods for optimizing accuracy-specificity trade-offs in large scale recognition where object categories form a semantic hierarchy consisting of many levels of abstraction.
    Type: Grant
    Filed: October 12, 2015
    Date of Patent: October 3, 2017
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Fei-Fei Li, Jia Deng, Jonathan Krause, Alexander C. Berg
  • Patent number: 9769733
    Abstract: Methods, systems, and devices are described for wireless communication. A first method includes receiving, at a user equipment (UE), a first set of system information; determining, based at least in part on the first set of system information, that additional system information is available; transmitting a request for the additional system information; and receiving the additional system information at the UE. A second method includes transmitting, from a base station, a first set of system information; receiving a request for additional system information; and transmitting the additional system information based at least in part on the request.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: September 19, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Keiichi Kubota, Gavin Bernard Horn
  • Patent number: 9741058
    Abstract: Embodiments provide a computer-executable method, computer system and non-transitory computer-readable medium for programmatically analyzing a consumer review. The method includes programmatically accessing, via a network device, one or more consumer reviews for a commercial entity or a commercial object. The method also includes executing a consumer review processing engine to programmatically identify an attribute descriptor in the one or more consumer reviews, and executing the consumer review processing engine to programmatically generate a sentiment score associated with the one or more consumer reviews. The method further includes storing, on a non-transitory computer-readable storage device, the attribute descriptor and the sentiment score in association with the commercial entity or the commercial object.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: August 22, 2017
    Assignee: Groupon, Inc.
    Inventors: Gaston L'Huillier, Francisco Jose Larrain, Hernan Enrique Arroyo Garcia, Juzheng Li, Daniel Langdon, Jonathan Esterhazy, Srinivasa Raghavan Vedanarayanan, Shawn Jeffery, Feras Karablieh, Bhupesh Bansal, Dor Levi, Amit Koren
  • Patent number: 9734181
    Abstract: The present invention extends to methods, systems, and computer program products for understanding tables for search. Aspects of the invention include identifying a subject column for a table, detecting a column header using other tables, and detecting a column header using a knowledge base. Implementations can be utilized in a structured data search system (SDSS) that indexes structured information, such as, tables in a relational database or html tables extracted from web pages. The SDSS allows users to search over the structured information (tables) using different mechanisms including keyword search and data finding data.
    Type: Grant
    Filed: October 2, 2014
    Date of Patent: August 15, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Zhongyuan Wang, Kanstantsyn Zoryn, Zhimin Chen, Kaushik Chakrabarti, James P. Finnigan, Vivek R. Narasayya, Surajit Chaudhuri, Kris Ganjam
  • Patent number: 9734146
    Abstract: Systems, methods and computer-readable media are provided for facilitating patient health care by providing discovery, validation, and quality assurance of nomenclatural linkages between pairs of terms or combinations of terms in databases extant on multiple different health information systems that do not share a set of unified codesets, nomenclatures, or ontologies, or that may in part rely upon unstructured free-text narrative content instead of codes or standardized tags. Embodiments discover semantic structures existing naturally in documents and records, including relationships of synonymy and polysemy between terms arising from disparate processes, and maintained by different information systems. In some embodiments, this process is facilitated by applying Latent Semantic Analysis in concert with decision-tree induction and similarity metrics.
    Type: Grant
    Filed: September 4, 2014
    Date of Patent: August 15, 2017
    Assignee: Cerner Innovation, Inc.
    Inventors: Douglas S. McNair, John Christopher Murrish, Kanakasabha Kailasam
  • Patent number: 9575952
    Abstract: Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model.
    Type: Grant
    Filed: October 21, 2014
    Date of Patent: February 21, 2017
    Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventor: Vivek Kumar Rangarajan Sridhar
  • Patent number: 9558265
    Abstract: Provided is a process including: obtaining a graph comprising nodes and edges, each of the edges having a value indicating an amount of similarity between objects corresponding to the two linked nodes; selecting a parameter for influencing the graph; assessing each of the nodes based on the selected influencing parameter, wherein assessing comprises, with respect to each adjacent node in the graph sharing an edge with the node: determining the value indicating the amount of similarity between the object corresponding to the node and the object corresponding to the adjacent node; and determining a score related to the edge shared with the node, the score determined based on the similarity-amount value and a value of the selected influencing parameter for the node, such that edges are removed, weakened, added, or strengthened; and preparing, based on the graph, instructions to display at least part of the graph.
    Type: Grant
    Filed: May 12, 2016
    Date of Patent: January 31, 2017
    Assignee: Quid, Inc.
    Inventors: Ruggero Altair Tacchi, Fabio Ciulla
  • Patent number: 9558165
    Abstract: A method and system for summarizing messages from a message stream is disclosed in which association analysis is applied to stream of short data messages comprising words in a spoken language, such as English. Clusters of words are identified that provide a summary of the several conversations (short data messages originating from different human sources) that are imbedded in the message stream. Each word cluster may represent a set of messages that are its instances. The word clusters may collectively constitute a summary of the entire message stream. The word clusters that have been extracted from message stream may also be grouped into topics. Also, an identity of one or more message originators may be listed based on their influence on the messages being analyzed. The short data messages may also be sorted based on a geographical location of one or more originators of messages.
    Type: Grant
    Filed: August 19, 2012
    Date of Patent: January 31, 2017
    Assignee: EMICEN CORP.
    Inventors: Roy Marsten, Russell Caldwell, Radhika Subramanian
  • Patent number: 9477751
    Abstract: A system and method for displaying relationships between concepts to provide classification suggestions via injection is provided. A reference set of concepts each associated with a classification code is designated. Clusters of uncoded concepts are designated. One or more of the uncoded concepts from at least one cluster are compared to the reference set. At least one of the concepts in the reference set that is similar to the one or more uncoded concepts is identified. The similar concepts are injected into the at least one cluster. Relationships between the uncoded concepts and the similar concepts in the at least one cluster are visually depicted as suggestions for classifying the uncoded concepts.
    Type: Grant
    Filed: July 27, 2010
    Date of Patent: October 25, 2016
    Assignee: FTI Consulting, Inc.
    Inventors: William C. Knight, Nicholas I. Nussbaum, John W. Conwell
  • Patent number: 9479839
    Abstract: Provided is a method and system for providing a representative phrase with respect to a real time popular keyword, which may determine programs including a popular keyword from broadcast information, and may generate a representative phrase with respect to the popular keyword using the determined programs, thereby providing the representative phrase by combining the generated representative phrase and the popular keyword.
    Type: Grant
    Filed: July 6, 2011
    Date of Patent: October 25, 2016
    Assignee: NHN Corporation
    Inventors: Jae Seung Shin, Young Sub Park, Jae Keol Choi, Won Sook Noh
  • Patent number: 9472115
    Abstract: Mechanisms for evaluating a link between information concept entities are provided. A set of evidential data specifying a plurality of information concept entities is received and a link between at least two information concept entities in the set of evidential data is generated. The set of evidential data is evaluated with regard to whether or not the set of evidential data supports or refutes the link. The evaluation of the set of evidential data comprises analyzing language of natural language statements in the set of evidential data to identify certainty terms within the natural language statements. A confidence value for the link is calculated based on results of the evaluation of the set of evidential data and a knowledge output is generated based on the link and the confidence value associated with the link.
    Type: Grant
    Filed: November 19, 2014
    Date of Patent: October 18, 2016
    Assignee: International Business Machines Corporation
    Inventors: Darryl M. Adderly, Corville O. Allen, Robert K. Tucker
  • Patent number: 9454602
    Abstract: A device may analyze text to identify a set of text portions of interest, and may analyze the text to identify a set of terms included in the set of text portions. The device may perform a similarity analysis to determine a similarity score. The similarity score may be determined between each term, included in the set of terms, and each text portion, included in the set of text portions, or the similarity score may be determined between each term and each other term included in the set of terms. The device may determine a set of dominant terms based on performing the similarity analysis. The set of dominant terms may include at least one term with a higher average degree of similarity than at least one other term. The device may provide information that identifies the set of dominant terms.
    Type: Grant
    Filed: August 29, 2013
    Date of Patent: September 27, 2016
    Assignee: Accenture Global Services Limited
    Inventors: Janardan Misra, Shubhashis Sengupta, Subhabrata Das
  • Patent number: 9430485
    Abstract: An information processor coupled to a storage apparatus that stores information, includes: a creation unit configured to create a snapshot of a file system that manages first information stored in the storage apparatus and to output the snapshot to the storage apparatus; a writing unit configured to write second information stored in cache memory onto the storage apparatus after the snapshot has been created; and a replication instruction unit configured to instruct the storage apparatus to create a replication of the first information stored in the storage apparatus after the second information has been written and the snapshot.
    Type: Grant
    Filed: November 6, 2013
    Date of Patent: August 30, 2016
    Assignee: FUJITSU LIMITED
    Inventors: Norihito Kato, Nobuhiro Takano, Norichika Imamura
  • Patent number: 9384233
    Abstract: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user.
    Type: Grant
    Filed: December 4, 2012
    Date of Patent: July 5, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ariel Fuxman, Hoa Nguyen, Juliana Freire de Lima e Silva, Stelios Paparizos, Rakesh Agrawal, Zhimin Chen, Lawrence William Colagiovanni, Prakash Sikchi
  • Patent number: 9292491
    Abstract: An apparatus for providing a control input signal for an industrial process or technical system having one or more controllable elements includes elements for generating a semantic space for a text corpus, and elements for generating a norm from one or more reference words or texts, the or each reference word or text being associated with a defined respective value on a scale, and the norm being calculated as a reference point or set of reference points in the semantic space for the or each reference word or text with its associated respective scale value. Elements for reading at least one target word included in the text corpus, elements for predicting a value of a variable associated with the target word based on the semantic space and the norm, and elements for providing the predicted value in a control input signal to the industrial process or technical system.
    Type: Grant
    Filed: June 13, 2014
    Date of Patent: March 22, 2016
    Assignee: STROSSLE INTERNATIONAL AB
    Inventors: Sverker Sikstrom, Mattias Tyrberg, Anders Hall, Fredrik Horte, Joakim Stenberg
  • Patent number: 9264446
    Abstract: Methods and systems for analyzing flows of communication packets. A front-end processor associates input packets with flows and forwards each flow to the appropriate unit, typically by querying a flow table that holds a respective classification for each active flow. In general, flows that are not yet classified are forwarded to the classification unit, and the resulting classification is entered in the flow table. Flows that are classified as requested for further analysis are forwarded to an appropriate flow analysis unit. Flows that are classified as not requested for analysis are not subjected to further processing, e.g., discarded or allowed to pass.
    Type: Grant
    Filed: January 25, 2012
    Date of Patent: February 16, 2016
    Assignee: VERINT SYSTEMS LTD.
    Inventors: Eithan Goldfarb, Yuval Altman, Naomi Frid, Gur Yaari
  • Patent number: 9223779
    Abstract: Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results.
    Type: Grant
    Filed: October 14, 2014
    Date of Patent: December 29, 2015
    Assignee: Alibaba Group Holding Limited
    Inventors: Jian Sun, Lei Hou, Jing Ming Tang, Min Chu, Xiao Ling Liao, Bing Jing Xu, Ren Gang Peng, Yang Yang
  • Patent number: 9159030
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining geographic locations of devices. One of the methods includes obtaining an estimated user location associated with each respective IP address block based on observed events from the IP address block; obtaining an estimate of a probability model p(ev|loc), the probability model p(ev|loc) including a respective probability distribution of interest locations for each of multiple user locations; wherein obtaining the estimate of the probability model p(ev|loc) includes calculating p(ev|loc) from a p(zone|loc) matrix and a p(ev|zone) matrix; and using the estimate for the probability model p(ev|loc) and the observed events to calculate an estimate for multiple probability distributions X(loc) associated with a respective IP address block.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: October 13, 2015
    Assignee: Google Inc.
    Inventor: Hartmut Maennel
  • Publication number: 20150142812
    Abstract: The present application discloses a method, a server and a computer readable storage medium for segmenting a search query. The server receives a query segmentation request including a search query, and the search query further includes an ordered sequence of semantic elements. Each semantic element is correlated with one or more predetermined search terms each at least including the respective semantic element. The server further modifies the search terms by replacing irrelevant semantic elements with segmentation identifiers. The modified search terms are then combined to form combined search queries each of which includes the ordered sequence of semantic elements and at least one segmentation identifier that separates the semantic elements. A specific combined search query is identified based on search probabilities of the combined search queries, and the search query is segmented according to a location of at least one segmentation identifier in the specific combined search query.
    Type: Application
    Filed: January 29, 2015
    Publication date: May 21, 2015
    Inventor: Chao MA
  • Publication number: 20150134666
    Abstract: Techniques for managing big data include retrieval using per-subject dictionaries having multiple levels of sub-classification hierarchy within the subject. Entries may include subject-determining-power (SDP) scores that provide an indication of the descriptive power of the entry term with respect to the subject of the dictionary containing the term. The same term may have entries in multiple dictionaries with different SDP scores in each of the dictionaries. A retrieval request for one or more documents containing search terms descriptive of the one or more documents can be processed by identifying a set of candidate documents tagged with subjects, i.e., identifiers of per-subject dictionaries having entries corresponding to a search term, then using affinity values to adjust the aggregate score for the terms in the dictionaries. Documents are then selected for best match to the subject based on the adjusted scores.
    Type: Application
    Filed: November 12, 2013
    Publication date: May 14, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Anne Elizabeth Gattiker, Fadi H. Gebara, Anthony N. Hylick, Rouwaida N. Kanj
  • Patent number: 9031952
    Abstract: Methods and apparatuses are provided for user interest modeling. A method may include accessing logged interactive user history data for a user data for a user. The method may additionally include determining at least one user interest topic for the user by utilizing a topic model acting upon at least a portion of the logged interactive user history data and one or more seed documents generated from a topic feature source. Corresponding apparatuses are also provided.
    Type: Grant
    Filed: December 31, 2009
    Date of Patent: May 12, 2015
    Assignee: Nokia Corporation
    Inventors: Jilei Tian, Rile Hu, Wenfeng Li, Xiaojie Wang
  • Patent number: 9031955
    Abstract: Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
    Type: Grant
    Filed: January 30, 2014
    Date of Patent: May 12, 2015
    Assignee: Splunk Inc.
    Inventors: R. David Carasso, Micah James Delfino
  • Publication number: 20150127652
    Abstract: The disclosed solution uses machine learning-based methods to improve the knowledge extraction process in a specific domain or business environment. By formulizing a specific company's internal knowledge and terminology, the ontology programming accounts for linguistic meaning to surface relevant and important content for analysis. For example, the disclosed ontology programming adapts to the language used in a specific domain, including linguistic patterns and properties, such as word order, relationships between terms, and syntactical variations. Based on the self-training mechanism developed by the inventors, the ontology programming automatically trains itself to understand the domain or environment of the communication data by processing and analyzing a defined corpus of communication data.
    Type: Application
    Filed: October 30, 2014
    Publication date: May 7, 2015
    Inventor: Roni Romano
  • Patent number: 9026535
    Abstract: A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.
    Type: Grant
    Filed: January 2, 2013
    Date of Patent: May 5, 2015
    Assignee: Brainspace Corporation
    Inventor: Paul A. Jakubik
  • Publication number: 20150120708
    Abstract: Described are a method and system for aggregating, categorizing, and displaying information. With the method, information is acquired from an information-exchanging-sharing platform, and a content keyword of the information is extracted; the information is aggregated and categorized according to the content keyword; and the information is displayed according to each category. In the system, a keyword extracting unit is configured for acquiring information from an information-exchanging-sharing platform, and extracting a content keyword of the information; an aggregating-categorizing unit is configured for aggregating and categorizing the information according to the content keyword; and a displaying unit is configured for displaying the information according to each category. With what described, it is possible to display aggregated and categorized information, facilitating information sharing and exchanging as well as reducing complexity in user operation.
    Type: Application
    Filed: December 29, 2014
    Publication date: April 30, 2015
    Inventor: Feng Kang
  • Publication number: 20150120738
    Abstract: A computer based method and system for classifying a document into one or more categories. The method and system can be configured to identify one or more cluster of clauses or sentences from a plurality of semantically similar clauses of the document and determine one or more representative concepts for each cluster of the document. Accordingly, one or more categories for the document are determined from the one or more representative concepts and the document is classified into the one or more categories.
    Type: Application
    Filed: December 24, 2014
    Publication date: April 30, 2015
    Applicant: RAGE FRAMEWORKS, INC.
    Inventor: Venkat Srinivasan
  • Patent number: 9015160
    Abstract: A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.
    Type: Grant
    Filed: December 14, 2011
    Date of Patent: April 21, 2015
    Assignee: Brainspace Corporation
    Inventor: Paul A. Jakubik
  • Patent number: 9015171
    Abstract: Exemplary systems and methods for linking entity references to entities and identifying associations between entities are presented. In particular, a method for delinking one or more entity references linked to a same entity is provided, where the one or more entity references have at least one common data field. The method comprises the steps of evaluating at least one actual measurement of the entity based at least in part on one or more field values of the one or more entity references, determining a difference between the at least one actual measurement and at least one predefined measurement associated with the entity and delinking the one or more entity references based at least in part on a comparison of the difference and a defined threshold.
    Type: Grant
    Filed: December 14, 2009
    Date of Patent: April 21, 2015
    Assignee: Lexisnexis Risk Management Inc.
    Inventor: David Bayliss
  • Publication number: 20150100584
    Abstract: The present invention provides a computer-implemented method of analyzing messages in a computer system to allow workflows constituted by the messages to be identified, the method comprising: analyzing a sequence of messages in a computer system in order to classify the messages, thereby producing a corresponding sequence of classifications of the messages; and, applying sequence induction to the sequence of classifications of the messages to produce (i) a set or sub-sequences of the classifications of the messages and (ii) a sequence grammar for the sub-sequences, from which a workflow constituted by the sequence of messages can be identified.
    Type: Application
    Filed: August 12, 2014
    Publication date: April 9, 2015
    Inventor: Stephen Anthony Moyle
  • Patent number: 8996529
    Abstract: A networked computer system identifies, optimizes and recommends content sources for users. The content sources can be used for providing news feeds, search results, etc. based on taking into net useful content contributed by such sources over other sources.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: March 31, 2015
    Assignee: John Nicholas and Kristin Gross Trust
    Inventors: John Nicholas Gross, Philip Albert
  • Publication number: 20150088896
    Abstract: A Website may be automatically categorized by (a) accepting Website information, (b) determining a set of scored clusters (e.g., semantic, term co-occurrence, etc.) for the Website using the Website information, and (c) determining at least one category (e.g., a vertical category) of a predefined taxonomy using at least some of the set of clusters.
    Type: Application
    Filed: December 4, 2014
    Publication date: March 26, 2015
    Inventors: David Gehrking, Ching Law, Andrew Maxwell
  • Publication number: 20150081714
    Abstract: An approach is provided for an information handling system to present knowledge-based information. In the approach, a semantic analysis is performed on the document with the analysis resulting in various sets of semantic content. Each of the sets of semantic content corresponds to an area in the document. The areas of the document are visually highlighted using visual indicators that show the availability of the sets of semantic content to a user via a user interface. In response to a user selection, such as a selection using the user interface or a user specified configuration setting, a selected set of semantic content is displayed to the user using the interface.
    Type: Application
    Filed: September 17, 2013
    Publication date: March 19, 2015
    Applicant: International Business Machines Corporation
    Inventors: Donna K. Byron, Krishna Kummamuru, Alexander Pikovsky, Timothy Winkler
  • Publication number: 20150081715
    Abstract: A processor performs semantic analysis on a query and generates one or more semantic structures where each structure is expressed by a graph. The processor generates retrieval keys corresponding to combinations of nodes connected directly or indirectly in the semantic structures, in addition to retrieval keys corresponding to minimum units of semantic connections between nodes in the generated semantic structures. The processor retrieves relevant documents whose sentences are matched to combinations of nodes, by using the generated retrieval keys, in the semantic structures stored in an index for retrieval on a database storing the documents.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 19, 2015
    Inventors: Seiji OKURA, Akira USHIODA
  • Patent number: 8983840
    Abstract: Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker.
    Type: Grant
    Filed: June 19, 2012
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Om D. Deshmukh, Sachindra Joshi, Saket Saurabh, Ashish Verma
  • Patent number: 8983963
    Abstract: Certain example embodiments relate to techniques for analyzing documents. A plurality of documents/document portions are imported into a database, with at least some of the documents/document portions being structured and at least some being unstructured. The imported documents/document portions are organized into one or more collections. A selection of at least one of the one or more collections is made. An index of words and/or groups of words is built (and optionally refined in accordance with one or more predefined rules) based on each of the document or document portion in each selection. A document-word matrix is built (and optionally weighted using a semantic approach), with the matrix including a value indicative of a number of times each word and/or group of words in the index appears in each document/document portion. One or more clusters of documents are generated using the document-word matrix.
    Type: Grant
    Filed: July 7, 2011
    Date of Patent: March 17, 2015
    Assignee: Software AG
    Inventors: Klaus Fittges, Khalid El Mansouri