Clustering Or Classification (epo) Patents (Class 707/E17.089)
-
Patent number: 11361028Abstract: A technique produces a graph data structure based on at least partially unstructured information dispersed over web documents. The technique involves applying a machine-trained model to a set of documents (or, more generally “document units”) to identify topics in the documents. The technique then generates count information by counting the occurrences of the single topics and co-occurrences of parings of topics in the documents. The technique generates conditional probability information based on the count information. An instance of conditional probability information describes a probability that a first topic will occur, given an appearance of a second topic, and a probability that the second topic will occur, given an appearance of the first topic. The technique then formulates the conditional probability information in a graph data structure. The technique also provides an application system that utilizes the graph data structure to provide any kind of computer-implemented service to a user.Type: GrantFiled: June 9, 2020Date of Patent: June 14, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Ziliu Li, Junaid Ahmed, Arnold Overwijk, Li Xiong, Xiao Liu
-
Patent number: 10585965Abstract: A determination device includes an image obtaining unit for obtaining an image in a linked area associated with an URL, a linked-to page obtaining unit for obtaining, from storing means for storing content, a linked-to page specified by the URL associated with the linked area, and a character determination unit for determining correctness of association between the linked area and the URL based on the image obtained by the image obtaining unit and the linked-to page obtained by the linked-to page obtaining unit.Type: GrantFiled: June 28, 2013Date of Patent: March 10, 2020Assignee: RAKUTEN, INC.Inventor: Yukiko Ochiai
-
Patent number: 10380226Abstract: Described herein are techniques for identifying and displaying key excerpts of a digital work and related key excerpts of other digital works. Key excerpts are identified by evaluating (a) the number of interactions by human readers within each of the key excerpts and (b) the number of reviews that reference each of the key excerpts. Related excerpts from other books can be identified by comparing the key excerpts of the other books. Excerpts can be displayed by subject, and links are provided to move from one subject to another.Type: GrantFiled: September 16, 2014Date of Patent: August 13, 2019Assignee: Amazon Technologies, Inc.Inventors: Walter Manching Tseng, Abhishek Patnia, Adam Joseph Iser, Christopher Michael Ellis, Alice Chu
-
Patent number: 9842175Abstract: The present invention provides a method and system for automatically identifying and selecting preferred classification and regression trees. The invention is used to identify a specific decision tree or group of trees that are consistent across train and test samples in node-specific details that are often important to decision makers. Specifically, for a tree to be identified as preferred by this system, the train and test samples must both agree on key measures for every terminal node of the tree. In addition to this node-by-node criterion, an additional tree selection method may be imposed. Accordingly, the train and test samples rank order the nodes on a relevant measure in the same way. Both consistency criteria may be applied in a fuzzy manner in which agreement must be close but need not be exact.Type: GrantFiled: January 4, 2008Date of Patent: December 12, 2017Assignee: Minitab, Inc.Inventors: Dan Steinberg, Nicholas Scott Cardell
-
Patent number: 8983961Abstract: A high availability system in a cloud computing environment includes a snapshot manager disposed in a mirror environment having at least one computer server and a plurality of virtual machines disposed in a production environment. Each of the plurality of virtual machines includes a snapshot agent configured to perform a method. The method includes periodically taking snapshots of the virtual machine associated with the snapshot agent, determining a delta image based on a change between a current snapshot and a previous snapshot, removing previous snapshots in the virtual machine and transmitting the delta image to the snapshot manager. The snapshot manager is configured to store a recovery image for each of the plurality of virtual machines and to merge the received delta image with the recovery image to update the recovery image.Type: GrantFiled: November 29, 2012Date of Patent: March 17, 2015Assignee: International Business Machines CorporationInventors: Hoi Y. Chan, Trieu C. Chieu
-
Patent number: 8930366Abstract: A method and system for automatically ranking product reviews according to review helpfulness. Given a collection of reviews, the method employs an algorithm that identifies dominant terms and uses them to define a feature vector representation. Reviews are then converted to this representation and ranked according to their distance from a ‘locally optimal’ review vector. The algorithm is fully unsupervised and thus avoids costly and error-prone manual training annotations. In one embodiment a Multi Layer Lexical Model (MLLM) approach partitions the dominant lexical terms in a review into layers, creates a compact unified layers lexicon, and ranks the reviews according to their weight with respect to unified lexicon, all in a fully unsupervised manner. When used to rank book reviews, it was found that the invention significantly outperforms the user votes-based ranking employed by Amazon.Type: GrantFiled: January 11, 2009Date of Patent: January 6, 2015Assignee: Yissum Research Development Comapny of the Hebrew University of Jerusalem LimitedInventors: Ari Rappoport, Oren Tsur
-
Patent number: 8918275Abstract: With use of GPS, an action-history recording apparatus obtains latitudes and longitudes representing places of user's action where a user is acting, and stores action-history data containing place names indicating the places of user's action at a predetermined processing timing. In the case where, the place of user's action is a specific place unique to the user, where the user visits customarily or frequently, the user is allowed to enter an arbitrary name independent of the latitude and longitude. The name entered by the user is used as a pace name to be contained in action-history data. In this way, the apparatus obtains a place name appropriate for the user and the user can use the name conveniently as the place name of the user's action.Type: GrantFiled: February 6, 2012Date of Patent: December 23, 2014Assignee: Casio Computer Co., Ltd.Inventor: Naoyuki Sakazaki
-
Patent number: 8918397Abstract: A computer implemented method for clustering customers includes receiving a source set of customer records, wherein each customer record represents one customer, and each customer record includes at least one data attribute, and each data attribute has an attribute value; pre-processing the source set of customer records to generate a pre-processed set of customer records; executing a clustering algorithm on the pre-processed set of customer records to group the pre-processed set of customer records into clusters of a pre-defined number. The pre-processing comprises: determining the type of a customer in the source set of customer records; using a type attribute value to indicate the type of the customer in its customer record; normalizing data attribute values and type attribute values; weighting to the data attribute values and the type attribute values respectively to obtain weighted attribute values of the data attribute and weighted attribute values of the type attribute.Type: GrantFiled: July 30, 2012Date of Patent: December 23, 2014Assignee: International Business Machines CorporationInventors: Heng Cao, Jin Dong, Jacqueline Giang Huong Morris, Ming Xie, Wen Jun Yin, Bin Zhang
-
Patent number: 8914372Abstract: A computer implemented method for clustering customers includes receiving a source set of customer records, wherein each customer record represents one customer, and each customer record includes at least one data attribute, and each data attribute has an attribute value; pre-processing the source set of customer records to generate a pre-processed set of customer records; executing a clustering algorithm on the pre-processed set of customer records to group the pre-processed set of customer records into clusters of a pre-defined number. The pre-processing comprises: determining the type of a customer in the source set of customer records; using a type attribute value to indicate the type of the customer in its customer record; normalizing data attribute values and type attribute values; weighting to the data attribute values and the type attribute values respectively to obtain weighted attribute values of the data attribute and weighted attribute values of the tune attribute.Type: GrantFiled: March 28, 2012Date of Patent: December 16, 2014Assignee: International Business Machines CorporationInventors: Heng Cao, Jin Dong, Jacqueline Giang Huong Morris, Ming Xie, Wen Jun Yin, Bin Zhang
-
Patent number: 8903825Abstract: A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.Type: GrantFiled: May 23, 2012Date of Patent: December 2, 2014Assignee: NamesforLife LLCInventors: Charles T. Parker, George M. Garrity
-
Patent number: 8843494Abstract: Using keywords to merge document clusters is described. Documents are distributed into document clusters that include a first document cluster of first documents and a second document cluster of second documents. A template associated with the first document cluster is created. The template includes keywords associated with most of the first documents. A distance is calculated between keyword location information associated with the template and word location information associated with a document in the second document cluster. The keyword location information includes information indicating a location of a keyword in the template relative to other keywords in the template. The word location information includes information indicating a location of a word in the document relative to other words in the document. A determination is made whether the distance is less than a threshold value.Type: GrantFiled: April 23, 2013Date of Patent: September 23, 2014Assignee: EMC CorporationInventor: Steven Sampson
-
Patent number: 8788497Abstract: Interrelated items in a complex item set (such as a set of components in a complex software architecture) may be difficult to present in a manner that facilitates an understanding and evaluation of the item set, due to the amount of information and the difficulty in automatically discerning the organization of the item set. A set of criteria may be utilized to form criterion groups to which items matching respective criteria may be automatically assigned. Further grouping assignments may be achieved by identifying an ungrouped item that is associated with a grouped item. Such techniques may be applied in many variations to yield a representation of the item set, and a presentation of the item set to a user, that aggregates similar items and interrelationships, thereby promoting an understanding and analysis of the structure and organization of the item set while reducing the user involvement in the generation of same.Type: GrantFiled: September 15, 2008Date of Patent: July 22, 2014Assignee: Microsoft CorporationInventors: Jean-Pierre Duplessis, Chris Lovett, Craig Symonds, Jacob Meyer, Scott Marison, Allen Denver, Tracey Trewin
-
Patent number: 8788498Abstract: Described is a technology for obtaining labeled sample data. Labeling guidelines are converted into binary yes/no questions regarding data samples. The questions and data samples are provided to judges who then answer the questions for each sample. The answers are input to a label assignment algorithm that associates a label with each sample based upon the answers. If the guidelines are modified and previous answers to the binary questions are maintained, at least some of the previous answers may be used in re-labeling the samples in view of the modification.Type: GrantFiled: June 15, 2009Date of Patent: July 22, 2014Assignee: Microsoft CorporationInventors: Anitha Kannan, Krishnaram Kenthapadi, John C. Shafer, Ariel Fuxman
-
Patent number: 8776228Abstract: Systems and methods are provided for intrusion detection. The systems and methods may include receiving transaction information related to one or more current transactions between a client entity and a resource server, accessing a database storing a plurality of transaction groups, analyzing the received transaction information with respect to information related to at least one of the plurality of transaction groups, and based on said analyzing, determining a possibility of an occurrence of an intrusion act at the resource server. The transaction groups may be formed based on a plurality of past transactions between a plurality of client entities and the resource server. Identity information of a user associated with the one or more current transactions may also be received along with the transaction information. The user may be associated with at least one of the plurality of transaction groups.Type: GrantFiled: November 22, 2011Date of Patent: July 8, 2014Assignee: CA, Inc.Inventors: Ramesh Natarajan, Timothy Gordon Brown, Carrie Elaine Gates
-
Patent number: 8775401Abstract: The present application relates to a method for implementing picture search and a website server thereof.Type: GrantFiled: February 1, 2013Date of Patent: July 8, 2014Assignee: Alibaba Group Holding LimitedInventors: Chunyi Zhou, Weiwei Wang, Xinfeng Zhou, Yu Dong, Xiaoying Weng, Jialong Huang
-
Patent number: 8745728Abstract: Methods, apparatus, systems and computer program products are described and claimed that provide for automatically and positively determining that an associate accessing a business domain/application using an application-specific associate identifier is the same associate that is accessing another business domain/application using another application-specific associate identifier. Once the positive determination of same associate is made, a federated identifier key is generated and applied to all of the platforms in which the associate can be positively identified, so as to globally identify the associates across multiple enterprise-wide domains/applications. As such, the present invention eliminates the need to manually analyze associate data to determine if an associate interfacing with one domain/application is the same associate interfacing with another domain/application.Type: GrantFiled: May 10, 2012Date of Patent: June 3, 2014Assignee: Bank of America CorporationInventors: Rangarajan Umamaheswaran, Bruce Wyatt Englar, Brett A. Nielson, Miroslav Halas
-
Publication number: 20140143254Abstract: Systems and methods can determine categories for product searches. One or more computing devices can receive a product query of search terms. The product query can be classified to identify a product category. The search terms may be verified against an ambiguous term list for the product category. The search terms may also be verified against an attribute list for the product category. The product query may be classified as fully understood in response to all of the search terms matching either the ambiguous term list or the attribute list for the product category. A product search may be performed on the product query. The product search may be informed by the product category when the product query has been classified as fully understood. Search results may be generated and returned according to the product search.Type: ApplicationFiled: November 16, 2012Publication date: May 22, 2014Inventors: Ritendra Datta, Joshua Yelon, Thomas Walter Murphy
-
Publication number: 20140136540Abstract: A system and method of determining the level of diversity for a search query are described. Distances between leaf categories in a hierarchical category tree are determined using co-click counts between the leaf categories for a query. Coordinate representations of the leaf categories are determined using the distances between the leaf categories. A diversity score for the query is determined using the coordinate representations. The diversity score represents a degree of variability in what different users find relevant to the query. In some embodiments, determining distances between leaf categories comprises determining the distances using a normalization of the co-click counts that uses co-impression counts between the leaf categories for the query. In some embodiments, a manifold learning algorithm is used to determine the coordinate representations. In some embodiments, multi-dimensional scaling is used to determine the coordinate representations.Type: ApplicationFiled: November 9, 2012Publication date: May 15, 2014Applicant: eBay Inc.Inventors: Duangmanee Putthividhya, Zhaohui Chen
-
Publication number: 20140136537Abstract: A computing system determines incremental values associated with a plurality of clustering solutions. Each of the clustering solutions groups stores of a retailer into clusters in a different way. For each clustering solution in the plurality of clustering solutions, the incremental value associated with the clustering solution indicates a difference between an estimated revenue associated with the clustering solution and revenue associated with a baseline clustering solution. The computing system then determines, based on the incremental values associated with the plurality of clustering solutions, the appropriate number of clusters. The clustering solutions that group the stores into more or fewer clusters than the appropriate number of clusters tend to be associated with incremental values that are the same or lower than the clustering solutions that group the stores into the appropriate number of clusters.Type: ApplicationFiled: November 15, 2012Publication date: May 15, 2014Applicant: Target Brands, Inc.Inventors: James Carl Nelson, Raja Ranganathan, Abhijit Sharma, Zachary George Sands
-
Publication number: 20140122483Abstract: An activity-modeling system computes an amount of time that a user is expected to spend when performing activities of a certain type. During operation, the system can obtain a plurality of location events associated with the user, such that a respective location event indicates a time at which a user logged his location while performed an activity related to the activity type. The system selects, from the plurality of location events, a set of location events associated with the activity type. The system determines an activity start-time and an activity end-time for the activity type from the set of location events, and computes an activity-duration time for the activity type based on the determined activity start-time and the activity end-time.Type: ApplicationFiled: October 26, 2012Publication date: May 1, 2014Applicant: PALO ALTO RESEARCH CENTER INCORPORATEDInventors: Rui Zhang, Robert R. Price, Oliver Brdiczka
-
Publication number: 20140114972Abstract: Systems and methods for sharing information between distributed computer systems connected to one or more data networks. In particular, a replication system implementing methodologies for sharing database information between computer systems where the databases use different classification schemes for information access control is disclosed.Type: ApplicationFiled: October 22, 2012Publication date: April 24, 2014Applicant: PALANTIR TECHNOLOGIES, INC.Inventors: Richard Allen Ducott, III, John Kenneth Garrod, Khan Tasinga
-
Publication number: 20140108460Abstract: Data stores that store content units and annotations regarding the content units derived through a semantic interpretation of the content units. When annotations are stored in a database, different parts of an annotation may be stored in different tables of the database. For example, one or more tables of the database may store all semantic classifications for the annotations, while one or more other tables may store content of all of the annotations. A user may be permitted to provide natural language queries for searching the database. A natural language query may be semantically interpreted to determine one or more annotations from the query. The semantic interpretation of the query may be performed using the same annotation model used to determine annotations stored in the database. Semantic classifications and format of the annotations for a query may be the same as one or more annotations stored in the database.Type: ApplicationFiled: October 11, 2012Publication date: April 17, 2014Applicant: Nuance Communications, Inc.Inventors: Mariana Casella dos Santos, Frank Montyne
-
Publication number: 20140108410Abstract: A test case generation system includes a processor, a process residing on the processor and configured to extract descriptions from document artifacts, extract a first set of keywords from the descriptions, categorize the descriptions to a first set and a second set, extract a second set of keywords that occur in the second set and generate a test case from the second set of keywords.Type: ApplicationFiled: October 17, 2012Publication date: April 17, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Futoshi Iwama, Ken Mizuno, Taiga Nakamura, Hironori Takeuchi
-
Publication number: 20140101162Abstract: A method for recommending semantic annotations on a main document and sub documents is provided. The method includes: extracting a keyword of the main document; extracting a or a set of keyword of each sub document; and generating a or a set of keyword similarity of each of the sub documents based on a degree of similarity between the keyword of the main document and the keyword of each of the sub documents. The method also includes: obtaining a plurality of words appeared on each of the sub documents and calculating a frequency of each of the words; generating a semantic capacity of each of the sub documents according to the frequencies; grouping the main document and at least one of the sub documents into a semantic document set based on the semantic capacities and the keyword similarities; and annotating the main document according to the semantic document set.Type: ApplicationFiled: October 9, 2012Publication date: April 10, 2014Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTEInventors: Hsiang-Yuan Hsueh, Ko-Li Kan, Chi-Chou Chiang
-
Publication number: 20140095505Abstract: Systems and methods that allow for an intelligence platform for distributed processing of big data sets including both structured and unstructured data types across two or more intelligent data operation engine servers. The intelligent data operation engine servers can form a conceptual understanding of content in each electronic file and then cooperates with a distributed index handler to index the conceptual understanding of the electronic file. A query pipeline and the distributed index handler in the intelligence platform cooperate with the two or more intelligent data operation engine servers to improve scalability and performance on the big data sets containing both structured and un-structured electronic files represented in the common index.Type: ApplicationFiled: October 1, 2012Publication date: April 3, 2014Applicant: LONGSAND LIMITEDInventors: Sean Mark Blanchflower, Darren John Gallagher
-
Publication number: 20140095503Abstract: A system and a method for initializing a streaming application are disclosed. The method may include initializing a streaming application for execution on one or more compute nodes which are adapted to execute one or more stream operators. The method may, during a compiling of code, identify whether a processing condition exists at a first stream operator of a plurality of stream operators. The method may add a grouping condition to a second stream operator of the plurality of stream operators if the processing condition exists. The method may provide for the second stream operator to group tuples for sending to the first stream operator.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Applicant: International Business Machines CorporationInventors: Michael J. Branson, Bradford L. Cobb, John M. Santosuosso
-
Publication number: 20140081973Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying a spike in a rate of occurrence of events. One of the methods includes receiving data identifying a spike at a particular time in a rate of occurrence of events relating to a particular search query, where an event relating to the particular search query is a receipt event of the particular search query or an indexing event of a resource that satisfies the particular search query, fitting the occurrences of the events in a time window to a reference distribution of occurrences of events to determine a goodness of fit value, wherein the reference distribution models a random occurrence of events relating to search queries, comparing the goodness of fit value to a primary threshold, and classifying the spike as a spurious spike if the goodness of fit value satisfies the predetermined threshold.Type: ApplicationFiled: September 14, 2012Publication date: March 20, 2014Applicant: Google Inc.Inventors: Mukund Jha, Kumar Mayur Thakur
-
Publication number: 20140081974Abstract: Systems and methods are provided for aggregating relevant electronic content items that are relevant to one another. In one embodiment, a content management application determines that a first electronic content item and a second electronic content item are relevant to one another. The first electronic content item is provided by a first client account and the second electronic content item is provided by a second client account. The content management application also aggregates the first and second electronic content items to form at least part of a collection of electronic content. The first and second electronic content items are aggregated based on determining that the first and second electronic content items are relevant to one another. The content management application also provides access to the collection of electronic content.Type: ApplicationFiled: September 18, 2012Publication date: March 20, 2014Applicant: Adobe Systems IncorporatedInventors: Jon Lorenz, Justin Velo
-
Publication number: 20140067807Abstract: A method performed on an electronic device for migrating tags across entities. The migration of the tags is performed following an analysis of one or more personal electronically encoded items associated with a previously created perspective or album associated with the previously created perspective, responsive to a user decision the creation of a new perspective, a new album associated with one of the previously created perspectives, or a new perspective and a new album associated with the new perspective, responsive to a user decision to treat the previously created perspective or album as an individual entity, and association of the previously created perspective or album with the new perspective or new album. The tags are respectively migrated from the new perspective or the new album to the associated previously created perspective or the previously created album and to associated ones of the one or more personal electronically encoded items.Type: ApplicationFiled: August 31, 2012Publication date: March 6, 2014Applicant: RESEARCH IN MOTION LIMITEDInventors: Anand Ravindra OKA, Sean Bartholomew SIMMONS, Christopher Harris SNOW, Steven Michael HANOV, Ghasem NADDAFZADEH SHIRAZI
-
Publication number: 20140067816Abstract: In an effort to enhance computer user engagement with a search results page, systems and methods are presented which are configured to identify an entity as being the subject matter of a user's search query. If the entity is a known entity, i.e., entity information is stored in an entity store for the identified entity, a subset of entity attributes are identified and a representative entity attribute question is obtained for each of the attributes in the subset of entity attributes. The representative entity attribute questions are identified according to the probability that they are formed linguistically correct. The representative entity attribute questions are included in a search results page that is generated in response to the user's search query.Type: ApplicationFiled: August 29, 2012Publication date: March 6, 2014Applicant: MICROSOFT CORPORATIONInventors: Tapas Kanungo, Ashok Ponnuswami
-
Publication number: 20140058992Abstract: Techniques are described to characterize motion patterns of a group of agents engaging in an activity. An analysis system receives input data associated with spatial and temporal information of at least one element of interest associated with the activity, where the object of interest may be a ball, person, animal or any other object in motion. The analysis system partitions the input data into a plurality of spatiotemporal segments and generates one or more representations of one or more sets of segments of the plurality of spatiotemporal segments based on one or more criteria. The analysis system computes a metric, such as an entropy value, for each of the one or more representations. Partial tracing data, such as ball movements in a sporting event, may be created using an inexpensive input device, such as a tablet computer, making the disclosed techniques available for a wide range of events and activities.Type: ApplicationFiled: August 21, 2012Publication date: February 27, 2014Inventors: Patrick Lucey, Alina Bialkowski, Iain Matthews, G. Peter Carr, Eric Foote
-
Publication number: 20140052730Abstract: Embodiments of the present invention provide a system, method, and program product for managing data sets. According to one aspect of the present invention, a data group of one or more related data sets is reorganized. Utilizing one or more specified criteria, data sets that should be cataloged in the data group are identified and cataloged in the data group such that they are arranged in a chronological order and are named with appropriate generation numbers.Type: ApplicationFiled: August 14, 2012Publication date: February 20, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Eric J. Harris, Franklin E. McCune, Miguel A. Perez, Ryan J. Wisniewski
-
Publication number: 20140047384Abstract: Systems, methods, computer-readable media, and graphical user interfaces for facilitating integrated data capture with an item group key are provided. Integrated data capture workflows are initiated from within an electronic medical record (EMR). Selections of groups of items from the EMR are received. Item group keys are assigned to at least one item for the groups of items. Available data associated with the item group keys is gathered from the EMR. Selections of available data to include in case report forms are received. The case report forms are populated with the selections of available data and the item group keys.Type: ApplicationFiled: August 8, 2012Publication date: February 13, 2014Applicant: CERNER INNOVATION, INC.Inventors: JON FEWINS, RYAN MOOG, MARSHA LAIRD-MADDOX, TODD JEFFREY REYNOLDS, BRADY TIMMERBERG, NITISH AMRAJI
-
Publication number: 20140046895Abstract: Data for a plurality of entities that can be offered a plurality of products can be obtained. The data can include categorical data and numeric data. Based on business constraints, some of all of the data can be selected. The selected data can be converted to another set of numeric data, wherein the categorical values are converted to numeric values. Dimensions of the converted data can be reduced to generate another set of data. Based on this another set of data, clusters of entities can be formed. The products can be grouped by assigning a unique product identifier of each product to a corresponding cluster. This grouping of products can be used by a predictive model to predict a likelihood of an entity to purchase a particular product in a future time period. Related methods, apparatus, systems, techniques and articles are also described.Type: ApplicationFiled: August 10, 2012Publication date: February 13, 2014Inventors: Amit Sowani, Eeshan Malhotra, Shafi Ur Rahman
-
Publication number: 20140046947Abstract: A method for question/answer creation for a document is described. The method includes importing a document having a set of questions based on content in the document. The method also includes automatically creating a candidate question from the content in the document. The method also includes automatically generating answers for the set of questions and the candidate question using the content in the document. The method also includes presenting the set of questions, the candidate question, and the answers to a content creator for user verification of accuracy. The method also includes storing a verified set of questions in the document. The verified set of questions includes the candidate question.Type: ApplicationFiled: August 9, 2012Publication date: February 13, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jana H. Jenkins, David C. Steinmetz, Wlodek W. Zadrozny
-
Publication number: 20140040270Abstract: Method, apparatus, and computer-readable medium are provided for analyzing a document including text. In one example, a method for identifying patterns in a document is described. The method includes identifying a plurality of candidate phrases in the document based on candidate identification criteria, grouping the candidate phrases of the plurality of candidate phrases with a phrase family based on family criteria and comparison between candidate phrases of the plurality of candidate phrases to obtain consistent phrases, and, for remaining phrases not meeting all of the candidate identification criteria, associating at least one of the remaining phrases with a phrase family based on inconsistent phrase criteria to obtain inconsistent phrases. Identified in this manner, the inconsistent phrase may be displayed via a user interface to permit a user the opportunity to determine whether an inconsistent phrase requires modification.Type: ApplicationFiled: July 31, 2012Publication date: February 6, 2014Applicant: Freedom Solutions Group, LLC, d/b/a MicrosystemsInventors: Thomas O'Sullivan, Andrzej Jachowicz
-
Publication number: 20140040233Abstract: Methods, systems, and computer-readable and executable instructions are provided for organizing content. A method for organizing content can include building a customized content corpus for a user, building a concept graph customized for the user's context based on the customized corpus, and organizing, utilizing multi-view clustering, the content within the corpus based on the concept graph.Type: ApplicationFiled: July 31, 2012Publication date: February 6, 2014Inventors: Mehmet Kivanc Ozonat, Claudio Bartolini
-
Publication number: 20140040263Abstract: The disclosure generally describes computer-implemented methods, software, and systems for search-, context-, and rule-based creation and runtime adaptation in dynamic workspaces. One computer-implemented method includes identifying a data artifact associated with each search result of at least one received search result, associating each identified data artifact with a module category of a plurality of module categories, injecting the identified artifacts into a content gallery, categorize, by operation of at least one computer, the injected identified artifacts within the content gallery, presenting at least a subset of the injected identified artifacts on an enterprise workspace page associated with an enterprise workspace, and constructing a context associated with at least one of the enterprise workspace or the enterprise workspace page.Type: ApplicationFiled: August 6, 2012Publication date: February 6, 2014Applicant: SAP Portals Israel Ltd.Inventors: Yahali Sherman, Vitaly Vainer
-
Patent number: 8635223Abstract: A system and method for providing a classification suggestion for electronically stored information is provided. A corpus of electronically stored information including reference electronically stored information items each associated with a classification and uncoded electronically stored information items are maintained. A cluster of uncoded electronically stored information items and reference electronically stored information items is provided. A neighborhood of reference electronically stored information items in the cluster is determined for at least one of the uncoded electronically stored information items. A classification of the neighborhood is determined using a classifier. The classification of the neighborhood is suggested as a classification for the at least one uncoded electronically stored information item.Type: GrantFiled: July 9, 2010Date of Patent: January 21, 2014Assignee: FTI Consulting, Inc.Inventor: William C. Knight
-
Publication number: 20140019451Abstract: A technique can include identifying a collection of documents to be clustered. The collection of documents can include foreign language documents and base language documents. The foreign language documents can be translated into the base language at a base language translation module. Keywords in the base language documents and keywords in the translated foreign language documents can be determined at a document indexing module. The base language documents can be clustered with the foreign language documents in a common set of document clusters based on the determined keywords in the base language documents and the determined keywords in the translated foreign language documents. In response to a search query in a first language, a listing of search results can be provided that includes documents in the first language and another language from the a common document cluster.Type: ApplicationFiled: July 16, 2012Publication date: January 16, 2014Applicant: GOOGLE INC.Inventor: Kirill Buryak
-
Publication number: 20140012848Abstract: Systems and methods for measuring similarity between a set of clusters and a set of object labels, wherein at least two of the object labels are related, receive a first set of clusters, wherein the first set of clusters was formed by clustering objects in a set of objects into clusters of the first set of clusters according to a clustering procedure; and calculate a similarity index between the first set of clusters and a set of object labels based at least in part on a relationship between two or more object labels in the set of object labelsType: ApplicationFiled: July 5, 2012Publication date: January 9, 2014Applicant: CANON KABUSHIKI KAISHAInventors: Bradley Scott Denney, Dariusz T. Dusberger
-
Publication number: 20140006408Abstract: Example methods, apparatuses, or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to facilitate or otherwise support one or more processes or operations for identifying points of interest in a text, such as in an unstructured text, for example, in connection with bootstrapping points of interest via social media.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Applicant: Yahoo! Inc.Inventors: Adam Rae, Vanessa Murdock, Hugues Bouchard, Adrian Popescu
-
Publication number: 20140006401Abstract: Various technologies described herein pertain to classifying data in a main memory database system. A record access log can include a sequence of record access observations logged over a time period from a beginning time to an end time. Each of the record access observations can include a respective record ID and read timestamp. The record access log can be scanned in reverse from the end time towards the beginning time. Further, access frequency estimate data for records corresponding to record IDs read from the record access log can be calculated. The access frequency estimate data can include respective upper bounds and respective lower bounds of access frequency estimates for each of the records. Moreover, the records can be classified based on the respective upper bounds and the respective lower bounds of the access frequency estimates, such that K records can be classified as being frequently accessed records.Type: ApplicationFiled: June 30, 2012Publication date: January 2, 2014Applicant: MICROSOFT CORPORATIONInventors: Justin Jon Levandoski, Per-Ake Larson
-
Publication number: 20130339354Abstract: A method and system for mining trends around trending terms. The method includes determining a plurality of articles, from one or more websites, in relation to a first entity for a time period. The first entity is a trending term. The method also includes generating comment clusters for the plurality of articles. Each comment cluster is generated for associated article and includes plurality of user comments. The method further includes extracting one or more entities from plurality of user comments for each of the comment clusters, the one or more entities related to the first entity. Further, the method includes enabling selection of a second entity, from the one or more entities, by the user. Moreover, the method includes rendering one or more user comments corresponding to the first entity and the second entity for the time period. The system includes an electronic device, communication interface, memory, and processor.Type: ApplicationFiled: June 14, 2012Publication date: December 19, 2013Applicant: YAHOO! INC.Inventors: Vidit JAIN, Nikhil RASIWASIA
-
Publication number: 20130326346Abstract: The embodiments provide a cloud brainstorming service implemented on at least one cloud server. The brainstorming service includes a message service component configured to receive a plurality of ideas, over a network, from one or more users of devices. The users represent members of a brainstorming session. The brainstorming service also includes a brainstorming logic component configured to process the plurality of ideas and store the plurality of processed ideas in an in-memory database system, and a clustering component configured to retrieve the plurality of processed ideas from the in-memory database system and arrange the plurality of processed ideas into one or more clusters, where each cluster is a group of similar ideas. The message service component is configured to provide the plurality of processed ideas that are arranged into the one or more clusters, over the network, to the one or more users for display.Type: ApplicationFiled: August 17, 2012Publication date: December 5, 2013Applicant: SAP AGInventors: Zheren Zhu, Yongyuan Shen, Fu Zhao, Yingyu Chen, Bin Dong, Zheng Long Wei, Hui Wang
-
Publication number: 20130325862Abstract: Systems and methods are provided for large-scale, incrementing clustering. A plurality of processing nodes each include a processor and a non-transitory computer readable medium. The non-transitory computer readable medium stores a plurality of clusters of feature vectors and machine executable instructions for determining a plurality of values for a distance metric relating each of the plurality of clusters to an input feature vector and selecting a cluster having a best value for the distance metric. An arbitrator is configured to receive the selected cluster and best value for the distance metric from each of the plurality of processing nodes and determine a winning cluster as one of the selected clusters and a new cluster. A multiplexer is configured to receive the winning cluster and provide the winning cluster and a new input feature vector to each of the plurality of processing nodes.Type: ApplicationFiled: June 4, 2012Publication date: December 5, 2013Inventor: MICHAEL D. BLACK
-
Publication number: 20130325861Abstract: Embodiments of the invention relate to a modeling activity area associated with groups of data items. Tools are provided to profile activity area involvement, both from the data item and from associated participants. The data items are placed into clusters and one or more activity areas are derived from the formed clusters. Each activity area is defined from the perspective of a single user. Participants in an activity area are connected to a user, but not necessarily to each other. The combination of formations of clusters and activity areas provides a multi-facetted organization of connections between data items and associated participants.Type: ApplicationFiled: May 31, 2012Publication date: December 5, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Hongxia Jin
-
Publication number: 20130325849Abstract: Techniques for annotating an entity in a document corpus using cross-document signals. A method includes determining which documents in a document corpus mention an entity of interest, clustering the documents that mention an entity of interest according to a temporal signal, a structural signal and/or a content signal, thereby forming at least one cluster of documents, and annotating at least one document in the at least one cluster of documents by marking each occurrence of the entity in the at least one document.Type: ApplicationFiled: August 16, 2012Publication date: December 5, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sushovan De, Amit K. Singh, Karthik Visweswariah
-
Publication number: 20130318088Abstract: According to one embodiment of the present invention, classification of objects in a directory service may be managed. An object is identified in a directory service. Classification information associated with the object is received from a reference database. Using a processor, a rule that specifies a value that corresponds to the classification information is accessed. The accessed value is based on a power of two classification model. Using the processor, the class of service attribute is created using the value. The class of service attribute is associated with the object listed in the directory service using the processor.Type: ApplicationFiled: May 22, 2012Publication date: November 28, 2013Applicant: Bank of America CorporationInventor: Michael Edward Futty
-
Publication number: 20130311473Abstract: A method for dynamically clustering data items, the method comprising: receiving a plurality of data items originating from at least two sources, a plurality of distinct metadata details, and data indicative of associations between the data items and the metadata details, wherein each data item is associated with at least one metadata detail indicative of its owner, and wherein at least a first data item originating from a first source and a second data item originating from a second source are related data items associated with at least one shared metadata detail; grading probabilities of relationships between at least one of the data items and at least one of the metadata details; clustering the data items into one or more clusters, based on the calculated probabilities; and, optionally, sharing clusters and meta-clusters between users.Type: ApplicationFiled: May 21, 2012Publication date: November 21, 2013Applicant: SPHEREUP LTD.Inventors: Yevgeny Safovich, Ronen Abramov, Natan Chosnek