Clustering Or Classification (epo) Patents (Class 707/E17.046)
  • Publication number: 20120254175
    Abstract: According to one aspect, provided is a horizontally scaled database architecture. Partition a database enables efficient distribution of data across a number of systems reducing processing costs associated with multiple machines. According to some aspects, the partitioned database can be manages as a single source interface to handle client requests. Further, it is realized that by identifying and testing key properties, horizontal scaling architectures can be implemented and operated with minimal overhead. In one embodiment, databases can be partitioned in an order preserving manner such that the overhead associated with moving the data for a given partition can be minimized during management of the data and/or database. In one embodiment, splits and migrations operations prioritize zero cost partitions, thereby, reducing computational burden associated with managing a partitioned database.
    Type: Application
    Filed: April 1, 2011
    Publication date: October 4, 2012
    Inventors: Eliot Horowitz, Dwight Merriman
  • Publication number: 20120254182
    Abstract: A resource group attribute is assigned to a storage resource object representing at least one of the plurality of storage resources in a system configuration of the computing storage environment. The resource group attribute includes a selectable value indicating a resource group object to which the storage resource object is associated. A resource group label is provided in the resource group object and is a string having no wildcards. A user resource scope is assigned to a user ID and a value of the user resource scope provides a mechanism to match to the resource group label. The user ID is authorized to perform one of creating, deleting, modifying, controlling, and managing storage resources with an association to a resource group.
    Type: Application
    Filed: June 11, 2012
    Publication date: October 4, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Richard A. RIPBERGER
  • Publication number: 20120254174
    Abstract: According to one embodiment, a file system (FS) of a storage system is partitioned into a plurality of FS partitions, where each FS partition stores segments of data files. In response to a request for writing a file to the storage system, the file is stored in a first of the FS partitions that is selected based on a time attribute of the file, such that files having similar time attributes are stored in an identical FS partition.
    Type: Application
    Filed: March 31, 2011
    Publication date: October 4, 2012
    Applicant: EMC CORPORATION
    Inventors: Soumyadeb Mitra, Windsor W. Hsu
  • Publication number: 20120254184
    Abstract: Method of analyzing data from an online social network having a set of network users. The method includes obtaining topic-specific messages sent by the network users. The messages have social content provided by the networks users, wherein the messages include a marker of interest in the social content. The marker of interest is associated with a topic of the social network. The method also includes identifying the network users that sent the messages having the marker of interest in the social content as interested users of the topic. The interested users are a subset of the set of network users. The method also includes determining a topic-specific influence (TSI) value of a designated user from the interested users in the subset. The TSI value of the designated user is based on a number of the interested users that are registered to receive the messages from the designated user.
    Type: Application
    Filed: April 3, 2012
    Publication date: October 4, 2012
    Applicant: NORTHWESTERN UNIVERSITY
    Inventors: Alok Choudhary, Ramanathan Narayanan
  • Publication number: 20120254181
    Abstract: A method is disclosed, for recognizing whether some electronic data is the digital representation of a piece of text and, if so, in which character encoding it has been encoded. A fingerprint is constructed from the data, wherein the fingerprint comprises, for each of a plurality of predetermined character encoding schemes, at least one confidence value, representing a confidence that the data was encoded using said character encoding scheme. The fingerprint also comprises a frequency value for each of a subset of byte values, each frequency value representing the frequency of occurrence of a respective byte value in the data. A statistical classification of the data is then performed based on the fingerprint.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 4, 2012
    Applicant: CLEARSWIFT LIMITED
    Inventors: Kevin Schofield, Istvan Biro
  • Publication number: 20120254183
    Abstract: Systems and methods for clustering a group of data points based on a measure of similarity between each pair of data points in the group are provided. A pairwise similarity function can be estimated for each pair of data points in the group. A clustering algorithm can be executed to create clusters and associate data points with the clusters using the pairwise similarity function. The algorithm can be iterated multiple times until a stopping condition is reached in order to reduce variance in the output of the algorithm. The pairwise similarity function for each pair of data points can be updated between iterations of the algorithm and the results of each iteration can be aggregated. The data in each data point associated with a cluster can be consolidated into a consolidated data point.
    Type: Application
    Filed: November 10, 2009
    Publication date: October 4, 2012
    Applicant: GOOGLE INC.
    Inventors: Nir Ailon, Edo Liberty, Harishabd Khalsa
  • Publication number: 20120254132
    Abstract: A method and an apparatus for organizing information in an electronic address book. The method comprises collecting contact information for an electronic address book, comparing a name from any field in said contact information to a database comprising name information, identifying a first name or a surname from the contact information and relocating in the contact information the identified first name to a field assigned to first names or the surname to a field assigned to surnames as a response to a name identified in a wrong field.
    Type: Application
    Filed: March 27, 2012
    Publication date: October 4, 2012
    Inventors: Kimmo Kivirauma, Rami Lehtonen
  • Publication number: 20120244835
    Abstract: A computerized system, method and process allows telecommunications carriers to find, evaluate and select locations for equipment through direct access to end users, while providing citizens the opportunity to offer the use of their dwelling or other assets to carriers. The system and method further provides a computerized mechanism for (a) creating an inventory and marketplace for available properties for use in telecommunications networks, (b) providing quality and/or performance monitoring and control for wireless communication systems based on data in the clearinghouse, and (c) providing localized content over wireless networks using the clearinghouse.
    Type: Application
    Filed: June 6, 2012
    Publication date: September 27, 2012
    Inventor: Theodore S. Rappaport
  • Publication number: 20120246161
    Abstract: According to one embodiment, profile information of new user and items to be selected are inputted. Each item has an attribute value of a plurality of attributes. Profile information and preference information of a plurality of users are acquired. The preference information represents whether each user has selected each item. The plurality of users is classified into a plurality of clusters by the profile information and the preference information of the plurality of users. A parameter of each attribute of each cluster is calculated by the preference information of each cluster. A similar cluster to classify the new user is estimated from the plurality of clusters by the profile information of the new user. A preference degree of each item is calculated by the parameter of each attribute of the similar cluster and the attribute value of each item. An item to be recommended is decided by the preference degree.
    Type: Application
    Filed: March 7, 2012
    Publication date: September 27, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Tomoko Murakami
  • Publication number: 20120246165
    Abstract: Disclosed is a system and method for presenting content in response to receiving a portion of a search query. A computing device receives, over a network from a user computer, a portion of a search query submitted by a user in a search query entry area. The computing device receives, from a search suggestion module, one or more search suggestions related to the portion of the query. The computing device transmits, to the user computer, the one or more search suggestions for display in a search suggestion region, the search suggestion region displayed differently than a search results area. The computing device transmits a search suggestion of the one or more search suggestions to a rich content module. The rich content module generates rich content related to the transmitted search suggestion. The computing device transmits, to the user computer, the rich content for display in the search suggestion region.
    Type: Application
    Filed: March 22, 2011
    Publication date: September 27, 2012
    Applicant: Yahoo! Inc.
    Inventors: Ethan Batraski, Vivian Lin Dufour, Aarti Parmar, Shenhong Zhu, Olivia Franklin
  • Publication number: 20120239659
    Abstract: A library classification scheme, based on customer preferences, is provided. The scheme is based on user-friendly categories, sub-categories and subjects, along with a unique customer-friendly code. The classification scheme is used in conjunction with a database system that incorporates a database of information comprising bibliographic information of items of literature, unique codes, and codes associated with a second classification scheme. In addition, there are a plurality of identifiers to assist customers in locating a given item of literature based on the library classification scheme.
    Type: Application
    Filed: November 19, 2010
    Publication date: September 20, 2012
    Applicant: MARKHAM PUBLIC LIBRARY
    Inventors: Moe Hosseini-Ara, Hilary Huffman, Suraj Sharma, Penny Barclay, Amy Dolmer, Amy Caughlin
  • Publication number: 20120237090
    Abstract: A photograph classification unit classifies the photographs for each scene in a music-feature information determination unit. A photograph feature acquisition unit identifies a feature of the photograph based on additional information of the photograph and a result of face recognition of the image. A tempo determination unit determines a tempo of music based on a time zone of image capturing, a range of the number of people captured in a photograph, the degree of smile, activities, etc. A melody determination unit determines information that confines a title, a feature value, a genre, etc., based on an event, a time zone, and a season at the time of image capturing, a city and a country in which the image capturing has occurred, etc. Based on the determined feature of the music, a music data output unit extracts and then presents matching music.
    Type: Application
    Filed: September 6, 2010
    Publication date: September 20, 2012
    Applicant: Sony Computer Entertainment Inc.
    Inventors: Shoichi Ikenoue, Takayuki Sakamoto
  • Publication number: 20120233168
    Abstract: A sound segment sorting unit (103) sorts the sound segments of a video. An image segment sorting unit (104) sorts the image segments of the video. A multiple sorting result generation unit (105) generates a plurality of sound segment sorting results and/or a plurality of image segment sorting results. A sorting result pair generation unit (106) generates a plurality of sorting result pairs of the sorting results as the candidates of the optimum segment sorting result of the video. A sorting result output unit (108) compares the sorting result comparative scores of the sorting result pairs calculated by a sorting result comparative score calculation unit (107) and thus outputs a sound segment sorting result and an image segment sorting result having good correspondence. This allows to accurately sort, for each object, a plurality of sound segments and a plurality of image segments contained in the video without adjusting parameters in advance.
    Type: Application
    Filed: November 5, 2010
    Publication date: September 13, 2012
    Applicant: NEC CORPORATION
    Inventors: Makoto Terao, Takafumi Koshinaka
  • Publication number: 20120233171
    Abstract: The disclosure relates to a system and method for managing data from a number of systems. The method comprises: defining a set of objects for the data; defining a set of classes for the data; maintaining a catalog for each instance of the data; in the catalog identifying each instance's source system and its level of harmonization with other data; applying a set of harmonization rules to identify from the data a group of related data and an owner of the group; identifying differences in instantiations within the group; and initiating update requests to affected systems having the identified differences.
    Type: Application
    Filed: October 3, 2011
    Publication date: September 13, 2012
    Inventor: Philippe RICHARD
  • Publication number: 20120233164
    Abstract: The present identifies collections of digital music and sound that effectively elicit particular emotional responses as a function of analytical features from the audio signal and information concerning the background and preferences of the subject. The invention can change emotional classifications along with variations in the audio signal over time. Interacting with a listener, the invention locates music with desired emotional characteristics from a central repository, assembles these into an effective and engaging “playlist” (sequence of songs), and plays the music files in the calculated order to the listener.
    Type: Application
    Filed: September 8, 2009
    Publication date: September 13, 2012
    Applicant: Sourcetone, LLC
    Inventors: Robert Rowe, Jeff Berger, Juan Bello
  • Publication number: 20120232788
    Abstract: A method of operation of a navigation system includes: extracting navigation-related web documents having a point of interest; generating formatting sequences of the navigation-related web documents; selecting a user-defined percentile representing reciprocal fraction of an expected number of clusters; calculating a threshold value for a first cluster with the threshold value to be equal to the user-defined percentile of a first normalized distribution of sample comparison values between the first cluster and formatting sequence samples from the formatting sequences, the first cluster is from the clusters; computing an associated comparison value between a first formatting sequence from the formatting sequences and the first cluster; grouping the first formatting sequence with the first cluster when the associated comparison value exceeds the threshold value for the first cluster; and generating a travel route for the point of interest related to the first cluster for displaying on a device.
    Type: Application
    Filed: March 9, 2011
    Publication date: September 13, 2012
    Applicant: TELENAV, INC.
    Inventor: Qian Diao
  • Publication number: 20120226695
    Abstract: A system for classifying documents in a collection of documents according to their intended readerships includes: a computer configured to select a document in the collection of documents; and a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. A computer classifies the selected document as misleading, commercial, or personal according to its determined characteristic; and a computer repeats the steps of select document, determines a characteristic of the selected document, and classifies the selected document for additional documents in the collection. At least some documents are classified as misleading, some as commercial, and at least some as personal.
    Type: Application
    Filed: May 16, 2012
    Publication date: September 6, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ying Chen, Bin He, W. Scott Spangler
  • Publication number: 20120226692
    Abstract: A system and method for matching and assembling records is provided. One embodiment of the invention assembles records by applying a method for grouping records based on matching fields, assembling a new record as a composite of the matched records, and then repeating the grouping, matching and assembly steps in a cascade where the matching, grouping and assembly steps are modified as a function of the cascade step and the assembled records created in earlier steps. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules that allow a reader to quickly ascertain the subject matter of the disclosure contained herein. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Application
    Filed: March 20, 2012
    Publication date: September 6, 2012
    Applicant: Parity Computing, Inc.
    Inventors: Zunaid H. Kazi, Christopher D. Rosin, Ramamohan Paturi, Holden P. Robbins, Mark W. S. Land
  • Publication number: 20120226706
    Abstract: A system and method for sorting music files based on moods are provided. The system includes a portable terminal for sorting stored music files based on moods, for changing mood information of a corresponding music file in response to a request to change mood of at least one of the music files, and for transmitting the mood change information to a mood analyzing server, and a mood analyzing server for receiving the mood change information, for updating mood judgment information that is standard information for determining mood of a music file based on the received mood change information, and for transmitting the updated mood judgment information to the portable terminal.
    Type: Application
    Filed: February 28, 2012
    Publication date: September 6, 2012
    Applicant: SAMSUNG ELECTRONICS CO. LTD.
    Inventors: In Yong CHOI, Joon Ho WON, Chul Min CHOI, Nam Il LEE, Sang Hoon OH
  • Publication number: 20120226691
    Abstract: A data interpretation and separation system for identifying data elements within a data set that have common features, and separating those data elements from other data elements not sharing such common features. Commonalities relative to methods and/or rates of change within a data set may be used to determine which elements share common features. Determining the commonalities may be performed autonomously by referencing data elements within the data set, and need not be matched against algorithmic or predetermined definitions. Interpreted and separated data may be used to reconstruct an output that includes only separated data. Such reconstruction may be non-destructive. Interpreted and separated data may also be used to retroactively build on existing element sets associated with a particular source.
    Type: Application
    Filed: March 3, 2012
    Publication date: September 6, 2012
    Inventor: Tyson LaVar Edwards
  • Publication number: 20120221566
    Abstract: Content items and other entities may be ranked or organized according to a relevance to a user. Relevance may take into consideration recency, proximity, popularity, air time (e.g., of television shows) and the like. In one example, the popularity and age of a movie may be used to determine a relevance ranking Popularity (i.e., entity rank) may be determined based on a variety of factors. In the movie example, popularity may be based on gross earnings, awards, nominations, votes and the like. According to one or more embodiments, entities may initially be categorized into relevance groupings based on popularity and/or other factors. Once categorized, the entities may be sorted within each grouping and later combined into a single ranked list.
    Type: Application
    Filed: May 4, 2012
    Publication date: August 30, 2012
    Applicant: COMCAST INTERACTIVE MEDIA, LLC
    Inventors: Ken Iwasa, Seth Michael Murray, Goldee Udani
  • Publication number: 20120221574
    Abstract: A pivot is determined from enrolled data by a pivot determination unit, raw data is acquired, features are extracted from the raw data, a score is calculated as one of a distance and a degree of similarity between the features, an index vector is generated by using the score for the pivot, a ? score is calculated as one of a distance and a degree of similarity between the index vectors, a parameter of each non-pivot including a regression coefficient is trained by using training data, order to select the non-pivots is, by using the ? score between search data and the non-pivot as well as the regression coefficient, determined in descending order of posterior probability through logistic regression, and a search result is outputted based on the score between the search data and the enrolled data.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 30, 2012
    Applicant: HITACHI, LTD.
    Inventors: Takao Murakami, Kenta Takahashi
  • Publication number: 20120221498
    Abstract: Disclosed are methods for making disparate entertainment media content (e.g., television or movies) from multiple sources available through a single interface of a user device. Content of varying data formats from multiple data sources are aggregated. Classifications of the media data are created which can include assigning content into clusters. The data are normalized, and attributes of the data are curated. Features also are provided to automatically synchronize, obtain, and update media content on the media sources and on client devices. Various ways of handling data aggregation and normalization issues associated with compiling media data also are described.
    Type: Application
    Filed: February 16, 2012
    Publication date: August 30, 2012
    Applicant: SETJAM, INC.
    Inventors: Marcin Kaszynski, Ryszard Szopa, Grzegorz Kapkowski, Remigiusz Dymecki, Maciej Pasternacki, Eran Dror
  • Publication number: 20120221573
    Abstract: Methods and systems for improved unsupervised learning are described. The unsupervised learning can consist of biclustering a data set, e.g., by biclustering subsets of the entire data set. In an example, the biclustering does not include feeding know and proven results into the biclustering methodology or system. A hierarchical approach can be used that feeds proven clusters back into the biclustering methodology or system as the input. Data that does not cluster may be discarded. Thus, a very large unknown data set can be acted on to learn about the data. The system is also amenable to parallelization.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 30, 2012
    Applicant: The Curators of the University of Missouri
    Inventors: Donald Coolidge Wunsch, II, Rui Xu, Sejun Kim
  • Publication number: 20120215784
    Abstract: A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set.
    Type: Application
    Filed: April 12, 2012
    Publication date: August 23, 2012
    Inventors: Gary King, Daniel Hopkins, Ying Lu
  • Publication number: 20120216100
    Abstract: A method of operating a service registry and repository based on a triplestore comprises: receiving a request to aggregate a service document; shredding elements of the service document to create logical objects within the triplestore; for each logical object, searching for all policy attachments logical objects that have a relationship with the logical object; for each located policy attachment, retrieving details of the policy and building a list of policies and associated logical objects in the repository; and returning an indication of the list of polices and associated logical objects. The list of logical objects and associated polices is used to compile a service document containing details of policies that have relationships with the selected service document logical objects. Objects that have associated polices are rendered with a hypertext policy icon next to the object and selection of the hypertext policy icon opens a new window with the policy details.
    Type: Application
    Filed: May 1, 2012
    Publication date: August 23, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Evan G. Jardine-Skinner, James R. Orchard, Philip D. Rowley, Samuel J. Smith
  • Publication number: 20120215777
    Abstract: Systems and techniques for determining significance between entities are disclosed. The systems and techniques identify a first entity having an association with a second entity, apply a plurality of association criteria to the association, weight each of the criteria based on defined weight values, and compute a significance score for the first entity with respect to the second entity based on a sum of a plurality of weighted criteria values. The systems and techniques utilize information from disparate sources to create a uniquely powerful signal. The systems and techniques can be used to identify the significance of relationships (e.g., associations) among various entities including, but not limited to, organizations, people, products, industries, geographies, commodities, financial indicators, economic indicators, events, topics, subject codes, unique identifiers, social tags, industry terms, general term/s, metadata elements, classification codes, and combinations thereof.
    Type: Application
    Filed: May 13, 2011
    Publication date: August 23, 2012
    Inventors: Hassan H. Malik, Mans Olof-Ors
  • Publication number: 20120215772
    Abstract: Provided is a method for grouping identity records to generate candidate lists to use in an entity and relationship resolution process. A plurality of identity records provide attributes of entities. The received identity records are grouped into a group of identity records. A composite query on values for selected attributes of the identity records in the group is generated and applied to an entity database to obtain composite results of entity records in the entity database matching the attribute values of the composite query. For the identity records in the group, an individual query on attributes of one of the identity records is performed against the composite results of the entity records to determine a candidate list of entity records from the entity database for the identity record.
    Type: Application
    Filed: April 19, 2012
    Publication date: August 23, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bhavani K. ESHWAR, Rajeshwar KALAKUNTLA, Vaishnavi NORI, Nithinkrishna P. SHENOY
  • Publication number: 20120209849
    Abstract: A web-scale data processing system and method are provided herein. More particularly, a web-scale data processing system and method for crawling, storing, processing, encoding, and/or serving web-scale data are disclosed.
    Type: Application
    Filed: April 24, 2012
    Publication date: August 16, 2012
    Applicant: SEOmoz, Inc.
    Inventors: Benjamin Cappel HENDRICKSON, Nicholas Stefan GERNER
  • Publication number: 20120209851
    Abstract: An apparatus and a method manage a received mobile transaction coupon in a mobile terminal. The apparatus includes a communication unit, an information analyzer, a schedule manager, an output unit, and a controller. The communication unit receives a mobile transaction coupon. The information analyzer obtains the received mobile transaction coupon information. The schedule manager registers the obtained mobile transaction coupon information in an alarm program. The output unit outputs the registered mobile transaction coupon information on a relevant date via the alarm program. The controller controls to register the mobile transaction coupon information in the alarm program, and controls to store the received mobile transaction coupon in a storage area corresponding to a reception type or a folder for a widget function.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 16, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Byung-Kwon Kong, Soon-Mi Cho
  • Publication number: 20120203785
    Abstract: A computer system is used for tracking data, the computer system including a data collection system for identifying events of categories such as reception events, storage events, and association events. The system further includes an item module containing index data for an item. The system further includes an end user facility module for recording the reception events that are to be assigned to the item and the storage events that are to be assigned to the item. The system further includes a provider module for assigning the association events to a provider. The system further includes an end user module for recording the association events that are to be assigned to the end user and to the item. The system further includes a reporting system for generating reports. The item has a unique index value and the end user has a non-unique index value.
    Type: Application
    Filed: October 15, 2010
    Publication date: August 9, 2012
    Applicant: NANOMEDAPPS LLC
    Inventor: Mariam Awada
  • Publication number: 20120203741
    Abstract: Provided are techniques for selecting a first group of indexes to form a current generation of indexes, selecting indexes from the first group biased to indexes with higher fitness values from the current generation of indexes, forming sub-groups of indexes using the selected indexes, determining fitness values of each of the sub-groups based on the fitness value of each of the indexes, selecting a subset of the sub-groups; and placing the indexes in the selected sub-groups into a new generation of indexes.
    Type: Application
    Filed: April 17, 2012
    Publication date: August 9, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gaurav Mehrotra, Abhinay R. Nagpal, Sandeep R. Patil, Rulesh F. Rebello
  • Publication number: 20120203784
    Abstract: A plurality of catalogs are maintained, and wherein each catalog of the plurality of catalogs includes data sets and attributes of the data sets. An indication that a new data set is to be defined is received. A selected catalog is determined from the plurality of catalogs, wherein the selected catalog is suitable for including the new data set and attributes of the new data set. An entry that indicates a data set name corresponding to the new data set and an index to the selected catalog is inserted in a group table.
    Type: Application
    Filed: April 17, 2012
    Publication date: August 9, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Douglas Lee Lehr, Franklin Emmert McCune, David Charles Reed, Max Douglas Smith
  • Publication number: 20120203782
    Abstract: Method, system, and programs for heterogeneous data management. Information from multiple data sources is first obtained. Data/metadata from each of the data sources is modeled based on the source and/or granularity information of the data/metadata to generate data/metadata models. The data/metadata from multiple data sources are integrated, by applying one or more processes to the data/metadata from different data sources based on the data/metadata models, to generate integrated data/metadata. A provenance representation for the integrated data/metadata is created tracing sources, granularities, and/or processes applied and archived for enabling an query associated with the integrated data/metadata.
    Type: Application
    Filed: February 7, 2011
    Publication date: August 9, 2012
    Applicant: YAHOO! INC.
    Inventors: Chris Olston, Anish Das Sarma
  • Publication number: 20120197890
    Abstract: A computer-implemented system and method for extracting Human Generated Lists from an electronic database is described. The system searches for objects of the same class within a context window to identify Human Generated Lists and stores them to an archive, The archive may be used to generate a relationship network. The system generates variable length data vectors to represent the relationships between the objects within each Human Generated List. This relationship network can then be queried to discover relationships between the objects in the Human Generated Lists and to provide related objects as recommendations.
    Type: Application
    Filed: December 21, 2011
    Publication date: August 2, 2012
    Applicant: Intertrust Technologies Corp.
    Inventors: Kasian Franks, Mike Muldoon, Raf Podowski
  • Publication number: 20120197891
    Abstract: Genre discovery engines are presented. A genre discovery engine can compare clusters of products falling within known genres to other clusters. Known genres can be defined in turns of correlated product properties. When a new cluster is identified falling outside the boundaries of known genres, the discovery engine can recommend that the new cluster might be a new genre.
    Type: Application
    Filed: January 24, 2012
    Publication date: August 2, 2012
    Applicant: ELECTRONIC ENTERTAINMENT DESIGN AND RESEARCH
    Inventors: Gregory T. Short, Geoffrey C. Zatkin, Theodore Spence
  • Publication number: 20120197897
    Abstract: A method for defining a collection of digital media content for playback using a digital media player where (a) the collection is defined using specific criteria; and (b) the collection is not static but can alter or grow even after being made available to the digital media player; and (c) the said digital media files form a subset of a catalogue of digital media files available for the digital media player to play.
    Type: Application
    Filed: May 11, 2010
    Publication date: August 2, 2012
    Applicant: Omnifone Ltd.
    Inventors: Mark Knight, Philip Sant, Christopher Evans, Matthew White, Roy Stead
  • Publication number: 20120197849
    Abstract: A method, system and computer program product for retrieving information from a relational database using user defined facets in a faceted query may include receiving a faceted query and receiving at least one user defined facet group query. The method may also include filtering out facets in the faceted query that relate to metadata in the relational database. The method may additionally include associating each remaining facet in the faceted query with a corresponding user defined facet group query of the at least one user defined facet group query to provide a set of user defined facet groups. An SQL query may be generated for the faceted query using the set of user defined facet groups Information from the relational database may be retrieved responsive to the SQL query.
    Type: Application
    Filed: January 31, 2011
    Publication date: August 2, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Deepak M. Srinivasa, Adarsh Ramamurthy, Samanvitha Kumar
  • Publication number: 20120197894
    Abstract: Disclosed is an apparatus and method for processing documents to extract expressions and descriptions. The apparatus for processing documents includes a document collection unit, which collects documents from websites and divides each of the collected documents into a script portion and a description portion to thus generate a script document and a description document, and an expression extraction unit, which extracts expression description sentences on the basis of the description document, and extracts expressions described by the expression description sentences from the script document. According to the invention, study material, including a pair that comprises an expression to be studied and a description thereof, can be automatically constructed.
    Type: Application
    Filed: October 11, 2010
    Publication date: August 2, 2012
    Applicant: POSTECH ACADEMY - INDUSTRY FOUNDATION
    Inventors: Hyung Jong Noh, Jong Hoon Lee, Sung Jin Lee, Gary Geunbae Lee
  • Publication number: 20120191713
    Abstract: A process for evaluating cross-domain clusterability upon a target domain and a source domain. The cross-domain clusterability is calculated as a linear combination of a target clusterability and a source-target pair matchability, by use of a trade-off parameter that determines relative contribution of the target clusterability and the source-target pair matchability. The target clusterability quantifies how clusterable the target domain is. The source- target pair matchability is calculated as an average of a target-side matchability and a source-side matchability, which quantifies how well target centroids of the target domain are aligned with the source centroids and how well source centroids of the source domain are aligned with the target centroids, respectively.
    Type: Application
    Filed: April 2, 2012
    Publication date: July 26, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: JEFFREY M. ACHTERMANN, INDRAJIT BHATTACHARYA, KEVIN W. ENGLISH, JR., SHANTANU R. GODBOLE, SACHINDRA JOSHI, ASHWIN SRINIVASAN, ASHISH VERMA
  • Publication number: 20120191678
    Abstract: In an embodiment, a method comprises dividing collected data into data clusters based on proximity of the data and adjusting the clusters based on density of data in individual clusters. Based on first data points in a first cluster, a first average point in the first cluster is determined. Based on second data points in a second cluster, a second average point in the second cluster is determined. Aggregate data, comprising the first average point and the second average point, are stored in storage. Upon receiving a request to provide data for a particular coordinate, the reconstructed data point is determined by interpolating between the first average point and the second average point at the particular coordinate. Accordingly, aggregated data may be stored and when a request specifies data that was not actually stored, a reconstructed data point with an approximated data value may be provided as a substitute.
    Type: Application
    Filed: January 21, 2011
    Publication date: July 26, 2012
    Inventors: Ying Liu, Shahrokh Sadjadi
  • Publication number: 20120185446
    Abstract: In one example embodiment, a method is illustrated as including retrieving item data from a plurality of listings, the item data filtered from noise data, constructing at least one base cluster having at least one document with common item data stored in a suffix ordering, compacting the at least one base cluster to create a compacted cluster representation having a reduced duplicate suffix ordering amongst the clusters, and merging the compacted cluster representation to generate a merged cluster, the merging based upon a first overlap value applied to the at least one document with common item data.
    Type: Application
    Filed: February 3, 2012
    Publication date: July 19, 2012
    Inventors: Neelakantan Sundaresan, Kavita Ganesan, Roopnath Grandhi
  • Publication number: 20120185478
    Abstract: A method, apparatus and article of manufacture for extracting and normalizing organization names from text. The method uses regular expressions, certain rules and dictionaries to identify potential organization names in text, then uses word similarity metrics, clustering, and other considerations to group normalized organization names.
    Type: Application
    Filed: January 17, 2011
    Publication date: July 19, 2012
    Inventors: Philip S. Topham, Siddhartha Jonnalagadda
  • Publication number: 20120185466
    Abstract: According to one embodiment, a relevancy presentation apparatus includes a storage, an extraction unit, a first expansion unit, a second expansion unit, a determination unit and a generation unit. The storage stores topic networks. The extraction unit extracts subject keywords. The first expansion unit acquires first relevant words from the topic networks. The second expansion unit searches an ontologies for the subject keywords. The determination unit extracts common relevant words, and determines whether frequencies of appearances of relevant words are stationary. The generation unit generates search queries based on whether the frequencies of appearances are stationary, and generates search results.
    Type: Application
    Filed: January 25, 2012
    Publication date: July 19, 2012
    Inventors: Tomohiro YAMASAKI, Masaru SUZUKI
  • Publication number: 20120185479
    Abstract: A system is configured to organize content. The system may constitute a decision tool that provides a user with a decision template that enables the user to create a decision record that organizes aspects of a decision that the user considered, the user's reasoning with respect to these aspects in arriving at an ultimate outcome, and/or other information related to the decision. The decision template may include one or more fields into which the user may enter content manually, or the user may search for, and import content related to the decision into the template from, one or more content sources that include relevant content.
    Type: Application
    Filed: January 18, 2011
    Publication date: July 19, 2012
    Applicant: DecisionStreet, Inc.
    Inventor: Clinton Douglas KORVER
  • Publication number: 20120179684
    Abstract: A computer program product for an indexer-agnostic index building system includes a computer readable storage medium to store a computer readable program, wherein the computer readable program, when executed on a computer, causes the computer to perform operations for creating a semantically aggregated index. The operations include: extracting documents from a data source, wherein each document includes a data object; distributing the documents to a plurality of processing nodes within the system; for each node: indexing the data objects for each document into fields using semantic rules; and grouping indexed data objects for related fields by: classifying the documents into logical groups based on the semantic rules; and creating a searchable index shard for related logical groups.
    Type: Application
    Filed: January 12, 2011
    Publication date: July 12, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alfredo Alba, Chad E. DeLuca, Vuk Ercegovac, Thomas D. Griffin, Jun Rao, Asim V. Singh, Kevin B. Wang
  • Publication number: 20120173533
    Abstract: Embodiments are directed towards identifying auto-folder tags for messages by using a combinational optimization approach of bi-clustering folder names and features of messages based on relationship strengths. The combinational optimization approach of bi-clustering, generally, groups a plurality of folder names and a plurality of features into one or more metafolders to optimize a cost. The cost is based on an aggregate of cut relationship strengths, where a cut results when a relationship folder name and feature are grouped in separate metafolders. Furthermore, the plurality of folder names and the plurality of features are obtained by monitoring actions of a plurality of users, where the folder names are user generated folder names and features are from a plurality of messages. The metafolders may be used to tag new user messages with an auto-folder tag.
    Type: Application
    Filed: January 4, 2011
    Publication date: July 5, 2012
    Applicant: Yahoo! Inc.
    Inventors: Vishwanath Tumkur Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark E. Risher, Yoelle Maarek Smadja
  • Publication number: 20120173527
    Abstract: A mode-seeking clustering mechanism identifies clusters within a data set based on the location of individual data point according to modes in a kernel density estimate. For large-scale applications the clustering mechanism may utilize rough hierarchical kernel and data partitions in a computationally efficient manner. A variational approach to the clustering mechanism may take into account variational probabilities, which are restricted in certain ways according to hierarchical kernel and data partition trees, and the mechanism may store certain statistics within these trees in order to compute the variational probabilities in a computational efficient way. The clustering mechanism may use a two-step variational expectation and maximization algorithm and generalizations hereof, where the maximization step may be performed in different ways in order to accommodate different mode-seeking algorithms, such as the mean shift, mediod shift, and quick shift algorithms.
    Type: Application
    Filed: December 31, 2010
    Publication date: July 5, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Bo THIESSON, Jingu KIM
  • Publication number: 20120173532
    Abstract: According to one embodiment, a determination tree generating apparatus includes a determination unit, a condition generating unit, a determining unit, and a point branch generating unit. The determination unit provisionally and sequentially determines all component categories to be classification component categories for a first point of a determination tree. The point branch generating unit generates a first point assigned to a classification component category, and generates component names to be assigned to one or more branches leading from an assigned first point to one or more child points.
    Type: Application
    Filed: March 15, 2012
    Publication date: July 5, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Shigeta KUNINOBU
  • Publication number: 20120173528
    Abstract: One aspect of the invention provides system for facilitating a job search comprising a database, a display for displaying a user interface with past, present and future sections and one or more processors. The system stores job search data in the database, schedules one or more activities associated with the job search data and displays the job search data or one or more activities in one of the past, present and future sections. Also provided, in another aspect, is a reporting module operably connected to the database and configured to provide statistical data derived from the job search data and the one or more activities. A further aspect of the subject system is directed towards providing such data to third parties to facilitate coaching a job seeker and monitoring job search related activities.
    Type: Application
    Filed: December 28, 2011
    Publication date: July 5, 2012
    Inventor: Jonathan KREINDLER