Machine Learning Patents (Class 706/12)
-
Patent number: 8010341Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.Type: GrantFiled: September 13, 2007Date of Patent: August 30, 2011Assignee: Microsoft CorporationInventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
-
Patent number: 8010466Abstract: The invention provides a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage. The method further utilizes learning machines in combination with hashing schemes to cluster and classify documents. In one embodiment hash apparatuses and methods taxonomize clusters. In yet another embodiment, clusters of documents utilize geometric hash to contain the documents in a data corpus without the overhead of search and storage.Type: GrantFiled: June 10, 2009Date of Patent: August 30, 2011Assignee: TW Vericept CorporationInventor: Seth Patinkin
-
Patent number: 8010663Abstract: A computationally implemented method includes, but is not limited to acquiring subjective user state data including data indicating incidence of at least a first subjective user state associated with a first user and data indicating incidence of at least a second subjective user state associated with a second user; acquiring objective occurrence data including data indicating incidence of at least a first objective occurrence and data indicating incidence of at least a second objective occurrence; and correlating the subjective user state data with the objective occurrence data. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.Type: GrantFiled: March 25, 2009Date of Patent: August 30, 2011Assignee: The Invention Science Fund I, LLCInventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
-
Patent number: 8010662Abstract: A computationally implemented method includes, but is not limited to: acquiring objective occurrence data including data indicating occurrence of at least one objective occurrence; soliciting, in response to the acquisition of the objective occurrence data, subjective user state data including data indicating occurrence of at least one subjective user state associated with a user; acquiring the subjective user state data and correlating the subjective user state data with the objective occurrence data. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.Type: GrantFiled: February 25, 2009Date of Patent: August 30, 2011Assignee: The Invention Science Fund I, LLCInventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
-
Patent number: 8010357Abstract: Combined active and semi-supervised learning to reduce an amount of manual labeling when training a spoken language understanding model classifier. The classifier may be trained with human-labeled utterance data. Ones of a group of unselected utterance data may be selected for manual labeling via active learning. The classifier may be changed, via semi-supervised learning, based on the selected ones of the unselected utterance data.Type: GrantFiled: January 12, 2005Date of Patent: August 30, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
-
Patent number: 8010674Abstract: Some embodiments of the present invention provide a system that facilitates access to a website from an application. During operation, the system obtains community data associated with interactions between a set of users and the website and examines the community data to identify an interactivity request made by the website to users of the website. Next, the system obtains user-specific data from a new user of the application, which includes a response to the interactivity request from the new user. Finally, the system uses the user-specific data to automate access to the website for the new user.Type: GrantFiled: March 31, 2008Date of Patent: August 30, 2011Assignee: Intuit Inc.Inventor: Spencer W. Fong
-
Patent number: 8010664Abstract: A computationally implemented method includes, but is not limited to: acquiring events data including data indicating incidence of a first one or more reported events and data indicating incidence of a second one or more reported events, at least one of the first one or more reported events and the second one or more reported events being associated with a user; determining an events pattern based selectively on the incidences of the first one or more reported events and the second one or more reported events; and developing a hypothesis associated with the user based, at least in part, on the determined events pattern. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.Type: GrantFiled: May 28, 2009Date of Patent: August 30, 2011Assignee: The Invention Science Fund I, LLCInventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
-
Publication number: 20110208735Abstract: Described is a technology by which a term frequency function for web click data is machine learned from raw click features extracted from a query log or the like and training data. Also described is using combining the term frequency function with other functions/click features to learn a relevance function for use in ranking document relevance to a query.Type: ApplicationFiled: February 23, 2010Publication date: August 25, 2011Applicant: Microsoft CorporationInventors: Jianfeng Gao, Krysta M. Svore
-
Publication number: 20110208425Abstract: Techniques describe determining a correlation between identified locations to recommend a location that may be of interest to an individual user. The process constructs a location model to identify locations. To construct the model, the process uses global positioning system (GPS) logs of geospatial locations collected over time and identifies trajectories representing trips of the individual user and extracts stay points from the trajectories. Each stay point represents a geographical region where the individual user stayed over a time threshold within a distance threshold. A location history is formulated for the individual user based on a sequence of the extracted stay points to identify locations. The process determines a correlation between identified locations. The process integrates travel experiences of individual users who have visited the locations in a weighted manner and identifies a common travel sequence which the individual users followed between the locations.Type: ApplicationFiled: February 23, 2010Publication date: August 25, 2011Applicant: Microsoft CorporationInventors: Yu Zheng, Lizhu Zhang, Xing Xie
-
Publication number: 20110208680Abstract: The method includes obtaining system model data representing a set of failures in a system including a plurality of components, a set of symptoms and relationships between at least some of the failures and symptoms. The system model data is used to create a Bayesian Network. Failure cases data is also obtained, where each failure case describes the presence/absence of at least one of the symptoms and the presence/absence of at least one of the failures. A learning operation on the Bayesian Network using the failure cases data is then performed and the contribution made by at least some of the failure cases to updating the parameters of the Bayesian Network during the learning operation is assessed. Information representing the assessed contribution of the at least some failure cases is displayed.Type: ApplicationFiled: September 30, 2009Publication date: August 25, 2011Applicant: BAE SYSTEMS plcInventors: Richard Lee Bovey, Erdem Turker Senalp
-
Publication number: 20110208679Abstract: A computer readable, non-transitory medium has stored therein a trouble pattern creating program. The program causes a computer to execute: (a) extracting, from a plurality of log messages that are output from an information system having a plurality of configuration items and that are output in a predetermined period of time, configuration items that output the log messages; (b) calculating a degree of relationship between the configuration items extracted in the (a) extracting; (c) executing learning of the rate of the number of occurrences of troubles in the information system in the number of times the log messages are output, the learning is executed by a number of times corresponding to the degree of relationship calculated in the (b) calculating; and (d) creating, in accordance with a result of the learning in the (c) executing, a trouble pattern message that is output when a trouble occurs.Type: ApplicationFiled: February 17, 2011Publication date: August 25, 2011Applicant: Fujitsu LimitedInventors: Yukihiro WATANABE, Masazumi Matsubara, Atsuji Sekiguchi, Yuji Wada, Yasuhide Matsumoto
-
Publication number: 20110208677Abstract: A system and method for analyzing Intrusion Detection System (IDS) alert data associated with a computer network is described. The method includes applying first association rules to obtained IDS alert data associated with a computer network and processing the obtained IDS alert data with the first association rules. Analyst feedback data associated with the processed obtained IDS alert data is received, and a training data set from the analyst feedback data is received. New association rules are determined based upon the training data set, and the new association rules are outputted to a display of a computing device. Outputting the new association rules may include outputting patterns within the IDS alert data of false positive alerts. The new association rules may be applied back to the obtained IDS alert data.Type: ApplicationFiled: May 4, 2011Publication date: August 25, 2011Applicant: BANK OF AMERICA LEGAL DEPARTMENTInventors: Mian Zhou, Sean Kenric Catlett
-
Publication number: 20110208678Abstract: An electronic system includes an accelerometer. A method for excessive mechanical shock feature extraction for overstress event registration and cumulative tracking includes obtaining a sample from the accelerometer. Feature extraction is performed on the sample using empirical mode decomposition (EMD) to produce a plurality of modes. A pattern classifier is utilized for processing the plurality of modes to determine if the sample classifies as a shock event. If the sample classifies as a shock event, a shock event counter is incremented. If the shock event counter reaches a specified count, an indication to a user is generated.Type: ApplicationFiled: February 19, 2010Publication date: August 25, 2011Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Anton A. Bougaev, Aleksey M. Urmanov, David K. McElfresh, Kenny C. Gross
-
Patent number: 8005768Abstract: An apparatus and method to check whether a user likes a multimedia file based on the user's emotional reaction index of the multimedia file and repeatedly reproducing the multimedia file if the user likes the multimedia file. The multimedia file reproducing apparatus can include an emotional reaction index calculation unit to calculate an emotional reaction index based on a physical reaction signal of a user; a like/dislike checking unit to check whether the user likes or dislikes a corresponding audio file based on the calculated emotional reaction index; a list generation unit to generate a list of audio files that the user likes based on an average of emotional reaction indices for each audio file and the user's preference for each audio file; and a reproduction management unit to control the reproduction of the corresponding audio file based on whether the user likes or dislikes the corresponding audio file and to reproduce the audio files in the generated list.Type: GrantFiled: November 27, 2007Date of Patent: August 23, 2011Assignee: SAMSUNG Electronics Co., Ltd.Inventors: Gyung-Hye Yang, Seung-Nyung Chung
-
Patent number: 8005293Abstract: A training method for a support vector machine, including executing an iterative process on a training set of data to determine parameters defining the machine, the iterative process being executed on the basis of a differentiable form of a primal optimization problem for the parameters, the problem being defined on the basis of the parameters and the data set.Type: GrantFiled: April 11, 2001Date of Patent: August 23, 2011Assignee: Telestra New Wave Pty LtdInventors: Adam Kowalczyk, Trevor Bruce Anderson
-
Patent number: 8005948Abstract: A computationally implemented method includes, but is not limited to: acquiring subjective user state data including at least a first subjective user state and a second subjective user state; acquiring objective context data including at least a first context data indicative of a first objective occurrence associated with a user and a second context data indicative of a second objective occurrence associated with the user; and correlating the subjective user state data with the objective context data. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.Type: GrantFiled: November 26, 2008Date of Patent: August 23, 2011Assignee: The Invention Science Fund I, LLCInventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
-
Patent number: 8005771Abstract: A method and framework are described for detecting changes in a multivariate data stream. A training set is formed by sampling time windows in a data stream containing data reflecting normal conditions. A histogram is created to summarize each window of data, and data within the histograms are clustered to form test distribution representatives to minimize the bulk of training data. Test data is then summarized using histograms representing time windows of data and data within the test histograms are clustered. The test histograms are compared to the training histograms using nearest neighbor techniques on the clustered data. Distances from the test histograms to the test distribution representatives are compared to a threshold to identify anomalies.Type: GrantFiled: September 24, 2008Date of Patent: August 23, 2011Assignee: Siemens CorporationInventors: Terrence Chen, Chao Yuan, Abdul Saboor Sheikh, Claus Neubauer
-
Patent number: 8005769Abstract: A method of generating association rules from a data stream, which is a non-limited data set composed of transactions, includes: when itemsets in the generated transactions and the counts of the itemsets are managed using a prefix tree, and each node of the prefix tree has information on the count of a specific itemset corresponding to the node and a specific item, updating the information of a node corresponding to the itemset or adding a new node on the basis of the itemset included in the generated transaction and the count of the itemset; comparing the support of the itemset corresponding to each of the nodes of the prefix tree with a minimum support to select frequent itemsets; and visiting all or some of the nodes corresponding to the selected frequent itemsets and generating the association rule on the basis of the information of each of the visited nodes.Type: GrantFiled: February 19, 2008Date of Patent: August 23, 2011Assignee: Lee, Won SukInventor: Won Suk Lee
-
Patent number: 8005767Abstract: The present invention enables identification of events such as target. From training target event data the present a very large number of clusters are formed for each class based on Euclidean distance using a repetitive k-means clustering process. Features from each cluster are identified by extracting out their dominant eigenvectors. Once all of the dominant eigenvectors have been identified, they define the relevant space of the cluster. New target event data is compared to each cluster by projecting it onto the relevant and noise spaces. The more the data lies within the relevant space and the less it lies within the noise space the more similar the data is to a cluster. The new target event data is then classified based on the training target event data.Type: GrantFiled: June 1, 2007Date of Patent: August 23, 2011Assignee: The United States of America as represented by the Secretary of the NavyInventor: Vincent A. Cassella
-
Patent number: 8005770Abstract: A method for generating a Bayesian network in a parallel manner is based on an initial model having a plurality of nodes. Each node corresponds to a variable of a data set and has a local distribution associated therewith. The method includes assigning a plurality of subsets of the nodes to a respective plurality of constructors. The plurality of constructors is operated in a parallel manner to identify edges to add between nodes in the initial model. The identified edges are added to the initial model to generate the Bayesian network. The edges indicate dependency between nodes connected by the edges.Type: GrantFiled: June 9, 2008Date of Patent: August 23, 2011Assignee: Microsoft CorporationInventors: Chi Cao Minh, Max Chickering, John Feo, Jaime Hwacinski, Anitha Panapakkam, Khaled Sedky
-
Publication number: 20110202513Abstract: The present invention is directed towards a method and system for processing a real time increase in search requests for a common event. The method and system includes detecting an activity spike in user search request activity based on monitoring of user search requests over a defined period of time and determining source locations associated with the activity spike based on user search result activities. The method and system further includes associating the source locations with the user search request and thereupon applying a machine-learning model to determine a plurality of common features operative to cause the activity spike, including determining associations between the source locations and the activity spike.Type: ApplicationFiled: February 16, 2010Publication date: August 18, 2011Applicant: YAHOO! INC.Inventor: Vik Singh
-
Publication number: 20110202484Abstract: Access is obtained to a parallel corpus including a problem corpus and a solution corpus. A first plurality of topics are mined from the problem corpus and a second plurality of topics are mined from the solution corpus. A transition probability from the first plurality of topics to the second plurality of topics is determined, to identify a most appropriate one of the topics from the solution corpus for a given one of the topics from the problem corpus.Type: ApplicationFiled: February 18, 2010Publication date: August 18, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nikolaos Anerousis, Abhijit Bose, Jimeng Sun, Duo Zhang
-
Publication number: 20110202487Abstract: A statistical model learning device is provided to efficiently select data effective in improving the quality of statistical models. A data classification means 601 refers to structural information 611 generally possessed by a data which is a learning object, and extracts a plurality of subsets 613 from the training data 612. A statistical model learning means 602 utilizes the plurality of subsets 613 to create statistical models 614 respectively. A data recognition means 603 utilizes the respective statistical models 614 to recognize other data 615 different from the training data 612 and acquires each recognition result 616. An information amount calculation means 604 calculates information amounts of the other data 615 from a degree of discrepancy among the statistical models of the recognition results. A data selection means 605 selects the data with a large information amount and adds the same to the training data 612.Type: ApplicationFiled: July 22, 2009Publication date: August 18, 2011Applicant: NEC CORPORATIONInventor: Takafumi Koshinaka
-
Publication number: 20110202485Abstract: Various exemplary embodiments relate to a method and related network node and machine-readable storage medium including one or more of the following: receiving, at the PCRN, the application request message; determining at least one requested service flow from the application request message; for each requested service flow of the at least one requested service flow, generating a new PCC rule based on the application request message; and providing each new PCC rule to a Policy and Charging Enforcement Node (PCEN). Various exemplary embodiments further include an application request message including at least one media component and at least one media subcomponent and the step of for each media subcomponent, determining a requested service flow from the media subcomponent.Type: ApplicationFiled: February 18, 2010Publication date: August 18, 2011Applicant: Alcatel-Lucent Canada Inc.Inventors: Kevin Scott Cutler, Fernando Cuervo, Mike Vihtari, Ajay Kirit Pandya
-
Publication number: 20110202486Abstract: Described herein is a framework for predicting development of a cardiovascular condition of interest in a patient. The framework involves determining, based on prior domain knowledge relating to the cardiovascular condition of interest, a risk score as a function of patient data. The patient data may include both genetic data and non-genetic data. In one implementation, the risk score is used to categorize the patient into at least one of multiple risk categories, the multiple risk categories being associated with different strategies to prevent the onset of the cardiovascular condition. The results generated by the framework may be presented to a physician to facilitate interpretation, risk assessment and/or clinical decision support.Type: ApplicationFiled: March 14, 2011Publication date: August 18, 2011Inventors: Glenn Fung, Faisal Farooq, Bharat R. Rao, Stephan B. Felix, Till Ittermann, Heyo K. Kroemer, Rainer Rettig, Henry Volzke
-
Publication number: 20110202488Abstract: In a machine condition monitoring technique, related sensors are grouped together in clusters to improve the performance of state estimation models. To form the clusters, the entire set of sensors is first analyzed using a Gaussian process regression (GPR) to make a prediction of each sensor from the others in the set. A dependency analysis of the GPR then uses thresholds to determine which sensors are related. Related sensors are then placed together in clusters. State estimation models utilizing the clusters of sensors may then be trained.Type: ApplicationFiled: September 25, 2009Publication date: August 18, 2011Applicant: Siemens CorporationInventor: Chao Yuan
-
Publication number: 20110202876Abstract: An apparatus and method are disclosed for providing feedback and guidance to touch screen device users to improve text entry user experience and performance by generating input history data including character probabilities, word probabilities, and touch models. According to one embodiment, a method comprises receiving first input data, automatically learning user tendencies based on the first input data to generate input history data, receiving second input data, and generating auto-corrections or suggestion candidates for one or more words of the second input data based on the input history data. The user can then select one of the suggestion candidates to replace a selected word with the selected suggestion candidate.Type: ApplicationFiled: March 22, 2010Publication date: August 18, 2011Applicant: Microsoft CorporationInventors: Eric Norman Badger, Drew Elliot Linerud, Itai Almog, Timothy S. Paek, Parthasarathy Sundararajan, Dmytro Rudchenko, Asela J Gunawardana
-
Patent number: 8001060Abstract: A method and system for classifying small collections of hi-value entities with missing data. The invention includes: collecting measurement variables for a set of entity cases for which classifications are known; calibrating standard weights for each measurement variable based on historical data; computing compensating weights for each entity case that has missing data, computing case scores for each of one or more dimensions as a sum-product of compensating weights and variables associated with each dimension; executing an iterative process that finds a specific combination of compensation weights that best classify the entity cases in terms of distinct scores; and applying a resulting model, which is determined by the specific combination of compensation weights, to classify other entity cases for which the classifications are unknown.Type: GrantFiled: May 9, 2007Date of Patent: August 16, 2011Assignee: International Business Machines CorporationInventor: John A. Ricketts
-
Patent number: 8001063Abstract: In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves obtaining a decision-making entity and a reward mechanism. The decision-making entity manages a plurality of application environments supported by a data processing system, where each application environment operates on data input to the data processing system. The reward mechanism generates numerical measures of value responsive to actions performed in states of the application environments. The decision-making entity and the reward mechanism are applied to the application environments, and results achieved through this application are processed in accordance with reward-based learning to derive a policy. The reward mechanism and the policy are then applied to the application environments, and the results of this application are processed in accordance with reward-based learning to derive a new policy.Type: GrantFiled: June 30, 2008Date of Patent: August 16, 2011Assignee: International Business Machines CorporationInventors: Gerald James Tesauro, Rajarshi Das, Nicholas K. Jong, Jeffrey O. Kephart
-
Patent number: 8001061Abstract: A data processing apparatus includes first and second unsupervised learning process units and a supervised learning process unit. The first unsupervised learning process unit classifies data of a first data group according to unsupervised learning, to perform dimension reduction for the first data group and to obtain first classified data. The second unsupervised learning process unit classifies data of a second data group according to the unsupervised learning, to perform dimension reduction for the second data group and to obtain a second classified data group. The supervised learning process unit performs supervised learning using, as a teacher, the first classified data group obtained by the first unsupervised learning process unit and the second classified data group obtained by the second unsupervised learning process unit to determine a mapping relation between the first classified data group and the second classified data group.Type: GrantFiled: June 26, 2007Date of Patent: August 16, 2011Assignee: Fuji Xerox Co., Ltd.Inventors: Shinichiro Serizawa, Tomoyuki Ito
-
Patent number: 8001074Abstract: Systems and methods for extracting or analyzing time-series behavior are described. Some embodiments of computer-implemented methods include generating fuzzy rules from time series data. Certain embodiments also include resolving conflicts between fuzzy rules according to how the data is clustered. Some embodiments further include extracting a model of the time-series behavior via defuzzification and making that model accessible. Advantageously, to resolve conflicts between fuzzy rules, some embodiments define Gaussian functions for each conflicting data point, sum the Gaussian functions according to how the conflicting data points are clustered, and resolve the conflict based on the results of summing the Gaussian functions. Some embodiments use both crisp and non-trivially fuzzy regions and/or both crisp and non-trivially fuzzy membership functions.Type: GrantFiled: January 31, 2008Date of Patent: August 16, 2011Assignee: Quest Software, Inc.Inventor: Wai Yip To
-
Patent number: 8001062Abstract: Disclosed herein is a method, a system and a computer program product for generating a statistical classification model used by a computer system to determine a class associated with an unlabeled time series event. Initially, a set of labeled time series events is received. A set of time series features is identified for a selected set of the labeled time series events. A plurality of scale space decompositions is generated based on the set of time series features. A plurality of multi-scale features is generated based on the plurality of scale space decompositions. A first subset of the plurality of multi-scale features that correspond at least in part to a subset of space or time points within a time series event that contain feature data that distinguish the time series event as belonging to a class of time series events that corresponds to the class label are identified.Type: GrantFiled: December 7, 2007Date of Patent: August 16, 2011Assignee: Google Inc.Inventors: Ullas Gargi, Jay Yagnik
-
Publication number: 20110196739Abstract: The present invention provides a method and system for ranking and selecting advertisements based on relevancy, click feedback and click over expected click (COEC) data. Advertisements may be described as contextual, page-embedded advertisements appearing on publisher websites. The method and system includes storing page-advertisement relevancy features in a vector space model and historical impression and click features in a click feedback model and analyzing data in the vector space model and click feedback model. The method and system further includes storing empirical click-through data in a serving log and analyzing data therein. The method and system then generates a regression model based on the analyzed data, which is stored in a regression storage module. The method and system receives requests for advertisement content from client devices, selects a plurality of candidate advertisements based on the generated regression model and provides a plurality of advertisements to a client device.Type: ApplicationFiled: February 5, 2010Publication date: August 11, 2011Inventors: Ruofei Zhang, Wei Li, Jianchang Mao
-
Publication number: 20110196853Abstract: A computer-implemented method for automatically generating a script for a target web interface instance. Embodiments include receiving a task description of a task to be completed on a target web interface instance. The computer-implemented method also includes repeating steps until the task is completed. The repeating steps include determining from the target web interface instance a plurality of actions that may be performed on the target web interface instance and using the task description, predicting which action of the plurality of actions from the target web interface instance is an action most likely to be selected. The repeating steps also include performing the action most likely to be selected, thus proceeding to a first web interface instance and setting the first web interface instance as the target web interface instance.Type: ApplicationFiled: February 8, 2010Publication date: August 11, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeffrey P. Bigham, Clemens Drews, Tessa A. Lau, Ian A. R. Li, Jeffrey W. Nichols
-
Publication number: 20110196870Abstract: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.Type: ApplicationFiled: April 19, 2011Publication date: August 11, 2011Applicant: KOFAX, INC.Inventors: Mauritius A.R. Schmidtler, Roland Borrey, Anthony Sarah
-
Patent number: 7996409Abstract: A method to manage objects in an information lifecycle management system is provided. The method includes determining a score for each of the objects based on a score of at least one feature within respective ones of each of the objects where the score of the at least one feature being associated with a valuation of the at least one feature. The method also includes managing each of the objects based on the score for each of the objects wherein higher scored objects are managed preferentially.Type: GrantFiled: December 28, 2006Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventors: Windsor Wee Sun Hsu, Shauchi Ong
-
Patent number: 7996343Abstract: Described is using semi-Riemannian geometry in supervised learning to learn a discriminant subspace for classification, e.g., labeled samples are used to learn the geometry of a semi-Riemannian submanifold. For a given sample, the K nearest classes of that sample are determined, along with the nearest samples that are in other classes, and the nearest samples in that sample's same class. The distances between these samples are computed, and used in computing a metric matrix. The metric matrix is used to compute a projection matrix that corresponds to the discriminant subspace. In online classification, as a new sample is received, it is projected into a feature space by use of the projection matrix and classified accordingly.Type: GrantFiled: September 30, 2008Date of Patent: August 9, 2011Assignee: Microsoft CorporationInventors: Deli Zhao, Zhouchen Lin, Xiaoou Tang
-
Patent number: 7996341Abstract: Embodiments of the present disclosure assess how well a tag describes a color theme by estimating a descriptiveness value for the tag for the color theme. Some embodiments determine descriptiveness values for a tag based on weighted color attributes determined from the tag's existing use in a color theme collection. Descriptiveness values are used generally in color theme searching and to suggest tags for a color theme, among other things.Type: GrantFiled: December 20, 2007Date of Patent: August 9, 2011Assignee: Adobe Systems IncorporatedInventor: Hendrik Kueck
-
Patent number: 7996340Abstract: A workforce analysis method for solving L1-based clustering problem of multinomial distributions of workforce data includes acquiring workforce allocation data, arranging the workforce allocation data in sets of fraction data with respect to the L1 distances, clustering the sets of fraction data t corresponding set of cluster centers, or L1 distances for each set, minimizing the sets of fraction data based on the cluster centers or L1 distances and outputting analysis results of the clustering problem.Type: GrantFiled: December 19, 2007Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventor: Hisashi Kashima
-
Patent number: 7996342Abstract: Systems, methods and computer program products for supervised dimensionality reduction. Exemplary embodiments include a method including receiving an input in the form of a data matrix X of size NĂD, wherein N is a number of samples, D is a dimensionality, a vector Y of size NĂ1, hidden variables U of a number K, a data type of the matrix X and the vector Y, and a trade-off constant alpha; selecting loss functions in the form of Lx(X,UV) and Ly(Y,UW) appropriate for the type of data in the matrix X and the vector Y, where U, V and W are matrices, selecting corresponding sets of update rules RU, RV and RW for updating the matrices U,V and W, learning U, V and W that provide a minimum total loss L(U,V,W)=Lx(X,UV)+alpha*Ly(Y,UW), and returning matrices U, V and W.Type: GrantFiled: February 15, 2008Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventors: Genady Grabarnik, Irina Rish
-
Patent number: 7996339Abstract: A method for generating object classification models is disclosed. Initially, a set of training data is fed into a training algorithm to generate a first object classification model. A set of field data is then applied to the first object classification model to produce a set of field object classifications. The number of data in the set of field data is significantly less than the number of data in the set of training data. Finally, the set of field object classifications and the set of field data are fed into the training algorithm to generate a second object classification model. The second object classification model can be utilized for predicting object classifications.Type: GrantFiled: September 17, 2004Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventors: Ameha Aklilu, Raed Hijer, Wilson Velez
-
Publication number: 20110191170Abstract: The present invention provides methods and systems for use in bid optimization in connection with advertisement serving impression opportunities available in an auction-based online advertising exchange. Methods are presented in which, based in part on historical advertisement performance information, a Kalman filter-based model is used in forecasting performance of a set of possible advertisement impressions served over a future period of time. Forecasted performance information is used in determining an optimized bid in connection with an available opportunity. A similarity function, including non-linearly determined feature weighting, can be used in determining most similar forecasted impressions to the available opportunity.Type: ApplicationFiled: February 2, 2010Publication date: August 4, 2011Applicant: Yahoo! Inc.Inventors: Ruofei Zhang, Ying Cui
-
Publication number: 20110191275Abstract: A computer-implemented method for determining the volume of activation of neural tissue. In one embodiment, the method uses one or more parametric equations that define a volume of activation, wherein the parameters for the one or more parametric equations are given as a function of an input vector that includes stimulation parameters. After receiving input data that includes values for the stimulation parameters and defining the input vector using the input data, the input vector is applied to the function to obtain the parameters for the one or more parametric equations. The parametric equation is solved to obtain a calculated volume of activation.Type: ApplicationFiled: August 26, 2010Publication date: August 4, 2011Applicant: THE CLEVELAND CLINIC FOUNDATIONInventors: J. Luis Lujan, Ashu Chaturvedi, Cameron C. McIntyre
-
Publication number: 20110191277Abstract: A data mining system includes a planning and learning module which receives as input a knowledge model and a set of goals and automatically produces as output a plurality of plans. The system includes a data mining processing unit which receives the plans as instructions and automatically creates results which are provided back to the planning and learning module as feedback. A method for data mining includes the steps of receiving as input at a planning and learning module a knowledge model and a set of goals. There is the step of automatically producing as output of the planning and learning module a plurality of plans from the input. There is the step of receiving by a data mining processing unit the plans as instructions. There is the step of automatically creating results by the data mining processing unit. There is the step of providing back to the planning and learning module the results as feedback.Type: ApplicationFiled: August 29, 2008Publication date: August 4, 2011Inventors: JosĂ© Luis AgĂșndez Dominguez, Jesus Renero Quintero
-
Publication number: 20110191400Abstract: Similarities between simplex projection with upper bounds and L1 projection are explored. Criteria for a-priori determination of sequence in which various constraints become active are derived, and this sequence is used to develop efficient algorithms for projecting a vector onto the L1-ball while observing box constraints. Three projection methods are presented. The first projection method performs exact projection in O(n2) worst case complexity, where n is the space dimension. Using a novel criteria for ordering constraints, the second projection method has a worst case complexity of O(n log n). The third projection method is a worst case linear time algorithm having O(n) complexity. The upper bounds defined for the projected entries guide the L1-ball projection to more meaningful predictions.Type: ApplicationFiled: August 10, 2010Publication date: August 4, 2011Inventors: Mithun Das Gupta, Jing Xiao, Sanjeev Kumar
-
Publication number: 20110191374Abstract: Methods and systems to associate semantically-related items of a plurality of item types using a joint embedding space are disclosed. The disclosed methods and systems are scalable to large, web-scale training data sets. According to an embodiment, a method for associating semantically-related items of a plurality of item types includes embedding training items of a plurality of item types in a joint embedding space configured in a memory coupled to at least one processor, learning one or more mappings into the joint embedding space for each of the item types to create a trained joint embedding space and one or more learned mappings, and associating one or more embedded training items with a first item based upon a distance in the trained joint embedding space from the first item to each said associated embedded training items. Exemplary item types that may be embedded in the joint embedding space include images, annotations, audio and video.Type: ApplicationFiled: February 1, 2011Publication date: August 4, 2011Applicant: Google Inc.Inventors: Samy BENGIO, Jason Weston
-
Publication number: 20110191276Abstract: To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.Type: ApplicationFiled: December 16, 2010Publication date: August 4, 2011Applicant: University of Washington through its Center for CommercializationInventors: Michael J. Cafarella, Michele Banko, Oren Etzioni
-
Publication number: 20110191273Abstract: A method for providing an evaluation/verification of the correctness of an ontology is described. The method includes loading a first ontology associated with a first rule set. an extended ontology and an extended rule set are generated based at least in part on the first ontology and the first rule set. The extended rule set is applied to the extended ontology. The method also includes determining (e.g., by a data processor) a correctness of the extended ontology. Results are generated which include the correctness. Apparatus and computer readable media are also described.Type: ApplicationFiled: February 2, 2010Publication date: August 4, 2011Applicant: International Business Machines CorporationInventors: Genady Grabarnik, Zhen Liu, Anand Ranganathan, Anton V. Riabov, Irina Rish, Larisa Shwartz
-
Publication number: 20110191271Abstract: A method described herein includes receiving a digital image, wherein the digital image includes a first element that corresponds to a first domain and a second element that corresponds to a second domain. The method also includes automatically assigning a label to the first element in the digital image based at least in part upon a computed probability that the label corresponds to the first element, wherein the probability is computed through utilization of a first model that is configured to infer labels for elements in the first domain and a second model that is configured to infer labels for elements in the second domain. The first model receives data that identifies learned relationships between elements in the first domain and elements in the second domain, and the probability is computed by the first model based at least in part upon the learned relationships.Type: ApplicationFiled: February 4, 2010Publication date: August 4, 2011Applicant: Microsoft CorporationInventors: Simon John Baker, Ashish Kapoor, Gang Hua, Dahua Lin
-
Publication number: 20110191274Abstract: Described is a technology by which a deep-structured (multiple layered) conditional random field model is trained and used for classification of sequential data. Sequential data is processed at each layer, from the lowest layer to a final (highest) layer, to output data in the form of conditional probabilities of classes given the sequential input data. Each higher layer inputs the conditional probability data and the sequential data jointly to output further probability data, and so forth, until the final layer which outputs the classification data. Also described is layer-by-layer training, supervised or unsupervised. Unsupervised training may process raw features to minimize average frame-level conditional entropy while maximizing state occupation entropy, or to minimize reconstruction error.Type: ApplicationFiled: January 29, 2010Publication date: August 4, 2011Applicant: Microsoft CorporationInventors: Dong Yu, Li Deng, Shizhen Wang