Machine Learning Patents (Class 706/12)
  • Patent number: 8010341
    Abstract: Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: August 30, 2011
    Assignee: Microsoft Corporation
    Inventors: Kannan Achan, Moises Goldszmidt, Lev Ratinov
  • Patent number: 8010466
    Abstract: The invention provides a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage. The method further utilizes learning machines in combination with hashing schemes to cluster and classify documents. In one embodiment hash apparatuses and methods taxonomize clusters. In yet another embodiment, clusters of documents utilize geometric hash to contain the documents in a data corpus without the overhead of search and storage.
    Type: Grant
    Filed: June 10, 2009
    Date of Patent: August 30, 2011
    Assignee: TW Vericept Corporation
    Inventor: Seth Patinkin
  • Patent number: 8010663
    Abstract: A computationally implemented method includes, but is not limited to acquiring subjective user state data including data indicating incidence of at least a first subjective user state associated with a first user and data indicating incidence of at least a second subjective user state associated with a second user; acquiring objective occurrence data including data indicating incidence of at least a first objective occurrence and data indicating incidence of at least a second objective occurrence; and correlating the subjective user state data with the objective occurrence data. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.
    Type: Grant
    Filed: March 25, 2009
    Date of Patent: August 30, 2011
    Assignee: The Invention Science Fund I, LLC
    Inventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
  • Patent number: 8010662
    Abstract: A computationally implemented method includes, but is not limited to: acquiring objective occurrence data including data indicating occurrence of at least one objective occurrence; soliciting, in response to the acquisition of the objective occurrence data, subjective user state data including data indicating occurrence of at least one subjective user state associated with a user; acquiring the subjective user state data and correlating the subjective user state data with the objective occurrence data. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.
    Type: Grant
    Filed: February 25, 2009
    Date of Patent: August 30, 2011
    Assignee: The Invention Science Fund I, LLC
    Inventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
  • Patent number: 8010357
    Abstract: Combined active and semi-supervised learning to reduce an amount of manual labeling when training a spoken language understanding model classifier. The classifier may be trained with human-labeled utterance data. Ones of a group of unselected utterance data may be selected for manual labeling via active learning. The classifier may be changed, via semi-supervised learning, based on the selected ones of the unselected utterance data.
    Type: Grant
    Filed: January 12, 2005
    Date of Patent: August 30, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
  • Patent number: 8010674
    Abstract: Some embodiments of the present invention provide a system that facilitates access to a website from an application. During operation, the system obtains community data associated with interactions between a set of users and the website and examines the community data to identify an interactivity request made by the website to users of the website. Next, the system obtains user-specific data from a new user of the application, which includes a response to the interactivity request from the new user. Finally, the system uses the user-specific data to automate access to the website for the new user.
    Type: Grant
    Filed: March 31, 2008
    Date of Patent: August 30, 2011
    Assignee: Intuit Inc.
    Inventor: Spencer W. Fong
  • Patent number: 8010664
    Abstract: A computationally implemented method includes, but is not limited to: acquiring events data including data indicating incidence of a first one or more reported events and data indicating incidence of a second one or more reported events, at least one of the first one or more reported events and the second one or more reported events being associated with a user; determining an events pattern based selectively on the incidences of the first one or more reported events and the second one or more reported events; and developing a hypothesis associated with the user based, at least in part, on the determined events pattern. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.
    Type: Grant
    Filed: May 28, 2009
    Date of Patent: August 30, 2011
    Assignee: The Invention Science Fund I, LLC
    Inventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
  • Publication number: 20110208735
    Abstract: Described is a technology by which a term frequency function for web click data is machine learned from raw click features extracted from a query log or the like and training data. Also described is using combining the term frequency function with other functions/click features to learn a relevance function for use in ranking document relevance to a query.
    Type: Application
    Filed: February 23, 2010
    Publication date: August 25, 2011
    Applicant: Microsoft Corporation
    Inventors: Jianfeng Gao, Krysta M. Svore
  • Publication number: 20110208425
    Abstract: Techniques describe determining a correlation between identified locations to recommend a location that may be of interest to an individual user. The process constructs a location model to identify locations. To construct the model, the process uses global positioning system (GPS) logs of geospatial locations collected over time and identifies trajectories representing trips of the individual user and extracts stay points from the trajectories. Each stay point represents a geographical region where the individual user stayed over a time threshold within a distance threshold. A location history is formulated for the individual user based on a sequence of the extracted stay points to identify locations. The process determines a correlation between identified locations. The process integrates travel experiences of individual users who have visited the locations in a weighted manner and identifies a common travel sequence which the individual users followed between the locations.
    Type: Application
    Filed: February 23, 2010
    Publication date: August 25, 2011
    Applicant: Microsoft Corporation
    Inventors: Yu Zheng, Lizhu Zhang, Xing Xie
  • Publication number: 20110208680
    Abstract: The method includes obtaining system model data representing a set of failures in a system including a plurality of components, a set of symptoms and relationships between at least some of the failures and symptoms. The system model data is used to create a Bayesian Network. Failure cases data is also obtained, where each failure case describes the presence/absence of at least one of the symptoms and the presence/absence of at least one of the failures. A learning operation on the Bayesian Network using the failure cases data is then performed and the contribution made by at least some of the failure cases to updating the parameters of the Bayesian Network during the learning operation is assessed. Information representing the assessed contribution of the at least some failure cases is displayed.
    Type: Application
    Filed: September 30, 2009
    Publication date: August 25, 2011
    Applicant: BAE SYSTEMS plc
    Inventors: Richard Lee Bovey, Erdem Turker Senalp
  • Publication number: 20110208679
    Abstract: A computer readable, non-transitory medium has stored therein a trouble pattern creating program. The program causes a computer to execute: (a) extracting, from a plurality of log messages that are output from an information system having a plurality of configuration items and that are output in a predetermined period of time, configuration items that output the log messages; (b) calculating a degree of relationship between the configuration items extracted in the (a) extracting; (c) executing learning of the rate of the number of occurrences of troubles in the information system in the number of times the log messages are output, the learning is executed by a number of times corresponding to the degree of relationship calculated in the (b) calculating; and (d) creating, in accordance with a result of the learning in the (c) executing, a trouble pattern message that is output when a trouble occurs.
    Type: Application
    Filed: February 17, 2011
    Publication date: August 25, 2011
    Applicant: Fujitsu Limited
    Inventors: Yukihiro WATANABE, Masazumi Matsubara, Atsuji Sekiguchi, Yuji Wada, Yasuhide Matsumoto
  • Publication number: 20110208677
    Abstract: A system and method for analyzing Intrusion Detection System (IDS) alert data associated with a computer network is described. The method includes applying first association rules to obtained IDS alert data associated with a computer network and processing the obtained IDS alert data with the first association rules. Analyst feedback data associated with the processed obtained IDS alert data is received, and a training data set from the analyst feedback data is received. New association rules are determined based upon the training data set, and the new association rules are outputted to a display of a computing device. Outputting the new association rules may include outputting patterns within the IDS alert data of false positive alerts. The new association rules may be applied back to the obtained IDS alert data.
    Type: Application
    Filed: May 4, 2011
    Publication date: August 25, 2011
    Applicant: BANK OF AMERICA LEGAL DEPARTMENT
    Inventors: Mian Zhou, Sean Kenric Catlett
  • Publication number: 20110208678
    Abstract: An electronic system includes an accelerometer. A method for excessive mechanical shock feature extraction for overstress event registration and cumulative tracking includes obtaining a sample from the accelerometer. Feature extraction is performed on the sample using empirical mode decomposition (EMD) to produce a plurality of modes. A pattern classifier is utilized for processing the plurality of modes to determine if the sample classifies as a shock event. If the sample classifies as a shock event, a shock event counter is incremented. If the shock event counter reaches a specified count, an indication to a user is generated.
    Type: Application
    Filed: February 19, 2010
    Publication date: August 25, 2011
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Anton A. Bougaev, Aleksey M. Urmanov, David K. McElfresh, Kenny C. Gross
  • Patent number: 8005768
    Abstract: An apparatus and method to check whether a user likes a multimedia file based on the user's emotional reaction index of the multimedia file and repeatedly reproducing the multimedia file if the user likes the multimedia file. The multimedia file reproducing apparatus can include an emotional reaction index calculation unit to calculate an emotional reaction index based on a physical reaction signal of a user; a like/dislike checking unit to check whether the user likes or dislikes a corresponding audio file based on the calculated emotional reaction index; a list generation unit to generate a list of audio files that the user likes based on an average of emotional reaction indices for each audio file and the user's preference for each audio file; and a reproduction management unit to control the reproduction of the corresponding audio file based on whether the user likes or dislikes the corresponding audio file and to reproduce the audio files in the generated list.
    Type: Grant
    Filed: November 27, 2007
    Date of Patent: August 23, 2011
    Assignee: SAMSUNG Electronics Co., Ltd.
    Inventors: Gyung-Hye Yang, Seung-Nyung Chung
  • Patent number: 8005293
    Abstract: A training method for a support vector machine, including executing an iterative process on a training set of data to determine parameters defining the machine, the iterative process being executed on the basis of a differentiable form of a primal optimization problem for the parameters, the problem being defined on the basis of the parameters and the data set.
    Type: Grant
    Filed: April 11, 2001
    Date of Patent: August 23, 2011
    Assignee: Telestra New Wave Pty Ltd
    Inventors: Adam Kowalczyk, Trevor Bruce Anderson
  • Patent number: 8005948
    Abstract: A computationally implemented method includes, but is not limited to: acquiring subjective user state data including at least a first subjective user state and a second subjective user state; acquiring objective context data including at least a first context data indicative of a first objective occurrence associated with a user and a second context data indicative of a second objective occurrence associated with the user; and correlating the subjective user state data with the objective context data. In addition to the foregoing, other method aspects are described in the claims, drawings, and text forming a part of the present disclosure.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: August 23, 2011
    Assignee: The Invention Science Fund I, LLC
    Inventors: Shawn P. Firminger, Jason Garms, Edward K. Y. Jung, Chris D. Karkanias, Eric C. Leuthardt, Royce A. Levien, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, Jr., Clarence T. Tegreene, Kristin M. Tolle, Lowell L. Wood, Jr.
  • Patent number: 8005771
    Abstract: A method and framework are described for detecting changes in a multivariate data stream. A training set is formed by sampling time windows in a data stream containing data reflecting normal conditions. A histogram is created to summarize each window of data, and data within the histograms are clustered to form test distribution representatives to minimize the bulk of training data. Test data is then summarized using histograms representing time windows of data and data within the test histograms are clustered. The test histograms are compared to the training histograms using nearest neighbor techniques on the clustered data. Distances from the test histograms to the test distribution representatives are compared to a threshold to identify anomalies.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: August 23, 2011
    Assignee: Siemens Corporation
    Inventors: Terrence Chen, Chao Yuan, Abdul Saboor Sheikh, Claus Neubauer
  • Patent number: 8005769
    Abstract: A method of generating association rules from a data stream, which is a non-limited data set composed of transactions, includes: when itemsets in the generated transactions and the counts of the itemsets are managed using a prefix tree, and each node of the prefix tree has information on the count of a specific itemset corresponding to the node and a specific item, updating the information of a node corresponding to the itemset or adding a new node on the basis of the itemset included in the generated transaction and the count of the itemset; comparing the support of the itemset corresponding to each of the nodes of the prefix tree with a minimum support to select frequent itemsets; and visiting all or some of the nodes corresponding to the selected frequent itemsets and generating the association rule on the basis of the information of each of the visited nodes.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: August 23, 2011
    Assignee: Lee, Won Suk
    Inventor: Won Suk Lee
  • Patent number: 8005767
    Abstract: The present invention enables identification of events such as target. From training target event data the present a very large number of clusters are formed for each class based on Euclidean distance using a repetitive k-means clustering process. Features from each cluster are identified by extracting out their dominant eigenvectors. Once all of the dominant eigenvectors have been identified, they define the relevant space of the cluster. New target event data is compared to each cluster by projecting it onto the relevant and noise spaces. The more the data lies within the relevant space and the less it lies within the noise space the more similar the data is to a cluster. The new target event data is then classified based on the training target event data.
    Type: Grant
    Filed: June 1, 2007
    Date of Patent: August 23, 2011
    Assignee: The United States of America as represented by the Secretary of the Navy
    Inventor: Vincent A. Cassella
  • Patent number: 8005770
    Abstract: A method for generating a Bayesian network in a parallel manner is based on an initial model having a plurality of nodes. Each node corresponds to a variable of a data set and has a local distribution associated therewith. The method includes assigning a plurality of subsets of the nodes to a respective plurality of constructors. The plurality of constructors is operated in a parallel manner to identify edges to add between nodes in the initial model. The identified edges are added to the initial model to generate the Bayesian network. The edges indicate dependency between nodes connected by the edges.
    Type: Grant
    Filed: June 9, 2008
    Date of Patent: August 23, 2011
    Assignee: Microsoft Corporation
    Inventors: Chi Cao Minh, Max Chickering, John Feo, Jaime Hwacinski, Anitha Panapakkam, Khaled Sedky
  • Publication number: 20110202513
    Abstract: The present invention is directed towards a method and system for processing a real time increase in search requests for a common event. The method and system includes detecting an activity spike in user search request activity based on monitoring of user search requests over a defined period of time and determining source locations associated with the activity spike based on user search result activities. The method and system further includes associating the source locations with the user search request and thereupon applying a machine-learning model to determine a plurality of common features operative to cause the activity spike, including determining associations between the source locations and the activity spike.
    Type: Application
    Filed: February 16, 2010
    Publication date: August 18, 2011
    Applicant: YAHOO! INC.
    Inventor: Vik Singh
  • Publication number: 20110202484
    Abstract: Access is obtained to a parallel corpus including a problem corpus and a solution corpus. A first plurality of topics are mined from the problem corpus and a second plurality of topics are mined from the solution corpus. A transition probability from the first plurality of topics to the second plurality of topics is determined, to identify a most appropriate one of the topics from the solution corpus for a given one of the topics from the problem corpus.
    Type: Application
    Filed: February 18, 2010
    Publication date: August 18, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nikolaos Anerousis, Abhijit Bose, Jimeng Sun, Duo Zhang
  • Publication number: 20110202487
    Abstract: A statistical model learning device is provided to efficiently select data effective in improving the quality of statistical models. A data classification means 601 refers to structural information 611 generally possessed by a data which is a learning object, and extracts a plurality of subsets 613 from the training data 612. A statistical model learning means 602 utilizes the plurality of subsets 613 to create statistical models 614 respectively. A data recognition means 603 utilizes the respective statistical models 614 to recognize other data 615 different from the training data 612 and acquires each recognition result 616. An information amount calculation means 604 calculates information amounts of the other data 615 from a degree of discrepancy among the statistical models of the recognition results. A data selection means 605 selects the data with a large information amount and adds the same to the training data 612.
    Type: Application
    Filed: July 22, 2009
    Publication date: August 18, 2011
    Applicant: NEC CORPORATION
    Inventor: Takafumi Koshinaka
  • Publication number: 20110202485
    Abstract: Various exemplary embodiments relate to a method and related network node and machine-readable storage medium including one or more of the following: receiving, at the PCRN, the application request message; determining at least one requested service flow from the application request message; for each requested service flow of the at least one requested service flow, generating a new PCC rule based on the application request message; and providing each new PCC rule to a Policy and Charging Enforcement Node (PCEN). Various exemplary embodiments further include an application request message including at least one media component and at least one media subcomponent and the step of for each media subcomponent, determining a requested service flow from the media subcomponent.
    Type: Application
    Filed: February 18, 2010
    Publication date: August 18, 2011
    Applicant: Alcatel-Lucent Canada Inc.
    Inventors: Kevin Scott Cutler, Fernando Cuervo, Mike Vihtari, Ajay Kirit Pandya
  • Publication number: 20110202486
    Abstract: Described herein is a framework for predicting development of a cardiovascular condition of interest in a patient. The framework involves determining, based on prior domain knowledge relating to the cardiovascular condition of interest, a risk score as a function of patient data. The patient data may include both genetic data and non-genetic data. In one implementation, the risk score is used to categorize the patient into at least one of multiple risk categories, the multiple risk categories being associated with different strategies to prevent the onset of the cardiovascular condition. The results generated by the framework may be presented to a physician to facilitate interpretation, risk assessment and/or clinical decision support.
    Type: Application
    Filed: March 14, 2011
    Publication date: August 18, 2011
    Inventors: Glenn Fung, Faisal Farooq, Bharat R. Rao, Stephan B. Felix, Till Ittermann, Heyo K. Kroemer, Rainer Rettig, Henry Volzke
  • Publication number: 20110202488
    Abstract: In a machine condition monitoring technique, related sensors are grouped together in clusters to improve the performance of state estimation models. To form the clusters, the entire set of sensors is first analyzed using a Gaussian process regression (GPR) to make a prediction of each sensor from the others in the set. A dependency analysis of the GPR then uses thresholds to determine which sensors are related. Related sensors are then placed together in clusters. State estimation models utilizing the clusters of sensors may then be trained.
    Type: Application
    Filed: September 25, 2009
    Publication date: August 18, 2011
    Applicant: Siemens Corporation
    Inventor: Chao Yuan
  • Publication number: 20110202876
    Abstract: An apparatus and method are disclosed for providing feedback and guidance to touch screen device users to improve text entry user experience and performance by generating input history data including character probabilities, word probabilities, and touch models. According to one embodiment, a method comprises receiving first input data, automatically learning user tendencies based on the first input data to generate input history data, receiving second input data, and generating auto-corrections or suggestion candidates for one or more words of the second input data based on the input history data. The user can then select one of the suggestion candidates to replace a selected word with the selected suggestion candidate.
    Type: Application
    Filed: March 22, 2010
    Publication date: August 18, 2011
    Applicant: Microsoft Corporation
    Inventors: Eric Norman Badger, Drew Elliot Linerud, Itai Almog, Timothy S. Paek, Parthasarathy Sundararajan, Dmytro Rudchenko, Asela J Gunawardana
  • Patent number: 8001060
    Abstract: A method and system for classifying small collections of hi-value entities with missing data. The invention includes: collecting measurement variables for a set of entity cases for which classifications are known; calibrating standard weights for each measurement variable based on historical data; computing compensating weights for each entity case that has missing data, computing case scores for each of one or more dimensions as a sum-product of compensating weights and variables associated with each dimension; executing an iterative process that finds a specific combination of compensation weights that best classify the entity cases in terms of distinct scores; and applying a resulting model, which is determined by the specific combination of compensation weights, to classify other entity cases for which the classifications are unknown.
    Type: Grant
    Filed: May 9, 2007
    Date of Patent: August 16, 2011
    Assignee: International Business Machines Corporation
    Inventor: John A. Ricketts
  • Patent number: 8001063
    Abstract: In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves obtaining a decision-making entity and a reward mechanism. The decision-making entity manages a plurality of application environments supported by a data processing system, where each application environment operates on data input to the data processing system. The reward mechanism generates numerical measures of value responsive to actions performed in states of the application environments. The decision-making entity and the reward mechanism are applied to the application environments, and results achieved through this application are processed in accordance with reward-based learning to derive a policy. The reward mechanism and the policy are then applied to the application environments, and the results of this application are processed in accordance with reward-based learning to derive a new policy.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: August 16, 2011
    Assignee: International Business Machines Corporation
    Inventors: Gerald James Tesauro, Rajarshi Das, Nicholas K. Jong, Jeffrey O. Kephart
  • Patent number: 8001061
    Abstract: A data processing apparatus includes first and second unsupervised learning process units and a supervised learning process unit. The first unsupervised learning process unit classifies data of a first data group according to unsupervised learning, to perform dimension reduction for the first data group and to obtain first classified data. The second unsupervised learning process unit classifies data of a second data group according to the unsupervised learning, to perform dimension reduction for the second data group and to obtain a second classified data group. The supervised learning process unit performs supervised learning using, as a teacher, the first classified data group obtained by the first unsupervised learning process unit and the second classified data group obtained by the second unsupervised learning process unit to determine a mapping relation between the first classified data group and the second classified data group.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: August 16, 2011
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Shinichiro Serizawa, Tomoyuki Ito
  • Patent number: 8001074
    Abstract: Systems and methods for extracting or analyzing time-series behavior are described. Some embodiments of computer-implemented methods include generating fuzzy rules from time series data. Certain embodiments also include resolving conflicts between fuzzy rules according to how the data is clustered. Some embodiments further include extracting a model of the time-series behavior via defuzzification and making that model accessible. Advantageously, to resolve conflicts between fuzzy rules, some embodiments define Gaussian functions for each conflicting data point, sum the Gaussian functions according to how the conflicting data points are clustered, and resolve the conflict based on the results of summing the Gaussian functions. Some embodiments use both crisp and non-trivially fuzzy regions and/or both crisp and non-trivially fuzzy membership functions.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: August 16, 2011
    Assignee: Quest Software, Inc.
    Inventor: Wai Yip To
  • Patent number: 8001062
    Abstract: Disclosed herein is a method, a system and a computer program product for generating a statistical classification model used by a computer system to determine a class associated with an unlabeled time series event. Initially, a set of labeled time series events is received. A set of time series features is identified for a selected set of the labeled time series events. A plurality of scale space decompositions is generated based on the set of time series features. A plurality of multi-scale features is generated based on the plurality of scale space decompositions. A first subset of the plurality of multi-scale features that correspond at least in part to a subset of space or time points within a time series event that contain feature data that distinguish the time series event as belonging to a class of time series events that corresponds to the class label are identified.
    Type: Grant
    Filed: December 7, 2007
    Date of Patent: August 16, 2011
    Assignee: Google Inc.
    Inventors: Ullas Gargi, Jay Yagnik
  • Publication number: 20110196739
    Abstract: The present invention provides a method and system for ranking and selecting advertisements based on relevancy, click feedback and click over expected click (COEC) data. Advertisements may be described as contextual, page-embedded advertisements appearing on publisher websites. The method and system includes storing page-advertisement relevancy features in a vector space model and historical impression and click features in a click feedback model and analyzing data in the vector space model and click feedback model. The method and system further includes storing empirical click-through data in a serving log and analyzing data therein. The method and system then generates a regression model based on the analyzed data, which is stored in a regression storage module. The method and system receives requests for advertisement content from client devices, selects a plurality of candidate advertisements based on the generated regression model and provides a plurality of advertisements to a client device.
    Type: Application
    Filed: February 5, 2010
    Publication date: August 11, 2011
    Inventors: Ruofei Zhang, Wei Li, Jianchang Mao
  • Publication number: 20110196853
    Abstract: A computer-implemented method for automatically generating a script for a target web interface instance. Embodiments include receiving a task description of a task to be completed on a target web interface instance. The computer-implemented method also includes repeating steps until the task is completed. The repeating steps include determining from the target web interface instance a plurality of actions that may be performed on the target web interface instance and using the task description, predicting which action of the plurality of actions from the target web interface instance is an action most likely to be selected. The repeating steps also include performing the action most likely to be selected, thus proceeding to a first web interface instance and setting the first web interface instance as the target web interface instance.
    Type: Application
    Filed: February 8, 2010
    Publication date: August 11, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey P. Bigham, Clemens Drews, Tessa A. Lau, Ian A. R. Li, Jeffrey W. Nichols
  • Publication number: 20110196870
    Abstract: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.
    Type: Application
    Filed: April 19, 2011
    Publication date: August 11, 2011
    Applicant: KOFAX, INC.
    Inventors: Mauritius A.R. Schmidtler, Roland Borrey, Anthony Sarah
  • Patent number: 7996409
    Abstract: A method to manage objects in an information lifecycle management system is provided. The method includes determining a score for each of the objects based on a score of at least one feature within respective ones of each of the objects where the score of the at least one feature being associated with a valuation of the at least one feature. The method also includes managing each of the objects based on the score for each of the objects wherein higher scored objects are managed preferentially.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventors: Windsor Wee Sun Hsu, Shauchi Ong
  • Patent number: 7996343
    Abstract: Described is using semi-Riemannian geometry in supervised learning to learn a discriminant subspace for classification, e.g., labeled samples are used to learn the geometry of a semi-Riemannian submanifold. For a given sample, the K nearest classes of that sample are determined, along with the nearest samples that are in other classes, and the nearest samples in that sample's same class. The distances between these samples are computed, and used in computing a metric matrix. The metric matrix is used to compute a projection matrix that corresponds to the discriminant subspace. In online classification, as a new sample is received, it is projected into a feature space by use of the projection matrix and classified accordingly.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: August 9, 2011
    Assignee: Microsoft Corporation
    Inventors: Deli Zhao, Zhouchen Lin, Xiaoou Tang
  • Patent number: 7996341
    Abstract: Embodiments of the present disclosure assess how well a tag describes a color theme by estimating a descriptiveness value for the tag for the color theme. Some embodiments determine descriptiveness values for a tag based on weighted color attributes determined from the tag's existing use in a color theme collection. Descriptiveness values are used generally in color theme searching and to suggest tags for a color theme, among other things.
    Type: Grant
    Filed: December 20, 2007
    Date of Patent: August 9, 2011
    Assignee: Adobe Systems Incorporated
    Inventor: Hendrik Kueck
  • Patent number: 7996340
    Abstract: A workforce analysis method for solving L1-based clustering problem of multinomial distributions of workforce data includes acquiring workforce allocation data, arranging the workforce allocation data in sets of fraction data with respect to the L1 distances, clustering the sets of fraction data t corresponding set of cluster centers, or L1 distances for each set, minimizing the sets of fraction data based on the cluster centers or L1 distances and outputting analysis results of the clustering problem.
    Type: Grant
    Filed: December 19, 2007
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventor: Hisashi Kashima
  • Patent number: 7996342
    Abstract: Systems, methods and computer program products for supervised dimensionality reduction. Exemplary embodiments include a method including receiving an input in the form of a data matrix X of size N×D, wherein N is a number of samples, D is a dimensionality, a vector Y of size N×1, hidden variables U of a number K, a data type of the matrix X and the vector Y, and a trade-off constant alpha; selecting loss functions in the form of Lx(X,UV) and Ly(Y,UW) appropriate for the type of data in the matrix X and the vector Y, where U, V and W are matrices, selecting corresponding sets of update rules RU, RV and RW for updating the matrices U,V and W, learning U, V and W that provide a minimum total loss L(U,V,W)=Lx(X,UV)+alpha*Ly(Y,UW), and returning matrices U, V and W.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventors: Genady Grabarnik, Irina Rish
  • Patent number: 7996339
    Abstract: A method for generating object classification models is disclosed. Initially, a set of training data is fed into a training algorithm to generate a first object classification model. A set of field data is then applied to the first object classification model to produce a set of field object classifications. The number of data in the set of field data is significantly less than the number of data in the set of training data. Finally, the set of field object classifications and the set of field data are fed into the training algorithm to generate a second object classification model. The second object classification model can be utilized for predicting object classifications.
    Type: Grant
    Filed: September 17, 2004
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ameha Aklilu, Raed Hijer, Wilson Velez
  • Publication number: 20110191170
    Abstract: The present invention provides methods and systems for use in bid optimization in connection with advertisement serving impression opportunities available in an auction-based online advertising exchange. Methods are presented in which, based in part on historical advertisement performance information, a Kalman filter-based model is used in forecasting performance of a set of possible advertisement impressions served over a future period of time. Forecasted performance information is used in determining an optimized bid in connection with an available opportunity. A similarity function, including non-linearly determined feature weighting, can be used in determining most similar forecasted impressions to the available opportunity.
    Type: Application
    Filed: February 2, 2010
    Publication date: August 4, 2011
    Applicant: Yahoo! Inc.
    Inventors: Ruofei Zhang, Ying Cui
  • Publication number: 20110191275
    Abstract: A computer-implemented method for determining the volume of activation of neural tissue. In one embodiment, the method uses one or more parametric equations that define a volume of activation, wherein the parameters for the one or more parametric equations are given as a function of an input vector that includes stimulation parameters. After receiving input data that includes values for the stimulation parameters and defining the input vector using the input data, the input vector is applied to the function to obtain the parameters for the one or more parametric equations. The parametric equation is solved to obtain a calculated volume of activation.
    Type: Application
    Filed: August 26, 2010
    Publication date: August 4, 2011
    Applicant: THE CLEVELAND CLINIC FOUNDATION
    Inventors: J. Luis Lujan, Ashu Chaturvedi, Cameron C. McIntyre
  • Publication number: 20110191277
    Abstract: A data mining system includes a planning and learning module which receives as input a knowledge model and a set of goals and automatically produces as output a plurality of plans. The system includes a data mining processing unit which receives the plans as instructions and automatically creates results which are provided back to the planning and learning module as feedback. A method for data mining includes the steps of receiving as input at a planning and learning module a knowledge model and a set of goals. There is the step of automatically producing as output of the planning and learning module a plurality of plans from the input. There is the step of receiving by a data mining processing unit the plans as instructions. There is the step of automatically creating results by the data mining processing unit. There is the step of providing back to the planning and learning module the results as feedback.
    Type: Application
    Filed: August 29, 2008
    Publication date: August 4, 2011
    Inventors: JosĂ© Luis AgĂșndez Dominguez, Jesus Renero Quintero
  • Publication number: 20110191400
    Abstract: Similarities between simplex projection with upper bounds and L1 projection are explored. Criteria for a-priori determination of sequence in which various constraints become active are derived, and this sequence is used to develop efficient algorithms for projecting a vector onto the L1-ball while observing box constraints. Three projection methods are presented. The first projection method performs exact projection in O(n2) worst case complexity, where n is the space dimension. Using a novel criteria for ordering constraints, the second projection method has a worst case complexity of O(n log n). The third projection method is a worst case linear time algorithm having O(n) complexity. The upper bounds defined for the projected entries guide the L1-ball projection to more meaningful predictions.
    Type: Application
    Filed: August 10, 2010
    Publication date: August 4, 2011
    Inventors: Mithun Das Gupta, Jing Xiao, Sanjeev Kumar
  • Publication number: 20110191374
    Abstract: Methods and systems to associate semantically-related items of a plurality of item types using a joint embedding space are disclosed. The disclosed methods and systems are scalable to large, web-scale training data sets. According to an embodiment, a method for associating semantically-related items of a plurality of item types includes embedding training items of a plurality of item types in a joint embedding space configured in a memory coupled to at least one processor, learning one or more mappings into the joint embedding space for each of the item types to create a trained joint embedding space and one or more learned mappings, and associating one or more embedded training items with a first item based upon a distance in the trained joint embedding space from the first item to each said associated embedded training items. Exemplary item types that may be embedded in the joint embedding space include images, annotations, audio and video.
    Type: Application
    Filed: February 1, 2011
    Publication date: August 4, 2011
    Applicant: Google Inc.
    Inventors: Samy BENGIO, Jason Weston
  • Publication number: 20110191276
    Abstract: To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.
    Type: Application
    Filed: December 16, 2010
    Publication date: August 4, 2011
    Applicant: University of Washington through its Center for Commercialization
    Inventors: Michael J. Cafarella, Michele Banko, Oren Etzioni
  • Publication number: 20110191273
    Abstract: A method for providing an evaluation/verification of the correctness of an ontology is described. The method includes loading a first ontology associated with a first rule set. an extended ontology and an extended rule set are generated based at least in part on the first ontology and the first rule set. The extended rule set is applied to the extended ontology. The method also includes determining (e.g., by a data processor) a correctness of the extended ontology. Results are generated which include the correctness. Apparatus and computer readable media are also described.
    Type: Application
    Filed: February 2, 2010
    Publication date: August 4, 2011
    Applicant: International Business Machines Corporation
    Inventors: Genady Grabarnik, Zhen Liu, Anand Ranganathan, Anton V. Riabov, Irina Rish, Larisa Shwartz
  • Publication number: 20110191271
    Abstract: A method described herein includes receiving a digital image, wherein the digital image includes a first element that corresponds to a first domain and a second element that corresponds to a second domain. The method also includes automatically assigning a label to the first element in the digital image based at least in part upon a computed probability that the label corresponds to the first element, wherein the probability is computed through utilization of a first model that is configured to infer labels for elements in the first domain and a second model that is configured to infer labels for elements in the second domain. The first model receives data that identifies learned relationships between elements in the first domain and elements in the second domain, and the probability is computed by the first model based at least in part upon the learned relationships.
    Type: Application
    Filed: February 4, 2010
    Publication date: August 4, 2011
    Applicant: Microsoft Corporation
    Inventors: Simon John Baker, Ashish Kapoor, Gang Hua, Dahua Lin
  • Publication number: 20110191274
    Abstract: Described is a technology by which a deep-structured (multiple layered) conditional random field model is trained and used for classification of sequential data. Sequential data is processed at each layer, from the lowest layer to a final (highest) layer, to output data in the form of conditional probabilities of classes given the sequential input data. Each higher layer inputs the conditional probability data and the sequential data jointly to output further probability data, and so forth, until the final layer which outputs the classification data. Also described is layer-by-layer training, supervised or unsupervised. Unsupervised training may process raw features to minimize average frame-level conditional entropy while maximizing state occupation entropy, or to minimize reconstruction error.
    Type: Application
    Filed: January 29, 2010
    Publication date: August 4, 2011
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Shizhen Wang