Patents Assigned to Health Discovery Corporation

Identification of Co-Regulation Patterns By Unsupervised Cluster Analysis of Gene Expression Data

Publication number: 20110125683

Abstract: A method is provided for unsupervised clustering of gene expression data to identify co-regulation patterns. A clustering algorithm randomly divides the data into k different subsets and measures the similarity between pairs of datapoints within the subsets, assigning a score to the pairs based on similarity, with the greatest similarity giving the highest correlation score. A distribution of the scores is plotted for each k. The highest value of k that has a distribution that remains concentrated near the highest correlation score corresponds to the number of co-regulation patterns.

Type: Application

Filed: February 2, 2011

Publication date: May 26, 2011

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: Asa Ben Hur, André Elisseeff, Isabelle Guyon
SUPPORT VECTOR MACHINE - RECURSIVE FEATURE ELIMINATION (SVM-RFE)

Publication number: 20110119213

Abstract: Identification of a determinative subset of features from within a group of features is performed by training a support vector machine using training samples with class labels to determine a value of each feature, where features are removed based on their the value. One or more features having the smallest values are removed and an updated kernel matrix is generated using the remaining features. The process is repeated until a predetermined number of features remain which are capable of accurately separating the data into different classes.

Type: Application

Filed: December 1, 2010

Publication date: May 19, 2011

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: André Elisseeff, Bernhard Schölkopf, Fernando Perez-Cruz
RECURSIVE FEATURE ELIMINATION METHOD USING SUPPORT VECTOR MACHINES

Publication number: 20110106735

Abstract: Identification of a determinative subset of features from within a group of features is performed by training a support vector machine using training samples with class labels to determine a value of each feature, where features are removed based on their the value. One or more features having the smallest values are removed and an updated kernel matrix is generated using the remaining features. The process is repeated until a predetermined number of features remain which are capable of accurately separating the data into different classes. In some embodiments, features are eliminated by a ranking criterion based on a Lagrange multiplier corresponding to each training sample.

Type: Application

Filed: November 11, 2010

Publication date: May 5, 2011

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: Jason Weston, André Elisseeff, Bernhard Schölkopf, Fernando Perez-Cruz, Isabelle Guyon
Data mining platform for knowledge discovery from heterogeneous data types and/or heterogeneous data sources

Patent number: 7921068

Abstract: The data mining platform comprises a plurality of system modules, each formed from a plurality of components. Each module has an input data component, a data analysis engine for processing the input data, an output data component for outputting the results of the data analysis, and a web server to access and monitor the other modules within the unit and to provide communication to other units. Each module processes a different type of data, for example, a first module processes microarray (gene expression) data while a second module processes biomedical literature on the Internet for information supporting relationships between genes and diseases and gene functionality. In the preferred embodiment, the data analysis engine is a kernel-based learning machine, and in particular, one or more support vector machines (SVMs).

Type: Grant

Filed: October 30, 2007

Date of Patent: April 5, 2011

Assignee: Health Discovery Corporation

Inventors: Isabelle Guyon, Edward P. Reiss, René Doursat, Jason Aaron Edward Weston
METHOD FOR FEATURE SELECTION AND FOR EVALUATING FEATURES IDENTIFIED AS SIGNIFICANT FOR CLASSIFYING DATA

Publication number: 20110078099

Abstract: A group of features that has been identified as “significant” in being able to separate data into classes is evaluated using a support vector machine which separates the dataset into classes one feature at a time. After separation, an extremal margin value is assigned to each feature based on the distance between the lowest feature value in the first class and the highest feature value in the second class. Separately, extremal margin values are calculated for a normal distribution within a large number of randomly drawn example sets for the two classes to determine the number of examples within the normal distribution that would have a specified extremal margin value. Using p-values calculated for the normal distribution, a desired p-value is selected. The specified extremal margin value corresponding to the selected p-value is compared to the calculated extremal margin values for the group of features.

Type: Application

Filed: September 26, 2010

Publication date: March 31, 2011

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: Jason Aaron Edward Weston, André Elisseeff, Bernhard Schöelkopf, Fernando Perez-Cruz, Isabelle Guyon
Model selection for cluster data analysis

Patent number: 7890445

Abstract: A model selection method is provided for choosing the number of clusters, or more generally the parameters of a clustering algorithm. The algorithm is based on comparing the similarity between pairs of clustering runs on sub-samples or other perturbations of the data. High pairwise similarities show that the clustering represents a stable pattern in the data. The method is applicable to any clustering algorithm, and can also detect lack of structure. We show results on artificial and real data using a hierarchical clustering algorithm.

Type: Grant

Filed: October 30, 2007

Date of Patent: February 15, 2011

Assignee: Health Discovery Corporation

Inventors: Asa Ben Hur, André Elisseeff, Isabelle Guyon
Kernels for Identifying Patterns in Datasets Containing Noise or Transformation Invariances

Publication number: 20100318482

Abstract: Learning machines, such as support vector machines, are used to analyze datasets to recognize patterns within the dataset using kernels that are selected according to the nature of the data to be analyzed. Where the datasets include an invariance transformation or noise, tangent vectors are defined to identify relationships between the invariance or noise and the training data points. A covariance matrix is formed using the tangent vectors, then used in generation of the kernel, which may be based on a kernel PCA map.

Type: Application

Filed: August 25, 2010

Publication date: December 16, 2010

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: Peter L. Bartlett, André Elisseeff, Bernhard Schoelkopf, Olivier Chapelle
SYSTEM FOR PROVIDING DATA ANALYSIS SERVICES USING A SUPPORT VECTOR MACHINE FOR PROCESSING DATA RECEIVED FROM A REMOTE SOURCE

Publication number: 20100256988

Abstract: A network-based system is provided for performing data analysis services using a support vector machine for analyzing data received from a remote user connected to the network. The user transmits a data set to be analyzed and along with an account identifier that allows the analysis service provider to collect payment for the processing services. Once payment has been confirmed, the service provider's server transmits the analysis results to the remote user.

Type: Application

Filed: June 11, 2010

Publication date: October 7, 2010

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: Stephen Barnhill, Isabelle Guyon, Jason Weston
Method for feature selection in a support vector machine using feature ranking

Patent number: 7805388

Abstract: In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (l0-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score, transductive feature selection and single feature using margin-based ranking. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection.

Type: Grant

Filed: October 30, 2007

Date of Patent: September 28, 2010

Assignee: Health Discovery Corporation

Inventors: Jason Weston, André Elisseeff, Bernhard Schölkopf, Fernando Perez-Cruz, Isabelle Guyon
System for providing data analysis services using a support vector machine for processing data received from a remote source

Patent number: 7797257

Abstract: A computer system for performing data analysis services using a support vector machine for analyzing data received from a remote source on a distributed network includes a server in communication with the distributed network for receiving a data set and a financial account identifier associated with the remote source. The server communicates over the distributed network with a financial institution to receive funds from a financial account identified by the financial account identifier. A processor receives one or more data sets from the remote source and pre-processes the data to enhance meaning within the data set. The pre-processed data is used to train and test a support vector machine for recognizing patterns within the data. Live data is processed using the trained and tested support vector machine to generate an output which is transmitted to the remote source after the server confirms that payment for the data processing service has been received.

Type: Grant

Filed: October 29, 2007

Date of Patent: September 14, 2010

Assignee: Health Discovery Corporation

Inventors: Stephen Barnhill, Isabelle Guyon, Jason Weston
Kernels and methods for selecting kernels for use in learning machines

Patent number: 7788193

Abstract: Learning machines, such as support vector machines, are used to analyze datasets to recognize patterns within the dataset using kernels that are selected according to the nature of the data to be analyzed. Where the datasets possesses structural characteristics, locational kernels can be utilized to provide measures of similarity among data points within the dataset. The locational kernels are then combined to generate a decision function, or kernel, that can be used to analyze the dataset. Where an invariance transformation or noise is present, tangent vectors are defined to identify relationships between the invariance or noise and the data points. A covariance matrix is formed using the tangent vectors, then used in generation of the kernel.

Type: Grant

Filed: October 30, 2007

Date of Patent: August 31, 2010

Assignee: Health Discovery Corporation

Inventors: Peter L. Bartlett, André Elisseeff, Bernhard Schoelkopf, Olivier Chapelle
SUPPORT VECTOR MACHINE-BASED METHOD FOR ANALYSIS OF SPECTRAL DATA

Publication number: 20100205124

Abstract: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

Type: Application

Filed: February 4, 2010

Publication date: August 12, 2010

Applicant: HEALTH DISCOVERY CORPORATION

Inventors: Asa Ben-Hur, Andre Elisseeff, Olivier Chapelle, Jason Aaron Edward Weston
Selection of features predictive of biological conditions using protein mass spectrographic data

Patent number: 7676442

Abstract: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

Type: Grant

Filed: October 30, 2007

Date of Patent: March 9, 2010

Assignee: Health Discovery Corporation

Inventors: Asa Ben-Hur, André Elisseeff, Olivier Chapelle, Jason Aaron Edward Weston
BIOMARKERS DOWNREGULATED IN PROSTATE CANCER

Publication number: 20090305257

Abstract: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM), recursive feature elimination (RFE) and/or linear ridge regression classifiers to rank genes according to their ability to separate prostate cancer from normal tissue. Proteins expressed by identified genes are detected in patient samples to screen, predict and monitor prostate cancer.

Type: Application

Filed: September 30, 2008

Publication date: December 10, 2009

Applicant: HEALTH DISCOVERY CORPORATION

Inventor: Isabelle Guyon
Methods for feature selection in a learning machine

Patent number: 7624074

Abstract: In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (l0-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score and transductive feature selection. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection.

Type: Grant

Filed: October 30, 2007

Date of Patent: November 24, 2009

Assignee: Health Discovery Corporation

Inventors: Jason Aaron Edward Weston, Andre′ Elisseeff, Bernard Schoelkopf, Fernando Pérez-Cruz
BIOMARKERS OVEREXPRESSED IN PROSTATE CANCER

Publication number: 20090286240

Abstract: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM) to rank genes according to their ability to separate prostate cancer from normal tissue. Proteins expressed by identified genes are detected in patient samples to screen, predict and monitor prostate cancer.

Type: Application

Filed: September 30, 2008

Publication date: November 19, 2009

Applicant: HEALTH DISCOVERY CORPORATION

Inventor: Isabelle Guyon
Kernels and kernel methods for spectral data

Patent number: 7617163

Abstract: Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

Type: Grant

Filed: October 9, 2002

Date of Patent: November 10, 2009

Assignee: Health Discovery Corporation

Inventors: Asa Ben-Hur, André Elisseeff, Olivier Chapelle, Jason Aaron Edward Weston
Methods for Screening, Predicting and Monitoring Prostate Cancer

Publication number: 20090226915

Abstract: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM) to rank genes according to their ability to separate prostate cancer from normal tissue. Expression products of identified genes are detected in patient samples, including prostate tissue, serum, semen and urine, to screen, predict and monitor prostate cancer.

Type: Application

Filed: January 6, 2009

Publication date: September 10, 2009

Applicant: HEALTH DISCOVERY CORPORATION

Inventor: Isabelle Guyon
METHODS FOR SCREENING, PREDICTING AND MONITORING PROSTATE CANCER

Publication number: 20090215058

Abstract: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM) to rank genes according to their ability to separate prostate cancer from normal tissue. Expression products of identified genes are detected in patient samples, including prostate tissue, serum, semen and urine, to screen, predict and monitor prostate cancer.

Type: Application

Filed: December 4, 2008

Publication date: August 27, 2009

Applicant: HEALTH DISCOVERY CORPORATION

Inventor: Isabelle Guyon
BIOMARKERS UPREGULATED IN PROSTATE CANCER

Publication number: 20090215024

Abstract: Biomarkers are identified by analyzing gene expression data using support vector machines (SVM) to rank genes according to their ability to separate prostate cancer from normal tissue. Proteins expressed by identified genes are detected in patient samples to screen, predict and monitor prostate cancer.

Type: Application

Filed: February 4, 2008

Publication date: August 27, 2009

Applicant: HEALTH DISCOVERY CORPORATION

Inventor: ISABELLE GUYON

prev 1 2 3 next