Abstract: A method for processing semi-structured data. The method includes receiving semi-structured data into a first format from a real business process. Preferably, the semi-structured data are machine generated. The method includes tokenizing the semi-structured data into a second format and storing the semi-structured data in the second format into one or more memories and clustering the tokenized data to form a plurality of clusters. The method also includes identifying a selected low frequency term in each of the clusters, and processing at least two of the clusters and the associated selected low frequency terms to form a single template for the at least two of the clusters. In a preferred embodiment, the method replaces the selected low frequency term with a wild card character.
Type:
Grant
Filed:
July 20, 2004
Date of Patent:
June 17, 2008
Assignee:
ENKATA Technologies, Inc.
Inventors:
Hinrich H. Schuetze, Chia-Hao Yu, Omer Emre Velipasaoglu, Stan Stukov
Abstract: A method for estimating the performance of a statistical classifier. The method includes inputting a first set of business data in a first format from a real business process and storing the first set of business data in the first format into memory. The method applying a statistical classifier to the first set of business data and recording its classification decisions and obtaining a labeling that contains the correct decision for each data item. The method includes computing a weight for each data item that reflects its true frequency and computing a performance measure of the statistical classifier based on the weights that reflect true frequency. The method also displays the performance measure to a user.
Type:
Grant
Filed:
July 14, 2004
Date of Patent:
June 3, 2008
Assignee:
ENKATA Technologies, Inc.
Inventors:
Omer Emre Velipasaoglu, Hinrich Schuetze, Chia-Hao Yu, Stan Stukov
Abstract: A method for estimating the performance of a statistical classifier. The method includes inputting a first set of business data in a first format from a real business process and storing the first set of business data in the first format into memory. The method applying a statistical classifier to the first set of business data and recording its classification decisions and obtaining a labeling that contains the correct decision for each data item. The method includes computing a weight for each data item that reflects its true frequency and computing a performance measure of the statistical classifier based on the weights that reflect true frequency. The method also displays the performance measure to a user.
Type:
Application
Filed:
July 14, 2004
Publication date:
January 27, 2005
Applicant:
ENKATA Technologies, Inc.
Inventors:
Omer Velipasaoglu, Hinrich Schuetze, Chia-Hao Yu, Stan Stukov