Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build […]
Fractal-Based Intrinsic Dimension Estimation and Its Application in Dimensionality Reduction
Dimensionality reduction is an important step in knowledge discovery in databases. Intrinsic dimension indicates the number of variables necessary to describe a data set. Two methods, box-counting dimension and correlation dimension, are commonly used for intrinsic dimension estimation. However, the robustness of these two methods has not been rigorously studied. This paper demonstrates that correlation […]
Extending Attribute Information for Small Data Set Classification
Data quantity is the main issue in the small data set problem, because usually insufficient data will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. This paper proposes a new attribute construction approach which converts the original data attributes into […]
Effective and Efficient Shape-Based Pattern Detection over Streaming Time Series
Existing distance measures of time series such as the euclidean distance, DTW, and EDR are inadequate in handling certain degrees of amplitude shifting and scaling variances of data items. We propose a novel distance measure of time series, Spatial Assembling Distance (SpADe), that is able to handle noisy, shifting, and scaling in both temporal and […]
DDD: A New Ensemble Approach for Dealing with Concept Drift
Online learning algorithms often have to operate in the presence of concept drifts. A recent study revealed that different diversity levels in an ensemble of learning machines are required in order to maintain high generalization on both old and new concepts. Inspired by this study and based on a further study of diversity with different […]