Dynamic XML labeling schemes have important applications in XML Database Management Systems. In this paper, we explore dynamic XML labeling schemes from a novel order-centric perspective. We compare the various labeling schemes proposed in the literature with a special focus on their orders of labels. We show that the order of labels fundamentally impacts the […]
Archives for May 2012
Incremental Information Extraction Using Relational Databases
Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules targeting the extraction of a particular kind of information. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be reapplied from scratch to the entire text corpus even […]
Identifying Evolving Groups in Dynamic Multimode Networks
A multimode network consists of heterogeneous types of actors with various interactions occurring between them. Identifying communities in a multimode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and assist tasks like targeted marketing and finding influential actors within or between groups. In general, a network […]
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build […]
Holistic Top-k Simple Shortest Path Join in Graphs
Motivated by the needs such as group relationship analysis, this paper introduces a new operation on graphs, named top-k path join, which discovers the top-k simple shortest paths between two given node sets. Rather than discovering the top-k simple paths between each node pair, this paper proposes a holistic join method which answers the top-k […]
Fuzzy Orders-of-Magnitude-Based Link Analysis for Qualitative Alias Detection
Alias detection has been the significant subject being extensively studied for several domain applications, especially intelligence data analysis. Many preliminary methods rely on text-based measures, which are ineffective with false descriptions of terrorists’ name, date-of-birth, and address. This barrier may be overcome through link information presented in relationships among objects of interests. Several numerical link-based […]
Fractal-Based Intrinsic Dimension Estimation and Its Application in Dimensionality Reduction
Dimensionality reduction is an important step in knowledge discovery in databases. Intrinsic dimension indicates the number of variables necessary to describe a data set. Two methods, box-counting dimension and correlation dimension, are commonly used for intrinsic dimension estimation. However, the robustness of these two methods has not been rigorously studied. This paper demonstrates that correlation […]
Fast Elastic Peak Detection for Mass Spectrometry
We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity […]
Efficiently Indexing Large Sparse Graphs for Similarity Search
The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the […]
Extending Attribute Information for Small Data Set Classification
Data quantity is the main issue in the small data set problem, because usually insufficient data will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. This paper proposes a new attribute construction approach which converts the original data attributes into […]