Archives for May 2012

Labeling Dynamic XML Documents: An Order-Centric Approach

May 31, 2012 by IeeeAdmin

Dynamic XML labeling schemes have important applications in XML Database Management Systems. In this paper, we explore dynamic XML labeling schemes from a novel order-centric perspective. We compare the various labeling schemes proposed in the literature with a special focus on their orders of labels. We show that the order of labels fundamentally impacts the […]

Incremental Information Extraction Using Relational Databases

May 28, 2012 by IeeeAdmin

Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules targeting the extraction of a particular kind of information. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be reapplied from scratch to the entire text corpus even […]

Identifying Evolving Groups in Dynamic Multimode Networks

May 26, 2012 by IeeeAdmin

A multimode network consists of heterogeneous types of actors with various interactions occurring between them. Identifying communities in a multimode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and assist tasks like targeted marketing and finding influential actors within or between groups. In general, a network […]

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

May 25, 2012 by IeeeAdmin

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build […]

Holistic Top-k Simple Shortest Path Join in Graphs

May 24, 2012 by IeeeAdmin

Motivated by the needs such as group relationship analysis, this paper introduces a new operation on graphs, named top-k path join, which discovers the top-k simple shortest paths between two given node sets. Rather than discovering the top-k simple paths between each node pair, this paper proposes a holistic join method which answers the top-k […]

Fuzzy Orders-of-Magnitude-Based Link Analysis for Qualitative Alias Detection

May 23, 2012 by IeeeAdmin

Alias detection has been the significant subject being extensively studied for several domain applications, especially intelligence data analysis. Many preliminary methods rely on text-based measures, which are ineffective with false descriptions of terrorists’ name, date-of-birth, and address. This barrier may be overcome through link information presented in relationships among objects of interests. Several numerical link-based […]

Fractal-Based Intrinsic Dimension Estimation and Its Application in Dimensionality Reduction

May 8, 2012 by IeeeAdmin

Dimensionality reduction is an important step in knowledge discovery in databases. Intrinsic dimension indicates the number of variables necessary to describe a data set. Two methods, box-counting dimension and correlation dimension, are commonly used for intrinsic dimension estimation. However, the robustness of these two methods has not been rigorously studied. This paper demonstrates that correlation […]

Fast Elastic Peak Detection for Mass Spectrometry

May 7, 2012 by IeeeAdmin

We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity […]

Efficiently Indexing Large Sparse Graphs for Similarity Search

May 5, 2012 by IeeeAdmin

The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the […]

Extending Attribute Information for Small Data Set Classification

May 3, 2012 by IeeeAdmin

Data quantity is the main issue in the small data set problem, because usually insufficient data will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. This paper proposes a new attribute construction approach which converts the original data attributes into […]