Functional and inclusion dependency discovery is important to knowledge discovery, database semantics analysis, database design, and data quality assessment. Motivated by the importance of dependency discovery, this paper reviews the methods for functional dependency, conditional functional dependency, approximate functional dependency, and inclusion dependency discovery in relational databases and a method for discovering XML functional dependencies.
Archives for April 2012
DDD: A New Ensemble Approach for Dealing with Concept Drift
Online learning algorithms often have to operate in the presence of concept drifts. A recent study revealed that different diversity levels in an ensemble of learning machines are required in order to maintain high generalization on both old and new concepts. Inspired by this study and based on a further study of diversity with different […]
Answering General Time-Sensitive Queries
Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe […]
Anónimos: An LP-Based Approach for Anonymizing Weighted Social Network Graphs
The increasing popularity of social networks has initiated a fertile research area in information extraction and data mining. Anonymization of these social graphs is important to facilitate publishing these data sets for analysis by external entities. Prior work has concentrated mostly on node identity anonymization and structural anonymization. But with the growing interest in analyzing […]
Agglomerative Mean-Shift Clustering
Mean-Shift (MS) is a powerful nonparametric clustering method. Although good accuracy can be achieved, its computational cost is particularly expensive even on moderate data sets. In this paper, for the purpose of algorithmic speedup, we develop an agglomerative MS clustering method along with its performance analysis. Our method, namely Agglo-MS, is built upon an iterative […]
A Probabilistic Scheme for Keyword-Based Incremental Query Construction
Databases enable users to precisely express their informational needs using structured queries. However, database query construction is a laborious and error-prone process, which cannot be performed well by most end users. Keyword search alleviates the usability problem at the price of query expressiveness. As keyword search algorithms do not differentiate between the possible informational needs […]
A Multidimensional Sequence Approach to Measuring Tree Similarity
Tree is one of the most common and well-studied data structures in computer science. Measuring the similarity of such structures is key to analyzing this type of data. However, measuring tree similarity is not trivial due to the inherent complexity of trees and the ensuing large search space. Tree kernel, a state of the art […]
A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
Although attempts have been made to solve the problem of clustering categorical data via cluster ensembles, with the results being competitive to conventional algorithms, it is observed that these techniques unfortunately generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. […]
A Genetic Programming Approach to Record Deduplication
Several systems that rely on consistent data to offer high-quality services, such as digital libraries and e-commerce brokers, may be affected by the existence of duplicates, quasi replicas, or near-duplicate entries in their repositories. Because of that, there have been significant investments from private and government organizations for developing methods for removing replicas from its […]