Search engine companies collect the “database of intentions,” the histories of their users’ search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper, we analyze algorithms for publishing frequent keywords, queries, and clicks of […]
Archives for July 2012
Privacy Preserving Decision Tree Learning Using Unrealized Data Sets
Privacy preservation is important for machine learning and data mining, but measures designed to protect private information often result in a trade-off: reduced utility of the training samples. This paper introduces a privacy preserving approach that can be applied to decision tree learning, without concomitant loss of accuracy. It describes an approach to the preservation […]
Practical Efficient String Mining
In recent years, several algorithms for mining frequent and emerging substring patterns from databases of string data (such as proteins and natural language texts) have been discovered, all of which traverse an enhanced suffix array data structure. All of these algorithms lie at either extreme of the efficiency spectrum; they are either fast and use […]