As one of the most popular cloud services, data storage has attracted great attention in recent research efforts. Key-value (k-v) stores have emerged as a popular option for storing and querying billions of key-value pairs. So far, existing methods have been deterministic. Providing such accuracy, however, comes at the cost of memory and CPU time. […]
TEES: An Efficient Search Scheme over Encrypted Data on Mobile Cloud
Cloud storage provides a convenient, massive, and scalable storage at low cost, but data privacy is a major concern that prevents users from storing files on the cloud trustingly. One way of enhancing privacy from data owner point of view is to encrypt the files before outsourcing them onto the cloud and decrypt the files […]
Bag-of-Discriminative-Words (BoDW) Representation via Topic Modeling
Many of the words in a given document either deliver facts (objective) or express opinions (subjective), respectively, depending on the topics they are involved in. For example, given a bunch of documents, the word “bug” assigned to the topic “order Hemiptera” apparently remarks one object (i.e., one kind of insects), while the same word assigned […]
Efficient Pattern-Based Aggregation on Sequence Data
A Sequence OLAP (S-OLAP) system provides a platform on which pattern-based aggregate (PBA) queries on a sequence database are evaluated. In its simplest form, a PBA query consists of a pattern template T and an aggregate function F. A pattern template is a sequence of variables, each is defined over a domain. Each variable is […]
Computing Semantic Similarity of Concepts in Knowledge Graphs
This paper presents a method for measuring the semantic similarity between concepts in Knowledge Graphs (KGs) such as WordNet and DBpedia. Previous work on semantic similarity methods have focused on either the structure of the semantic network between concepts (e.g., path length and depth), or only on the Information Content (IC) of concepts. We propose […]
User-Centric Similarity Search
User preferences play a significant role in market analysis. In the database literature, there has been extensive work on query primitives, such as the well known top-k query that can be used for the ranking of products based on the preferences customers have expressed. Still, the fundamental operation that evaluates the similarity between products is […]
A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud
Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing provides a promising platform for big sensing data processing and storage as it provides a flexible stack of massive computing, storage, and software services in a scalable manner. Current big sensing […]
Energy-Efficient Query Processing in Web Search Engines
Web search engines are composed by thousands of query processing nodes, i.e., servers dedicated to process user queries. Such many servers consume a significant amount of energy, mostly accountable to their CPUs, but they are necessary to ensure low latencies, since users expect sub-second response times (e.g., 500 ms). However, users can hardly notice response […]
Collaboratively Training Sentiment Classifiers for Multiple Domains
We propose a collaborative multi-domain sentiment classification approach to train sentiment classifiers for multiple domains simultaneously. In our approach, the sentiment information in different domains is shared to train more accurate and robust sentiment classifiers for each domain when labeled data is scarce. Specifically, we decompose the sentiment classifier of each domain into two components, […]
Query Expansion with Enriched User Profiles for Personalized Search Utilizing Folksonomy Data
Query expansion has been widely adopted in Web search as a way of tackling the ambiguity of queries. Personalized search utilizing folksonomy data has demonstrated an extreme vocabulary mismatch problem that requires even more effective query expansion methods. Co-occurrence statistics, tag-tag relationships, and semantic matching approaches are among those favored by previous research. However, user […]