EFFECTIVE AND EFFICIENT BORDA COUNT APPROACH USING INDEX BASED ALGORITHM FOR TOP-K QUERY ON MULTIVALUED OBJECTS

As ranking is an essential analytic method, it is natural and fundamental to investigate how to rank a set of multivalued objects. To the best of our knowledge, however, there is no existing work addressing this important problem systematically. One may think we may simply rank multivalued objects as uncertain/probabilistic data. However, the models of multivalued objects and uncertain/probabilistic ones are fundamentally different. Different from the existing models of ranking uncertain and probabilistic data, which model an object as a random variable and the instances of an object are assumed exclusive, we have to capture the coexistence of instances here. To tackle the problem, we advocate the semantics of favoring widely preferred objects instead of majority votes, which is widely used in many elections and competitions. Technically, we borrow the idea from Borda Count (BC), a well-recognized method in consensus-based voting systems. However, Borda Count cannot handle multivalued objects of inconsistent cardinality, and is costly to evaluate top k queries on large multidimensional data sets. To address the challenges, we extend and generalize Borda Count to quantile-based Borda Count, and develop efficient computational methods with comprehensive cost analysis. We present case studies on real data sets to demonstrate the effectiveness of the generalized Borda Count ranking, and use synthetic and real data sets to verify the efficiency of our computational method.