A Database Aggregate Privacy Guarantee

It is often the case that the data published by an organization or company are too detailed to expect attackers to have accurate partial knowledge. Still, an attacker might have some aggregate or abstract knowledge of a record. Examples like this often arise in practice. For example IMIS is currently involved in anonymizing tax related data from Greece. Each record in this data collection has hundreds of fields, that trace financial activity often in a very detailed level. When
publishing or sharing such data, we expect that the major threats come from attackers who will be able to identify records using aggregate knowledge e.g. total taxable income and not the exact values of fields that are hard to acquire as background knowledge, e.g., exact sum of expenses in
agricultural financial activities. The same case can appear in several other application areas; when publishing movement data an attacker might know how long a trip took, but not detailed information on the duration of each stop. Also when publishing medical data, an attacker might know a previous diagnosis for a patient that corresponds to a certain value range on a combination of indicators, but it is unlikely that he can have exact partial knowledge about exam results. Anonymizing such data under a traditional anonymization framework would guarantee privacy, but it would cause unnecessary distortion on the data, since we need only to create groups of similar records with respect to the abstract knowledge of the attacker (an aggregate function over the record fields). Consider the motivating example of Table I which contains tax data of individuals. A realistic scenario is that an attacker may only know the approximate total income of a target, but not more detailed information. Thus, it is not the exact values of attributes that act as quasi-identifiers, but the aggregate information on them.