Anonymization by sequential clustering for privacy-preservation in social networks

The problem of privacy-preservation in social networks.The distributed setting in which the network data is split between several data holders. Networks are structures that describe a set of entities and the relations between them. A social network, for example, provides information on individuals in some population and the links between them, which may describe relations of friendship, collaboration, correspondence and so forth. An information network, they describe scientific publications and their citation links. In their most basic form, networks are modeled by a graph, where the nodes of the graph correspond to the entities, while edges denote relations between them. Real social networks may be more complex or contain additional information. The network would be modeled as a hyper-graph; in case where there are several types of interaction, the edges would be labeled; or the nodes in the graph could be accompanied by attributes that provide demographic information such as age,gender, location or occupation which could enrich and shed light on the structure of the network. A naive anonymization of the network, in the sense of removing identifying attributes like names or social security numbers from the data, is insufficient. The mere structure of the released graph may reveal the identity of the individuals behind some of the nodes. The targets who are connected to this subgraph are re-identified and the edges between them are disclosed. Even less sophisticated adversaries may use prior knowledge of some property of their target nodes. The sequential clustering algorithm for k-anonymizing tables was presented in. It was shown there to be a very efficient algorithm in terms of runtime as well as in terms of the utility of the output anonymization. We proceed to describe an adaptation of it for anonymizing social networks. The methods of the first category provide k-anonymity via a deterministic procedure of edge additions or deletions. In those methods it is assumed that the adversary has a background knowledge regarding some property of its target node, and then those methods modify the graph so that it becomes k-anonymous with respect to that assumed property. The methods of the second category add noise to the data, in the form of random additions, deletions or switching of edges, in order to prevent adversaries from identifying their target in the network,or inferring the existence of links between nodes. The methods of the third category do not alter the graph data like the methods of the two previous categories; instead, they cluster together nodes into super-nodes of sizeat least k, where k is the required anonymity parameter, and then publish the graph data in that coarse resolution. The study of anonymizing social networks has concentrated so far on centralized networks, i.e., networks that are held by one data holder.