Single Intermediate Dataset Privacy Representation

The research on privacy protection in cloud, intermediate dataset privacy preserving and Priva-cy-Preserving Data Publishing (PPDP). Currently, encryption is exploited by most existing research to ensure the data privacy in cloud . Although encryption works well for data privacy in these approaches, it is necessary to encrypt and decrypt datasets frequently in many applications. Encryption is usually integrated with other methods to achieve cost reduction, high data usability and privacy protection. Investigated the data privacy problem caused by MapReduce and presented a system named Airavat which incorporates mandatory access control with differential privacy. Described a set of tools called Silverline that identifies all functionally encryptable data and then encrypts them to protect privacy. A system named Sedic which partitions MapReduce computing jobs in terms of the security labels of data they work on and then assigns the computation without sensitive data to a public cloud. The sensitivity of data is required to be labeled in advance to make the above approaches available. That combines encryption and data fragmentation to achieve privacy protection for distributed data storage with encrypting only part of datasets. We follow this line, but integrate data anonymization and encryption together to fulfill cost-effective privacy preserving. cloud computing is regarded as an ingenious combination of a series of technologies, establishing a novel business model by offering IT services and using economies of scale . Participants in the business chain of cloud computing can benefit from this novel model. Cloud customers can save huge capital investment of IT infrastructure, and concentrate on their own core business. Many companies or organizations have been migrating or building their business into cloud. Numerous potential customers are still hesitant to take advantage of cloud due to security and privacy concerns. The privacy concerns caused by retaining intermediate datasets in cloud are important but they are paid little attention. Storage and computation services in cloud are equivalent from an economical perspective because they are charged in proportion to their usage. Thus, cloud users can store valuable intermediate datasets selectively when processing original datasets in data-intensive applications like medical diagnosis, in order to curtail the overall expenses by avoiding frequent re-computation to obtain these datasets . Such scenarios are quite common because data users often re-analyze results, con-duct new analysis on intermediate datasets, or share some intermediate results with others for collaboration. With-out loss of generality, the notion of intermediate dataset herein refers to intermediate and resultant datasets .The storage of intermediate data enlarges attack surfaces so that privacy requirements of data holders are at risk of being violated.