Use different privacy to enable data sharing and collaboration – Referus


Traditionally, companies have four relies on data encryption, sometimes called de-identification, to protect data privacy. The basic idea is to remove all personal identification information (PII) from each record. However, a number of high-profile cases have shown that even unidentified data can compromise consumer privacy.

In 1996, a MIT researcher identified the then governor of Massachusetts health records into a database that was allegedly hidden by comparing health records with public voter registration data. In 2006, UT Austin researchers redefined thousands of movies watched by thousands of anonymous Netflix databases in public by linking them to data from IMDB.

In the 2022 Nature article, researchers use AI to fingerprint and redirect more than half of cell phone records to so-called anonymous databases. All of these examples highlight how “side” information can be used by attackers to retrieve supposedly hidden data.

This failure has led to a split in privacy. Instead of sharing data, companies will share data processing results including random audio. The audio level is set so that the output does not tell the potential attacker anything statistically significant about the target: The same output may appear on the target site or on the exact same site but without the target. The results of processing shared data do not disclose information about anyone, which is why everyone’s privacy is maintained.

To use a different privacy, one should not start over, as any startup error can be a disaster for privacy guarantees.

Using different privacy has been a major challenge in the early days. The first applications were the emergence of big data science organizations and engineering teams such as Apple, Google or Microsoft. As technology grows exponentially and their costs decrease, how can all organizations with modern data infrastructure use the privacy of different real-world applications?

Separate privacy applies to both ratings and line level data

If the analyst is unable to access the data, it is common to use different privacy to produce different confidential ratings. Sensitive data is accessible through an API that only produces audio results that maintain privacy. This API can perform integration across the entire database, from simple SQL queries to complex machine learning training tasks.

A standard setting for using personal data with separate privacy guarantees

A standard setting for using personal data with separate privacy guarantees. Photo Credits: Sarus

Another disadvantage of this set is that, unlike data encryption methods, analysts no longer see individual records in order to “hear the data.” Another way to reduce this limitation is to provide separate confidential activity data where the data owner generates false data that mimics the mathematical features of the original database.