Ask ten people what privacy is and you'll likely get twelve different answers. The reason for the disparity is that your feelings about privacy depend on context and your experience. Privacy is not a purely technical issue, but a human one. Long before computers existed, people cared about and debated privacy. Future U.S. Supreme Court Justice Louis Brandeis defined it as "the right to be left alone" in 1890.
Before the Web became a ubiquitous phenomenon, people primarily thought of privacy in terms of government intrusion. But the march of technological progress means that private companies probably have much more data about you than your government. Ad networks and the valuation of platforms based on how well they display ads to their users has led to the wide-spread surveillance of people, usually without them knowing the extent or consequences.
The International Association of Privacy Professionals (IAPP) defines four classes of privacy:
Bodily Privacy—The protection of a person's physical being and any invasion thereof. This includes practices like genetic testing, drug testing, or body cavity searches.
Communications Privacy—The protection of the means of correspondence, including postal mail, telephone conversations, electronic mail, and other forms of communication.
Information Privacy—The claim of individuals, groups, or organizations to determine for themselves when, how, and to what extent information about them is communicated to others.
Territorial Privacy—Placing limitations on the ability of others to intrude into an individual's environment. Environment can be more than just the home, including workplaces, vehicles, and public spaces. Intrusions of territorial privacy can include video surveillance or ID checks.
While Bodily and Territorial Privacy can be issues online, Communications and Information Privacy are the ones we worry about the most and the ones most likely to have a digital identity component. To begin a discussion of online privacy, we first need to be specific about what we mean when we talk about online conversations.
Each online interaction consists of packets of data flowing between parties. For our purposes, consider that a conversation. Even a simple Internet Control Message Protocol (ICMP) echo request packet is a conversation as we're defining it—the message needn't be meaningful to humans.
Conversations have content and they have metadata—the information about the conversation. In an ICMP echo, there's only metadata—the TCP and ICMP headers1. The headers include information like the source and destination IP addresses, the TTL (time-to-live), type of message, checksums, and so on. In a more complex protocol, say SMTP for email, there would also be content—the message—in addition to the metadata.
Communication Privacy is concerned with metadata. Confidentiality is concerned with content2. Put another way, for a conversation to be private, only the parties to the conversation should know who the other participants are. More generally, privacy concerns the control of any metadata about an online conversation so that only parties to the conversation know the metadata.
Defined in this way, online privacy may appear impossible. After all, the Internet works by passing packets from router to router, all of which can see the source IP address and must know the destination IP address. Consequently, at the packet level, there's no online privacy.
But consider the use of TLS (Transport Layer Security) to create an encrypted web channel between the browser and the server . At the packet level, the routers will know (and the operators of the routers can know) the IP addresses of the encrypted packets going back and forth. If a third party can correlate those IP addresses with the actual participants, then the conversation isn't absolutely private.
But other metadata—the headers—is private. Beyond the host name and information needed to set up the TLS connection, all the rest of the headers are encrypted. This includes cookies and the URL path. So, someone eavesdropping on the conversation will know the server name, but not the specific place on the site the browser connected to. For example, suppose Alice visits Utah Valley University's Title IX office (where sexual misconduct, discrimination, harassment, and retaliation are reported) by pointing her browser at
uvu.edu/titleix. With TLS an eavesdropper could know Alice connected to Utah Valley University, but not know that she connected to web site for the Title IX office because the path is encrypted.
Extending this example, we can easily see the difference between privacy and confidentiality. If the Title IX office were located at a subdomain of
titleix.uvu.edu, then an eavesdropper would be able to tell that Alice had connected to the Title IX web site, even if the conversation were protected by a TLS connection. The content that was sent to Alice and that she sent back would be confidential, but the important metadata showing that Alice connected to the Title IX office would not be private.
This example introduces another important term to this discussion: authenticity. If Alice goes to
uvu.edu instead of
titleix.uvu.edu then an eavesdropper cannot easily establish the authenticity of who Alice is speaking to at UVU—there are too many possibilities. Depending on how easily correlated Alice's IP number is with Alice, an eavesdropper can't reliably authenticate Alice either. So, while Alice's conversation with the Title IX office through
uvu.edu is not absolutely private, it is probably private enough because we can't easily authenticate the parties to the conversation from the metadata alone.
Information Privacy, on the other hand, is distinguished from Communications Privacy online because it usually concerns content, rather than metadata. When Alice connects with the Title IX office, to extend the example from the previous paragraph, she might transmit data to the office, possibly by filling out Web forms, or even just by authenticating, allowing the Web site to identify Alice and correlate other information with her. All of this is done inside the confidential channel provided by the TLS connection. But Alice will still be concerned about the privacy of the information she's communicated.
Information privacy quickly gets out of the technical realm and into policy. How will Alice's information be handled? Who will see it? Will it be shared? With whom and under what conditions? These are all policy questions that impact the privacy of information that Alice willingly shared. Information privacy is generally about who controls disclosure.
Communications Privacy often involves the involuntary collection of metadata—surveillance. Information Privacy usually involves policies and practices for handling data that has been voluntarily provided. Of course, there are places where these two overlap. Data created from metadata becomes personally identifying information (PII), subject to privacy concerns that might be addressed by policy. Still, the distinction between Communications and Information Privacy is useful.
The intersection of Communications and Information privacy is sometimes called Transactional3 or Social Privacy. Transaction privacy is worth exploring as a separate category because transactional privacy is always evaluated in a specific context. Thus, it speaks to people's real concerns and their willingness to trade off privacy for a perceived benefit in a specific transaction. Transactional privacy concerns can be more transient.
The modern Web is replete with transactions that involve both metadata and content data. The risks of this data being used in ways that erode individual privacy are great. And because the mechanisms are obscure—even to Web professionals—people can't make good privacy decisions about the transactions they engage in. Transactional privacy is consequently an important lens for evaluating the privacy rights of people and they ways technology, policy, regulation, and the law can protect them.
With privacy, we're almost never dealing with absolutes. Absolute digital privacy can be achieved by simply never using the Internet. But that also means being absolutely cut off from online interaction. Consequently, privacy is a spectrum, and we must choose where we should be on that spectrum, taking all factors into consideration. Since confidentiality is easily achieved through encryption, we're almost always trading off privacy and authentication. More on that next week.
- ICMP packets can have data in the packet, but it's optional and almost never set.
- This distinction between privacy and confidentiality isn't often made in casual conversation where people often say they want privacy when they really mean confidentiality.
- I have seen the term "transactional privacy" used to describe the idea of people being sellers of their own data outright. That is not the sense I'm using it. I'm speaking more generally of the interactions that take place online.
Photo Credit: Hide and seek (2) from Ceescamel (CC BY-SA 4.0)