The Revolution Will Not Be Reading Your Email

April 6th, 2010

4

Information is useful. Historically, information dominance has built empires and colonial corporations. When the war path of the mongol armies under Genghis Khan could travel as fast as the information announcing their arrival, everything in their path was helpless. Today, unprecedented amounts of information is tied up in a complex mess of parties with various access and capability to interpret it, but by centralizing information on the internet, companies like Google have been able to derive enormous profits from such subtitles as displaying messages in the sidebar of people’s email clients. The popularization of the internet has changed many things about modern life, and will continue to. Consider that today, the average American spends 8 hours a day in front of a screen of some sort! With so much energy and labor being filtered through ubiquitous electronic systems that are globally interconnected, information about nearly anything that is important to people’s lives could be centralized into giant data warehouses and analyzed. Both the computing resources and the infrastructure to gather such data not only exist, but are readily available at many different levels to larger corporations and governments.

On a typical day, my every movement and financial transaction are recorded in computable form: records from cell phone towers are capable of inferring my location, credit cards and point of sale systems are capable of recording exactly what I bought from whom and when. If I’m not careful, Facebook and my email provider might have access to my innermost thoughts. The only way I have any privacy from such anonymous and distant eyes is that my information is disparate, disorganized, and the organizations that control access to it are balkanized. However, this balkanization and disorganization could no doubt be entirely reversed with the technology and social infrastructure that the next decade will bring. From a hacker’s perspective, it’s hard to see this degree of information availability — and particularly the discrepancy between the information available to shadowy private entities and that available to individuals or the general public — to be even remotely sustainable. The power invested in this information segregation could be used for anything from sophisticated mass-media brainwashing to outright oppression by governments , and yet the actual documented incidents of abuses have been important but relatively few.

Perhaps the infrequency or lack of wide-reaching press coverage of abuses has lead the public to say, “So what?” There’s obvious value in sending messages online and using cell phones and credit cards, and the cost of information dispersal to the individual is relatively low. Short of imagining some apocalyptic scenario like 1984, another holocaust or an invasion by techno-mongols, what are the potential adverse affects of wide availability of volumes of personal information? I find that people tend to envision the consequences of global centralized consumer surveillance or “Google reading their email” as primarily negative, but this has so far failed to affect consumer’s decisions regarding privacy. Grocery stores are still effective in driving adoption of “club cards” that provide higher quality consumer data in exchange for significant discounts to card-holders. Consumers either do not think about why grocery stores are willing to reduce their marginal revenue by 5% to get higher quality data, or they are consciously making a decision that benefits them personally at a cost to consumers as a whole and to themselves in the long term: stores will use this data to drive a harder bargain against consumers by adjusting prices around realizations about what consumers are willing to pay.

Hackers tend to think of Google as a software company, but their success in the market has been a direct result of seizing the opportunity to buy online consumer data on a massive scale at wholesale prices, at a time when the consumer data market was underdeveloped, and the data itself undervalued. Google’s business is organizing the world’s information in more ways than internet search. And the world is still changing. While a grocery store has no incentive to share its customer data with other grocery stores at reasonable cost, Facebook has plenty of incentive to sell consumer data on a developing, open market. The market for user data is emerging, and the market for web analytics (traffic) data alone has already reached over $1 billion of transactions per year. Certainly at Border Stylo, the trend does not go unnoticed.

Of course, in the context of all this, we’re trying to do better for our users and the world. Border Stylo has adopted as part of its mission to “avoid” making a business model based on advertising to our users (precluding ourselves on principle from this more than $20 billion market). More recently, upon seeing the scope of potential impact of eroding online privacy, we adopted a similar mandate to respect the privacy of our users and lead the market in software that enables user privacy. My team is currently conducting research into how we can navigate these narrow straits. Our decision to support privacy is based on a combination of factors.

First and simply, we want to love our jobs and feel like we’re doing something good for the world, and we want to see a world in which users have access to informed methods of making decisions about sharing their marketable data. Second, we believe that while user data continues to be undervalued, market forces and consumer awareness will sometime soon wake up users into appropriate valuation of their private data, at which point existing privacy-agnostic infrastructure will suddenly become a lot less cost-effective to maintain, and the currently dominant brands will become sullied for siding against consumers in the privacy debate 1 , 2 , 3 .

That said, we know life is often not as easy as simply deciding to do the right thing. Here at Border Stylo, we are questioning the current norms in the privacy realm and evaluating our technological capabilities. As an engineer, it’s tempting to envision perfect solutions to problems, but in a world where we will do no good if our business doesn’t survive and engineering costs money, it pays to prioritize the kinds of privacy and data security that users will benefit most from, that hurt us the least. We have to think clearly about what online privacy is, and what we are trying to do for consumers by enabling it.

As a conscious consumer, I can group my own concerns about privacy into a few categories.

The most acute form of privacy violation occurs when a party I’m dealing with mediates a leakage of my information to a third party, against my intentions, that could adversely affect me, such as a breech of confidentiality. Examples of this obvious form of privacy violation abound. Employers reading Facebook pages to find embarrassing drunk pictures that cause them to pass on a hire is an acute privacy violation, as was the recent privacy scandal with Google Buzz . The Chinese government jailing journalists with the assistance of yahoo is another salient historical example. Consumers in the US often argue that they have “nothing to hide” and therefore are immune to this kind of privacy violation, but this attitude is nearly tantamount to saying that they have no now or future adversaries or adversarial relationships. I wish I could be so assured.

There is another flavor of collateral acute privacy violation. There is a prevailing attitude among service providers that it’s OK to use whatever data they have available as long as they do not leak data to unintended parties. But what if a customer is in an adversarial relationship with the service provider? All transactions on a market are inherently adversarial on some level. Besides, what if the service provider is Google, who has competitors all over the software industry. Shall people in this industry not use Google services because they are competitors? In cases like this, I wonder what of Google’s services are covered by the Electronic Communications Privacy Act . Are calendar entries and Google documents afforded protection?

Most of this post however has been centered around innocuous and ubiquitous privacy violation, where I leak information to parties that I don’t know or can’t identify that may adversely affect me, but not in ways I’m likely to be able to attribute to specific sources. The phrase “we may share non-personally-identifiable information” is a hallmark of this sort of privacy violation. In this case, consumers in the aggregate are leaking data, to the detriment of consumers in the aggregate, without much oversight or control. When individual consumers are given a benefit for doing this (such as free email hosting or access to a social networking site), this can even turn into a tragedy of the commons situation.

What can we do?

Service providers like Border Stylo could potentially have a lot of power over privacy settings. Safeguards against acute privacy violation are already in place to various degrees, because they are clearly bad for business, but until consumer awareness grows, it won’t be cost-effective for businesses to stop reading innocuous data about consumers. Nonetheless, there may be some things we can do to improve the situation and educate users at the same time.

First, we can try to raise the bar of privacy configurability. We can give users fine-grained control over privacy settings, recognizing that there may be some tradeoffs between privacy and functionality. Search is a classic example: typically, we cannot index content for search without first programmatically reading it. Educating users that they are releasing data not only to their friends, but also to us, and giving users better understanding and control over what exact data they are releasing goes hand in hand with this approach.

Second, we can put an important check on ourselves by allowing users to “own” their data, including the data we innocuously record about them (to the extent permissible by existing computer surveillance laws in the US, which mandate service providers to record certain data for wiretapping and subpoena). What this would mean is that users can reserve the right to revoke our use of their data if for whatever reason they find us having it to be detrimental to them. This policy would prevent wholesale of our data to third parties because we would be giving away (irrevocably) what we don’t own, but third party data customers could still ask us specific questions that didn’t violate our users’ notion of ownership or revocability.

Finally, I can at least issue this futile appeal to consumers. Facebook generates revenue on the order of $1/user-year from users. Is keeping your privacy and getting rid of the ads really worth less than $1/year to you? Just pay for it. Our business plan would be golden if users were willing to pay us $10/year to use a service without ads and with strong guarantees about privacy and data security. Unfortunately, due to poor consumer education, that market for products like this doesn’t exist. Not yet, anyway.

Tagged with: privacy, security, google, facebook, china, data products

Related Posts

Author

Mitchell Johnson

Small

Mitchell is a data products engineer. He likes scheme, physics and mathematical biology and speaks (in code) for the trees.

Tags

API Aardvark Athletes AutoCAD AutoLISP Avinash Kaushik Barrelfish Box Shadows CSS3 Calculus Careers Catalysts Community Community Conferences/Conventions Conferences/Conventions Cross Browser Culture Degrading Digital Footprints Evernote Front End Development Gaming Geek Culture Glass Gradients HR HTML Haskell Holidays IPv4 IPv6 IgniteLA Ignorance Innovative Interactions Kanban Knowledge LEGO Lomography Los Angeles Martha Stewart Movies Multikernel Music NBA Photoshop QA Resolutions Rounded Corners SGML Scheme Scriptability Social Fresh Software Development Sports Stereomood Swag Unix Videos Web Standards World Cup 2010 advice agile ajax apps beta beta testing beta versions bloggers brands browser cache caching call/cc challenges china chrome cold call comet communication community management company pages computation connectivity continuations control-structures copyleft copyright coroutines creative workspaces creativity critiques css cucumber cursors customer service customer support data products design designers dynamic code economy entrepreneur entrepreneurs exceptions extension facebook feed firefox franken post gadgets generators google greasemonkey grid system http humanization influencers innovation intellectual property internet iphone jQuery javascript job search job-hunting jobs lambda lamp marketing markov chain martinis monetization strategies mottos mst3k networking new technology open source software partner passion patent phone plugin privacy productivity products programming languages protocol pure-function quality assurance readability remote pair programming resumes tips rspec ruby ruby on rails scalability screencast security servers social media software engineering sponsors start-ups state syntax taxes team members terminology test threads tips tools turing machine type theory types typography unicycling user experience user stories vidcon web development webspider xbl youtube zappos

4 Comments Leave a comment

5 months ago

Great article, Mitchell!

All the references were especially useful, since some of this stuff just seem too crazy to be true until you see the headlines. Here are few more links:

http://infochimps.org/search?query=myspace — will give you a feel for the current market for this kind of data.

http://www.thelantern.com/campus/employers—nix-more-and-more-job-applicants-after-reviewing—ssocial-networking-sites-1.1304612 — is a recent story about the use of social networking data in background checks.

Reply to comment

Anonymous
5 months ago

Good article. I know I’m certainly interested in products that have strong guarantees about privacy and data security even if it costs a little more – although I can’t think of a single product I use often that has them, what a shame.

Hopefully consumers will get educated and outraged and privacy will become more of a concern and selling point.

Reply to comment

Mitchell
4 months ago
Anonymous

What about ssh?!!

Reply to comment

about 1 month ago

Great article! I think that privacy concerns are in fact very important, though I wonder whether you’re considering all aspects of the issue.

In particular, I think users are inclined to make certain assumptions about privacy based on what a platform “feels” like. For example, I don’t post confidential stuff on Facebook because Facebook feels like a toy; however, I frequently send confidential stuff over GMail because GMail feels like a solid business tool.

These perceptions are probably ill-founded, of course. I could easily write a FB note and lock its permissions down, thus being reasonably confident that nobody would, now or in the future, be able to see it without being my friend. And I know for a fact that Google reads my e-mail to put up annoying and often non-sequitur ads, and that they index it for searching.

You mentioned an interesting pricing model around paying for confidentiality. I think that certain users might be willing to do this, but in the aggregate the situation becomes a prisoner’s dilemma (for more than two parties). I use a shopper’s card to get discounts, but it is still in my best interest to do so, since whatever anonymity I might add to the aggregate data by not using one is noise next to the benefits I get by using it.

Another case of this same paradox is corporate advertising. Someone did a study indicating that in a competitive market, the net effect of all companies launching ad campaigns is zero; that is, it doesn’t change the sales of any company relative to the others if all of them advertise. However, any company that doesn’t advertise as much as the others will lose revenue. The obvious solution of “let’s all stop advertising and save money” doesn’t arise for the same reason that price gouging and other forms of corporate collusion are relatively rare — the game theory rewards local instead of global optimization.

I also wonder about the value of information in an economy where there is so much of it, though that’s another topic altogether.

Reply to comment

Leave a comment

Anonymous
Right now

Your comment preview

Reply to comment





Incorrect please try again
Enter the words above: Enter the numbers you hear:
If you are not able to read this, you can get another image or hear it
Want to see an image again?

Allowed Tags

_emphasis_
*strong*
??citation??
-deleted text-
+inserted text+
^superscript^
~subscript~
@code@

Add code using a GIST
gist: gistid