One of the most common questions I get is “if I can bank online, why can’t I vote online”. A recently released (but undated) document ”Supplement to Authentication in an Internet Banking Environment” from the Federal Financial Institutions Examination Council addresses some of the risks of online banking. Krebs on Security has a nice writeup of the issues, noting that the guidelines call for 'layered security
programs' to deal with these riskier transactions, such as:
- methods for detecting transaction anomalies;
- dual transaction authorization through different access devices;
- the use of out-of-band verification for transactions;
the use of 'positive pay' and debit blocks to appropriately limit
the transactional use of an account;
'enhanced controls over account activities,' such as transaction
value thresholds, payment recipients, the number of transactions
allowed per day and allowable payment days and times; and
- ’enhanced customer education to increase awareness of the fraud
risk and effective techniques customers can use to mitigate the
[I've replaced bullets with numbers in Krebs’ posting in the above list to make it
easier to reference below.]
So what does this have to do with voting? Well, if you look at them
in turn and consider how you'd apply them to a voting system:
One could hypothesize doing this - if 90% of the people in a
precinct vote R or D, that's not a good sign - but too late to do
much. Suggesting that there be personalized anomaly detectors (e.g.,
"you usually vote R but it looks like you're voting D today, are you
sure?") would not be well received by most voters!
- This is the focus of a lot of work – but it increases the effort for the voter.
Same as #2. But have to be careful that we don't make it too hard
for the voter! See for example SpeakUp: Remote Unsupervised Voting as an example of how this might be done.
- I don't see how that would apply to voting, although in places like Estonia where you’re allowed to vote more than once (but only the last vote counts) one could imagine limiting the number of votes that can be cast by one ID. Limiting the number of votes from a single IP address is a natural application – but since many ISPs use the same (or a few) IP addresses for all of their customers thanks to NAT, this would disenfranchise their customers.
“You don't usually vote in primaries, so we're not going to let you
vote in this one either." Yeah, right!
This is about the only one that could help - and try doing it on
the budget of an election office!
Unsaid, but of course implied by the financial industry list is that the goal is to reduce fraud to a manageable level. I’ve heard that 1% to 2% of the online banking transactions are fraudulent, and at that level it’s clearly not putting banks out of business (judging by profit numbers). However, whether we can accept as high a level of fraud in voting as in banking is another question.
None of this is to criticize the financial industry’s efforts to improve security! Rather, it’s to point out that try as we might, just because we can bank online doesn’t mean we should vote online.
This morning, the Supreme Court agreed to hear an appeal next term of United States v. Jones (formerly United States v. Maynard), a case in which the D.C. Circuit Court of Appeals suppressed evidence of a criminal defendant's travels around town, which the police collected using a tracking device they attached to his car. For more background on the case, consult the original opinion and Orin Kerr's previous discussions about the case.
No matter what the Court says or holds, this case will probably prove to be a landmark. Watch it closely.
(1) Even if the Court says nothing else, it will face the constitutionally of the use by police of tracking beepers to follow criminal suspects. In a pair of cases from the mid-1980's, the Court held that the police did not need a warrant to use a tracking beeper to follow a car around on public, city streets (Knotts) but did need a warrant to follow a beeper that was moved indoors (Karo) because it "reveal[ed] a critical fact about the interior of the premises." By direct application of these cases, the warrantless tracking in Jones seems constitutional, because it was restricted to movement on public, city streets.
Not so fast, said the D.C. Circuit. In Jones, the police tracked the vehicle 24 hours a day for four weeks. Citing the "mosaic theory often invoked by the Government in cases involving national security information," the Court held that the whole can sometimes be more than the parts. Tracking a car continuously for a month is constitutionally different in kind not just degree from tracking a car along a single trip. This is a new approach to the Fourth Amendment, one arguably at odds with opinions from other Courts of Appeal.
(2) This case gives the Court the opportunity to speak generally about the Fourth Amendment and location privacy. Depending on what it says, it may provide hints for lower courts struggling with the government's use of cell phone location information, for example.
(3) For support of its embrace of the mosaic theory, the D.C. Circuit cited a 1989 Supreme Court case, U.S. Department of Justice v. National Reporters Committee. In this case, which involved the Freedom of Information Act (FOIA) not the Fourth Amendment, the Court allowed the FBI to refuse to release compiled "rap sheets" about organized crime suspects, even though the rap sheets were compiled mostly from "public" information obtainable from courthouse records. In agreeing that the rap sheets nevertheless fell within a "personal privacy" exemption from FOIA, the Court embraced, for the first time, the idea that the whole may be worth more than the parts. The Court noted the difference "between scattered disclosure of the bits of information contained in a rap-sheet and revelation of the rap-sheet as a whole," and found a "vast difference between the public records that might be found after a diligent search of courthouse files, county archives, and local police stations throughout the country and a computerized summary located in a single clearinghouse of information." (FtT readers will see the parallels to the debates on this blog about PACER and RECAP.) In summary, it found that "practical obscurity" could amount to privacy.
Practical obscurity is an idea that hasn't gotten much traction in the Courts since National Reporters Committee. But it is an idea well-loved by many privacy scholars, including myself, for whom it helps explain their concerns about the privacy implications of data aggregation and mining of supposedly "public" data.
The Court, of course, may choose a narrow route for affirming or reversing the D.C. Circuit. But if it instead speaks broadly or categorically about the viability of practical obscurity as a legal theory, this case might set a standard that we will be debating for years to come.
In my research on privacy problems in PACER, I spent a lot of time examining PACER documents. In addition to researching the problem of "bad" redactions, I was also interested in learning about the pattern of redactions generally. To this end, my software looked for two redaction styles. One is the "black rectangle" redaction method I described in my previous post. This method sometimes fails, but most of these redactions were done successfully. The more common method (around two-thirds of all redactions) involves replacing sensitive information with strings of XXs.
Out of the 1.8 million documents it scanned, my software identified around 11,000 documents that appeared to have redactions. Many of them could be classified automatically (for example "123-45-xxxx" is clearly a redacted Social Security number, and "Exxon" is a false positive) but I examined several thousand by hand.
Here is the distribution of the redacted documents I found.
|Type of Sensitive Information||No. of Documents|
|Social Security number||4315|
|Bank or other account number||675|
|Date of birth||290|
|Unique identifier other than SSN||216|
|Name of person||129|
|Phone, email, IP address||60|
|National security related||26|
To reiterate the point I made in my last post, I didn't have access to a random sample of the PACER corpus, so we should be cautious about drawing any precise conclusions about the distribution of redacted information in the entire PACER corpus.
Still, I think we can draw some interesting conclusions from these statistics. It's reasonable to assume that the distribution of redacted sensitive information is similar to the distribution of sensitive information in general. That is, assuming that parties who redact documents do a decent job, this list gives us a (very rough) idea of what kinds of sensitive information can be found in PACER documents.
The most obvious lesson from these statistics is that Social Security numbers are by far the most common type of redacted information in PACER. This is good news, since it's relatively easy to build software to automatically detect and redact Social Security numbers.
Another interesting case is the "address" category. Almost all of the redacted items in this category—393 out of 449—appear in the District of Columbia District. Many of the documents relate to search warrants and police reports, often in connection with drug cases. I don't know if the high rate of redaction reflects the different mix of cases in the DC District, or an idiosyncratic redaction policy voluntarily pursued by the courts and/or the DC police but not by officials in other districts. It's worth noting that the redaction of addresses doesn't appear to be required by the federal redaction rules.
Finally, there's the category of "trade secrets," which is a catch-all term I used for documents whose redactions appear to be confidential business information. Private businesses may have a strong interest in keeping this information confidential, but the public interest in such secrecy here is less clear.
To summarize, out of 6208 redacted documents, there are 4315 Social Security that can be redacted automatically by machine, 449 addresses whose redaction doesn't seem to be required by the rules of procedure, and 419 "trade secrets" whose release will typically only harm the party who fails to redact it.
That leaves around 1000 documents that would expose risky confidential information if not properly redacted, or about 0.05 percent of the 1.8 million documents I started with. A thousand documents is worth taking seriously (especially given that there are likely to be tens of thousands in the full PACER corpus). The courts should take additional steps to monitor compliance with the redaction rules and sanction parties who fail to comply with them, and they should explore techniques to automate the detection of redaction failures in these categories.
But at the same time, a sense of perspective is important. This tiny fraction of PACER documents with confidential information in them is a cause for concern, but it probably isn't a good reason to limit public access to the roughly 99.9 percent of documents that contain no sensitive information and may be of significant benefit to the public.
Thanks again to Carl Malamud and Public.Resource.Org for their support of my research.
When Brazilian president Dilma Roussef visited China in the beginning of May, she came back with some good news (maybe too good to be entirely true). Among them, the announcement that Foxconn, the largest maker of electronic components, will invest US$12 billion to open a large industrial plant in the country. The goal is to produce iPads and other key electronic components locally.
The announcement was praised, and made it quickly to the headlines of all major newspapers. There is certainly reason for excitement. Brazil lost important waves of economic development, including industrialization (which only really happened in the 1940´s), or the semiconductor wave, an industry that has shown but a few signs of development in the country until now. (continue reading)
The president´s news also included the announcement that Foxconn would hire 100 thousand employees for the new plant, being 20% of them engineers. The numbers raised skepticism, for various reasons. Not only they seem exaggerated, but Brazil simply does not have 20,000 engineers available for hire. In 2008, the number of engineers in the country was 750,000 and the projection is that if growth rates continue at the same level, a deficit deficit in engineers is expected for the next years.
The situation increases the pressure over universities to train engineers and also to cope with the demands of development and innovation. This is a complex debate, but it is worth focusing on one aspect of the Brazilian university system: its isolation from the rest of the world. In short, Brazilian universities, both in terms of students and faculty, are almost entirely made of Brazilians. As an example, at the University of Sao Paulo (USP), the largest and most important university in the country, only 2,8% of a total 56,000 students are international international. In most other universities the number of international students tend to be even smaller. Regarding faculty, the situation is not different. There have been a few recent efforts by some institutions (mostly private) to increase the number of international professors. But there is still a long way to go.
The low degree of internationalization is already causing problems. For instance, it makes it difficult for Brazilian universities to score well on world ranks. By way of example, no Brazilian university has ever been included in the top 200 universities of the Times Higher Education World Ranking, a ranking that pays especial attention to internationalization efforts.
Even if rankings might not be the main issue, the fact that the university system is essentially inward-looking indeed creates problems, making it harder for innovation. For instance, many of Foxconn's new plant engineers might end up being hired abroad. If some sort of integration is not established with Brazilian universities, that will consist of a missed opportunity for transferring technology or developing local capacity.
The challenges of integrating such a large operation with universities are huge. Even for small scale cooperation, it turns out that the majority of universities in Brazil are unprepared to deal with international visitors, either students or faculty. For an international professor to be formally hired by a local university, she will have in most cases have to validate her degree in Brazil. The validation process can be Kafkian, requiring lots o paperwork (including "sworn translations") and time, often months or years. This poses a challenge not only for professors seeking to teach in Brazil, but also to Brazilian who obtained a degree abroad and return home. Local boards of education do not recognize international degrees, regardless if they have been awarded by Princeton or the Free University of Berlin. Students return home formally with the same academic credentials they had before obtaining a degree abroad. The market often recognize the value of the international degrees, but the the university system does not.
The challenges are visible also at the very practical level. Most of universities do not have an office in charge of foreign admissions or international faculty or students. Many professors who venture into the Brazilian university system will go through the process without formal support, counting on the efforts and enthusiasm of local peer professors who undertake the work of dealing with the details of the visit (obtaining a Visa, work permit, or the long bureaucratic steps to get the visitor’s salary actually being paid).
The lack of internationalization is bad innovation. As pointed out by Princeton’s computer science professor Kai Li during a recent conference on technology cooperation between the US and China organized by the Center for Information Technology Policy, the presence of international students and faculty in US universities has been crucial for innovation. Kai emphasizes the importance of maintaining an ecosystem for innovation, which not only attracts the best students to local universities, but help retain them after graduation. Many will work on research, create start-ups or get jobs in the tech industry. The same point was made recently by Lawrence Lessig at his recent G8 talk in France, where he claimed that a great deal of innovation in the US was made by “outsiders”.
Another important aspect of the lack of internationalization in Brazil is the lack of institutional support. Government funding organizations, such as CAPES, CNPQ, Fapesp and others, play an important role. But Brazil still lacks both public and private institutions aimed specifically at promoting integration, Brazilian culture and international exchange (along the lines of Fulbright, the Humboldt Foundation, or institutes like Cervantes, Goethe or the British Council).
As mentioned by Volker Grassmuck, a German media studies professor who spent 18 months as a researcher at the University of Sao Paulo: “The Brazilian funding institutions do have grants for visiting researchers, but the application has to be sent locally by the institution. In the end of my year in Sao Paulo I applied to FAPESP, the research funding age of the Sao Paulo state, but it did not work out, since my research group did not have a research project formalized there”.
He compares the situation with German universities, saying that “when I started teaching at Paderborn University which is a young (funded in 1972) mid-sized (15.000 students) university in a small town, the first time I walked across campus, I heard Indian, Vietnamese, Chinese, Arabic, Turkish and Spanish. At USP during the entire year I never heard anything but Portuguese”. (see Volker's full interview below)
Of course any internationalization process at this point has to be very well planned. In Brazil, 25% of the universities are public and 75% private. There is still a huge deficit of places for local students, even with the university population growing quite fast in the past 6 years. In 2004 Brazil had 4,1 milllion university students. In 2010, the number reached 6,5 million. However, only 20% of young students in Brazil find a place at the university system, different from the 43% in Chile or 61% in Argentina. The country still struggles to provide access to its own students at universities. But at the same time, the effort of internationalization should not be understood as competing with expanding access. The challenge for Brazil is actually to do both things at the same time: expanding access to local students, and promoting internationalization. If Brazil wants to play a role as an important emerging economy, that´s the way to go (no one said it would be easy!). One thing should not exclude the other.
In this sense, João Victor Issler, an economics professor at EPGE (the Graduate School of Economics at Fundação Getulio Vargas), has a pragmatic view about the issue. He says: “inasmuch as Brazil develops economically, it will inexorably increase the openness of the university system. I am not saying that there should not be specific initiatives to increase internationalization, but an isolated process will be limited. More important than the internationalization of students and faculty is opening the economy to commerce and finance, a process that will directly affect long-term economic development and all its variables: education, innovation and the work force”. João Victor´s point is important. If internationalization follows development, there is already some catch up to do. The country has developed significantly in the past 16 years, but that has not corresponded to any significant improvement in the internationalization of universities.
A few strategies might help achieving more openness on the part of Brazilian universities, without necessarily competing with the goal of expanding access to local students. One of them is the use ICT´s for international collaboration. Another is providing support to what is already working. But there is more that could be done to improve internationalization. Here is a short list:
a) Development organizations such as the World Bank or the Interamerican Development Bank (IDB) can play an important role. Once the internationalization goal is defined, they could provide the necessary support, in partnership with local institutions.
b) Pay attention to the basics: creating specific departments to centralize support for international students and faculty. They should be responsible for the strategy, but also help with practical matters, such as Visa, travel, and coping with the local bureaucracy.
c) The majority of Brazilian universities´ websites are only in Portuguese. Even the webpage of the International Cooperation Commission at the University of Sao Paulo is mostly in Portuguese, and many of the English links are broken.
d) Increase the use of Information and Communication Technologies (ICT´s) as a tool for cooperation and for integrating students and faculty with international projects. Increasing distance learning programs and cooperation mediated by ICT´s is a no-brainer.
e) Create a prize system for internationalization projects, to be awarded every few years to the educational institution that best advanced that goal.
f) Consider a policy-effective tax break to the private sector (which might include private universities), in exchange for developing successful research centers that include an international component.
g) Brazilian organizations funding research should seek to increase support to international researchers and professors who would like to develop projects in Brazil.
h) Regional integration is the low-hanging fruit. Attracting the best students from other Latin American countries is an opportunity to kickstart international cooperation
i) Map what is already in place, identifying what is working in terms of internationalization and supporting its expansion.
j) Brazil needs an innovation research lab. Large investment packages, such as the government support to Foxconn´s new plant should include integration with universities and the creation of a public/private research center, focused on innovation.
Below are the the complete interviews with Volker Grassmuck and João Victor Issler, with their perspectives on the issue.
Interview with Volker Grassmuck
Volker is currently a lecturer at Paderborn University. He spent 18 months in Brazil as a visiting researcher affiliated with the University of Sao Paulo. His visit contributed significantly to the Brazilian copyright reform debate. He partnered with local researchers and law professors (as well as artists and NGO’s) to develop an innovative compensation system for artists, which has become part of the copyright reform debate.
1) How do you think the Brazilian Universities are prepared to receive students and professors/researchers from abroad?
I did not experience any special provisions for foreigners at USP. The inviting professor has to navigate university bureaucracy for the visiting researcher just as for any Brazilian researcher. I did experience a number of bizarre situations, but these were not specific to me, but the same for all in our research group.
E.g.: In order to receive my grant I was forced to open an account with the only bank that has an office on the USP Leste campus. The money from Ford Foundation was already there, and it was exactly the same amount that was supposed to be made over to my account at the same day of the month. But every single month had to remind the person in our group in charge of administrative issues that the money had not arrived. She would then go to the university administration to pick up a check that physically had to be carried to the bank to deposit it there. If the single person in the administration in charge was ill this would be delayed until that person came back.
Another path a foreigner can pursue is to apply for a professorship at a Brazilian university. I looked into this while I was there and got advice from a few people who had actually done this. Prerequisite would be a “revalidating” my German Ph.D. This is a long procedure, requiring originals and copies of diploma, grades etc. authenticated by the Brazilian Consulate, a copy of the dissertation, maybe even a translation into Portuguese, an examination similar to the original Ph.D. examination plus some extras (e.g. “didactics”) that you don’t have at a German university and a fee, in the case of USP, of R$ 1,530.00. In other words, Brazilian academy does not trust Free University of Berlin to issue valid Ph.Ds and requires me to essentially go through the whole Ph.D. procedure all over again. And then I would be able to take a “public competition”, which is yet another procedure unlike anything required by a German university.
2) What is the situation in the German universities? Are they prepared and/or do receive foreign students and professors/researcher?
Being German I have not experienced being a foreign student or researcher here. But here are some impressions: When I started teaching at Paderborn University which is a young (funded in 1972) mid-sized (15.000 students) university in a small town, the first time I walked across campus, I heard Indian, Vietnamese, Chinese, Arabic, Turkish and Spanish. At USP during the entire year I never heard anything but Portuguese, except in the language course where there were people from other Latin American countries, two women from Spain and one visiting researcher from the US. Staff at Paderborn is less international, but once or twice a week there is a presentation by a guest speaker from a university in Europe or beyond.
This is anecdotal, of course. I’m sure objective numbers would show a different picture. The Centrum für Hochschulentwicklung (CHE) does a regular ranking of German universities. It includes their international orientation. This year’s result: the business faculties at universities of applied science are leading with 50%. Only 35% of universities got ranked as being internationally oriented, with sociology and political sciences being the weakest. http://www.che-ranking.de/
I wonder how Brazilian universities would rank by the same standards.
c) Do you think there is a connection between innovation and foreign students at local universities?
No doubt about it. I did see an international orientation is two forms: 1. People read the international literature in the fields I’m interested in in. But without having actual people to enter into a dialogue with this often remains a reproduction or at best an application of innovations to Brazil. 2. People travel and study abroad. A few students and professors travel extensively. Some students from our group went to Bolivia, Mozambique, France during my year there. So there is a certain internationalization „from Brazil” but my overwhelming impression was that there is very little academic internationalization „of Brazil.”
Interview with João Victor Issler
Joao Victor Issler is an economics professor at the Fundacao Getulio Vargas Graduate School of Economics, who has been been closely following the recent internationalization efforts. His full bio here.
a) How do you see the presence of international students and faculty at the Brazilian universities?
The presence of of both is quite rare. There are a few isolated efforts here and there by a few groups. For example, in Economics, we have PUC-Rio (Pontifical Catholic University at Rio) in Economics and IMPA (National Institute for Pure and Applied Mathematics) who have at their masters and Ph.D. level students from Argentina, Chile, Peru etc. Our school, EPGE (FGV Graduate Scool of Economics) hires professors outside Brazil, but we do not have specific incentives for international students. Beyond Economics, I know that the University of Sao Paulo is seeking to attract international students, but it is hard to tell at what schools and how many
b) Foxcoon announced it will open a new plant in Brazil, and will hire 20,000 engineers for that. We clearly don´t have that many engineers. Do you think that the internationalization of universities could help the country to build better capacity for developing its tech-industry?
These numbers announced cannot be trusted. In any way, the general perception is that there is a deficit of engineers in Brazil. The tech-market, however, is an endogenous variable, correlated to our GDP per capita, the level of education of the working force, number of houses with access to drinkable water, infrastructure, etc. Inasmuch as Brazil develops economically, it will inexorably increase the openness of the university system. I am not saying that there should not be specific initiatives to increase internationalization, but an isolated process will be limited. More important than the internationalization of students and faculty is opening the economy to commerce and finance, a process that will directly affect long-term economic development and all its variables: education, innovation and the work force.
c) In other countries, there are institutions such as the Goethe Institute, or the Humboldt Foundation in Germany, that end up attracting international talents. The same goes for the US, with the Fulbright program. Why not in Brazil?
Germany and other European countries face problems due to the shape of their age demographic pyramid, whose base is small compared to the top. They have a better capacity to offer places in the university, that go beyond German students. Thus, it is possible to attract international students, in order to fill the present capacity. It is hard to say how this structure will evolve. They might reduce the installed capacity, or increase the search for international students. And they are looking for Brazilian students, for instance, especially engineers. Generally, developed countries tend to attract good students (and wealthier) than the developing countries, what explains this movement towards Germany, the US or Canada. To me, the US are the most important model regarding the higher education industry. In the beginning of the 20th Century, there were already many Japanes and Chinese students at universities in the US and Europe. With the development of Japan, this movement decreased in the end of the Century. Brazil today (for instance, the University of Sao Paulo) attrachs a few good students from Latin America. And it could attract more if we develop faster than the rest of the region. In Brazil, CAPES (for which I was an advisor until recently) plays a similar role than the institutions you mentioned. They are engaged in several bilateral agreements for students and professors. This openness is certainly positive. For students and professors, it is important to consider the hierarchy and quality: the best students tend to go to the US and Europe. We end up with the midle, and others go to countries where the development level is lower. As I mentioned, I don’t believe it is possible to change this pattern unilaterally, unless we want to apply huge public resources on that. In my view, it is not a priority, given the current levels of subsidies already applied to higher education in comparison with fundamental education in Brazil.
d) In your opinion, and considering the experience of EPGE, what are the advantages or disadvantages of increasing interationalization at Brazilian universities? Would that reduce space for Brazilians?
Increasing the universe of choice always improves the final results. Therefore, I see only advantages and I don’t see how we can be against internationalization. However, as I mentioned, I believe that an unilateral process will be limited to change higher education in Brazil (and also its impact on innovation and technology). Openning universities might not reduce the places for Brazilians, provided it is an organized and planned movement, correlated to our development level. If it is unilateral, then there can be indeed a loss for Brazilian students and professors.
e) Finally, do you see a relation between innovation and the internationalization of universities?
Yes, I do think the relation is positive between the two variables, but I don’t think it is possible to take any of them as isolated variables.
Earlier this week, Facebook expanded the roll-out of its facial recognition software to tag people in photos uploaded to the social networking site. Many observers and regulators responded with privacy concerns; EFF offered a video showing users how to opt-out.
Tim O'Reilly, however, takes a different tack:
Face recognition is here to stay. My question is whether to pretend that it doesn't exist, and leave its use to government agencies, repressive regimes, marketing data mining firms, insurance companies, and other monolithic entities, or whether to come to grips with it as a society by making it commonplace and useful, figuring out the downsides, and regulating those downsides.
...We need to move away from a Maginot-line like approach where we try to put up walls to keep information from leaking out, and instead assume that most things that used to be private are now knowable via various forms of data mining. Once we do that, we start to engage in a question of what uses are permitted, and what uses are not.
O'Reilly's point --and face-recognition technology -- is bigger than Facebook. Even if Facebook swore off the technology tomorrow, it would be out there, and likely used against us unless regulated. Yet we can't decide on the proper scope of regulation without understanding the technology and its social implications.
By taking these latent capabilities (Riya was demonstrating them years ago; the NSA probably had them decades earlier) and making them visible, Facebook gives us more feedback on the privacy consequences of the tech. If part of that feedback is "ick, creepy" or worse, we should feed that into regulation for the technology's use everywhere, not just in Facebook's interface. Merely hiding the feature in the interface, while leaving it active in the background would be deceptive: it would give us a false assurance of privacy. For all its blundering, Facebook seems to be blundering in the right direction now.
Privacy-invasive technology and the limits of privacy-protection should be visible. Visibility feeds more and better-controlled experiments to help us understand the scope of privacy, publicity, and the space in between (which Woody Hartzog and Fred Stutzman call "obscurity" in a very helpful draft). Then, we should implement privacy rules uniformly to reinforce our social choices.
Today, Joe Calandrino, Ed Felten and I are releasing a new result regarding the anonymity of fill-in-the-bubble forms. These forms, popular for their use with standardized tests, require respondents to select answer choices by filling in a corresponding bubble. Contradicting a widespread implicit assumption, we show that individuals create distinctive marks on these forms, allowing use of the marks as a biometric. Using a sample of 92 surveys, we show that an individual's markings enable unique re-identification within the sample set more than half of the time. The potential impact of this work is as diverse as use of the forms themselves, ranging from cheating detection on standardized tests to identifying the individuals behind “anonymous” surveys or election ballots.
If you've taken a standardized test or voted in a recent election, you’ve likely used a bubble form. Filling in a bubble doesn't provide much room for inadvertent variation. As a result, the marks on these forms superficially appear to be largely identical, and minor differences may look random and not replicable. Nevertheless, our work suggests that individuals may complete bubbles in a sufficiently distinctive and consistent manner to allow re-identification. Consider the following bubbles from two different individuals:
These individuals have visibly different stroke directions, suggesting a means of distinguishing between both individuals. While variation between bubbles may be limited, stroke direction and other subtle features permit differentiation between respondents. If we can learn an individual's characteristic features, we may use those features to identify that individual's forms in the future.
To test the limits of our analysis approach, we obtained a set of 92 surveys and extracted 20 bubbles from each of those surveys. We set aside 8 bubbles per survey to test our identification accuracy and trained our model on the remaining 12 bubbles per survey. Using image processing techniques, we identified the unique characteristics of each training bubble and trained a classifier to distinguish between the surveys’ respondents. We applied this classifier to the remaining test bubbles from a respondent. The classifier orders the candidate respondents based on the perceived likelihood that they created the test markings. We repeated this test for each of the 92 respondents, recording where the correct respondent fell in the classifier’s ordered list of candidate respondents.
If bubble marking patterns were completely random, a classifier could do no better than randomly guessing a test set’s creator, with an expected accuracy of 1/92 ≈ 1%. Our classifier achieves over 51% accuracy. The classifier is rarely far off: the correct answer falls in the classifier’s top three guesses 75% of the time (vs. 3% for random guessing) and its top ten guesses more than 92% of the time (vs. 11% for random guessing). We conducted a number of additional experiments exploring the information available from marked bubbles and potential uses of that information. See our paper for details.
Additional testing---particularly using forms completed at different times---is necessary to assess the real-world impact of this work. Nevertheless, the strength of these preliminary results suggests both positive and negative implications depending on the application. For standardized tests, the potential impact is largely positive. Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test.
The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing---our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee’s jurisdiction to locate the employee’s ballot. This threat is more realistic in jurisdictions that release scans of ballots.
Appropriate mitigation of this issue is somewhat application specific. One option is to treat surveys and ballots as if they contain identifying information and avoid releasing them more widely than necessary. Alternatively, modifying the forms to mask marked bubbles can remove identifying information but, among other risks, may remove evidence of respondent intent. Any application demanding anonymity requires careful consideration of options for preventing creation or disclosure of identifying information. Election officials in particular should carefully examine trade-offs and mitigation techniques if releasing ballot scans.
This work provides another example in which implicit assumptions resulted in a failure to recognize a link between the output of a system (in this case, bubble forms or their scans) and potentially sensitive input (the choices made by individuals completing the forms). Joe discussed a similar link between recommendations and underlying user transactions two weeks ago. As technologies advance or new functionality is added to systems, we must explicitly re-evaluate these connections. The release of scanned forms combined with advances in image analysis raises the possibility that individuals may inadvertently tie themselves to their choices merely by how they complete bubbles. Identifying such connections is a critical first step in exploiting their positive uses and mitigating negative ones.
It’s historically been the case that papers published in an IEEE or ACM conference or journal must have their copyrights assigned to the IEEE or ACM, respectively. Most of us were happy with this sort of arrangement, but the new IEEE policy seems to apply more restrictions on this process. Matt Blaze blogged about this issue in particular detail.
The IEEE policy and the comparable ACM policy appear to be focused on creating revenue opportunities for these professional societies. Hypothetically, that income should result in cost savings elsewhere (e.g., lower conference registration fees) or in higher quality member services (e.g., paying the expenses of conference program committee members to attend meetings). In practice, neither of these are true. Regardless, our professional societies work hard to keep a paywall between our papers and their readership. Is this sort of behavior in our best interests? Not really.
What benefits the author of an academic paper? In a word, impact. Papers that are more widely read are more widely influential. Furthermore, widely read papers are more widely cited; citation counts are explicitly considered in hiring, promotion, and tenure cases. Anything that gets in the way of a paper’s impact is something that damages our careers and it’s something we need to fix.
There are three common solutions. First, we ignore the rules and post copies of our work on our personal, laboratory, and/or departmental web pages. Virtually any paper written in the past ten years can be found online, without cost, and conveniently cataloged by sites like Google Scholar. Second, some authors I’ve spoken to will significantly edit the copyright assignment forms before submitting them. Nobody apparently ever notices this. Third, some professional societies, notably the USENIX Association, have changed their rules. The USENIX policy completely inverts the relationship between author and publisher. Authors grant USENIX certain limited and reasonable rights, while the authors retain copyright over their work. USENIX then posts all the papers on its web site, free of charge; authors are free to do the same on their own web sites.
(USENIX ensures that every conference proceedings has a proper ISBN number. Every USENIX paper is just as “published” as a paper in any other conference, even though printed proceedings are long gone.)
Somehow, the sky hasn’t fallen. So far as I know, the USENIX Association’s finances still work just fine. Perhaps it’s marginally more expensive to attend a USENIX conference, but then the service level is also much higher. The USENIX professional staff do things that are normally handled by volunteer labor at other conferences.
This brings me to the vote we had last week at the IEEE Symposium on Security and Privacy (the “Oakland” conference) during the business meeting. We had an unusually high attendance (perhaps 150 out of 400 attendees) as there were a variety of important topics under discussion. We spent maybe 15 minutes talking about the IEEE’s copyright policy and the resolution before the room was should we reject the IEEE copyright policy and adopt the USENIX policy? Ultimately, there were two “no” votes and everybody else voted “yes.” That’s an overwhelming statement.
The question is what happens next. I’m planning to attend ACM CCS this October in Chicago and I expect we can have a similar vote there. I hope similar votes can happen at other IEEE and ACM conferences. Get it on the agenda of your business meetings. Vote early and vote often! I certainly hope the IEEE and ACM agree to follow the will of their membership. If the leadership don’t follow the membership, then we’ve got some more interesting problems that we’ll need to solve.
Sidebar: ACM and IEEE make money by reselling our work, particularly with institutional subscriptions to university libraries and large companies. As an ACM or IEEE member, you also get access to some, but not all, of the online library contents. If you make everything free (as in free beer), removing that revenue source, then you’ve got a budget hole to fill. While I’m no budget wizard, it would make sense for our conference registration fees to support the archival online storage of our papers. Add in some online advertising (example: startup companies, hungry to hire engineers with specialized talents, would pay serious fees for advertisements adjacent to research papers in the relevant areas), and I’ll bet everything would work out just fine.
Since we launched RECAP a couple of years ago, one of our top concerns has been privacy. The federal judiciary's PACER system offers the public online access to hundreds of millions of court records. The judiciary's rules require each party in a case to redact certain types of information from documents they submit, but unfortunately litigants and their counsel don't always comply with these rules. Three years ago, Carl Malamud did a groundbreaking audit of PACER documents and found more than 1600 cases in which litigants submitted documents with unredacted Social Security numbers. My recent research has focused on a different problem: cases where parties tried to redact sensitive information but the redactions failed for technical reasons. This problem occasionally pops up in news stories, but as far as I know, no one has conducted a systematic study.
To understand the problem, it helps to know a little bit about how computers represent graphics. The simplest image formats are bitmap or raster formats. These represent an image as an array of pixels, with each pixel having a color represented by a numeric value. The PDF format uses a different approach, known as vector graphics, that represent an image as a series of drawing commands: lines, rectangles, lines of text, and so forth.
Vector graphics have important advantages. Vector-based formats "scale up" gracefully, in contrast to the raster images that look "blocky" at high resolutions. Vector graphics also do a better job of preserving a document's structure. For example, text in a PDF is represented by a sequence of explicit text-drawing commands, which is why you can cut and paste text from a PDF document, but not from a raster format like PNG.
But vector-based formats also have an important disadvantage: they may contain more information than is visible to the naked eye. Raster images have a "what you see is what you get" quality—changing all the pixels in a particular region to black destroys the information that was previously in that part of the image. But a vector-based image can have multiple "layers." There might be a command to draw some text followed by a command to draw a black rectangle over the text. The image might look like it's been redacted, but the text is still "under" the box. And often extracting that information is a simple matter of cutting and pasting.
So how many PACER documents have this problem? We're in a good position to study this question because we have a large collection of PACER documents—1.8 million of them when I started my research last year. I wrote software to detect redaction rectangles—it turns out these are relatively easy to recognize based on their color, shape, and the specific commands used to draw them. Out of 1.8 million PACER documents, there were approximately 2000 documents with redaction rectangles. (There were also about 3500 documents that were redacted by replacing text by strings of Xes, I also excluded documents that were redacted by Carl Malamud before he donated them to our archive.)
Next, my software checked to see if these redaction rectangles overlapped with text. My software identified a few hundred documents that appeared to have text under redaction rectangles, and examining them by hand revealed 194 documents with failed redactions. The majority of the documents (about 130) appear be from commercial litigation, in which parties have unsuccessfully attempted to redact trade secrets such as sales figures and confidential product information. Other improperly redacted documents contain sensitive medical information, addresses, and dates of birth. Still others contain the names of witnesses, jurors, plaintiffs, and one minor.
PACER reportedly contains about 500 million documents. We don't have a random sample of PACER documents, so we should be careful about trying to extrapolate to the entire PACER corpus. Still, it's safe to say there are thousands, and probably tens of thousands, of documents in PACER whose authors made unsuccessful attempts to conceal information.
It's also important to note that my software may not be detecting every instance of redaction failures. If a PDF was created by scanning in a paper document (as opposed to generated directly from a word processor), then it probably won't have a "text layer." My software doesn't detect redaction failures in this type of document. This means that there may be more than 194 failed redactions among the 1.8 million documents I studied.
A few weeks ago I wrote a letter to Judge Lee Rosenthal, chair of the federal judiciary's Committee on Rules of Practice and Procedure, explaining this problem. In that letter I recommend that the courts themselves use software like mine to automatically scan PACER documents for this type of problem. In addition to scanning the documents they already have, the courts should make it a standard part of the process for filing new documents with the courts. This would allow the courts to catch these problems before the documents are made available to the public on the PACER website.
My code is available here. It's experimental research code, not a finished product. We're releasing it into the public domain using the CC0 license; this should make it easy for federal and state officials to adapt it for their own use. Court administrators who are interested in adapting the code for their own use are especially encouraged to contact me for advice and assistance. The code relies heavily on the CAM::PDF Perl library, and I'm indebted to Chris Dolan for his patient answers to my many dumb questions.
Getting Redaction Right
So what should litigants do to avoid this problem? The National Security Agency has a good primer on secure redaction. The approach they recommend—completely deleting sensitive information in the original word processing document, replacing it with innocuous filler (such as strings of XXes) as needed, and then converting it to a PDF document, is the safest approach. The NSA primer also explains how to check for other potentially sensitive information that might be hidden in a document's metadata.
Of course, there may be cases where this approach isn't feasible because a litigant doesn't have the original word processing document or doesn't want the document's layout to be changed by the redaction process. Adobe Acrobat's redaction tool has worked correctly when we've used it, and Adobe probably has the expertise to do it correctly. There may be other tools that work correctly, but we haven't had an opportunity to experiment with them so we can't say which ones they might be.
Regardless of the tool used, it's a good idea to take the redacted document and double-check that the information was removed. An easy way to do this is to simply cut and paste the "redacted" content into another document. If the redaction succeeded, no text should be transferred. This method will catch most, but not all, redaction failures. A more rigorous check is to remove the redaction rectangles from the document and manually observe what's underneath them. One of the scripts I'm releasing today, called remove_rectangles.pl, does just that. In its current form, it's probably not user-friendly enough for non-programmers to use, but it would be relatively straightforward for someone (perhaps Adobe or the courts) to build a user-friendly version that ordinary users could use to verify that the document they just attempted to redact actually got redacted.
One approach we don't endorse is printing the document out, redacting it with a black marker, and then re-scanning it to PDF format. Although this may succeed in removing the sensitive information, we don't recommend this approach because it effectively converts the document into a raster-based image, destroying useful information in the process. For example, it will no longer be possible to cut and paste (non-redacted) text from a document that has been redacted in this way.
Bad redactions are not a new problem, but they are taking on a new urgency as PACER documents become increasingly available on the web. Correct redaction is not difficult, but it does require both knowledge and care by those who are submitting the documents. The courts have several important roles they should play: educating attorneys about their redaction responsibilities, providing them with software tools that make it easy for them to comply, and monitoring submitted documents to verify that the rules are being followed.
This research was made possible with the financial support of Carl Malamud's organization, Public.Resource.Org.
Ann Kilzer, Arvind Narayanan, Ed Felten, Vitaly Shmatikov, and I have released a new research paper detailing the privacy risks posed by collaborative filtering recommender systems. To examine the risk, we use public data available from Hunch, LibraryThing, Last.fm, and Amazon in addition to evaluating a synthetic system using data from the Netflix Prize dataset. The results demonstrate that temporal changes in recommendations can reveal purchases or other transactions of individual users.
To help users find items of interest, sites routinely recommend items similar to a given item. For example, product pages on Amazon contain a "Customers Who Bought This Item Also Bought" list. These recommendations are typically public, and they are the product of patterns learned from all users of the system. If customers often purchase both item A and item B, a collaborative filtering system will judge them to be highly similar. Most sites generate ordered lists of similar items for any given item, but some also provide numeric similarity scores.
Although item similarity is only indirectly related to individual transactions, we determined that temporal changes in item similarity lists or scores can reveal details of those transactions. If you're a Mozart fan and you listen to a Justin Bieber song, this choice increases the perceived similarity between Justin Bieber and Mozart. Because similarity lists and scores are based on perceived similarity, your action may result in changes to these scores or lists.
Suppose that an attacker knows some of your past purchases on a site: for example, past item reviews, social networking profiles, or real-world interactions are a rich source of information. New purchases will affect the perceived similarity between the new items and your past purchases, possibility causing visible changes to the recommendations provided for your previously purchased items. We demonstrate that an attacker can leverage these observable changes to infer your purchases. Among other things, these attacks are complicated by the fact that multiple users simultaneously interact with a system and updates are not immediate following a transaction.
To evaluate our attacks, we use data from Hunch, LibraryThing, Last.fm, and Amazon. Our goal is not to claim privacy flaws in these specific sites (in fact, we often use data voluntarily disclosed by their users to verify our inferences), but to demonstrate the general feasibility of inferring individual transactions from the outputs of collaborative filtering systems. Among their many differences, these sites vary dramatically in the information that they reveal. For example, Hunch reveals raw item-to-item correlation scores, but Amazon reveals only lists of similar items. In addition, we examine a simulated system created using the Netflix Prize dataset. Our paper outlines the experimental results.
While inference of a Justin Bieber interest may be innocuous, inferences could expose anything from dissatisfaction with a job to health issues. Our attacks assume that a victim reveals certain past transactions, but users may publicly reveal certain transactions while preferring to keep others private. Ultimately, users are best equipped to determine which transactions would be embarrassing or otherwise problematic. We demonstrate that the public outputs of recommender systems can reveal transactions without user knowledge or consent.
Unfortunately, existing privacy technologies appear inadequate here, failing to simultaneously guarantee acceptable recommendation quality and user privacy. Mitigation strategies are a rich area for future work, and we hope to work towards solutions with others in the community.
Worth noting is that this work suggests a risk posed by any feature that adapts in response to potentially sensitive user actions. Unless sites explicitly consider the data exposed, such features may inadvertently leak details of these underlying actions.
This guest post is from Nick Doty, of the W3C and UC Berkeley School of Information. As a companion post to my summary of the position papers submitted for last month's W3C Do-Not-Track Workshop, hosted by CITP, Nick goes deeper into the substance and interaction during the workshop.
The level of interest and participation in last month's Workshop on Web Tracking and User Privacy — about a hundred attendees spanning multiple countries, dozens of companies, a wide variety of backgrounds — confirms the broad interest in Do Not Track. The relatively straightforward technical approach with a catchy name has led to, in the US, proposed legislation at both the state and federal level and specific mention by the Federal Trade Commission (it was nice to have Ed Felten back from DC representing his new employer at the workshop), and comparatively rapid deployment of competing proposals by browser vendors. Still, one might be surprised that so many players are devoting such engineering resources to a relatively narrow goal: building technical means that allow users to avoid tracking across the Web for the purpose of compiling behavioral profiles for targeted advertising.
In fact, Do Not Track (in all its variations and competing proposals) is the latest test case for how new online technologies will address privacy issues. What mix of minimization techniques (where one might classify Microsoft's Tracking Protection block lists) versus preference expression and use limitation (like a Do Not Track header) will best protect privacy and allow for innovation? Can parties agree on a machine-readable expression of privacy preferences (as has been heavily debated in P3P, GeoPriv and other standards work), and if so, how will terms be defined and compliance monitored and enforced? Many attendees were at the workshop not just to address this particular privacy problem — ubiquitous invisible tracking of Web requests to build behavioral profiles — but to grab a seat at the table where the future of how privacy is handled on the Web may be decided. The W3C, for its part, expects to start an Interest Group to monitor privacy on the Web and spin out specific work as new privacy issues inevitably arise, in addition to considering a Working Group to address this particular topic (more below). The Internet Engineering Task Force (IETF) is exploring a Privacy Directorate to provide guidance on privacy considerations across specs.
At a higher level, this debate presents a test case for the process of building consensus and developing standards around technologies like tracking protection or Do Not Track that have inspired controversy. What body (or rather, combination of bodies) can legitimately define preference expressions that must operate at multiple levels in the Web stack, not to mention serve the diverse needs of individuals and entities across the globe? Can the same organization that defines the technical design also negotiate semantic agreement between very diverse groups on the meaning of "tracking"? Is this an appropriate role for technical standards bodies to assume? To what extent can technical groups work with policymakers to build solutions that can be enforced by self-regulatory or governmental players?
Discussion at the recent workshop confirmed many of these complexities: though the agenda was organized to roughly separate user experience, technical granularity, enforcement and standardization, overlap was common and inevitable. Proposals for an "ack" or response header brought up questions of whether the opportunity to disclaim following the preference would prevent legal enforcement; whether not having such a response would leave users confused about when they had opted back in; and how granular such header responses should be. In defining first vs. third party tracking, user expectations, current Web business models and even the same-origin security policy could point the group in different directions.
We did see some moments of consensus. There was general agreement that while user interface issues were key to privacy, trying to standardize those elements was probably counterproductive but providing guidance could help significantly. Regarding the scope of "tracking", the group was roughly evenly divided on what they would most prefer: a broad definition (any logging), a narrow definition (online behavioral advertising profiling only) or something in between (where tracking is more than OBA but excludes things like analytics or fraud protection, as in the proposal from the Center for Democracy and Technology). But in a "hum" to see which proposals workshop attendees opposed ("non-starters") no one objected to starting with a CDT-style middle ground — a rather shocking level of agreement to end two days chock full of debate.
For tech policy nerds, then, this intimate workshop about a couple of narrow technical proposals was heady stuff. And the points of agreement suggest that real interoperable progress on tracking protection — the kind that will help the average end user's privacy — is on the way. For the W3C, this will certainly be a topic of discussion at the ongoing meeting in Bilbao, and we're beginning detailed conversations about the scope and milestones for a Working Group to undertake technical standards work.
Thanks again to Princeton/CITP for hosting the event, and to Thomas and Lorrie for organizing it: bringing together this diverse group of people on short notice was a real challenge, and it paid off for all of us. If you'd like to see more primary materials: minutes from the workshop (including presentations and discussions) are available, as are the position papers and slides. And the W3C will post a workshop report with a more detailed summary very soon.
As reported in Fast Company, RichRelevance and Overstock.com teamed up to offer up to a $1,000,000 prize for improving "its recommendation engine by 10 percent or more."
If You Liked Netflix, You Might Also Like Overstock
When I first read a summary of this contest, it appeared they were following in Netflix's footsteps right down to releasing user data sans names. This did not end well for Netflix's users or for Netflix. Narayanan and Shmatikov were able to re-identify Netflix users using the contest dataset, and their research contributed greatly to Ohm's work on de-anonimization. After running the contest a second time, Netflix terminated it early in the face of FTC attention and a lawsuit that they settled out of court.
This time, Overstock is providing "synthetic data" to contest entrants, then testing submitted algorithms against unreleased real data. Tag line: "If you can't bring the data to the code, bring the code to the data." Hmm. An interesting idea, but short on a few details around the sharp edges that jump out as highest concern. I look forward to getting the time to play with the system and dataset. What is good news is seeing companies recognize privacy concerns and respond with something interesting and new. That is, at least, a move in the right direction.
Place your bets now on which happens first: a contest winner with a 10% boost to sales, or researchers finding ways to re-identify at least 10% of the data?
There's more than a hint of theatrics in the draft PROTECT IP bill (pdf, via dontcensortheinternet ) that has emerged as son-of-COICA, starting with the ungainly acronym of a name. Given its roots in the entertainment industry, that low drama comes as no surprise. Each section name is worse than the last: "Eliminating the Financial Incentive to Steal Intellectual Property Online" (Sec. 4) gives way to "Voluntary action for Taking Action Against Websites Stealing American Intellectual Property" (Sec. 5).
Techdirt gives a good overview of the bill, so I'll just pick some details:
- Infringing activities. In defining "infringing activities," the draft explicitly includes circumvention devices ("offering goods or services in violation of section 1201 of title 17"), as well as copyright infringement and trademark counterfeiting. Yet that definition also brackets the possibility of "no [substantial/significant] use other than ...." Substantial could incorporate the "merely capable of substantial non-infringing use" test of Betamax.
- Blocking non-domestic sites. Sec. 3 gives the Attorney General a right of action over "nondomestic domain names", including the right to demand remedies from (A) domain name system server operators, (B) financial transaction providers, (C), Internet advertising services, and (D) "an interactive computer service (def. from 230(f)) shall take technically feasible and reasonable measures ... to remove or disable access to the Internet site associated with the domain name set forth in the order, or a hypertext link to such Internet site."
- Private right of action. Sec. 3 and Sec. 4 appear to be near duplicates (I say appear, because unlike computer code, we don't have a macro function to replace the plaintiff, so the whole text is repeated with no
diff), replacing nondomestic domain with "domain" and permitting private plaintiffs -- "a holder of an intellectual property right harmed by the activities of an Internet site dedicated to infringing activities occurring on that Internet site." Oddly, the statute doesn't say the simpler "one whose rights are infringed," so the definition must be broader. Could a movie studio claim to be hurt by the infringement of others' rights, or MPAA enforce on behalf of all its members? Sec. 4 is missing (d)(2)(D)
- WHOIS. The "applicable publicly accessible database of registrations" gets a new role as source of notice for the domain registrant, "to the extent such addresses are reasonably available." (c)(1)
- Remedies. The bill specifies injunctive relief only, not money damages, but threat of an injunction can be backed by the unspecified threat of contempt for violating one.
- Voluntary action. Finally the bill leaves room for "voluntary action" by financial transaction providers and advertising services, immunizing them from liability to anyone if they choose to stop providing service, notwithstanding any agreements to the contrary. This provision jeopardizes the security of online businesses, making them unable to contract for financial services against the possibility that someone will wrongly accuse them of infringement. 5(a) We've already seen that it takes little to convince service providers to kick users off, in the face of pressure short of full legal process (see everyone vs Wikileaks, Facebook booting activists, and numerous misfired DMCA takedowns); this provision insulates that insecurity further.
In short, rather than "protecting" intellectual and creative industry, this bill would make it less secure, giving the U.S. a competitive disadvantage in online business. (Sorry, Harlan, that we still can't debug the US Code as true code.)
Not satisfied with seizing domain names, the Department of Homeland Security asked Mozilla to take down the MafiaaFire add-on for Firefox. Mozilla, through its legal counsel Harvey Anderson, refused. Mozilla deserves thanks and credit for a principled stand for its users' rights.
MafiaaFire is a quick plugin, as its author describes, providing redirection service for a list of domains: "We plan to maintain a list of URLs, and their duplicate sites (for example Demoniod.com and Demoniod.de) and painlessly redirect you to the correct site." The service provides redundancy, so that domain resolution -- especially at a registry in the United States -- isn't a single point of failure between a website and its would-be visitors. After several rounds of ICE seizure of domain names on allegations of copyright infringement -- many of which have been questioned as to both procedural validity and effectiveness -- redundancy is a sensible precaution for site-owners who are well within the law as well as those pushing its limits.
DHS seemed poised to repeat those procedural errors here. As Mozilla's Anderson blogged: "Our approach is to comply with valid court orders, warrants, and legal mandates, but in this case there was no such court order." DHS simply "requested" the takedown with no such procedural back-up. Instead of pulling the add-on, Anderson responded with a set of questions, including:
- Have any courts determined that MAFIAAfire.com is unlawful or illegal inany way? If so, on what basis? (Please provide any relevant rulings)
- Have any courts determined that the seized domains related to MAFIAAfire.com are unlawful, illegal or liable for infringement in any way? (please provide relevant rulings)
- Is Mozilla legally obligated to disable the add-on or is this request based on other reasons? If other reasons, can you please specify.
Unless and until the government can explain its authority for takedown of code, Mozilla is right to resist DHS demands. Mozilla's hosting of add-ons, and the Firefox browser itself, facilitate speech. They, like they domain name system registries ICE targeted earlier, are sometimes intermediaries necessary to users' communication. While these private actors do not have First Amendment obligations toward us, their users, we rely on them to assert our rights (and we suffer when some, like Facebook are less vigilant guardians of speech).
As Congress continues to discuss the ill-considered COICA, it should take note of the problems domain takedowns are already causing. Kudos to Mozilla for bringing these latest errors to public attention -- and, as Tom Lowenthal suggests in the do-not-track context, standing up for its users.
cross-posted at Legal Tags
Last week, we hosted the W3C "Web Tracking and User Privacy" Workshop here at CITP (sponsored by Adobe, Yahoo!, Google, Mozilla and Microsoft). If you were not able to join us for this event, I hope to summarize some of the discussion embodied in the roughly 60 position papers submitted.
The workshop attracted a wide range of participants; the agenda included advocates, academics, government, start-ups and established industry players from various sectors. Despite the broad name of the workshop, the discussion centered around "Do Not Track" (DNT) technologies and policy, essentially ways of ensuring that people have control, to some degree, over web profiling and tracking.
Unfortunately, I'm going to have to expect that you are familiar with the various proposals before going much further, as the workshop position papers are necessarily short and assume familiarity. (If you are new to this area, the CDT's Alissa Cooper has a brief blog post from this past March, "Digging in on 'Do Not Track'", that mentions many of the discussion points. Technically, much of the discussion involved the mechanisms of the Mayer, Narayanan and Stamm IETF Internet-Draft from March and the Microsoft W3C member submission from February.)
Read on for more...
Technical Implementation: First, some quick background and updates: A number of papers point out how analogizing to a Do-Not-Call-like registry--I suppose where netizens would sign-up not to be tracked--would not work in the online tracking sense, so we should be careful not to shape the technology and policy too closely to Do-Not-Call. Having recognized that, the current technical proposals center around the Microsoft W3C submission and the Mayer et al. IETF submission, including some mix of a DNT HTTP header, a DNT DOM flag, and Tracking Protection Lists (TPLs). While the IETF submission focuses exclusively on the DNT HTTP Header, the W3C submission includes all three of these technologies. Browsers are moving pretty quickly here: Mozilla's FireFox v4.0 browser includes the DNT header, Microsoft's IE9 includes all three of these capabilities, Google's Chrome browser now allows extensions to send the DNT Header through the WebRequest API and Apple has announced that the next version of its Safari browser will support the DNT header.
Some of the papers critique certain aspects of the three implementation options while some suggest other mechanisms entirely. CITP's Harlan Yu includes an interesting discussion of the problems with DOM flag granularity and access control problems when third-party code included in a first-party site runs as if it were first-part code. Toubiana and Nissenbaum talk about a number of problems with the persistence of DNT exceptions (where a user opts back in) when a resource changes content or ownership and then go on to suggest profile-based opting-back-in based on a "topic" or grouping of websites. Avaya's submission has a fascinating discussion of the problems with implementation of DNT within enterprise environments, where tracking-like mechanisms are used to make sure people are doing their jobs across disparate enterprise web-services; Avaya proposes a clever solution where the browser first checks to see if it can reach a resource only available internally to the enterprise (virtual) network, in which case it ignores DNT preferences for enterprise software tracking mechanisms. A slew of submissions from Aquin et al., Azigo and PDECC favor a culture of "self-tracking", allowing and teaching people to know more about the digital traces they leave and giving them (or their agents) control over the use and release of their personal information. CASRO-ESOMAR and Apple have interesting discussions of gaming TPLs: CASRO-ESOMAR points out that a competitor could require a user to accept a TPL that blocks traffic from their competitors and Apple talks about spam-like DNS cycling as an example of an "arms race" response against TPLs.
Definitions: Many of the papers addressed definitions definitions definitions... mostly about what "tracking" means and what terms like "third-party" should mean. Many industry submissions such as Paypal, Adobe, SIIA, and Google urge caution so that good types of "tracking", such as analytics and forensics, are not swept under the rug and further argue that clear definitions of the terms involved in DNT is crucial to avoid disrupting user expectations, innovation and the online ecosystem. Paypal points out, as have others, that domain names are not good indicators of third-party (e.g.,
metrics.apple.com is the Adobe Omniture service for
fb.com is equivalent to
facebook.com). Ashkan Soltani's submission distinguishes definitions for DNT that are a "do not use" conception vs. a "do not collect" conception and argues for a solution that "does not identify", requiring the removal of any unique identifiers associated with the data. Soltani points out how this has interesting measurement/enforcement properties as if a user sees a unique ID in the DNI case, the site is doing it wrong.
Enforcement: Some raised the issue of enforcement; Mozilla, for example, wants to make sure that there are reasonable enforcement mechanisms to deal with entities that ignore DNT mechanisms. On the other side, so to speak, are those calling for self-regulation such as Comcast and SIIA vs. those advocating for explicit regulation. The opinion polling research groups, CASRO-ESOMAR, call explicitly for regulation no matter what DNT mechanism is ultimately adopted, such that DNT headers requests are clearly enforced or that TPLs are regulated tightly so as to not over-block legitimate research activities. Abine wants a cooperative market mechanism that results in a "healthy market system that is responsive to consumer outcome metrics" and that incentivizes advertising companies to work with privacy solution providers to increase consumer awareness and transparency around online tracking. Many of the industry players worried about definitions are also worried about over-prescription from a regulatory perspective; e.g., Datran Media is concerned about over-prescription via regulation that might stifle innovation in new business models. Hoofnagle et al. are evaluating the effectiveness of self-regulation, and find that the self-regulation programs currently in existence are greatly stilted in favor of industry and do not adequately embody consumer conceptions of privacy and tracking.
Research: There were a number of submissions addressing research that is ongoing and/or further needed to gauge various aspects of the DNT puzzle. The submissions from McDonald and Wang et al. describe user studies focusing, respectively, on what consumers expect from DNT--spoiler: they expect no collection of their data--and gauging the usability and effectiveness of current opt-out tools. Both of these lines of work argue for usable mechanisms that communicate how developers implement/envision DNT and how users can best express their preferences via these tools. NIST's submission argues for empirical studies to set objective and usable standards for tracking protection and describes a current study of single sign-on (SSO) implementations. Thaw et al. discuss a proposal for incentivizing developers to communicate and design the various levels of rich data they need to perform certain kinds of ad targeting, and then uses a multi-arm bandit model to illustrate game-theoretic ad targeting that can be tweaked based on how much data they are allowed to collect. Finally, CASRO-ESOMAR makes a plea for exempting legitimate research purposes from DNT, so that opinion polling and academic research can avoid bias.
Transparency: A particularly fascinating thread of commentary to me was the extent to which submissions touched on or entirely focused on issues of transparency in tracking. Grossklags argues that DNT efforts will spark increased transparency but he's not sure that will overcome some common consumer privacy barriers they see in research. Seltzer talks about the intimate relationship between transparency and privacy and concludes that a DNT header is not very transparent--in operation, not use--while TPLs are more transparent in that they are a user-side mechanism that users can inspect, change and verify correct operation. Google argues that there is a need for transparency in "what data is collected and how it is used", leaving out the ability for users to effect or controls these things. In contrast, BlueKai also advocates for transparency in the sense of both accessing a user's profile and user "control" over the data it collects, but it doesn't and probably cannot extend this transparency to an understanding how BlueKai's clients use this data. Datran Media describes their PreferenceCentral tool which allows opting out of brands the user doesn't want targeting them (instead of ad networks, with which people are not familiar), which they argue is granular enough to avoid the "creepy" targeting feeling that users get from behavioral ads and also allow high-value targeted advertising. Evidon analogizes to physical world shopping transactions and concludes, smartly, "Anytime data that was not explicitly provided is explicitly used, there is a reflexive notion of privacy violation." and "A permanently affixed 'Not Me' sign is not a representation of an engaged, meaningful choice."
W3C vs. IETF: Finally, Mozilla seems to be the only submission that wrestles a bit with the "which standards-body?" question: W3C, IETF or some mix of both? They point out that the DNT Header is a broader issue than just web browsing so should be properly tackled by IETF where HTTP resides and the W3C effort could be focused on TPLs with a subcommittee for the DNT DOM element.
Finally, here are a bunch of submissions that don't fit into the above categories that caught my eye:
Soghoian talks about the quantity and quality of information needed for security, law enforcement and fraud prevention is usually so big as to risk making it the exception that swallows the rule. Soghoian further recommends a total kibosh on certain nefarious technologies such as browser fingerprinting.
Lowenthal makes the very good point that browser vendors need to get more serious about managing security and privacy vulnerabilities, as that kind of risk can be best dealt with in the choke-point of the browsers that users choose, rather than the myriad of possible web entities. This would allow browsers to compete on privacy in terms of how privacy preserving they can be.
Mayer argues for a "generative" approach to a privacy choice signaling technology, highlighting that language preferences (via short codes) and browsing platform (via user-agent strings) are now sent as preferences in web requests and web sites are free to respond as they see fit. A DNT signaling mechanism like this would allow for great flexibility in how a web service responded to a DNT request, for example serving a DNT version of the site/resource, prompting the user for their preferences or asking for a payment before serving.
Yahoo points out that DNT will take a while to make it into the majority of browsers that users are using. They suggest a hybrid approach using the DAA CLEAR ad notice for backwards compatibility for browsers that don't support DNT mechanisms and the DNT header for an opt-out that is persistent and enforceable.
Whew; I likely left out a lot of good stuff across the remaining submissions, but I hope that readers get an idea of some of the issues in play and can consult the submissions they find particularly interesting as this develops. We hope to have someone pen a "part 2" to this entry describing the discussion during the workshop and what the next steps in DNT will be.
This afternoon the CA Senate Judiciary Committee had a brief time for proponents and opponents of SB 761 to speak about CA's Do Not Track legislation. In general, the usual people said the usual things, with a few surprises along the way.
Surprise 1: repeated discussion of privacy as a Constitutional right. For those of us accustomed to privacy at the federal level, it was a good reminder that CA is a little different.
Surprise 2: TechNet compared limits on Internet tracking to Texas banning oil drilling, and claimed DNT is "not necessary" so legislation would be "particularly bad." Is Kleiner still heavily involved in the post-Wade TechNet?
Surprise 3: the Chamber of Commerce estimated that DNT legislation would cost $4 billion dollars in California, extrapolated from an MIT/Toronto study in the EU. Presumably they mean Goldfarb & Tucker's Privacy Regulation and Online Advertising, which is in my queue to read. Comments on donottrack.us raise concerns. Assuming even a generous opt-out rate of 5% of CA Internet users, $4B sounds high based on other estimates of value of entire clickstream data for $5/month. I look forward to reading their paper, and to learning the Chamber's methods of estimating CA based on Europe.
Surprise 4: hearing about the problems of a chilling effect -- for job growth, not for online use due to privacy concerns. Similarly, hearing frustrations about a text that says something "might" or "may" happen, with no idea what will actually transpire -- about the text of the bill, not about the text of privacy policies.
On a 3 to 2 vote, they sent the bill to the next phase: the Appropriations Committee. Today's vote was an interesting start.
Today, Pete Warden and Alasdair Allan revealed that Apple’s iPhone maintains an apparently indefinite log of its location history. To show the data available, they produced and demoed an application called iPhone Tracker for plotting these locations on a map. The application allows you to replay your movements, displaying your precise location at any point in time when you had your phone. Their open-source application works with the GSM (AT&T) version of the iPhone, but I added changes to their code that allow it to work with the CDMA (Verizon) version of the phone as well.
When you sync your iPhone with your computer, iTunes automatically creates a complete backup of the phone to your machine. This backup contains any new content, contacts, and applications that were modified or downloaded since your last sync. Beginning with iOS 4, this backup also included is a SQLite database containing tables named ‘CellLocation’, ‘CdmaCellLocaton’ and ‘WifiLocation’. These correspond to the GSM, CDMA and WiFi variants of location information. Each of these tables contains latitude and longitude data along with timestamps. These tables also contain additional fields that appear largely unused on the CDMA iPhone that I used for testing -- including altitude, speed, confidence, “HorizontalAccuracy,” and “VerticalAccuracy.”
Interestingly, the WifiLocation table contains the MAC address of each WiFi network node you have connected to, along with an estimated latitude/longitude. The WifiLocation table in our two-month old CDMA iPhone contains over 53,000 distinct MAC addresses, suggesting that this data is stored not just for networks your device connects to but for every network your phone was aware of (i.e. the network at the Starbucks you walked by -- but didn’t connect to).
Location information persists across devices, including upgrades from the iPhone 3GS to iPhone 4, which appears to be a function of the migration process. It is important to note that you must have physical access to the synced machine (i.e. your laptop) in order to access the synced location logs. Malicious code running on the iPhone presumably could also access this data.
Not only was it unclear that the iPhone is storing this data, but the rationale behind storing it remains a mystery. To the best of my knowledge, Apple has not disclosed that this type or quantity of information is being stored. Although Apple does not appear to be currently using this information, we’re curious about the rationale for storing it. In theory, Apple could combine WiFi MAC addresses and GPS locations, creating a highly accurate geolocation service.
The exact implications for mobile security (along with forensics and law enforcement) will be important to watch. What is most surprising is that this granularity of information is being stored at such a large scale on such a mainstream device.
Oak Ridge National Labs (one of the US national energy labs, along with Sandia, Livermore, Los Alamos, etc) had a bunch of people fall for a spear phishing attack (see articles in Computerworld and many other descriptions). For those not familiar with the term, spear phishing is sending targeted emails at specific recipients, designed to have them do an action (e.g., click on a link) that will install some form of software (e.g., to allow stealing information from their computers). This is distinct from spam, where the goal is primarily to get you to purchase pharmaceuticals, or maybe install software, but in any case is widespread and not targeted at particular victims. Spear phishing is the same technique used in the Google Aurora (and related) cases last year, the RSA case earlier this year, Epsilon a few weeks ago, and doubtless many others that we haven't heard about. Targets of spear phishing might be particular people within an organization (e.g., executives, or people on a particular project).
In this posting, I’m going to connect this attack to Internet voting (i-voting), by which I mean casting a ballot from the comfort of your home using your personal computer (i.e., not a dedicated machine in a precinct or government office). My contention is that in addition to all the other risks of i-voting, one of the problems is that people will click links targeted at them by political parties, and will try to cast their vote on fake web sites. The scenario is that operatives of the Orange party send messages to voters who belong to the Purple party claiming to be from the Purple party’s candidate for president and giving a link to a look-alike web site for i-voting, encouraging voters to cast their votes early. The goal of the Orange party is to either prevent Purple voters from voting at all, or to convince them that their vote has been cast and then use their credentials (i.e., username and password) to have software cast their vote for Orange candidates, without the voter ever knowing.
The percentage of users who fall prey to targeted attacks has been a subject of some controversy. While the percentage of users who click on spam emails has fallen significantly over the years as more people are aware of them (and as spam filtering has improved and mail programs have improved to no longer fetch images by default), spear phishing attacks have been assumed to be more effective. The result from Oak Ridge is one of the most significant pieces of hard data in that regard.
According to an article in The Register, of the 530 Oak Ridge employees who received the spear phishing email, 57 fell for the attack by clicking on a link (which silently installed software in their computers using to a security vulnerability in Internet Explorer which was patched earlier this week – but presumably the patch wasn’t installed yet on their computers). Oak Ridge employees are likely to be well-educated scientists (but not necessarily computer scientists) - and hence not representative of the population as a whole. The fact that this was a spear phishing attack means that it was probably targeted at people with access to sensitive information, whether administrative staff, senior scientists, or executives (but probably not the person running the cafeteria, for example). Whether the level of education and access to sensitive information makes them more or less likely to click on links is something for social scientists to assess – I’m going to take it as a data point and assume a range of 5% to 20% of victims will click on a link in a spear phishing attack (i.e., that it’s not off by more than a factor of two).
So as a working hypothesis based on this actual result, I propose that a spear phishing attack designed to draw voters to a fake web site to cast their votes will succeed with 5-20% of the targeted voters. With UOCAVA (military and overseas voters) representing around 5% of the electorate, I propose that a target of impacting 0.25% to 1% of the votes is not an unreasonable assumption. Now if we presume that the race is close and half of them would have voted for the "preferred" candidate anyway, this allows a spear phishing attack to capture an additional 0.12% to 0.50% of the vote.
If i-voting were to become more widespread – for example, to be available to any absentee voter – then these numbers double, because absentee voters are typically 10% of all voters. If i-voting becomes available to all voters, then we can guess that 5% to 20% of ALL votes can be coerced this way. At that point, we might as well give up elections, and go to coin tossing.
Considering the vast sums spent on advertising to influence voters, even for the very limited UOCAVA population, spear phishing seems like a very worthwhile investment for a candidate in a close race.
In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.
Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.
On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.
But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.
There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.
When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.
My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.
Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.
Over the last few weeks, I've described the chaotic attempts of the State of New Jersey to come up with tamper-indicating seals and a seal use protocol to secure its voting machines.
A seal use protocol can allow the seal user to gain some assurance that the sealed material has not been tampered with. But here is the critical problem with using seals in elections: Who is the seal user that needs this assurance? It is not just election officials: it is the citizenry.
Democratic elections present a uniquely difficult set of problems to be solved by a security protocol. In particular, the ballot box or voting machine contains votes that may throw the government out of office. Therefore, it's not just the government—that is, election officials—that need evidence that no tampering has occurred, it's the public and the candidates. The election officials (representing the government) have a conflict of interest; corrupt election officials may hire corrupt seal inspectors, or deliberately hire incompetent inspectors, or deliberately fail to train them. Even if the public officials who run the elections are not at all corrupt, the democratic process requires sufficient transparency that the public (and the losing candidates) can be convinced that the process was fair.
In the late 19th century, after widespread, pervasive, and long-lasting fraud by election officials, democracies such as Australia and the United States implemented election protocols in an attempt to solve this problem. The struggle to achieve fair elections lasted for decades and was hard-fought.
A typical 1890s solution works as follows: At the beginning of election day, in the polling place, the ballot box is opened so that representatives of all political parties can see for themselves that it is empty (and does not contain hidden compartments). Then the ballot box is closed, and voting begins. The witnesses from all parties remain near the ballot box all day, so they can see that no one opens it and no one stuffs it. The box has a mechanism that rings a bell whenever a ballot is inserted, to alert the witnesses. At the close of the polls, the ballot box is opened, and the ballots are counted in the presence of witnesses.
In principle, then, there is no single person or entity that needs to be trusted: the parties watch each other.
Democratic elections pose difficult problems not just for security protocols in general, but for seal use protocols in particular. Consider the use of tamper-evident security seals in an election where a ballot box is to be protected by seals while it is transported and stored by election officials out of the sight of witnesses. A good protocol for the use of seals requires that seals be chosen with care and deliberation, and that inspectors have substantial and lengthy training on each kind of seal they are supposed to inspect. Without trained inspectors, it is all too easy for an attacker to remove and replace the seal without likelihood of detection.
Consider an audit or recount of a ballot box, days or weeks after an election. It reappears to the presence of witnesses from the political parties from its custody in the hands of election officials. The tamper evident seals are inspected and removed—but by whom?
If elections are to be conducted by the same principles of transparency established over a century ago, the rationale for the selection of particular security seals must be made transparent to the public, to the candidates, and to the political parties. Witnesses from the parties and from the public must be able to receive training on detection of tampering of those particular seals. There must be (the possibility of) public debate and discussion over the effectiveness of these physical security protocols.
It is not clear that this is practical. To my knowledge, such transparency in seal use protocols has never been attempted.
Bibliographic citation for the research paper behind this whole series of posts:
Security Seals On Voting Machines: A Case Study, by Andrew W. Appel. Accepted for publication, ACM Transactions on Information and System Security (TISSEC), 2011.
Now that the FCC has finally acted to safeguard network neutrality, the time has come to take the next step toward creating a level playing field on the rest of the Information Superhighway. Network neutrality rules are designed to ensure that large telecommunications companies do not squelch free speech and online innovation. However, it is increasingly evident that broadband companies are not the only threat to the open Internet. In short, federal regulators need to act now to safeguard social network neutrality.
The time to examine this issue could not be better. Facebook is the dominant social network in countries other than Brazil, where everybody uses Friendster or something. Facebook has achieved near-monopoly status in the social networking market. It now dominates the web, permeating all aspects of the information landscape. More than 2.5 million websites have integrated with Facebook. Indeed, there is evidence that people are turning to social networks instead of faceless search engines for many types of queries.
Social networks will soon be the primary gatekeepers standing between average Internet users and the web’s promise of information utopia. But can we trust them with this new-found power? Friends are unlikely to be an unbiased or complete source of information on most topics, creating silos of ignorance among the disparate components of the social graph. Meanwhile, social networks will have the power to make or break Internet businesses built atop the enormous quantity of referral traffic they will be able to generate. What will become of these businesses when friendships and tastes change? For example, there is recent evidence that social networks are hastening the decline of the music industry by promoting unknown artists who provide their music and streaming videos for free.
Social network usage patterns reflect deep divisions of race and class. Unregulated social networks could rapidly become virtual gated communities, with users cut off from others who could provide them with a diversity of perspectives. Right now, there’s no regulation of the immense decision-influencing power that friends have, and there are no measures in place to ensure that friends provide a neutral and balanced set of viewpoints. Fortunately, policy-makers have a rare opportunity to preempt the dangerous consequences of leaving this new technology to develop unchecked.
The time has come to create a Federal Friendship Commission to ensure that the immense power of social networks is not abused. For example, social network users who have their friend requests denied currently have no legal recourse. Users should have the option to appeal friend rejections to the FFC to verify that they don’t violate social network neutrality. Unregulated social networks will give many users a distorted view of the world dominated by the partisan, religious, and cultural prejudices of their immediate neighbors in the social graph. The FFC can correct this by requiring social networks to give equal time to any biased wall post.
However, others have suggested lighter-touch regulation, simply requiring each person to have friends of many races, religions, and political persuasions. Still others have suggested allowing information harms to be remedied through direct litigation—perhaps via tort reform that recognizes a new private right of action against violations of the “duty to friend.” As social networking software will soon be found throughout all aspects of society, urgent intervention is needed to forestall “The Tyranny of The Farmville.”
Of course, social network neutrality is just one of the policy tools regulators should use to ensure a level playing field. For example, the Department of Justice may need to more aggressively employ its antitrust powers to combat the recent dangerous concentration of social networking market share on popular micro-blogging services. But enacting formal social network neutrality rules is an important first step towards a more open web.