NIST, the US government's standard setting organization, has issued a draft report condemning the use of electronic voting. The NIST report calls for paper trails and builds on an earlier paper by Ron Rivest (of RSA fame) and John Wack which defined the concept of "Software Independence" in voting systems: The idea that voting systems should allow election officials to recount ballots independently of voting machine' software. The Washington Post correctly points out that this report "repeats the contention of the computer security community that "a single programmer could 'rig' a major election."
The report states:
... the lack of an independent audit capability in DRE [Direct Recording Electronic] voting systems is one of the main reasons behind continued questions about voting system security and diminished public confidence in elections. NIST does not know how to write testable requirements to make DREs secure, and NIST’s recommendation to the STS [Security and Transparency Subcommittee] is that the DRE in practical terms cannot be made secure. Consequently, NIST and the STS recommend that VVSG 2007 should require voting systems to be of the SI [Software Independent] “class”...
NIST, as a government organization, probably needs to be a bit restrained in their criticisms of companies that produce the electronic voting machines in the market today, but they couldn't resist making it clear that:
... much evidence has been produced that voting systems in general are not developed according to rigorous models of secure code development nor tested with the rigor of other security-critical applications.
Those familiar with the issue will, of course, recognize this as a commentary on the oft-demonstrated incompetence and irresponsibility of companies like Diebold.
My hope is that this report will help remove software-dependent voting systems from the field as soon as possible. Democracy depends on accurate votes. But, the citizens of a democracy will only vote if they believe that the vote is accurate. All the evidence we have demonstrates clearly that even though it is unquestionable that accurate electronic voting systems can be built, it is essentially impossible to prove that these system actually are accurate. Given the importance of confidence in the vote, we must reject systems that don't inspire voter's confidence. We must have a paper trail. We must have software-independent voting systems.
One of the things that I love about the computer business is that often even the wildest predictions can seem mundane and old hat as soon as they are made. Consider, for instance, Ray Kurzweil's recent presentation to the SC06 SuperComputing Conference in which, among other things, he predicts that soon:
... we'll routinely be in virtual reality environments. Instead of making a cell call, we could "meet" someone in a virtual world and take a walk on a virtual beach and chat. Business meetings and conference calls will be held in calming or inspiring virtual locations
Well, for me and the hundreds of thousand of people who have frequented Second Life and similar virtual environments, the virtual walk on the beach is already commonplace. As far as business meetings go, folk like Jeff Barr at Amazon, IBM's CEO Samuel S. Palmisano, and others have already begun holding business meetings in Second Life. Additionally, it has been widely reported that quite a number of people make their livings, or contribute to their income, by running businesses within the synthetic Second Life economy.
We've still got some time to see if Kurzweil's other predictions will come true. However, there is no doubt that virtual worlds are already a reality and will only become more real in the future.
One potential application of virtual worlds that I've been surprised not to see more of is providing alternative worlds to those who are physically handicapped to the point of being immobilized or otherwise bed-ridden. I imagine that for such people, something like "Second Life" would present the opportunity for a new, more satisfying life... As long as they maintain enough motor control to be able to operate a keyboard or other controls, the otherwise immobilized should be able to interact within Second Life on equal footing with those not similarily afflicted. So, I'd like to add to Kurzweil's predictions by suggesting that in the future, we'll find that those who are physically incapable of normal or satisfying participation in the "first life" will instead find in world's like Second Life an environment that allows them to reach their maximum potential. For some, Second Life will become their First Life...
Akela Talamasca wonders on the Second Life Insider blog why there is such a disparity between the number of Second Life "residents" and those who are logged in at any one time. Certainly, when less than 1% of the total resident population is online, one must wonder if there is a problem with Second Life. While there are many possible explanations, one may be that Second Life, like every "metaverse," "virtual world," or MMORPG, suffers from the demotivating effects of network latency, packet loss, or general "lag." We've learned to use our networks for things that they aren't quite ready for yet...
Seemingly in answer to Akela's questions, ACM Communications published an interesting article in this months issue which seeks to answer the question: "How Sensitive are Online Gamers to Network Quality?". The authors of the paper (Chen, Huang, Li) have demonstrated a mathematical relationship between measures of network quality (lag, ping time, etc.) and the time that gamers stay online. Of course, these results are completely expected -- in general. What is most interesting about the results is that by establishing the sensitivity of gamers to a number of inter-related QOS metrics, it now becomes possible to reason about tradeoffs to be made QOS optimization. For instance, they clearly show that "packet loss is less tolerable than network latency." Thus, one might comfortably increase the redundancy in messages to reduce the impact of packet loss even at the cost of increased perceived latency...
"Given the strong relationship between game playing time and network QoS, we can “predict” the former if we know the latter. Forecasting when a player will leave a game could provide useful hints for system performance optimization and resource allocation. ...
...systems can be designed to automatically adapt to network quality in real time in order to improve user satisfaction. For example, we might enhance the smoothness of game playing in high-risk sessions by increasing the packet rate (if the risks are caused by long propagation delay or random loss on a noisy link rather than transient congestion) or the degree of data redundancy, so players would be less likely to leave prematurely." [From the paper.]
Today, Chinese state media announced that the number of blogs in China went over 34 million during August. That's more than existed in the entire world until very recently... Of course, as is the case in most areas, the Chinese have found that about 70% of existing blogs are actually dead or dormant... The article claims that 17 million people in China consider themselves to be "Bloggers" and about 75 million regularly read blogs.
In other recent news from the Chinese blogosphere, Xinhua profiles the blog of a Guo Feng, a 48 year-old cab driver in Hunan province. The blog, devoted to discussions of "housing prices, reforms and officials and ordinary people from the point of view of a laid-off worker..." has had 150,000 visitors in two months with as many as 5,000 visits per day. Another recent Xinhua story discusses a brewing scandal over "sex for [acting] roles" that has been heated by revelations published on the blog of 22-year-old Chinese actress Xiao Qiong. As in the west, the blogosphere has become a "source" for chinese "journalists".
Of course, when reading about blogging about blogging in China, it is useful to remember that the Chinese have recently gained quite a bit of attention by putting in place systems to monitor and control blog and news content. For some perspective on this, see the recent article on the Council for Foreign Affairs website titled: China: Pressing for More Control.
A slow stream of publications is revealing some of the inner workings of Google but there is much less documentation of Amazon and other GAYME sites. Now, a paper summarizing Peter Bodik's recent Master's Thesis, based on research at Ebates.com and Amazon, provides some insight into the challenges of operating large scale web services. Bodik's paper, Advanced Tools for Operators at Amazon.com, was presented at HotAC'06 in June but seems to have so far attracted surprisingly little attention in the Blogosphere. [Greg Linden of Findory and Robin Harris of StorageMojo have recently commented on the paper.]
Bodik and his co-authors identify the main challenges facing operators at Amazon:
- Failure propagation
- Lack of global dependency knowledge
- Overwhelming amounts of low-level information
While these challenges are similar to those faced by any system operator, traditional large scale system operators who "work in large corporate data centers, usually administering third-party software that doesn’t change very often." An enterprise environment usually has a much more stable base of software and operators than does a place like Amazon where "hundreds of software developers, working also as operators, administer rapidly changing software." Thus, while in a traditional environment, a small stable set of operators might build up enough experience to be able to understand and troubleshoot the running software, at a place like Amazon, both the software and those who are attempting to operate it are constantly in flux. In order to address these situation, a number of experimental tools have been developed and are described in these papers.
One tool, Maya, presents graphical displays of system status while showing the dependencies between systems and thus providing operators with some of the context they require to understand problems and predict the impact of those problems. A wiki-like system is used to allow team members to build component-specific dashboards which display key operational metrics. The "wiki-like" attributes of this system allows the dashboards to be easily updated as operators learn over time which are the most interesting metrics and how they relate to each other.
Rather than simply display current system status to operators, the research here discusses a number of experiments with techniques to automatically detect and report anomalies in system behavior. The idea is that failures in the system will typically result in detectable changes in things like the frequency that users view specific pages. Trivially, one can see that display of an "internal error" page will normally be much more frequent after an error than before. But, more interestingly, one could also watch for sudden drops in the traffic to some "goal" page and conclude that the change in traffic was probably due to some change in system behavior that alters users' path through the site.
Another set of tools brings into the realm of system operation the "recommendation" technologies that have made Amazon the success that it is. Bodik shows how web-based and command line tools are monitored to learn how operators respond to recurring events. This learning is later used to have the system automatically generate recommended actions when a problem reappears. New operators are thus able to come up to speed faster since the system will recommend to them those solution approaches that appear to have been learned by earlier operators whose advice and guidance may be no longer available.
While some of the tools and techniques that are required to run the largest sites are either overkill or irrelevant for the needs of smaller sites, it seems to me that the content of these papers will be useful to just about everyone with a serious interest in site operations issues. The papers mentioned and Bodik's Master's Thesis are a great addition to our common knowledge of how these large sites are built and operate. Hopefully, they will help people address the needs of smaller sites as well.
[Chubby's] ability to provide swift name updates without polling each name individually is so appealing that Chubby now provides name service for most of the company's systems.
As many know, I've often argued against the horrible waste that comes from polling for RSS and Atom syndication files. But feed syndication isn't the only application area where polling is overused. As Google demonstrates, polling simply doesn't scale for any system doing thousands or millions of DNS lookups per second. (Phil and I ran into many problems with overloading DNS servers while working on Web Traffic Analysis systems at Accrue Software...)
Even though polling is known to have many weaknesses, we have many Internet protocols that either rely exclusively on polling or only provide request/response interfaces that force polling. In many cases, these protocols were defined to use polling since polling is typically very easy for client's to implement and since, in many cases, the polling loads created by the average client have been light enough to accept the inefficiencies of polling. However, as the Internet scales out and as the difference in load generated by the largest systems grows proportionately greater than the load generated by the smaller, more common systems, I think we may see a growing need to not only use push more but also to define multiple-tiers of protocols. We may need one protocol based on polling for smaller systems and another protocol based on push for larger systems.
Scale makes a difference... What is acceptable for small systems is often completely unacceptable for larger ones. We saw this clearly in the world of feed synchronization. If you have a feed aggregator that is trying to monitor the content of only a few dozen feeds, then it is simple to rely on polling RSS/Atom feeds -- there is some waste involved, but that waste is "acceptable" to many small clients. On the other hand, if you are trying to monitor millions of feeds, polling simply won't work. That is why much of the process of feed update discovery for large blog aggregators has moved away from simple polling and now relies on push-based services like the FeedMesh, pinging, and the SixApart Update Stream. Rather than polling millions of feeds every few minutes, today's largest blog aggregators rely on having many of the updates pushed to them at the same time that most small or personal aggregators still rely on the inefficient but simple polling model.
FTP or File copy presents yet another example of a protocol that works "good enough" under light loads but doesn't do well for people distributing files to large numbers of users. For distributors of files to large audiences, something like BitTorrent often makes a great deal more sense than simple FTP. Of course, while BitTorrent can really help large distributors, it doesn't do anything useful for those who only distribute one or two copies of a file... So, what works for light load doesn't work for heavy loads and what is good for some heavy loads isn't useful for light loads... What we need here is is two protocols -- or, in some cases, a single protocol that incorporates support for two usage models.
Of course, nobody likes to define multiple protocols to get a single job done. Thus, you're not seeing a rush in the IETF to define "high volume" or "large scale" versions of the many protocols that quite adequately meet the needs of most users today. But, in the future, it might make sense for us to recognize that at some point quantitative differences translate into qualitatively different problems. Perhaps we should start thinking of serving both small and large systems as two different jobs and recognize that while we have many of the protocols we need to address small systems needs, we still don't have what we need to support the larger systems.
Seekers after the inner secrets of Google have undoubtedly been frustrated by the fact that several of the papers describing Google's internals refer to a distributed lock management service (Chubby) but provide little information on the lock service's design or implementation. For example, the recently released paper on BigTable refers frequently to Chubby but points to a paper by Mike Burrows to be presented at OSDI'06 in November -- still two months away! Well, sometime in the last few days, Google published a pre-print of Burrows' paper. For the inside story on Chubby, and thus more to the story of BigTable, the Google File System (GFS), etc. see the paper: The Chubby Lock Service for Loosely-Coupled Distributed Systems .
The paper's abstract reads in part:
We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled distributed system. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability, as opposed to high performance. Many instances of the service have been used for over a year, with several of them each handling a few tens of thousands of clients concurrently. The paper describes the initial design and expected use, compares it with actual use, and explains how the design had to be modified to accommodate the differences.
The author (ex-DEC) is refreshingly modest about what has been accomplished and clarifies that:
Building Chubby was an engineering effort ...; it was not research. We claim no new algorithms or techniques. The purpose of this paper is to describe what we did and why, rather than to advocate it.
As is always seems to be the case with code that actually works, the implementers were surprised to discover that their system was being used in unanticipated ways --- some reasonable, some not. What they had intended to be a lock manager is apparently also getting heavy use within Google as a DNS-like name service and as a repository for configuration information and small files. Given my interest in publish/subscribe technologies, I was also intrigued to see that they have had to dissuade a number of groups from using Chubby as a general Publish/Subscribe system. (Does this indicate that Google needs a general PubSub service for internal use? Will we one day see a paper published by Google describing a Pub/Sub system?)
The paper includes an interesting discussion of the use of lock management mechanisms as an alternative to an asynchronous consensus algorithm such as Leslie Lamport's (another ex-DECie) Paxos protocol.
This paper on Chubby is an interesting read. I recommend it highly.
I subscribe to the daily email summary of Slashdot stories and have noticed lately that I can't "copy and paste" parts of the messages I receive. Often, I can "cut" or "copy" text from the top of the Slashdot email but I can't copy or cut text that appears later on in the message. On digging into this odd behavior, I've discovered that recent Slashdot messages are sprinkled with embedded digital codes that disable some copying functions in Outlook. The effect is DRM-like copy-protection!
Mixed into the text of Slashdot emails, I frequently find a "b" followed by a [null] byte and a [DC4]. (In Outlook, this sequence of chararcters looks like a "b" followed by two square boxes.) Any attempt to "Forward" a message containing these hidden codes will result in the file being cut off at the first [null]. (C programmers will understand why...) Additionally, it seems that these embedded codes confuse the "cut" and "copy" functions in Outlook's email reader. One can typically copy or cut text that appears before the first [null][dc4] sequence in a message, but you can't cut or copy text that overlaps or follows these /.-DRM codes. This ability to prevent forwarding, copying or cutting text is something that some might explain as a bug... But some would consider it to be a new and interesting way to implement a crude form of DRM (Digital Rights Management) or copy-protection.
The source of Slashdot's easy-DRM-like corruption appears to be funkiness in their HTML to email conversion code. Wherever the HTML for a Slashdot story would contain the HTML entity —, the email conversion code inserts "b[null][dc4]" into the email message. Thus, anyone submitting stories to Slashdot that wishes to prevent their submission from being forwarded or "cut/copied" should simply ensure that their submission starts with an — ... (Note: I'm not sure if this technique works well with email editors other than the one in Outlook.)
Of course, I may be seeing a conspiracy where there is none. It is entirely possible that what we have here is a bug in Slashdot's email conversion which is triggering yet another bug in Microsoft's Outlook email reader. The "DRM-like effect" may not be intentional -- but rather a simple side effect of these two bugs. There may be no conspiracy here... But, then again... DRM seems so popular these days, perhaps there is more to this story than a simple pair of synergistic bugs!
"Playing with Alexa" today, I stumbled across a fairly striking case of cannibalization... Many may remember that on May 8, 2006 Gabe Rivera's popular Memeorandum site spun out it's technology section as TechMeme. It is well known that Techmeme has been successful in rapidly gaining a very respectable readership. But, what may not be as well known is that much of Techmeme's gain seems to have been at the direct cost of its parent --- Memeorandum. The "Alexa Rank" chart below tells the tale in graphic detail.
As Gabe mentions in the blog post announcing Techmeme, he wanted to create a more easily remembered URL for the site. (TechMeme is certainly easier to spell and say than "tech-dot-MEME-o-randum".) Thus, we shouldn't be too surprised that so much traffic was swept away. The launch of TechMeme worked like a renaming of part of the Memeorandum site. Memeorandum continues to run with non-tech content -- but seems to be losing readers over time -- while TechMeme is holding steady in the numbers...
It is somewhat disappointing to see that readership for the now non-technical Memeorandum has dropped so much and continues to fade. What we may see here is another example of a common problem: Techies deploy wonderful technologies for keeping up to date on news, but find that there users are limited to other techies trying to keep up to date on technical stuff...
Microsoft is neither judge, jury, nor legislature. Yet, they seem confident that they have the power to define, by fiat, what is "fair use" in digital media. The MS Zune iPod killer will apparently encourage users to believe -- incorrectly -- that fair use permits "three free plays" of copied media files. Thus, just as the Zune makes it mechanically easier for people to illegally copy digital media files between devices, Microsoft will be inducing them to do more illegal copying by confusing them about what is and is not permited by the law. Of course, at the same time that the Zune will encouage and induce illegal copying, it will also tend to make perfectly legal copying more difficult -- if not impossible.
One may argue that our legislatures have been slow in addressing the copyright issues that arise from the spread of the Internet and other new media, however, it can't be acceptable for a corporation, impatient for the resolution of some issue of public policy, to further their business goals by simply taking on themselves the roles that we assign exclusively and properly to our nation's law makers.
The issue of Microsoft's hubris comes in the recent announcement of the Microsoft Zune iPod "killer." A feature of the system, apparently considered critical to its commercial success, is the ability to easily share recordings, via WiFi, with other Zune owners. Supposedly in order to limit illegal music sharing, the Zune will wrap all shared music with DRM code that prevents shared files being played more than three times. Certainly, there is nothing wrong with MS doing this for files whose creators have agreed to it. The problem comes in that Microsoft will be wrapping ALL shared music with the DRM code -- whether or not the copyright holder has agreed to permit the three free plays that Microsoft seems to consider to be "fair use." User's will assume that the "three play" limit has a grounding in actual law. In this, they will be mislead. Of course, since the DRM is applied to *all* files shared via the Zune, Microsoft will be imposing it's "three free play" limit even on those copyright holders who, by using Creative Commons or some other means, have permitted *more* than three plays!
Microsoft's lawyers are well aware that copyright law clearly reserves *all rights* to a copyright holder. Under copyright law, any copying is prohibited unless explicitly permitted by the copyright holder or implicitly permitted as a fair use or via some implied license. It doesn't matter if the copying is for the purpose of listening to a song one time, three times, or a thousand times. Copying is simply not permitted -- we need no new law to clarify this point! As far as fair use is concerned, courts have often limited it to things like 18 second clips, the production of ephemeral copies when necessary to facilitate networked use, etc. To my knowlege (IANAL), no court has ever said it is fair use to copy an entire song, movie, or multi-hour digital media work -- no matter how many times it is to be listened to.
The problem in what Microsoft is doing is that they will create the impression in the minds of their users that "three plays" is "fair use." Most of their customers aren't lawyers and thus are likely to trust that Microsoft's lawyers have ensured that copying files among friends is legal -- as long as the number of plays are limited. Given this, we can anticipate that many users, who might have otherwise been held back by their fear of violating copyright law, will become comfortable about violating copyright law using a Zune. The result will inevitably be more illegal copying done with Zune's than is done using other media players like the iPod that make no implicit or implied statements about what is or is not permitted. Effectively, Microsoft will be indirectly inducing Zune users to violate copyright law.
So far, the only Microsoft "explanation" I've been able to find on the web is one offered by a member of the Zune team at Microsoft. He writes on his blog that:
There currently isn't a way to sniff out what you are sending, so we wrap it all up in DRM. We can’t tell if you are sending a song from a known band or your own home recording so we default to the safety of encoding... [See ZuneInsider.com]
Basically, he's claiming that it is "too hard" for them to figure out what should and should not be "protected" by their DRM encoding. The claim is simply not true and, in any case, isn't much more convincing than the arguments of teenagers that "music should be free" because it is too hard for them to pay for it... Microsoft can easily determine which recordings contain Microsoft DRM and were thus produced by those who have agreed to their "three play" policy. Any files which do not have Microsoft DRM in them should be considered files whose owners have not agreed to Microsoft's free play policy and should thus be left untouched -- no derivative works with DRM added should be created.
One must wonder if the precedent set in the Grokster case won't be used against Microsoft once the Zune is released. Certainly, Microsoft will be aware that there is much copyright violation using their product. Also, while Microsoft may argue that their DRM policy reduces the impact of illegal copying in some cases, they won't be able to dispute that the way they implemented their system actually encouraged some of the illegal copying by encouraging users to think that it was permitted -- if done the Microsoft way. It will also be very much the case that if the Zune is successful, a large part of its success will be due to features that differ from those of other players. In this case, just about the only unique thing in the Zune appears to be the support for widespread WiFi copying and the copyright-violation-inducing DRM system...
It may be that Microsoft is impatient for the law in this area to become settled or for the law to be changed so that it is easier for Microsoft to make money. Frankly, it doesn't matter what their motivations are. The law in the country is defined by the government -- not by the corporations.
Yet another study demonstrates that Diebold's voting machines are incompetently designed, easily hacked and a threat to voter confidence. This time, it is researchers at Princeton who did the work. On their site, they provide a detailed paper, a video, and other materials describing a problem that others, such as Bev Schultz have been railing about for years. Of course, in a Forbes article, Diebold's marketing manager objects to the study... Notably missing, however, is any proof by Diebold that *any* competent and independent study disputes the long string of reports and studies that have condemned Diebold's voting technology and corporate management. There is a stink about this company whose CEO once promised that he was "committed to helping Ohio deliver its electoral votes to the president [Bush] next year" even though one would expect the CEO of a voting machine company to work hard to maintain some appearance of impartiality in elections...
It should be obvious to everyone that democracy means little unless vote counts are accurate. It should be just as obvious that voters must *believe* that their votes are being accurately counted. Whether or not Diebold is correct in saying that their systems count votes accurately, the reality is that Diebold's behavior and design flaws over the last few years has contributed greatly to a loss of many voter's confidence in the very process of voting. In acting to destroy voter's confidence, Diebold has done vastly more damage to our system then is represented by the mere millions of dollars they have had to forfeit to settle lawsuits or pay fines.
Diebold goes beyond mere incompetence and hints of ethically challenged management by also permitting its employees to argue that a paper trail is not necessary to ensure trusted elections. Yet, virtually every competent computer scientist or computer security expert who has addressed the issue has said that the computers must not be trusted -- a paper trail MUST be provided. Even professional societies, such as the ACM, have argued this point. (An ACM poll, taken in 2004, showed 95% of those polled agreed that a paper trail was necessary...) By arguing against the combined wisdom of the industry, Diebold clearly appears to be pushing pseudoscience onto computer-illiterate elections officials in a crass attempt to build business. We need better from any company to which we entrust the accuracy and reputation of our elections.
Frankly, I think it is time for us all to ask Diebold to simply get out of this market. The company has a good and solid business in providing ATM's to banks and can use whatever skills it has to generate money elsewhere. However, their continued participation in the market for voting machines can do little but continue to degrade voter's confidence in the accuracy of voting systems and thus threaten the foundation upon which the USA's representative system rests. I think that even if Diebold were to address all the technical issues, eliminate all employees whose ethics or impartiality is questioned, stop fighting the computer industry by arguing against paper trails, etc. Diebold will never be able to overcome the negative impressions and sordid reputation that they have built up over the last few years. The taint that they have earned will not be erased.
This is not the normal case of a company stumbling in addressing market needs. Many companies have failed the first few times they attempt something and then eventually roared to success -- Microsoft is famous for doing exactly that! The problem with Diebold is that they have lost the right to be trusted and, in the area of voting, trust is the most important thing there is. We need not only accurate voting machines but voting machines that people believe are accurate. Certainly, it is possible that Diebold could be cajoled or beaten into building machines that should be trusted, but I think it will be a very long time before they will be able to build machines that will be trusted.
Diebold - Please, get out of the voting machine business. Make your money elsewhere. The strength of our democracy depends, in part, on your tearing down your voting division. Do it now.
Although many have talked about possible responses to and defenses against the scourge of Phishing sites, it wasn't until today that I actually saw a site begin to actively fight back. Sometime recently, Yahoo! made a small but terribly important change to their login screen which I think will soon become widespread. Certainly every bank with an online site should copy Yahoo! instantly...
Successful phishing relies on the fact that phishers know that real web sites look like and can easily produce fake (phishing) sites that look just like the real thing. But, now you can customize your Yahoo! login screen so phishers can't possibly know what you think the Yahoo! login screen looks like. They can't know what it looks like because the real Yahoo! screen will display a "secret" shared only between you and Yahoo!. This secret, which is either a bit of text or an image, is something that you choose and then send to Yahoo! for display whenever you're logging in to the real Yahoo!. Once you've shared you secret with Yahoo!, then whenever you see a Yahoo! login screen that doesn't display the secret, you should be alerted that you may be on a phishing site.
I've put some example Yahoo! login screens in the right margin. At the top, you'll see what the new default login screen looks like at Yahoo! Below that, you'll see examples of the same screen modified to show either a text secret or an image secret.
I'm sure that others who follow events on the web more closely than I have will be able to say that they've seen this sort of thing before. Certainly, I'm aware that the technique has been discussed for quite some time and that there have been previous uses, but this is the first time I've actually seen it used on a production site that I use. Wonderful!
While there are many other things we can do in the battle against phishing and while this technique is known to have some weaknesses, I still hope we see use of this technique spread across the net like wildfire. Every bit helps.
Well, like many, many other folk, I got temporarily hooked on the new Google Image Labeler before Digg and other flash mob assemblers managed to slow the site down to a crawl. This addictive little toy is wonderful in its simplicity yet frankly quite profound in it's implications.
The "theory" behind the tool is discussed at some length by Luis von Ahn of CMU in a Google engEDU video entitled "Human Computation". That video is almost as much fun as the Google Image Labeler is to play...
The basic idea behind Labeler is to create a two player online game similar to Ahn's "ESP Game." Two players are randomly connected and both shown an image. Then, each player inputs keywords that describe the image being displayed. If both players type the same keyword, a new image is shown and the keyword guessing starts again. This continues for two minutes until the game is up. The players then get points based on how many matches they had.
Based on initial results and measurements that showed that large numbers of people were willing to spend as many as 12 hours a day playing the game, Ahn estimates that 5,000 players would be needed to label all the images in Google Images in less about two months. This is just one example of what can be done with "Games with a purpose"... Ahn classifies the Labeler and ESPGame as "symmetric verification games" and in his talk describes the principles for creating "asymmetric verification games" as well.
SETI@home, folding@home, BOINC and similar systems showed us how to convert the otherwise wasted spare cycles of millions of computers into useful work by offering public-spirited computer owners a way to contribute those cycles to useful projects. Amazon's Mechanical Turk showed that by offering people money, spare human processing cycles could be captured and managed for commercial purposes on demand. The StarDust@home project showed that you don't have to pay people for spare brain-cycles if they think they are contributing to something worthwhile. Now, Ahn and Google are demonstrating that building games and entertaining people might be an exceptionally powerful way of capturing spare human cycles. And there are quite a few of them available!
Ahn estimates that during 2003, 9 billion human-hours were consumed by just by people playing Solitaire on their computers. Clearly, there's quite a resource there to be captured. To provide some scale to that number, Ahn shows that the Empire State Building in NYC took only 7 million human-hours to build (6.8 hours of Solitaire play) and the larger Panama Canal took only 20 million human-hours to build (less than a day of Solitaire.)
Perhaps we will see the coining of a new unit of measure:
1 "Sol" equals the amount of human processing devoted to Solitaire during one hour in 2003...
If Ahn's predictions are correct, and if the inevitably large number of people "play" with the Google Image Labeler, it would seem that Google will once again lengthen its lead over Microsoft, Yahoo and others by providing the world's best and most comprehensive image search capability. Google's competition is currently limited to indexing images based on surrounding text, alt text, or primitive image understanding algorithms. I can't help wondering if we'll soon see the Google's competitors soon release their own "games" in an effort to improve their indexes. (How will they make them good enough to attract users from ESP Game and Labeler? Is there a way to combine these games with Second Life or Halo? Can Yahoo! expand on what they have already been doing with exploiting Flickr tags?) Will the "best" search engine one day be the one that offers the most effective and most fun indexing "games"?
I think it goes without saying that "Games with a purpose" and "Human Computing" will be concepts that we'll be hearing quite a bit about in the future.
Denise Howell has been analyzing the implications of copyright law for RSS/Atom syndication on her new Lawgarithms blog at ZDnet. Clearly stating that many of the issues are currently unsettled, Denise encourages "participatory law" and asks readers to provide "reasons why RSS publishing isn't like free magazine publishing"... There are quite a few reasons, I think. As one digs into this subject of the law of syndication, it will become apparent that there are many, many interesting issues to be dealt with. For instance, one very significant difference is:
The publisher of a magazine has total control over the presentation of content in that magazine. However, the publisher of an RSS/Atom syndication feed has very little, if any, "control" over the presentation of data published in a feed.
The implication of this difference is that we might be making a mistake when we compare web and RSS/Atom syndicated content to normal text-based paper publications. It might be more useful to consider such content as though it were computer programs or even music!
XML, HTML, RSS, Atom, etc. are all members of a class of markup languages that quite intentionally do *not* have fixed presentation formats or styles. Thus, a content creator can never be sure what their content will look like when presented to a user. A given chunk of HTML may have one appearance in Internet Explore 6 yet look slightly different when viewed with IE7, FireFox, or Flock. The same content will look drastically different if viewed using the "text-only" Lynx browser or via the browser on a cell phone or PDA. Additionally, two people, both using the same browser software, might see different presentations as a result of different browser preferences, screen resolution, or the use of local CSS style sheets. In each case, the presentation of the encoded data is adjusted to address the limitations of the display device, user preferences, etc. Such adjustment does NOT happen and in most cases cannot happen with content in magazines. Once ink is laid on paper, the only changes expected are things like yellowing of the paper over time or fading of the ink...
The plasticity or indeterminateness of presentation which is inherent in online content presents a drastically different environment than the one that exists for the "free magazine" publisher that Denise talks about on her site. One big difference comes in the process of recognizing when content has been copied. When you run the page of a magazine through a copier, the result is something that is easily recognized as a direct copy. However, if I do a "print screen" while using the text-only Lynx browser on a typical web page, it is likely that the results will be so different from what that page's creator saw and intended, that it would be difficult for them to recognize the page without more work. Of course, the same "difficulty" in recognition could arise with a paper-based magazine if I copied it by retyping the text and then used different styles in printing it. This would, of course, not be a "copy" but rather, the production of a "derivative work" that was based on the paper magazine. The interesting thing here is that in the online world, ALL presentations of "copies" are actually derivative works -- not mere copies. In fact, in the online world, the original form of a work, the thing that gets "copied," is just binary data which cannot be directly viewed by humans.
Now, much has been said about the "implied license" to "copy" otherwise protected works on the Internet since doing so is a technical requirement of using the works. It seems settled that copying data over network links, into screen buffers, system caches, etc. is not prohibited. However, I think we need to recognize that there is at least an additional implied license here, and that is a license to produce a derivative work -- based on the HTML, RSS, Atom, etc. which is produced by a copyright holder or publisher. This implied license exists since the published content cannot be used without producing a derivative work -- potentially a derivative work that differs greatly from what the publisher expected.
So, there are at least two differences between an RSS/Atom feed and a magazine. The RSS feed, like all web content which is encoded in XML and HTML derivatives comes with two implied licenses which are not associated with the magazine or most other kinds of printed works:
- A limited license to "copy" when doing so is facilitative to viewing the content and is required for viewing
- A limited license to produce derivative works (i.e. determine presentation format and appearance)
These two differences are substantial and thus it might be best for us to stop thinking of paper or printing-related analogs when looking for insight into how copyright applies to web content or Internet syndication feed content. It might be more appropriate, for instance, for us to think about such content as being more like:
- Computer programs: They must be "copied" from disk into memory in order to run and their results depend not only on the data and instructions in the programs themselves but the capabilities of the devices on which they run, personal preferences (e.g. choices of screen resolution, etc.), etc. Note: The fact that many web pages and even syndication feeds contain scripts (sub-programs) strengthens this analogy.
- Music: While sheet music doesn't need to be "copied" in order to be performed, the results of any performance will be determined by the skill of the performer, personal tastes, the instruments used, etc. -- things not under the control of the composer of the sheet music.
- Scripts for plays, movies, etc: Similar to sheet music. Some "performances" will be unrecognizable to the original author...
Of course, copyright law and our natural sense of what is fair are drastically different when applied to printed materials like magazines then they are when applied to things like computer programs, sheet music and scripts. For instance, we might expect that a program writer, composer or playwright has a right to prevent non-required copying; however, it would be very odd to say that copyright law itself forbids the use of software on "slow" computers or forbids "bad" performances of a piece of sheet music. The point here is that in areas other than printed works, we allow the users of copyrighted content a great deal of latitude in how the content is presented or performed. Yet, much of the debate about the use of copyrighted syndication feeds involves people's objections to the way in which their syndicated content is presented...
In the area of performance variation, it might be that syndicated content is much more like sheet music than it is like even a computer program. (Even though XML and HTML are essentially "programs" which are executed or interpreted by other programs.) The similarity to sheet music comes in the aspect of "selective" execution or performance. Imagine a bit of sheet music for piano that is written for two hands. It has both a harmony and a base line. Yet, performers will often just play the harmony -- without the base. This is very similar to what often happens with syndicated content. Usually, not all of the content in a feed is actually "performed" or presented to the user.
If you look at the standards for either RSS or Atom, you'll see that many fields are defined in both, yet, most aggregators will only present to their users a subset of the fields that actually appear in feeds. While a feed might include entries with attachments, several different dates, FOAF elements, Dublin Core extensions, RSS-Media content, etc. many aggregators will only present a single date, a title, a link, and the text from the description or content feeds. Some aggregators will normally present all of the content in the content or description elements but will filter out embedded scripts, images, applets, etc. This is expected behavior in much the same way that we expect that the performer of a piece of sheet music will often omit one or more of the recorded parts.
An extreme example of "partial performance" can be seen in Dave Winer's "NewsRivers". For instance, in order to address the needs of low-bandwidth or small screen devices like cell phones or PDAs, Winer takes a complex, multi-element feed from the New York Times and distills it to little more than a link and some very much abbreviated text. While in the original feed each story carried at least one date, Dave clusters all stories for a single date together. Also, many fields of data that were in the original are not replicated in his "rivers." Is this selective display permitted? Leaving aside the question of modifying the contents of a field (such as the abbreviation of the content field), this sort of selective rendering of content is exactly what is intended by those who wrote the RSS and Atom specifications. Those specifications say much about how to create a valid syndication feed, however, they intentionally say virtually nothing about expectations concerning how the feeds will be formatted or presented.
Naturally, one asks: If one is permitted to eliminate items from the presentation of a syndicated feed, is one permitted to insert items or elements when presenting a feed? If so, are there limits to this right?
My personal feeling is that most of the issues in this area will eventually hinge on questions of intent. The difference between "white hat" and "black hat" uses is usually not found in what was done but rather in why it was done. If you remove ad links or images as part of the process of adapting a piece of content to some specific device or display environment, you're probably ok. However, if you remove the ads and replace them with others, then you've probably got a problem. Similarly, removing elements in such a way that you eliminate attribution to the original publisher or author is likely to be a problem... We'll need to work through a long set of examples before we have patterns for what is reasonable and what is not. The "implied license to perform" web content is limited in just the same way that the "implied license to copy" such content is limited.
Hopefully, I've illuminated at least a few differences between a syndication feed and a "magazine" as Denise requested. I've also tried to suggest that we might all be making a mistake by trying to find similarities between online works and works on paper. The interesting question at this point is: "So what?" What are the implications of these differences and how will they effect what we do and how we do it in the world of syndication? Hopefully, Denise (who is a lawyer) will be able to provide us some guidance here.
Michael Moe writes at AlwaysOn:
"I don’t know about you, but I’m all for having my phone calls listened to. If reading my emails (and everybody else’s) makes it easier to prevent the bad guys from doing bad things, I’m all for that, too."
The problem here comes in identifying the "bad guys." Today, we may see them as being terrorists, but tomorrow, once we've taught people to be accustomed to the idea of being monitored and once we've built the technology to enable and perfect that monitoring, it is quite possible that the "bad guys" will be the people doing the monitoring -- not those being monitored.
The freedoms and rights that we enjoy do not come without cost. Our rights require constant vigilance and, at times, we must accept lesser personal security in order to protect the freedoms we hold dear.
Moe, like George Bush, invokes the mantra "this is a war" to justify his acceptance of universal surveillance. But, in focusing on the new "war" against terrorists Moe forgets the older and never-ending "war" to protect our freedoms. In that war, just as soldiers who march to battle, we must all accept some occasional discomfort in order to achieve our goals and fend off defeat. In this case, it is best that we avoid the easy answer of universal and unrestricted wiretapping, and potentially let a few terrorists succeed, rather than teach us all to live in and accept a world in which we have no privacy -- a world in which we have lost our freedoms. Freedom is not free.
[Updated 22-Aug-06. See below]
I've been sweating for weeks in anticipation of a Forbes article covering the PubSub melt-down and now, it is finally out. It could have been worse... Fortunately, Matt Rand stuck to his promise to focus on the "lessons learned" and avoided most of the more sordid aspects of the affair. However, as seems inevitable in press coverage, there were some errors in the piece. One, at least, I need to deal with immediately.
Rand writes: "Wyman had a hand in designing the Lotus Notes spreadsheet program". Well, clearly, Lotus Notes is not a spreadsheet... but, let's ignore that minor error. The really embarrassing thing is that I never had a "hand in designing Lotus Notes" and have never said that I did (although Salim Ismail has often said that I did...). What I did do, back in 1984/5 at Digital, was develop a program called "NewNotes". New Notes was a client/server implementation of the earlier "VMS Notes" program written by Len Kawell (circa 1980?). My NewNotes was an attempt to show a better way to implement Kawell's VMSNotes and used a publish/subscribe system to replicate its database. (The code I used was based on the "SDDB" -- Self Distributing Database-- developed by Beat Marti's group at Digital in Geneva).
My New Notes program was one of a number of predecessors to both Digital's VaxNotes product (which replaced the internal use only VMSNotes) as well as the "Notes" that Len Kawell and Ray Ozzie later built at Iris and later sold to Lotus. The Kawell/Ozzie Notes that eventually became known as "Lotus Notes" can really be considered as the combination of a number development paths and influence vectors: The original Plato "Notes" program that Kawell and Ozzie worked with during the mid-70's, Kawell's VMSNotes, my New Notes, a variety of other "Notes" implementations, and the programmability of the ALL-IN-1 Office Automation system of which I was product manager at Digital. In any case, I was never an employee of either Iris or Lotus and thus whatever influence I may have had on Kawell, I cannot be said to have "had a hand in designing Lotus Notes."
I think, when speaking with Rand at Forbes, that I was clear in stating (as I always do) that I worked on "a predecessor" to Lotus Notes. The intention in that wording is to show precedence but not a hands on role. I think my wording is fair and accurate -- what was published in the Forbes article is not accurate.
My suspicion is that Rand was misled about my role by talking with Salim Ismail, who he also interviewed for the story. Those who know me will know that one of the things that really infuriated me about working with Salim was that he consistently overstated my accomplishments. In particular, he regularly told people that I "had a hand in designing Lotus Notes" as well as Microsoft Office, and a number of other products. On literally scores of occasions I discovered that Salim had once again exaggerated my history and I was forced to object. It got monotonous. At one point, I got so frustrated that I had to present to our PR agency a list of "Things you are not allowed to say about Bob -- even if you hear Salim say them."
In any case, it is a bit disappointing to see my exaggerated and thus false "role" in the prehistory of Lotus Notes as the key item used to sum up my career... I've done many more interesting and important things. ALL-IN-1 was the first customizable, integrated office automation system and was a multi-billion dollar business back in the 80's... I developed what I believe was the first wide-area network hypertext browser (well before TBL got the Web going at CERN -- where he was a VaxNotes user...) I was awarded some of the earliest "Digital Rights Management" patents. As a product manager at Microsoft I was responsible for getting OLE Automation going. I launched the first broad-market CDROM based magazine. I once ran one of the largest "search engines" freely available on the Web. I've done many other things much more worthy of notice -- including building PubSub...
I always try to accurately and truthfully represent what I've done. The problem is that what I have done over the last 30 years is pretty darned impressive -- yet, I've never managed to make much money in the process and most folk can't understand how you can do what I've done and not make gobs of money. (It's a long story...) The result is that folk who don't know me and review what I've done often suspect that I've exaggerated. As a result, my resume gets a great deal more scrutiny than most people's resumes do. My real fear is that someday someone will decide that false statements made by people like Salim or mistakes in articles like Rand's piece in Forbes actually originated with me and they will decide that I am the one not telling the truth... The truth is that if I tell you what I've done, it is probably a understatement not an exaggeration. If someone else tells you what I've done -- unless they were there -- you should ignore them.
[Updated 22-Aug-06: It seems that in the process of objecting to an overstatement of my own role in the history of Notes, I managed to overstate the role of Kawell and Ozzie... As Richard Swartz points out in the comments, the original Plato Notes system was actually created by David Wooley. I've changed the text to correct my misstatements. Also, please note that Forbes has declined the opportunity to correct the wording in their story. They claim that "being a predecessor" to something is included within their understanding of what it means to "have a hand in designing" that thing. I disagree.]
The subject of Net Neutrality got much well deserved attention this weekend during the Gnomedex presentations of John Edwards, Werner Vogels and Mike Arrington. Some of the discussion dealt with the difficulty of communicating the nature of the issue and it was suggested that "Net Discrimination" might be a better tag for the discussion. I think the idea was to follow the lead of the abortion debates and try to build terms that map to "Pro-Life" and "Pro-Choice". The idea seemed to catch on as various speakers and commenters at the conference began to use "Net Discrimination" in discussions. However, I think that while the intention was good, we can do better.
I would suggest that a much better way to clarify the issue would be modeled on the success of the Women's Movement's slogan: "Equal Pay for Equal Work." That phrase succinctly presents an argument so clear and simple that it has been terribly difficult for anyone to debate it. The "Net Neutrality" equivalent would be: "Equal Service for Equal Pay." i.e. While the rates charged for service might vary according to bandwidth (one assumes that those who buy more will pay less per unit...), the service received by all those who pay at any level should be identical.
Net Neutrality means "Equal Service for Equal Pay"... I think it would be very difficult to argue with that and I think it will be easily understood by those who hear it. What do you think?
I just stumbled across a wonderful "publish/subscribe" application at Folgers coffee's site TolerateMornings.com. It is a "BossTracker" that allows office workers to build a customized map of their office and then have people update the application with sightings of the boss as he/she wanders about the office. Of course, everyone is then notified of the boss' current location in graphical detail. There is more to publish/subscribe than blog posts and SEC Edgar updates... The basic pattern of publish/subscribe has endless applications.
Similarly "odd" applications of publish/subscribe are the services that MySpace.com has recently started shutting down with Cease and Desist letters... Until recently singlestat.us and DatingAnyone.com would let you "subscribe" to the "relationship" status of people on MySpace. Thus, you could find someone you really wanted to date or otherwise pursue, but if they were "currently in a relationship" you could ask for an email to be sent as soon as their status changed to "single." Just another Pub/Sub application. Tell the system what you're interested in and you'll get a message when it happens.
Rumors have been flying lately about the demise of PubSub.com. While I've seen quite a bit of exaggeration in various forums, I can't deny that things are not going well for us. Our days are numbered. A recent attempt to execute a merger has been blocked and we've been blocked from raising equity financing that would allow us to continue to pay salaries and pay off our $3 million in debt. Thus, our "doors" will close soon if we can't find someone to pull us out of the current situation. Persons with fast access to cash and a desire for some of the industry's best technology are advised to contact us rapidly...
I believe firmly that the cause of our difficulties is not the team we have built nor is it the technology we've developed. Both are great. What has prevented us moving forward is a battle with a group of minority shareholders, some of whom claim to be lead by our ex-CEO Salim Ismail and are, in any case, primarily his "friends and family." This group is using very unusual clauses in our Shareholder's agreements to block mergers or financings. We've found it difficult to determine their motives, however, some have said that they believe that it is in their interest to drive the company into bankruptcy so that they can buy our software and start a new company. Of course, I believe that by the time we go through bankruptcy proceedings, we won't have any employees and frankly, a software company without the employees that developed its software is worthless. Our assets will have a much higher value if we have the employees available, thus, the best course at this time is to sell our assets, trademarks, etc. to cover the outstanding debt and make our employees available to the purchaser.
The clause that is being used to block us is a "one-man-one-vote" clause that requires that a majority of the shareholders approve any change to the shareholders' agreement. Mergers and financings require such changes. Asset sales do not. What that means is that one of our shareholders who has 75 shares has as much voting power as I do with my 540,000 shares (38.8% of the company). So, a tiny minority is able to block us moving forward on any of the normal paths for raising money. If no alternatives arise, we'll be in bankruptcy and we'll lose the employees within days.
I'm proud of what we've accomplished at PubSub.com and I'm very happy to have been able to work with a great team in an exciting industry. I'm terribly sorry that we've spent so much time on this internal political issues and haven't devoted more time to pursuing our product plans and building a better product for our users. Hopefully, in the next few hours and days, we'll find someone who is willing to help us out of this situation.
(bobwyman .at. pubsub .dot. com)