• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)


Date: Saturday, 26 Jul 2008 22:33

I have moved my blog from this site to danweinreb.org/blog.  Thanks very much to Ivan Krstić for providing hosting and software.  If you have any comments on earlier blog entries, please use the new blog, since this one will eventually go away.  Thanks!

Author: "dlweinreb" Tags: "Uncategorized"
Comments Send by mail Print  Save  Delicious 
Date: Friday, 11 Jul 2008 12:22

I’ve been using email on the Internet (and its predecessor, the ARPAnet) for 32 year.  Here’s some advice from my experience.

The Prime Directive: Never send email when you’re angry.  Never, ever.  It always backfires and you always regret it. Trust me on this.

The rest of these recommendations apply primarily when you’re sending mail to anyone who isn’t a close friend.

Do not use sarcasm on mailing lists.  Remember your tone of voice is not available to indicate that what you’re saying is sarcasm.  Inevitably, a few people on the list will take what you say literally, and then you’ll have to underke the boring job of correcting everyone’s misimpression.

Be very polite.  You almost can’t be too polite.  Because your facial expressions and tone of voice are not present, it’s easy to write something that will seem demanding or commanding.

Know the difference between “Reply” and “Reply All”, and be careful to always use the appropriate one.

Be careful to address your mail to the right person!  The automatic name-completion feature in many of the good mail clients can sometimes complete to a name that’s not what you expected.

Some people have separate home and company email addresses.  Send personal mail to the home address.

Be careful about giving out someone else’s personal email address.  Some people do not like to have their email addresses be well-known.  So treat anyone else’s email address as if it were confidential information, until you get permission to distribute it.

When sending mail to many individuals, address the mail to yourself, and BCC it to everyone else.  This way, the recipients cannot see the email addresses of the other recipients, thus protecting their privacy.

Make your subject lines descriptive and clear.  If you’re replying, keep the same subject line (don’t worry about the “Re:”) so that mail readers can see which mail is grouped with which.  If you’re communicating with friends, clever subject lines can be quite an art form and source of innocent merriment.

Save away all of your interesting email.  It’s very handy to be able to refer to it when you subsequently communicate with the same person, or company.

Keep your email on your own computer.  Leaving it on a net server is too much of a risk to your privacy.  Even if you like Google (which I do) and trust them (which I pretty much do), you never know if conditions will change in the future, and by then it’s too late.

For very sensitive email, encryption is a good idea.  Sadly, there isn’t an easy-to-use standard.  I use the free version of AxCrypt from Axantum ; it’s only for Windows, unfortunately.  There are plenty of others.  Of course, the person to whom you are sending mail must also install the software, and you must have a shared passphrase.  As long as you’re going to the trouble to encrypt, use a long passphrase for better security.

Please feel free to use the Comments below to add other good advice.

Author: "dlweinreb" Tags: "Email"
Comments Send by mail Print  Save  Delicious 
Date: Sunday, 08 Jun 2008 16:10

I just learned of Perst, which is described as an open-source embedded object-oriented Java database (meaning database management system, of course) which claims ease of working with Java objects, “exceptional transparent persistence”, and suitability for aspect-oriented programming with tools such as AspectJ and JAssist. It’s available under a dual license, with the free non-commercial version under the GPL. (There is a .NET version aimed at C# as well, but I’ll stick to what I know; presumably the important aspects are closely analogous.)

I have some familiarity with PSE Pro for Java, from what is now the ObjectStore division of Progress. Its official name is now Progress ObjectStore PSE Pro. I’ll refer to it as PP4J, for brevity. I was one of its original implementors, but it was substantially enhanced after I left. I also have a bit of familiarity with Berkeley Database Java Edition, a more recently developed embedded DBMS, which I’ll refer to as BDBJE.

PP4J provides excellent transparency, in the sense that you “just program in Java and everything works.” It does this by using a class-file postprocessor. However, Perst claims to provide the same benefit without such a preprocessor. It also claims to be “very fast, as much as four times faster than alternative commercial Java OODBMS.” Of course, “as much as” usually means that they found one such micro-benchmark; still, it would be uninteresting had they not even claimed good performance. And it has ACID transactions with “very fast” recovery.

Those are very impressive claims. In the rest of this post, I’ll examine them.

Who Created Perst?

I always like to glance at the background of the company. In particular, I like to know who the lead technical people are and where they worked before. Unfortunately, the company’s management page only lists the CEO, COO, and Director of Marketing, which is rather unusual. They’re in Issaquah, WA; could the technical people be ex-Microsoft? It’s important to note that McObject’s main product line, called eXtremeDB(tm), is technically unrelated to Perst.

But I found a clue. The Java package names start with org.garret. It’s usually hard to retroactively change Java package names, so they’re often leftovers from an earlier naming scheme. By doing some searches on “garret”, I found Konstantin Khizhnik, a 36-year old software engineer from Moscow with 16 or so years experience, who has written and is distributing an object-oriented database system called “GOODS” (the “G” stands for “Generic”). His most recent release was March 2, 2007. He has a table that compares the features of GOODS with those of other systems, including Perst. At the bottom it says: “I have three products GigaBASE, PERST and DyBASE build on the same core.” He also has an essay named Overview of Embedded Object Oriented Databases for Java and C# which includes an extensive section on the architecture of Perst. This page also has some micro-benchmark comparisons including Perst, PP4J, BDBJE, and db40, but not GOODS. Perst comes out looking very good.

He even has a table of different releases of several DBMS’s, including GOODS and Perst, saying what changes were made in each minor release! But at no point does he say that he was involved in creating the Perst technology.

He mentions the web site perst.org. There’s nothing relevant there now, but Brewster’s Wayback machine shows that there used to be, starting in October, 2004. It’s quite clearly the same Perst. And the “Back to my home page” link is to Knizhnik’s home page. Aha, the smoking gun! By December, 2005, the site now mentions the dual license, and directs you to McObject LLC for a non-GPL commercial license. In 2006, the site changes to the McObject web site. McObject has several other embedded database products and was founded in 2001. This strongly suggests that McObject bought Perst from Knizhnik in 2006.

I joined the Yahoo “oodbms” group, and there’s Knizhnik, who is apparently maintaining FastDB, GigaBASE, and GOODS. He also wrote Dybase, based on the same kernel as GigaBASE. He announced Perst Lite in October, 2006. Postings on this group are sparse, mainly consisting of announcements of new minor releases of those three DBMS’s

The Tutorial
The documentation starts with a tutorial. Here are the high points, with my own comments in square brackets. My comparisons are mainly with PP4J, which likewise provides transparent Java objects. BDBJE works at a lower level of abstraction. [Update: BDBJE now has a transparent Java object feature, called DPL.] I don’t know enough about the low-level organization of BDBJE or of the current PP4J to make well-informed qualitative comparisons.

Perst claims to efficiently manages much more data than can fit in main memory. It has slightly different libraries for Java 1.1, 1.4, and 1.5, and J2ME. There is a base class called Persistent that you have to use for all persistent classes. [This is a drawback, due to Java's lack of multiple inheritance of implementation. PP4J does not have this restriction.] They explain a workaround in which you can copy some of the code of their Persistent.java class. [That sounds somewhat questionable from a modularity point of view, and doesn't help you for third-party libraries unless you want to mess with their sources.]

Files that hold databases can be stored compressed, encrypted, or as several physical files, in no file at all for in-memory use, and there’s an interface allowing you to add your own low level representation. Each database has a root object in the usual way. They use persistence-by-reachability [like PP4J]. There is a garbage collector for persistent objects. However, there is also explicit deletion; they correctly point out that this can lead to dangling pointers. [The fact that they have it at all suggests that the garbage collector is not always good enough.]

There are six ways to model relationships between objects. To their credit, they have a “(!)” after the word “six”. You can use arrays, but they explain the drawbacks to this. The Relation class is like a persistent ArrayList. The Link class is like Relation but it’s embedded instead of being its own object [huh?]. The IPersistentList interface has implementing classes that store a collection as a B-tree, which is good for large collections but has high overhead for small ones. Similarly, there is IPersistentSet. And finally there is a collection that automatically mutates from Link to a B-tree as the size gets larger. [PP4J, I believe, offers equivalents of the array, the list, and the set, and the list and set do the automatic mutation.]

How can they do transparent loading of objects, i.e. following pointers? They give you two choices, which can be set as a mode for each object: load everything reachable from the object, or make the programmer explicitly call the “load” method. They claim that this is usually sufficient, since your Java data structures usually consist of clusters of Java objects that are reachable from one head object, with no references between such clusters.

They assume that you always want to read in the whole cluster when you touch the head object [often true, but not always]. Also, when you modify an object, you must explicitly call the “modify” method, unless this is one of Perst’s library classes, whose methods call “modify” on themselves when needed. They say “Failure to invoke the modifymethod can result in unpredictable application behavior.”

[This is not what I would call "transparent"! PP4J is truly transparent, in that there is neither a "load" nor a "modify". PP4J always does these automatically. The Perst tutorial does not say what happens if you forget to call "load" when you were supposed to. Not all Java data follows their cluster model. PP4J depends for its transparency on the class postprocessor. As I recall, the postprocessor runs quickly enough that it doesn't seriously impact the total compile time. The only problem I had with it, as a user, was that it doesn't fit with the model assumed by the interactive development environments such as IntelliJ, requiring some inelegance in setting up your project.]

Perst has collections with indexes implemented as B-trees and allowing exact, range, and prefix searches. The programmer must explicitly insert and delete objects from indexes; if you make a change to some object that might affect any indexes, you have to remove the object from the index and re-insert it. [So you need to know in advance which things might ever be indexed, or pessimistically assume that they all are, and so remove and re-insert whenever the object is changed in any way. [I am pretty sure that PP4J does this automatically.] You can index on fields directly, or you can create your own index values (since you’re inserting explicitly) that could be any function of the indexed object. [That's very useful, and I cannot remember whether PP4J provides this.] Keys can be compound (several fields). They provide R-tree indexes and KD-tree indexes, useful for 2-D operations such as finding a point within certain constraints. They also provide Patricia Tries, bit indexes, and more. [Wow, how do they fit all that into such a small footprint?]

Transaction recovery is done with a shadow-object mechanism and can only do one transaction at a time. (So ACID really means AD.) [Like PP4J, at least in its first version.] The interaction of transaction semantics with threads, always a sticky issue, can be done in several ways, too extensive to go into here. [This looks very good.] Locks are multiple-reader single-writer. Locking is not transparent in basic Perst [Bad!], but there’s a package called “Continuous” which does provide transparency, although it’s not described in the tutorial. [So beginning users also have to remember to do explicit locking?] Two processes can access a single database by locking the whole database at the file system level; it works to have many readers.

There is a class called “Database” that provides semantics more like a conventional DBMS. It maintaints extents of classes. [Note: that means instances of the class can never be deallocated.] It can created indexes automatically based on Java annotations, but you still must do manual updates when the indexed object changes. It uses table-level locking. It has a query lanauge called JSQL, but it’s unlike SQL in that it returns objects rather than tuples, and does not support joins, nested selects, grouping, nor aggregate functions. You can “prepare” (pre-parse) JSQL queries, to improve performance if you use them many times, just as with most relational DBMS’s. A JSQL query is like a SQL “where” clause, and it uses whatever existing indexes are appropriate.

Schema evolution is automatic, and done per-object as the object is modified. It can’t handle renaming classes and fields, moving fields to a descendant or ancestor class, changing the class hierarchy, nor changing types of fields that aren’t convertible in the Java language. There’s an export/import facility that you’d use for those changes. You can physically compact the database files. Backup and restore are just like files [you have to back up the whole thing, not just changes, which is probably true of PP4J as well.] You can export to XML.

Perst supports automatic database replication. There’s one master, where all writes are performed, and any number of slaves, where reads can be performed. This lets you load-balance reads. It’s done at page granularity. You can specify whether it’s synchronous or asynchronous. You can add new slaves dynamically. For high-availability, it’s your responsibility to detect node failure and choose a new master node. [PP4J did not have this, the last time I looked.]

Recent Press Releases from McObject

Version 3.0 has new features. There is a full-text search, much smaller in footprint than Lucene and geared specifically to persistent classes. The .NET version supports LINQ. They list CA’s Wily Technology as a successful reference customer, for a real-time Java application.

Perst is used in Frost, which is a client for Freenet. “Frost is a newsgroup reader-like client application used to share encrypted messages, files and other information over Freenet without fear of censorship.” They switched from “the previous used SQL database” [it turns out to be something called McKoi] because its recovery was unreliable (leaving corrupt databases), Perst’s schema evolution is easier to use, the databases are smaller (because Perst can store strings in UTF-8 encoding), and because they could get better performance from Perst as the database size grew.

Perst has been verified as compatible with Google’s Android. They provide a benchmark program comparing performance of Perst against Android’s bundled SQLList. It’s a simple program that makes objects with one int field and one String field, and an index on each field. It inserts, looks up, etc. [It would be easy to recode it for PP4J or for B2B4J.]

The Download

The basic jar file is 530KB. “continuous” (see above) is another 57KB.

There’s more documentation, which goes into great detail about the internal implementation of how objects are represented and how atomic commit works. [It's extremely similar to PP4J. (The original version, anyway; it might have changed since.)]

There are other features, for which I could not find documentation. For intsance, each persistent class can have a “custom” allocator that you supply. You could use this to represent very large objects (BLOB/CLOB) by putting them in a separate file. In the database, you’d store the name of this file, and perhaps an offset or whatever. Also, there is an implementation of the Resource Description Framework (RDF, used by the Semantic Web to represent metadata).

There are lots of parameters that you can set from environment variables. I was not able to find documentation for these. The one that interests me most lets you control the behavior of the object cache. The default is a weak hash table with LRU behavior. Other possibilities are a weak table without the LRU, a strong hash table (if you don’t want to limit the object cache size), and a SoftHashTable which uses a Java “soft” hash table.

The code is clearly written except that it’s extremely short on comments.

Overall Evaluation

Perst is a lot like PP4J. To my mind, the most important difference is the degree of transparency. I greatly prefer PP4J’s approach of providing complete transparency, i.e. not requiring the use of methods such as load and modify. This has two advantages. First, your code is clearer and simpler if isn’t interrupted by all those calls to load and modify. Second, without transparency, it’s far too easy to forget to call load or modify, which would cause a bug, in some cases a bug that’s hard to find. Another problem is that the reference documentation is clearly incomplete and needs work. The tutorial, though, is quite clear and professionally-written, and very honest about the tradeoffs, pros, and cons of the product design. Personally, if you want to my respect, that’s how to do it!

However, it has a bunch of features and package that PP4J doesn’t (as far as I know).

I don’t know anything about the pricing of either product.

On the whole, for what it’s aiming for, Perst appears to be a very good, and a real competitor in this space.

Author: "dlweinreb" Tags: "Database, ObjectStore, Uncategorized"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 30 Apr 2008 12:49

This Is Your Brain On Music, by Daniel J. Levitin, is the most exciting science book that I’ve read in a long time. It’s all about music: what is music, how do we perceive music, why do we care about music, and, primarily, what do we know about how the mind and brain react, process, and create music.

Some facts that I learned:

If I put electrodes in your visual cortext, and then I showed you a red tomato, there is no group of neurons and will cause my electrodes to turn red. But if I put electrodes in your auditory cortext and play a pure tone in your ears at 440 Hz, there are neurons in your auditory cortex that will fire at exactly that frequency, causing the electrode to emit electrical activity at 440 Hz — for pitch, what goes into the ear comes out of the brain! I find this amazing.

If you’re familiar with the phenomenon of restoration of the missing fundamental, in which you perceive the fundamental pitch if you are played only overtones of the pitch: it turns out that you can put in an electrode, play music with the fundamentals missing, and the electrode actually shows energy at the fundamental frequency! The very fact that we can know things like this is exciting.

Ordinary people, when asked to sing a song (of which there is one well-known canonical recording, such as most pop songs), will sing back the song at almost the exactly correct tempo! (They are accurate within 4%, which is as good as most people can perceive anyway.) They also often get the key right, even though few people have “perfect pitch” per se. I would never have guessed this.

The brain stem and the dorsal cochlear nucleus — structures so primitive that all vertebrates have them — can distinguish between [musical] consonance and dissonance; this distinction happens before the higher level, human brain region — the cortex — gets involved.

The book is extremely readable and fun. It teaches you all the music theory you need to know. In fact, his basic music theory section is the best quick introduction to music theory I’ve ever read. The author has been a professional producer, so he knows a lot about how modern music recordings are made. He currently runs the Laboratory for Musical Perception, Cognition, and Expertise at McGill Unversity, and has published a lot in serious scientific journals. That’s a combination of expertise that may be unique. He knows several well-known musicians and quotes from them; what Stevie Wonder and Joni Mitchell have to say is quite interesting. The book is available in trade paperback for only $15 US.

Author: "dlweinreb" Tags: "Uncategorized"
Comments Send by mail Print  Save  Delicious 
Date: Monday, 28 Apr 2008 13:37

Last night, I saw a screening of a new independent film called Nerdcore Rising.  The filmmakers follow a country-wide tour of a band called MC Frontalot, who are leaders in the genre of nerdcore hip hop, also known as geeksta rap.  (That’s the name of the rapper himself, as well as the band, I think.) As I’m sure you can guess, it’s a cross between actual musical talent and self-deprecating ironic humor. Afterwards, we went to a party at Mantra Restaurant in Boston, which was turned into a nightclub for the occasion.  ITA Software, my employer, sponsored the event.  (Of course, for us it was a recruiting event, at least putatively.  Come work here!)

The director, Negin Farsad, and many members of her team, were there and talked about the film.  At the end, the members of the band walked out and took questions.  Apparently there are quite a few nerdcore hip hop bands; estimates range from 50 to hundreds.

I’m not a rap fan at all, and I admit that I had trouble following the lyrics, which went by very quickly.  And at the nightclub, everything was much too loud, and the vocals were mixed much too low (as always seems to be the case in these venues).  Despite all that, I had a very good time.

Despite Farsad’s protestations that she only just learned how to make movies, it’s quite professionally done.  The editing is great, and keeps the movie moving right along.  It’s exciting and funny.  If you get a chance, check it out.  There’s plenty of music (with lyrics, yay!)  at MC Frontalot’s web site.

Author: "dlweinreb" Tags: "Uncategorized"
Comments Send by mail Print  Save  Delicious 
Date: Saturday, 26 Apr 2008 20:26

I had a great time at the European Common Lisp Meeting (ECLM) in Amsterdam, April 19-20, 2008. I met many of the important people in today’s Common Lisp world, an almost completely generation set of folks as compared to 15 years ago. The papers were excellent, and demonstrated that Common Lisp is still a vibrant and uniquely powerful language. (I’m writing this on the plane back home, on my OLPC laptop, as I learn to touch-type with my big hands on the child-friendly little keys.) Arthur Lemmens and Edi Weitz did a great job organizing and running the meeting. Everything went entirely smoothly and I felt that everyone enjoyed it very much.

InspireData
Jeremy Jones of Clozure Associates demonstrated InspireData, an educational application that lets you analyze data and draw conclusions. The user interface and interaction design is superb. It performs very well, even on weak, old PC’s, which matters since that’s what many schools actually have. It has gotten excellent reviews and sales, and is widely used in schools.

The user would have no reason to know that it was written in Lisp. No Lisp is exposed to the user. They used LispWorks, since it runs on all of their target platforms (including Windows 95, as well as more modern Windows and MacOS X), it has a good interactive development environment, it provides a portable library for accessing the platform’s native menus and other widgets (called CAPI), and had favorable licensing terms. The rest of the graphics was done by calling the OpenGL graphics library, using a library from LispWorks. They found all of these technologies to work very well.

They wrote it as a contract job, building it to specs provided to them in a 200-page document. It took 8 person-years of developers plus two person-years of Q/A, who were brought on board from the very beginning. It’s 270K lines of Lisp code plus 470 lines of C.

The primary advantage of using Lisp is that they could produce a prototype in only two months, and then do incremental additions and refinements. You might ask, why was this necessary if they had a 200-page requirements document?

  • The specs were vague. It would have taken over a 1000-page spec to really be unambiguous. (In my opinion, that’s absolutely normal for software.)
  • The specs kept changing. (That always happens. Always!)
  • In particular, the designers would change the specs because of what they saw the program doing. In other words, specifying it in advance would have been impossible in any number of pages. Design and implementation must be interleaved.
  • Even if the spec were known in advance. the best implementation techniques are not initially appanrent. Sometimes you have to get pretty far along in the implementation to realize that some atrchitectural decision did not work out well.

Lisp is very malleable. Experience over the years has shown that even large Lisp systems are particulary easy to re-factor and even re-architect. (I have seen this over and over again.) In fact, Jeremy feels that they didn’t re-architect enough! (One usually hears the opposite lament.) He emphasized that iterative development — build, test, refine — was the only way to go and the only way they could have succeeded.

LispWorks performance in this application is excellent. As I could see in the demo, it is extremely responsive. Jeremy says he has never perceived any delay from the garbage collector. InspireData is a shining proof that real applications can be done just fine in Common Lisp.

FEMLISP

Nicholas Neuss (IANM, U. Karlsruhe) presented FEMLISP, a system to do finite-element analysis (FEM). FEM is used for solving partial differential equations. It’s used to model things like convection, diffusion, viscous fluid flow, and so on.

I had thought this would just be a numeric library with some API, and wondered why doing it in Lisp would be helpful or make any difference. But it’s not like that at all. First, choosing a good way to run FEM is a hard problem. I only sort-of understand the issues, but I got a sense of it. There’s a big issue of how you “discreetize” and solve the discrete problem. You also must make decisions about how to set the mesh. Second, you want an interactive environment that lets you display results graphically, and make small changes to the input spec and try again. Ideally, you’d like an expert system to make these decisions, but what he’s done so far he described as “rudimentary”.

He created small domain-specific language to represent of how to run a particular FEM problem. This lives in a Slime buffer and can be edited and recompiled quickly and conveniently. It can display the history of the iterations so you can see what’s going on and refine your input. You can insert Lisp code into the input, for computing or debugging.

It runs on CMUCL, SBCL, LispWorks CL, and Allegro CL. He does graphics with OpenDX (Data Explorer), which was written by IBM and open-sourced. (He is considering switching to VTK.)

Why did he use Lisp?

  • Dynamic typing worked out well
  • Macros and a few reader macros let him make an embedded domain-specific language easily
  • Dynamic testing and debugging (read-eval-print loop, etc.)
  • High performance compiled code (as compared to Guile and other Scheme implementations that he tried)
  • Common Lisp is stable; once you learn it, you know it
  • There is no system/user dichotomy
  • It only took 30K lines of code
  • There were lots of useful libraries

Performance as compared to other available FEM packages is hard to determine for many reasons. For example, who chooses the benchmark? Can you find an informed third party to spec and judge the procedure? How do you know you’re not comparing apples and oranges? (These are all standard pitfalls of benchmarking.) Also, he has not spent too much time on performance improvements anyway.

Nevertheless, he ran some basic comparisons against their company’s in-house FEM system, called M++, not only to measure speed but to make sure he got the same answers (he did). M++ turned out to be faster on small problems, but FEMLISP was faster on larger problems.

One reason for this is extremely interesting. Apparently there is a certain well-known technique for speeding up FEM. He had implemented it, but they had not yet done so. This illustrates the principle that higher-productivity software development can lead to faster performance! When considering the effects on performance of using Lisp, take this into account.

In other tests he found that FEMLISP is about as fast as a leading commercial product (FEMLAB) for comparable accuracy, and much easier to use.

So far he has not tried to encourage other people to use it, mainly for political reasons (his boss wrote M++). He used FFI for certain existing libraries (e.g. LAPACK).

Large Internet Systems

Stefan Richter of freiheit.com talked about “Using Common Lisp for Large Internet Systems”. His company, freiheit.com technologies (it means “freedom”, in the sense not having to use the Microsoft platform any more!) has built many commercial web sites in Java. They have 60 developers, most using Java, but also a 6-person Common Lisp group. In an unusual twist, the manager of the group had to convince the reluctant programmers to use Common Lisp. Also, the clients had to be convinced that accepting a product in Common Lisp was OK. They have delivered one Common Lisp application so far, a social marketing tool.

By “large internet systems” he mainly means scalable web sites. Unfortunately, he has not actually built such a thing in Lisp yet. The talk suggests approaches to the problem, but he did not have actual experiences to report. He primarly prefers Lisp because he feels that Java is too verbose, and Ruby is basically like Lisp.

He explained a lot about how to build scalable and reliable servers (all of which I was very familiar with from my work at BEA and at ITA Software). Clusters, load balancers, stateless app tier, separate DBMS’s transactions and reporting, shared memcached distributed cache, keeping functionally separate data on separate DBMS’s, plus one idea that’s still new or in the short-term future: shared-nothing database clusters using “shards” with replicated data for reliability. All of this is completely right, in my opinion, and I don’t think any of it is controversial.

Java has many good tools for doing such an archtecture: Tomcat providing a framework for servlets/JSP’s, a memcached client, Hibernate for database access, and even Hadoop (a free MapReduce implementation).

How does Common Lisp compare? We have Hunchentoot (a sophisticated HTTP server), cl-memcached (a memcached client), cl-sql (to invoke relational DBMS’s), and two advanced tools for generating HTML: Weblocks (by Slava Akhmechet, I think), and UnCommon Web (by Marco Baringer).

He also suggests using cl-muproc (a library that provides Erlang semantics in Common Lisp, basically) which he feels could be a good basis for a Common Lisp MapReduce. I don’t know exactly what he has in mind here, but apparently he has implemented this.

He doesn’t like existing conventional technology for generating web pages. Servlets clumsily embed HTML in Java code; JSP’s clumsily embed Java code in HTML. Using Common Lisp has many of the advantages of other popular languages that are being used to write HTML generation, such as Ruby, Groovy, and Python. Lisp has major advantages: you don’t have to write out files in order to compile things; CLOS is very useful, including the MOP; we can avoid the need for XML because programs and data use the same format; and of course macros help in all kinds of ways. (And, I was thinking, Common Lisp implementations typically execute code much faster than Ruby and Python!)

He talked about using continuations to save state between HTTP interactions. (Many papers have been written on this topic.) You want to be able to write a program in a normal style, that can say “do this web interaction” in the middle of any procedure; this makes flow of control much easier to understand. A continuation saves stack and execution state across interactions. He talked about Weblocks and how it uses continuations, as well as many of its other virtues (it sounds great, from what he said; I have yet to learn about it).

He feels that what’s needed now is to put it all together, and then write a good book about how to use it. He points out that Ruby on Rails would never have taken off without the excellent book. (I agree completely!) He encourages us to write books, and help develop the framework libraries.

This all led to a lively discussion of continuations, particularly persistent continuations, and how to best implement them for Common Lisp. Weblocks uses the cl-cont library. Marco Baringer said that cl-cont’s continuation states are extremely large, leading to performance problems, although it would not be hard to improve this.

We also talked about just how reliable a system like this needs to be. It often turns out that in exchange for a very small about of unreliability, you can make big improvements in simplicity and performance. On a web site, it’s often quite acceptable to fail now and then, since the clients are human users who are much better at handling failure and retrying or finding alternatives.

PWGL

Kilian Sprotte described PWGL, a tool for music composition and analysis. It is based on an earlier system called Patchwork, by Mikael Laurson in his 1996 doctor’s dissertation, at IRCAM, the famous music lab in Paris. It is ten years old, and has always been in Common Lisp. Originally it was developed in MCL; now it’s based on LispWorks and runs on both Windows and Mac OS X. It’s now being developed at Sibelius Academy in Finland. It’s currently in beta-test, downloadable, and version 1.0 is expected later this year.

According to the description on the web site: PWGL is a free cross-platform visual language based on Common Lisp, CLOS and OpenGL, specialized in computer aided composition and sound synthesis. It integrates several programming paradigms (functional, object-oriented, constraint-based) with high-level visual representation of data and it can be used to solve a wide range of musical problems.

It’s a visual dataflow functional language; in some ways it’s like doing Lisp by drawing boxes and lines.

It uses OpenGL for graphics, the PortAudio library for recording, playing back, and basic sound synthesis, and the libsndfile library for reading and writing files containing sampled sound. (It was interesting to see how many Lisp systems are capable of using non-Lisp libraries easily. This is another important counter-argument to the objection that Lisp has too few libraries.)

Embeddable Common Lisp (ECL)

Juan Jose Garcia-Ripoll described Embeddable Common Lisp. ECL is not just for embedding: it’s a full Common Lisp implementation. It’s a descendant of Kyoto Common Lisp by Taiichi Yuasa and Masami Hagiya at the Research Institute. Juan is the maintainer.

It is designed for portability. Rather than generating machine code for various processors, it generates C, and then allows the target host’s C compiler to produce machine language. This approach lets it take advantage of the target compiler’s optimizations, and specific knowledge of the target architecture. (However, compilation is not very fast.) All platforms these days include a free C compiler (even Microsoft). It makes minimal architectural assumptions: a pointer can be cast to an int, and C functions can be called with many arguments, and a variable number of arguments.

It supports a wide range of operating systems: Linux, NetBSD, FreeBSD, OpenBSD, Windows, Solaris, and Mac OS X.

The core and the Lisp interpreter are written in C; the rest is in Lisp. It borrows the Boehm-Weiser conservative GC, and provides CLOS with the PCL implementation. It uses native threads. It also contains a byte-code compiler and interpreter (instead of direct interpretation of Lisp as s-expressions). The implementation of subtypep uses the efficient method described by Henry Baker, and works with CLOS types.

It can build standalone executables and dynamically-linked libraries, and this is why it’s called “embeddable”. But it can be used as a regular Common Lisp implementation too, so don’t be put off by the name!

For more details, see his paper.

House Developer

Kristoffer Kvello of Selvaag told us about House Developer, which is basically a CAD system for architects. It allows the architect to draw a very high-level drawing, and it takes care of filling in myriad specifics. It decides where to put windows and doors, and which way the doors should swing. It places electrical outlets and switches. It decides on wall types, wall offsets, wall junctions, heaters, fire exit paths, and so on.

There are many details, all of which must conform to regulations, company rules, and best practices. Doing all this by hand is costly, time-consuming, and error-prone. Automating it reduces errors, and lets the architect try lots of ideas and see their consequences promptly.

This is, in many ways, a classic rule-based expert system. They started writing it in 1994, using Knowledge-Based Engineering (KBE) technology of the time, which was primarily in Lisp. However, the rules are not like classic Artificial Intelligence rules; they are more like constraints. An example:

(define-attribute area (window)
    (* ?width ?height))

This defines a constraint that gets recomputed as necessary. These rules can use the full power of Lisp.

The core of the system is written in Allegro CL. There is a Java-based user interface, that sends S-expressions to the core. The core sends XML replies back to the user interface. It uses many available libraries: asdf, zip, cl-sql, cl-utilities, s-xml, aserve, Expresso Toolkit, and Screamer.

The Expresso Toolkit knows the STEP (Standard Exchange of Product data) and EXPRESS (an ISO standard modeling language), which are important standards in the architecture industry.

Screamer supports “non-deterministic programming; it does constraint satisfaction with mixed systems of numeric and symbolic constraints, based on a substrate that supports backtracking and undoable side effects.

The advantages of using Lisp for this system include:

  • Interactive development, with fast recompilation, incremental changes, no need to constantly re-create the global state
  • Break loops, with the ability to fix things and then restart
  • Reader macros, so that we could customize the syntax
  • Advice, so that we could customize behavior
  • It’s easy to inspect the image to find out what to customize
  • Extensibility in general
  • handler-bind, for use on our test framework
  • Many available relevant libraries, which worked fine

There have not been a lot of users so far, but they are planning to deliver it to a large customer soon.

High-Performance Architecture

Marc Battyani discussed a high-performance computer architecture, using Field-Programmable Gate Arrays (FPGA’s) that are programmed using a high-level special-purpose language, implemented in Lisp. He has a computer based on a Stratix II FPGA with memory and network. The FPGA has modules on it such as adders, multipliers, I/O pins, memory, and so on. Programming one consists of hooking up the modules up to perform a particular special-purpose function. A problem with FPGA technology is that programming them is so hard; the novel feature here is to use a Lisp-based language, called HPCC, that compiles a high-level description into the FPGA’s program.

They have implemented two applications so far. One prices exotic financial instruments using Monte Carlo simulation. Currently, this kind of thing is done with grids of 10K-10K boxes. The other does multicast networking at 1 million messages per second. They plan to get funding, hire more Common Lisp programmers, and do more applications.

Cells

Ken Tilton talked about his Cells library, a dataflow extension to CLOS. The basic idea is that the values slots are determined by a formula, like the cells in a spreadsheet. Cells tracks dependencies between cells and propagating values. He demonstrated widgets that grow and reshape graphics automatically.

Announcements

Randall Pitts is looking for Lisp programmers to work on a speech understanding project, that would help answer email, help call center agents, etc. They’re dealing with language, grammar, and syntax. You must work in Germany.

Nick Levine is looking for work. He has 20 years of Lisp experience and has been consulting for seven years.

Marty Simmons of LispWorks is looking for applications that use concurrency, to help test their new thread support.

One parting thought

One of the most widespread complaints about Common Lisp is the lack of available libraries. However, in several practical applications, we see here that there are many available libraries for Common Lisp that work well and can be built on.

Author: "dlweinreb" Tags: "Event, Lisp"
Comments Send by mail Print  Save  Delicious 
Date: Saturday, 12 Apr 2008 14:30

The OLPC’s security mechanism is called Bitfrost and was designed by Ival Krstic. It is novel in two ways. First, the set of threats it is concerned with are tailored to the unusual mission of the OLPC. Second, the fundamental philosophy and mechanisms are different from what most of us are used to. Ivan gave a talk this week at ITA Software’s Technical Seminar series, explaining Bitfrost. You can read his paper about it here.

A paper castigating Bitfrost, called “Freezing More Than Bits: Chilling Effects of the OLPC XO Security Model”, was recently written by Meredith Patterson (U. Iowa) and Len Sassaman and David Chaum (both of KULeuven in Belgium).

I could not find Patterson at U. Iowa’s web site or anywhere else, but she turns out to be Sassaman’s wife. Len Sassaman is a grad student, “cypherpunk”, and privacy advocate. He was the security architect for Anonymizer and wrote the Mixmaster anonymous remailer. David Chaum is well-known as the inventor of cryto protocols for anonymous electronic cash, and currently heads the Punchscan project, an end-to-end auditable voting interface.

As you will see, their paper has a lot to say about anonymity and voting. At first, you might not think of these as topic germane to the OLPC, but the authors feel otherwise. This is their primary area of interest, and so they have brought to bear their own agenda on OLPC. You can decide the extent to which that’s appropriate.

Here are the points they seem to be making, as far as I understand, with my comments and replies in square brackets.

Bitfrost isn’t finished, but some OLPCs are in the field anyway. [True.]

Eventually, it will be necessary to have a finalized and detailed specification for Bifrost that can be audited and tested. [Sure.]

Bitfrost has not been submitted to a recognized standards body. [First, so what? Second, it's clearly far too early to do that. The right time to standardize is after there has been a great deal of experience.]

The prototypes that they saw did not have the LED’s that show that the camera and microphone are on. [Current OLPC's do have this, but they didn't know whether it would happen or not.]

The stored digital identity includes the child’s name and photograph, so that you can authenticate whether a given person matches the digital identity. They “question the need for such invasive measures.” [But they don't go into more detail about what particular problem they are concerned with.]

“The data recovery process should be decoupled from the identity and authentication component.” [I was not able to follow their reasoning about why this is important.]

A sophisticated attacker could set up a bogus backup service if they can gain access to the key store. How would they do that? The paper cites “black-bag cryptanalysis” and “aluminum-briefcase cryptanalysis”. The former means burglary (the use of the word “cryptanalysis” is sardonic/ironic). The latter is a term that the authors made up themselves (one of them boasts of this in a blog entry) but apparently also means burglary. [Well, you have to pick and choose what attacks you want to prevent against. What if someone goes to the real server and puts a gun to the head of the operator? You just can't protect against every conceivable possibility.]

P_IDENT says that all communications such as email and instant messaging are cryptographically signed. It’s not explained exactly how this works, so they speculate. They assert that signing implies non-repudiability of all signed messages [note: non-repudiation means that the receiver can prove that the sender really sent this message, and the sender can't deny it unless he claims that his own key has been compromised]. “Ergo, it is impossible for XO users to use any form of anonymous communication with confidence.” They’re saying that the signing is bad because you can’t turn it off, or you have to know to turn it off. So anyone who intercepts your messages knows who you are, so speaking out against your government or whistleblowing against a corporation could backfire on you. It’s also not good for doing secret ballots. [I guess this is all true, but if I sent an email right now, I would hardly depend on it to be untraceable to me, even without a digital signature. Perhaps anonymity should be added to the goals for Bitfrost, if they intend for it to be used in those ways. But it's really for childhood education, not voting. It's a lot of work to add on every requirement in the world and try to do them all. If we were designing a voting machine, security goals would be different. There may be very good reasons that anonymity was not added as a goal, too; I'd like to hear from OLPC about this.

Because of the digital signing, a child's Internet access can be "cut off at the source", which would be traumatic. [Oh, come on!]

The point about “Imagined Communities”. [I don't know what they're talking about; evidently I'd have to read one of the citations.]

Most important, they do not provide any suggestions about what they’d do to mitigate what they consider to be problems. In my opinion, a criticism carries much less weight without specific counterproposals, since then you can evaluate the drawbacks and tradeoffs required by those counterproposals.

Now that Ivan Krstic has left OLPC, it is not clear to what extent Bitfrost’s implementation will be finished and polished. I heard one rumor on the net that OLPC plans to replace it with something else, but I have no idea whether that’s actually true. There are a lot of rumors going around about OLPC, and I’ll wait for positive confirmation before repeating any more of them.

Personal news, speaking of OLPC: Federal Express lost the OLPC that was originally sent to me (or it was stolen). It was basically impossible to get my money back from FedEx, since they required some paperwork from the shipper (Brightstar), who never answered my calls. I complained to OLPC, but for a while nothing happened. Meanwhile someone at ITA had bought one for his kid, who didn’t like it, so he sold his to me. Then, OLPC decided to simply send me another one! Good for them! I’m selling the second one to a friend.

So now I have my very own green-and-white ultra-cute laptop. I’ve upgraded it to the latest release and started to learn to use Sugar and the installed applications. Maybe someday I’ll punt Sugar and just use it as a Linux machine, but for now I want to try it out. The most important thing, as I knew it would be, is learning to touch-type on the little keyboard. But I can hunt-and-peck, more easily than I could on something like a Blackberry, so I can’t complain. I’m going to the European Common Lisp Meeting in Amsterdam next week, and I’ll bring it along and play with it more.

Author: "dlweinreb" Tags: "One Laptop Per Child"
Comments Send by mail Print  Save  Delicious 
Date: Monday, 24 Mar 2008 01:22

The “condition” (exception) feature of Common Lisp is important, but widely misunderstood, as can be seen by the frequent confusion between “conditions” and “errors”. I’ve been thinking about conditions and exceptions for many years, and here’s how I explain them.

Notes: I’m going to avoid using the word “error”, which has become overloaded. Some of the following applies to Java, but not all; I might write about Java exceptions in the future. I’ll omit the use of explicit catch/throw, for brevity. I’m only talking here about the simple heart of the condition feature, not fancy things like restarts.

Contracts, Bugs, and the Failstop Principle

Every function has a “contract” which defines what the function is supposed to do. If any function call violates the contract, the program must be incorrect: a “bug” has happened. The actual incorrect behavior might have started any time before we detect that there’s a bug.

If a program detects that a bug has happened, it should stop. That’s because if it keeps on going, there’s no way to know what it might do: write the wrong data to a file or database, display wrong answers, hang, etc. This is called the “failstop” principle.

(Exactly what “stop” means depends on the context. An interactive command might return to its event loop. A server thread might go back to its wait-for-input step. These are not perfectly safe, since the program might have corrupted transient state before the bug was detected. A safer way to stop is to kill the entire process. In Erlang, you only have to kill a thread, since each thread has its own transient state.)

Outcomes

The contract of a function specifies, among other things, the possible “outcomes” of calling the function. There is always one “usual” or “straight-line” kind of outcome, and then there can be zero or more “unusual” outcomes.

In Common Lisp, every function call either returns zero of more values, or else signals a condition. The caller discriminates on which kind of outcome this is by scrutinizing the values returned, or scrutinizing the condition that was signaled. The contract specifies the circumstances under which each kind of outcome happens, saying what values are returned or what condition object is signaled (plus what side-effects occurred) for each kind of outcome.

For example, suppose we call (open pathname :if-does-not-exist nil). Possible kinds of outcome are:

  • It returns a stream object. This means that the specified file has been opened for input.
  • It returns nil. This means that there was no file by this name in the file system. There are no side-effects.
  • It signals inappropriate-wildcard. This means that the pathname had was a wildcard pathname; it doesn’t make sense to open one. There are no side-effects.
  • It signals undefined-logical-host, and the instance’s undefined-logical-host-name is the name of the logical host. This means that it was a logical pathname whose host was not found in the set of translations.

(There are many other kinds of outcome. Sadly, Common Lisp does not actually specify what condition classes are signaled. You own contracts always should!)

If the call to open does any of these things, it is working properly and there is no bug. If the call to open returns something other than nil or an open-for-input stream to the specified file, or if it signals any other condition class, a bug has happened and the program should stop.

Conditions and bugs are entirely orthogonal. If you call open (as shown above) with a wildcard pathname, and it signals inappropriate-wildcard, that’s not a bug; that’s exactly what it’s supposed to do. If you call open and it returns a symbol, that’s bug, but no condition is signaled.

Commonly, when a function call ends with an unusual outcome, that’s specified to mean that there were no side-effects. There’s nothing theoretically wrong with specifying in the contract that a certain unusual outcome also has some side effects, but it’s not customary.

Tasteful Contract Design

When you design a function, you should first think of all the possible kinds of (correct) outcome. Then you should decide how each outcome will look to the caller: certain specific returned value(s), or certain specific conditions. This all becomes part of the contract for the function.

The general principle for making this choice is to consider which outcomes are the ones that a programmer is likely to expect and desire. You can’t always know for sure: different programmers might call the same function with different expectations. But it’s usually not hard to guess accurately. The “usual”, “straight-line” outcomes should always be a kind of returned value. The more unusual outcomes seems like it will be expected and important, the more likely you’d be to represent it by a kind of returned value than by a condition. All other unusual outcomes should be indicated by signaling conditions.

The main clue is the appearance of the function call. That’s mainly the function’s name, but it can also include the names of keyword arguments.

For example, (open "/a/b") should be defined to return a value only when it has actually opened a file, in which case it returns a stream. All other outcomes should be signals of conditions. However, (open "/a/b" :if-does-not-exist nil) suggests strongly that some outcomes (there’s no “b” in directory “/a”, or directory “/a” does not exist) should be indicated by returning nil, and conditions should be used for other outcomes.

Why Conditions are Better Than Special Return Values

It’s sometimes tempting to indicate unusual outcomes by having a function return a special value, or by having it return a second value. However, there are two drawbacks to this.

First, experience over many long years has shown that programmers often forget to check for the special values. Coding is hard and demands a lot of concentration. When a programmer is hard at work figuring out how to write an algorithm, it can be difficult to keep in mind all the possible outcomes of every call. There’s no excuse for it, but in real life, this is a common bug.

Bruce Eckel, in Thinking in Java, 2nd edition, correctly says:

In C and other earlier languages, there could be several of these formalities, and they were generally established by convention and not as part of the programming language. Typically, you returned a special value or set a flag, and the recipient was supposed to look at the value or the flag and determine that something was amiss. However, as the years passed, it was discovered that programmers who use a library tend to think of themselves as invincible — as in, “Yes, errors might happen to others, but not in my code.” So, not too surprisingly, they wouldn’t check for the error conditions (and sometimes the error conditions were too silly to check for [such as all the error values from printf]). If you were thorough enough to check for an error every time you called a method, your code could turn into an unreadable nightmare. Because programmers could still coax systems out of these languages they were resistant to admitting the truth: This approach to handling errors was a major limitation to creating large, robust, maintainable programs.

If an algorithm forgets to check for the special values, it will proceed as if the usual outcome happened. This means that the program is malfunctioning. A bug has happened but it has not been detected.

But if that unusual outcome is expressed as a signal of a condition, and the programmer forgets to handle it, the program will stop. This is what we want: failstop behavior.

(Exactly what “stop” means depends on context. In a server, there would probably be a handler-bind near the base of the stack that handles all conditions. This “ultimate handler” is called when a bug has been detected. It might write a stack trace to a log file, and then cause the thread to be restarted, for example.)

Second, even if you do remember to check for the special value, it often makes the program cluttered and harder to read. This is particularly annoying in Lisp, where it’s customary to write applicative forms where arguments to one form are themselves non-trivial forms.

I only have room here for a short example. The problems discussed above come up more often, and are harder to deal with, in much larger programs.

Suppose we have a configuration module that associates keys with URL’s. Looking up a key has two possible outcomes: the URL is found (usual) and no URL is found (unusual). The function url-host-name extracts the host name from an URL. If the URL does not specify a host name, that’s an unusual outcome. Finally, make-host creates and returns a host object, with the given host name.

We want to write a new function, get-host-from-configuration, which takes a configuration and key, and returns the host name of the specified configuration entry. There are two possible outcomes: the host, or an indication that we could not obtain it.

Version 1 disregards unusual outcomes:

(defun get-host-from-configuration (configuration key)
  "Returns the host associated with the key and the configuration."
  (make-host :name (url-host-name (read-url configuration key))))

Version 2 indicates unusual outcomes by returning nil:

(defun get-host-from-configuration (configuration key)
  "Returns the host associated with the key and the configuration,
or nil if it cannot be obtained."
  (let ((url (read-url configuration key)))
    (when url
      (let ((host-name (url-host-name url)))
        (when host-name
          (make-host :name host-name))))))

Version 3 uses conditions:

(defun get-host-from-configuration (configuration key)
  "Returns the host associated with the key and the configuration,
  signal host-not-in-config if the host cannot be found."
  (handler-case
        (make-host :name (url-host-name (read-url configuration key)))
    ((configuration-entry-not-found url-has-no-host) ()
        (error 'cannot-make-host-from-key :key key))))

Version 1 is nice and simple, but it doesn’t take into account the possibility of the unusual outcomes of its callees. Its contract cannot possibly be fulfilled.

Version 2 works, but it loses the applicative form. Every time we call a function, we have to stop, give the result a name, and check it before we can go on.

Version 3 keeps the applicative form. As long as everything has the usual outcome, it’s just like the simple code in Version 1. The “straight-line” code path is all in one place and easy to see. The infrequent unusual condition handlers are out of the way.

Conditions at the Right Level of Abstraction

You may be thinking: why not fix Version 1 by keeping the code, and just changing its contract to say

“Returns the host associated with the key and the configuration, signals configuration-entry-not-found if the URL was not found in the configuration, and signals url-has-no-host if the URL doesn’t have a host.”

In other words, we could make the callees use conditions, as with version 3, but just let the conditions propagate to the caller.

The problem with this is that it’s a modularity violation. The caller of get-host-from-configuration has no business knowing that there are URL’s involved at all. That’s an underlying implementation detail. Instead, get-host-from-configuration should indicate the unusual outcome, that it can’t make the host object, by signaling the cannot-make-host-from-key condition. It’s OK for the condition object to contain the key, since our caller clearly knows about the concept of keys since that’s an argument to get-host-from-configuration.

Similarly, it’s good for the read-url function applied to a configuration to indicate that it can’t find an entry by signaling configuration-entry-not-found rather than, say, file-not-found if the whole configuration file was missing. The caller of read-url has no business knowing whether the configuration is stored in a file or a database. We might even have two subclasses of configuration, file-configuration and database-configuration, but this would be hidden from the caller of get-host-from-configuration. Whether the configuration is stored in a file or a database is an internal implementation detail.

condition, serious-condition, and error Are Meaningless

Common Lisp defines three base condition classes named condition, serious-condition, and error. This is based on the misconception that you can tell whether the signaling of a condition is an “error” (bug) simply by knowing the class. But you can’t. Whether the signaling of a condition is a bug or not depends entirely on whether the function signaling it is defined to do so, or not. If I were designing a new dialect of Lisp, I would omit the classes serious-condition and error.

Why This Philosophy is Unconventional

Most explanations of conditions put little or no emphasis on functions having contracts that specify conditions. Few other explanations refer to the propensity of programmers to neglect to check special “error codes”.

Major Lisp texts, such as “Practical Common Lisp” and “Common Lisp: The Language, 2nd Edition” start off by acknowledging that signaling does not always mean that there’s an “error”, but they soon give up on that distinction. The word “error” is often used to sometimes mean what I call an “unusual outcome” and other times used to mean what I call a “bug”. I see these as extremely different phenomena that must be carefully distinguished.

The fact that the usual function for signaling a condition is called error greatly amplifies the confusion. If I were designing a new Lisp dialect, I would not call it that.

Bruce Eckel’s book says:

With an exceptional condition, you cannot continue processing because you don’t have the information necessary to deal with the problem in the current context. All you can do is jump out of the current context and relegate that problem to a higher context. This is what happens when you throw an exception.

As you see, that’s not how I would explain it at all. An unusual outcomes isn’t even necessarily a “problem”. It doesn’t mean you “cannot continue processing” any more than returning from the function means that.

Joel Spolsky doesn’t like exceptions at all. He considers them like “goto” statements, which everybody “considers harmful”, whereas I think that structured non-local exits do not have the problems cited in the “considered harmful” paper. He objects that “there is no way to see which exceptions might be thrown and from where”. But how are you supposed to program with functions whose contracts you do not know, exceptions or no exceptions? He says “they create too many possible exit points”; but whether you express unusual outcomes with exceptions or with special returned values, there are just the same number of them. He advocates using error codes, even though he admits that it makes programs far bulkier and makes it impossible to nest function calls.

Implementation and Portability Considerations

The Common Lisp specification makes tradeoffs between clean contracts and speed. For example, the addition function “+” ideally ought to be defined to signal a condition when either argument is a symbol. But, in order to allow generation of fast code on non-specialized hardware, its contract says that given a symbol, it may either signal, or return some integer value.

Some contracts in Common Lisp are deliberately incomplete in order to allow some implementations to add non-standard extensions.

Many contracts in Common Lisp do not specify particular condition classes to be signaled, but rather erely say that some outcome’s behavior is “a condition is signaled” without specifying a particular condition class nor instance variable values.

Topics For the Future

unwind-protect, unhandled conditions in cleanup handlers, chained conditions, Java exceptions, debugging, handler-bind, handling all condition classes, *break-on-signal*, polymorphism, with-error-context, condition names should say what happened, not where it happened.

Author: "dlweinreb" Tags: "Lisp, Software Engineering"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 16 Jan 2008 12:27

And now for something completely different. I don’t usually post about politics, but what with all the election coverage I hear every day during my commute, I can’t help but think about it. I am basically a “progressive”. Perhaps my politics is “defined” by the fact that I read The New Republic and The New Yorker, and listen to National Public Radio. So here’s how I look at the Presidential election.

Because the primary elections are in progress, we hear mostly about them. But what’s really important is the general election. Any of the major Democratic candidates is way, way better than any of the major Republican candidates. So, the only thing that matters is for a Democrat to win. And therefore, the only criterion that matters for the primary is to vote for the Democrat who is most likely to be able to win in the general election.

That means that my primary vote should be based on how I expect everyone else to vote in the general election.

What criteria will everybody else be using to make their decision in the general election?

Polls tell us relatively little. As we all know, there have been major failures in polling already. Many voters don’t make up their mind until they are actually in the voting booth. I believe that what voters say to pollsters is influenced by what they think the pollsters want to hear, and the way that they’d like to sound and to perceive themselves, which is not all that accurate a predictor of how they will actually vote. Things keep changing, often rapidly: one “gaffe” can have a big effect. Polls asking “if x and y were candidates in the general election, who would you vote for?” are particularly dubious, since so much will change between now and general election day.

I believe that Iowa and New Hampshire tell us very little. They are quite atypical in so many ways, particularly that New Hampshire allows independents to vote Democratic in the primaries. Obviously Michigan tells us nothing about the Democratic primary race, since the only major candidates on the ballot is Clinton.

Some people mostly care about the real issues and/or ideology. The Democratic candidates are quite close on all the issues. (The differences that are being debated now are very minor.) So from the point of view of my primary voting decision, none of that matters.

A lot more people vote on what they perceive as the “character” of the candidate. Among the most important “character” criteria are:

  • Is he/she someone I’d enjoy spending an evening with?
  • Is he/she the “kind” of person who respects my “kind” of person?
  • Is he/she optimistic?
  • Is he/she female?
  • Is he/she black?
  • Is he/she “cold”?

For example, last time, I believe that John Kerry was perceived by many voters as being an “Eastern elite liberal sushi-eating Chardonnay-sipping type who snickers at ordinary working-class people”. It makes no difference whether that’s true or not; the only thing that matters is the perception. George W. Bush affects a Texas accent (gee, his dad and brother don’t talk that way!), is seen in jeans clearing sagebrush from his ranch (or whatever), and so on, portraying himself very differently from Kerry.

(The fact that W. avoided war service, while Kerry served honorably, was a cause of cognitive dissonance to some voters. They were looking hard for an excuse to dismiss this fact, and the “Swift Boat” people, no matter how implausible their claims, gave them the excuse they needed to make them feel happy opposing Kerry.)

I believe that Clinton cannot possibly win the general election. The Republican attack machine will have a much easier time sliming her than Obama (or Edwards). So many people already loathe her; her “negatives” are about 40% now (this is one polling result that I have some confidence in), and (I am told) negatives are much less volatile than positives. Last summer, I met an otherwise-reasonable woman who told me that she would never vote for Clinton, “because she didn’t divorce Bill”. And a lot of people, most of whom won’t admit it, would not vote for any woman.

Obama is a brilliant and inspiring orator, very optimistic and positive, and I don’t think he will be perceived as an “Eastern elite liberal etc.” as Kerry was. I am afraid that his major drawback is that he’s perceived as being black, albeit not with the kind of negatives that you’d get from, say, Al Sharpton (to put it mildly). I think more people will claim to be comfortable to be voting for a black candidate than actually are.

A lot of these things can be influenced by advertising, organization, and endorsements; sadly, people can be influenced (the advertising industry is not crazy to spend all the money they spend). Clinton has a lot of strengths in these areas that will help her a lot in the primaries.

I think that all things considered, Obama is less unelectable than Clinton, which is why I’m going to vote for him (unless something big changes in the next few weeks).

My prediction of the outcome, which has not changed in many months, is that Clinton will win the Democratic primary and lose the general election.

And what do I actually believe (as opposed to how I’m going to vote)? I prefer Obama, although any of Obama, Clinton, or Edwards would be just fine with me as President.

I do not buy the argument that Clinton had “experience”. Perhaps Biden and Dodd has experience that matters, but they’re out of the race. Clinton was merely a kibitzer; she didn’t even have a security clearance. And the Republicans will make great hay out of this. (And why does she say she was named after the recently-deceased Sir Edmund Hillary, who didn’t become famous until six years after Clinton was born? And what was going on when she made $100K in fast commodities trading? And those are just two of the real issues, not counting all the fake ones like Whitewater.) And they will find ways to make the point “you don’t want a woman as president”, without saying so in those words. Attacking her will be like shooting fish in a barrel. It’s going to be very ugly and unpleasant.

I hope that John McCain wins the Republican primary; he’s the least awful of them.

OK, now I’m on the record, here on the wonderful Internet where nothing ever goes away. I’ve been wrong about politics many times before, and the present situation is pretty volatile and hard to predict. (As have been the last few general elections, which were extremely close.)

Author: "dlweinreb" Tags: "Politics"
Comments Send by mail Print  Save  Delicious 
Date: Sunday, 06 Jan 2008 15:16

Supercapitalism: The Transformation of Business, Democracy, and Everyday Life, by Robert Reich, is the most interesting book on the American government and how it relates to our lives that I’ve read in a long time.

In a nutshell, the thesis is: “Capitalism has become more responsive to what we want as individual purchasers of goods [and investors], but democracy has grown less responsive to what we want together as citizens.” “Democracy means more than a process of free and fair elections. Democracy, in my view, is a system for accomplishing what can only be achieved by citizens joining together with other citizens–to determine the rules of the game whose outcomes express the common good.”

He makes what I consider an extremely cogent, well-written, and persuasive argument to support this. He explains why it happened, focusing on the history of the American economic and governmental structure starting in the post-War era (the fifties), and showing how major changes started in the Seventies and continued to the present.

He offers explanations for:

  • Why CEO pay has soared into the stratosphere and what prevented it from soaring before
  • Why inflation has become less of a threat than it was three or four decades ago
  • Why antitrust laws are less important today as a means of restraining economic power than they were previously.
  • Why there are so many more corporate lobbyists and lawyers in Washington, D.C.
  • Why politicians demand that companies be “patriotic” and put America before other nations
  • Why a bigger fuss is being made over corporate philanthropy when corporations were never set up to be charitable institutions
  • How someone can fret about the decline in hourly wages and simultaneously hunt for the best deal from China or India, which is often at the expense of an American’s wages or even job.
  • How someone can lament the decline of independent retailers on Main Street while at the same time do most of their shopping at big-box retailers and online.
  • Why a person who is deeply concerned about global warming might nonetheless buy an SUV
  • Why politicians like to publically excoriate CEOs but then enact no laws making what they did illegal
  • Why the move toward improved corporate governance makes companies less likely to be socially responsible
  • Why the promise of “corporate democracy” is illusory
  • Why the corporate income tax should be abolished
  • Why companies should not be held criminally liable
  • Why shareholders should be protected from having their money used by corporations for political purposes without their consent
  • Why large companies have less economic power now than they had three decades ago
  • Why the immense increase in corporate lobbying is due to a decrease in the market power of the corporations

You may be thinking about some of these points: “Oh, it’s obvious why.” But Reich’s explanations are often not what you’d expect.

None of this happened because of Ronald Reagan or Margaret Thatcher; the trends were clearly under way before they came to power, and the same trends can be seen in other countries to some extent. Neither were they caused by heroic or villainous CEOs; the changes are structural, not personal.

“The executives of Wal-Mart or any other large company are not brutally insensitive or ruthlessly greedy. They are doing what they’re supposed to do, according to the current rules of the game–giving their customers good deals and thereby maximizing the returns to their investors. Just like players in any game, they are doing whatever is necessary to win. But just as all games require rules to define fair play, the economy relies on government to set the economic ground rules. If government wanted to do something about the means Wal-Mart employs, it could change the current rules. In theory, it could enact laws to make it easier for all employees to unionize, require all large companies to provide their employees with health insurance and pensions, enact zoning regulations to protect Main Street retailers from the predations of big-box retailers, and raise the minimum wage high enough to give all working people a true living wage. All such measures would have the likely effect of causing Wal-Mart and other large corporations across the board to raise their prices and reduce returns to investors.”

Reich is not especially advocating that government should do these things. His point is to show what could happen, and why things are happening the way they are. He would like there to be more public conversation about whether or not to make these kinds of tradeoffs. The last sentence is “The first step, which is often the hardest, is to get our thinking straight.”

The writing style of the book is simple and direct, and fun to read. He has a lot of supporting facts and figures as well as good illustrative stories.

I believe that his overall point is extremely valid, and provides a useful framework for thinking about the vital issues of our economy and our government.

Author: "dlweinreb" Tags: "Uncategorized"
Comments Send by mail Print  Save  Delicious 
Date: Monday, 31 Dec 2007 21:39

This is a follow-up to my previous article about the success of OODBMS’s, and ObjectStore in particular. For people interested in the more technical story behind the ObjectStore object-oriented database management system, here are some stories that you might enjoy. You’ll see why it was harder to do than we had originally anticipated. There are also stories about problems with the business, with some cautionary tales that you could take into account the next time you start a company.

I’ve been involved with or heard about many high-tech startups. Nearly always, the product turns out to appeal to a set of customers who aren’t the ones the founders originally had in mind. Smart founders dynamically adjust. We found that our customers’ technical requirements varied somewhat, and we had to make a lot of improvements and changes to the product to meet these new requirements. That took a lot of engineering talent.

This essay includes very substantial contributions by my colleagues, which I have tried to organize into a cogent whole. Contributors, in alphabetical order:

Gene Bonte: Co-founder, CFO
Sam Haradhvala: Co-founder
Guy Hillyer: Senior engineer
Charles Lamb: Co-founder
Benson Margulies: Senior engineer and head of porting (after Ed)
Dave Moon: Senior engineer
Jack Orenstein: Co-founder
Mark Sandeen: Senior salesperson
Ed Schwalenberg: Senior engineer and head of porting
Dave Stryker: Co-founder, VP of Engineering

Porting Was Hard

We knew that porting ObjectStore was going to be hard. Dave Stryker recalls: “That was the thing we talked about most during the crucial first three months when we were working out the implications of the architecture.” However, by the time all was said and done, it turned out to be more work than we had originally anticipated.

We ported ObjectStore to an amazing number of architectures: many versions of Windows, many flavors of Unix, OS/2, you name it. I can hardly remember them all. Worse, we often had to do a port simply because a vendor produced a new C++ compiler! So we’d have a version for Solaris on the SPARC with C++ version 4, and another for Solaris on the SPARC with C++ version 5, and so on. We did ports to hardware that never made it big, like the NeXT, and hardware that never even reached the market. (What, you don’t remember the Canon workstation? As Mark Sandeen, one of our best salespeople, points out: “We never should have spent the time to port to platforms with minuscule market share.”) And every so often our sales force would book a sale on a platform that we didn’t actually support. Quick guys, get to work! Our porting group pulled off miracles, but all this took up a lot of engineering talent.

Ed Schwalenberg reminds me that “another bane of our porting existence was the set of orthogonal choices to be made in compiling a library: threads vs. non-threaded, shared vs. static libraries, 32- vs. 64-bit instructions, exceptions vs. non-exceptions, etc. All of those were in addition to the choice of compilers.”

By the way, the first thing that would happen whenever we did an ObjectStore port is that we would discover bugs in the vendor’s C++ compiler. Every single time! As Ed Schwalenberg says: “We were the world’s C++ compiler quality assurance department for a decade.”

Dave Moon points out: “A lot of the early technical problems in ObjectStore were caused by our building on very immature products from other vendors. Since they weren’t open source, we could not work around problems, and had to wait for the vendors to fix them. This is inherent in working at the bleeding edge.”

Fun fact: In the early days of C++, the designers at Bell Labs came up with a specification for the first version of parameterized types. This was of great interest to us, since we wanted to support “a set of Transistors” so that we could query over such a set, and so on. At that time, there was only one C++ implementation, from Bell Labs, known as “cfront”, which translated C++ to C. The guys at Bell Labs apparently were not good enough compiler hackers to implement parameterized types in cfront. So we did it for them (I believe Sam Hardhvala did the work) and gave the code back to them and the world, in an early instance of de facto open source collaboration. We got a nice press release out of it. We were very much among the world’s C++ experts at the time.

We also kept finding operating system bugs. ObjectStore needed to be able to create a “cache” file, and map each page, page by page, into the appropriate virtual address, and control its access permissions, using the Unix “mmap” and “mprotect” and “munmap” system calls. Then the application program would attempt to read or write a no-access page, or write a read-only page, causing a SIGSEGV fault. Our SIGSEGV handler would then, analogously to a page fault handler, figure out what had occurred, and do whatever needed to be done: fetch the page from the server if necessary, map the page into address space if necessary, set the access permissions, wait for locks when necessary, and so on, finally resuming the program. This was supposed to work in Unix, but Ed Schwalenberg says: “Recovering from a SIGSEGV did not work in any of the first dozen or so platforms we tried it on: Sun’s SunOS, IBM’s AIX, HP’s HP/UX, Digital Unix, OS/2, and the analogous thing on Win16, Win32s, and Windows NT. Every last one of these required a conversation with the relevant kernel development team to get the operating system fixed. Win16 and Win32s didn’t even have the concept of user-mode interception of memory faults, so we had to write kernel-level device drivers to add that capability. Also, SIGSEGV handling did not work recursively, anything that had to work inside a SIGSEGV handler could not, itself, take a SIGSEGV (this is fixed in modern versions of Unix and Windows).”

Here’s a story of an operating system bug. Solaris writes out all modified pages every N seconds. The ObjectStore “cache” file could get pretty big, and had lots of modified pages, but there was no need to write them out to the disk, since the file was discarded after a crash anyway. We acquired a customer, Telstra in Australia, who needed real-time response: ObjectStore was invoked after a customer dialed a special phone number, to look up another phone number, and the phone switch had unforgiving time limits. Sun suggested that we put the cache into a special “tmpfs” file system. Files in “tmpfs” aren’t written out, because they’re known to be temporary. That made perfect sense. Unfortunately, we got rare and unrepeatable weird bugs, which finally turned out to be because the SIGSEGV/mmap/mprotect feature almost worked on “tmpfs” file systems, but not quite. We got around it somehow, but I can no longer remember how.

We found that Solaris was taking a very long time to execute mprotect system calls. It turns out that the architects of Solaris had apparently assumed that there would be very few mapped regions of memory. They had not anticipated our architecture, which mapped a huge number of pages independently. So they were using a simple linear search. Guy Hillyer wrote an improvement to Solaris, using skip lists to make the search run in O(log n) time. The hard part was the politics getting Sun to accept our changes to Solaris! We only did this for Solaris, which was then our primary platform. (Maybe it should be done for Linux?)

When the new Windows technology (which was OS/2 at the time; IBM and Microsoft were still working together on it) came out, it was crucial for us that it be able to support memory mapping. Dave Stryker and Tom Atwood, flew out to meet with Bill Gates in September of 1989. Dave Stryker recalls: “We originally had a 45-minute appointment, but Gates extended the meeting to a couple of hours, and called in Dave Cutler [the architect of OS/2]. At Tom’s urging, we told Gates and Cutler everything they wanted to know about ObjectStore. Gates was complimentary of the Object Design approach, but said, in a nice enough way, that if the Microsoft Empire ever needed such a thing, they would build it themselves. Still, Gates told Cutler to make sure that the OS/2 equivalent to mmap was powerful enough to run ObjectStore, and there were some changes made to make it so.” Later, this OS/2 technology turned into Windows NT. Dave Moon adds that it turned to have a bug: it doesn’t free up disk space when it ought to. For some reason Microsoft hasn’t fixed this, even after many years. We found a way around it.)

Speaking of industry luminaries, we also met with Steve Jobs when he was at NeXT, and Jobs made a big announcement praising our technology, which resulted in a nice press release. There was some discussion that NeXT might buy Object Design, but that never went anywhere.

It turned out to be hard to support customers who wanted to use the same ObjectStore database from many different client architectures. We had to support what we called “heterogeneity”. First there was “architecture hetero”: some machines have big-endian numbers and some have little-endian numbers, and we’d have to convert, for example. Much worse was “compiler hetero”: different C++ compilers represented C++ objects differently in memory, due to run-time compiler “dope”, padding, and so on. Objects were not even the same size in different compilers, which was a huge problem. We had to know every last thing about how objects were laid out, where the compiler put padding, where the compiler put “dope” information such as “vtbl pointers” and various displacement offsets, etc. Our engineers came up with clever solutions to these problems, but it was hard and used up a lot of engineering talent. I think if we had realized that we’d run into this problem, originally, we might have never started the company at all, thinking the technical issues too daunting. It’s a good thing we didn’t think about it then!

The Virtual Memory Mapping Architecture

Was the page-mapping, virtual-memory mapping architecture worth it? Mark Sandeen says: “In competitive situations, against the other OODB companies, we sold on performance, performance, performance. Plus the fact that you got that performance by using an elegant architecture that was fundamentally different from anything our competitors had or ever would have. We used our incredible engineering team to win the benchmark wars, and then told our customers that the reason we won the benchmarks was the 2nd generation OODB architecture.” Sam Haradhvala, says: I still find the architecture almost as appealing as on day one of the company, and feel very lucky that we had a chance to see it realized in a product.

It would have been easier to port had we not gone for transparent persistence, and the goal that dereferencing a pointer was done in one instruction, exactly as in a non-persistent program. None of our competitors did this; for C++, they used the “overloaded operator ->” approach, in which dereferencing a pointer did a software operation that usually consisted of going through an indirection in an object table. Our justification was that CAD people would never tolerate a slowdown in the time it took to redisplay a drawing. So once the pages were faulted in, C++ operations would run at full speed. This led to all kinds of pros and cons. Concurrency control was totally transparent and foolproof; on the other hand, it was at page granularity, causing unnecessary conflicts sometimes. We didn’t expect this to be a problem in the classic CAD scenario since we imagined designers would usually not be working on the very same drawing at the same time. But other scenarios did run into this sometimes. However, difficulty of porting was our own problem, not our customers’ problem, so they didn’t know or care.

Dave Stryker recalls some more reasons we stuck with our original idea of using the memory-mapping architecture. “First, our competitors had staked out the strategy of overloading C++ dereferences. Object Design came into existence after Versant (then Object Sciences) and Objectivity, and needed to be differentiated from the competition. Second, our approach was really clever, and won many ideological converts based on cleverness alone. We could usually count on the smartest guy in the room being an ally, because using faulting was such an impressive intellectual accomplishment. Third, we had really smart engineers who enabled us to undertake obligations, particularly porting obligations, that with more prudence we might have avoided. With engineers that talented, you need really disciplined, far-sighted top management, because in the short term it’s perfectly clear that engineering can work miracles. It’s only in the longer term that the cumulative miracles sap all the capacity of engineering.” (Mark Sandeen also remembers that we were the last of these three startups, whereas Gene Bonte says we all started at the same time and remembers a lot of details about it.)

Dave Stryker says: “As you say, for the largest early customers database meant concurrency, and at that point at least it was difficult to avoid concurrency conflicts among simultaneous users. In my memory, it seemed that there were often fires burning because customers had trouble getting ObjectStore to work concurrently. I know this got a lot better in the years after I left Object Design.” A main technical problem is that locks were at the granularity of pages, so sometimes ObjectStore thought that there was a concurrency conflict even though there really wasn’t, and that would hold up processing until the other transaction was finished. This is inherent in the virtual memory mapping architecture. Our competitors often pointed out this drawback.

He goes on: “I’ve certainly wondered if the architectural choice of page faults and native-format on-disk objects was the right one. I was an enthusiastic booster of the page fault architecture, but it certainly made porting, multi-architecture access, schema evolution and so on much, much harder. [Ken Rugg says that Dave Moon has made huge improvements in schema evolution in the latest releases.] Certainly, the page fault / native object on disk architecture was instrumental in many of the CAD industry wins.” And, “The open-source industry makes me wonder what’s the future of software products like ObjectStore. At Multiverse where I work now, the large majority of libraries and development tools we use are open-source. The only things we buy right now are Microsoft licenses for Windows boxes and 3D modeling tools. The database is mySQL, and it’s going to be a fine solution for a fairly long time, because gaming isn’t hugely database intensive, even though the gaming objects would map naturally to an object database. In many product areas today, the best and/or most successful products are open source.” The whole concept of open source wasn’t around when we started 1988. (Neither were Unix threads. Nor was Windows. ObjectStore was aimed at the class of computers then known as “workstations”, primarily the Sun-3.)

Sam Haradhvala says: “I have often wondered like Dave and others on this thread whether the use of page faults and native on-disk representation was the correct one. It seems that it was the right choice at that time and conferred some rather unique advantages. Given the current state of technology and the hot issues of today, the limited flexibility inherent in the approach might very well dictate a different set of choices.” But he also says: “I still find the architecture almost as appealing as on day one of the company, and feel very lucky that we had a chance to see it realized in a product.”

Ed Schwalenberg also points out that our architecture, by doing so many things transparently, avoided huge numbers of bugs, much as languages with automatic storage management (e.g. garbage collection) save you from bugs in storage allocation and deallocation.

Ken Rugg notes that we had always intended to do some kind of declarative mechanism to help support clustering and reclustering, since that’s so crucial for delivering ObjectStore’s performance advantages. That still hasn’t been done yet, and perhaps never will be as the importance of C++ continues to decline.

Fun story: Ed Schwalenberg reminds me of a truly vexing case we ever ran into with the virtual-memory mapped architecture. The program went into a mysterious infinite loop. Guy Hillyer figured out that it had a single machine instruction that had both source and destination operands in ObjectStore-managed persistent memory, in two different “versions” (when we were trying to support a very sophisticated database versioning feature). Fetching from the source, in one version, was making the destination, in a second version, out of reach. Retrying would fault in the destination, putting the source out of reach, and so the single instruction could never make progress.

Performance

The high performance that we designed ObjectStore for really did come out as we expected it to. If your data had good spatial and temporal locality, and especially if concurrent access was relatively rare, it was extremely fast.

However, it turned out that it was not so easy to anticipate the performance that would result from using it in certain ways. Sometimes customers would come to us literally a week before they wanted to deploy their product. They had just tried running it under heavy load, or with multiple users, for the very first time (yes, a week before they planned to deploy!), and all of a sudden ObjectStore was becoming a bottleneck. We had some amazingly competent consultants, who could fly in and fix these problems for the customers very quickly, but not before there was some anger from the customers. Mark Sandeen goes so far as to say that few of our customers were able to build a deployable application without help from our consultants, which limits the scalability of the business model.

Charles Lamb points out: “I think this happens in any database company.” Indeed, there is a whole industry of Oracle experts; we have engaged several at my current company. Ed Schwalenberg says: “ObjectStore made it easy — too easy — for any C++ programmer to write a “database application”, while being ignorant of concepts like lock contention, database hot spots, etc. It was folks like that, who never tested more than one user until a week before launch, who sometimes gave us a bad name.” Everybody out there, take heed: do testing under serious performance load way, way before you’re going to release your product!

Sam Haradhvala, who has had extensive real-world experience with relational databases in the last few years, remembers: “ObjectStore was characterized as being like a Ferrari, which if tuned right by the experts could be made to run like one. Tuning an application, almost as an afterthought, is a common practice even in the relational database world. ObjectStore did make it easy for people to write database applications, without worrying about lock contentions, database hot spots, etc, but so do SQL and PL/SQL. So what was it about ObjectStore that made it a harder problem? If it had been possible in ObjectStore to use object level locks the way relational programs use row level locks, it would probably not have been as much of an issue, but this is one of those areas where the architecture puts you at a disadvantage.”

There were many competitive benchmarks. ComputerVision wrote an early one, aimed at determining OODBMS performance for CAD systems, and we spent a lot of time winning this. The one that took the most effort was the OO7 benchmark, describe in my previous posting. We spent a huge amount of time improving our performance on OO7. From the engineering point of view, this was very helpful. The OO7 crew at Wisconsin found many interesting performance problems that we didn’t even know about, many of which were easy to fix. I particularly remember how much benefit we got from setting the TCP_NODELAY flag. Meanwhile, the sales forces of every OODBMS company were using OO7 as a sales tool, each claiming to have gotten the best results! OO7 wasn’t really designed to compare competing products, but rather to act as an X-ray to analyze the systems and illustrate how they worked, and the researchers were rather unhappy to see it used in sales situations. Meanwhile tension developed as the benchmark was revised in order to make it a better X-ray. The problem was that each revision favored some vendors and disfavored others. Sadly, Ken Marshall decided that the OO7 team was intentionally trying to make Object Design look bad (because one of the researchers was on the technical advisory board of one of our competitors), and Object Design pulled out of the benchmark, invoking the clause in our license saying that customers could not distribute benchmark results. As you can imagine, the Wisconsin team was pretty upset about this. Charlie Lamb and I eventually published our own OO7 numbers, with complete instructions for anyone about the exact procedure that we had used, so that they could duplicate it. In my opinion, we did the best overall, though not on every test, but it was never official because of Object Design’s having withdrawn from the study.

Gene Bonte says: “I remember Ken Marshall [the CEO] telling me that in his days at Oracle (which he left to join us), 80-90% of the significant sales depended on benchmarks. For a new market like ours, this was the same or higher. Our salespeople and pre-sales engineers spent a lot of time trying to get customer benchmarks written so that it would favor our VMMA approach. Our competitors did the same for their approaches. Given there were almost no concurrent user engineering applications in existence, this was always a weak to non-existent part of the benchmark and we were always strong in these situations. Thus we won most of the benchmark wars.

An important thing that we never got around to implementing was putting more data processing on the server side. In formulating the architecture, I was heavily influenced by work at Xerox PARC on database systems in which the server just stored pages of data without interpreting them. This matched ObjectStore’s needs very well; the server side wasn’t where we knew the C++ data layouts and database schema. But sometimes this meant that you had to read a lot of data into the client side in order to search for small amounts of data in the database. We had originally hoped that this would not be a problem on the grounds that local area networks are awfully fast. That was a good answer for many cases, but not all. I have only recently (in my present job) worked with sophisticated Oracle experts who have shown me more about how to improve performance by processing (in PL/SQL, in their case) on the server side; I didn’t appreciate that well enough back when we designed ObjectStore.

Dave Stryker points out: “One thing that has made it harder for OODBMS’s is ever-growing memory and CPU power in PCs. ObjectStore database sizes were typically just a few gigabytes or less. Our original Sun-3 workstations had 8 Mbytes of RAM, I believe, and if you’re going to search a couple of gigabytes on an 8 Mbyte machine, you’re going to need a database system with indexes. In contrast, today even my laptop has a 2 Gbytes of memory, and lots of workstations have 8 Gbytes or more. It’s completely practical and common to slurp up a couple of gigs of information into memory and search it in memory on a machine like that. So the ‘object database cache’ of the past gets done now, most of the time, using in-memory data structures. Even when a database is the right answer, the extra overhead of translating from an on-disk representation to an object representation happens 100 times faster on today’s CPUs than on the 50Mhz CPUs of 1990. So the performance advantages of not translating are much smaller.”

Looking into the future, Dave Moon says: “The illusion of random access memory is becoming increasingly unconvincing on modern hardware. Although dereferencing a pointer takes only one instruction, when the target of the pointer is not cached in the CPU that instruction can take as long to execute as 1000 ordinary instructions executed at peak speed. It’s not clear that other approaches to database navigation are able to execute at peak speed, i.e. with no cache misses and no delays due to resource conflicts within the CPU, but if they were able to execute that fast, they would be able to expend hundreds of instructions to do what pointer dereferencing does and still come out equally fast, in the random access case where the target is not cached. Thus, the advantage of ObjectStore’s architecture is being eroded by hardware evolution. But at the same time, the advantage of C++ and other conventional programming languages is being eroded in the same way. It is not unreasonable to predict that we will see widespread abandonment of the illusion of random access memory in the next two decades. The IBM Cell processor used in video games is the first crack in the dam.”

Standards

Many customers wanted an industry standard, to avoid vendor lock-in. There was never a real standard for OODBMS’s. There was an attempted standardization effort called ODMG. Unfortunately, it was run by the vendors, not by the customers. So every vendor tried to adjust the standard to benefit his own technical approach and make life hard for the other company’s technical approach. It was really not done in good faith, and we were just as bad as anyone else, perhaps even worse. Unfortunately, there wasn’t any other OODBMS that worked the way ours did, so our customers really did have a vendor lock-in problem, which we never succeeded in addressing.

Ken Rugg points out that there wasn’t even a common understanding of what an object database even is! “If you looked under the covers, the actual persistence mechanisms behind Versant and ObjectStore, let alone something like Cache, are very different. Also, these differences are much more visible to the user than differences in the engines of RDBMS products.”

Complexity

Several key customers wanted support for versioning, e.g. so a CAD system could easily keep track of earlier versions of a design. But our highly sophisticated versioning system involved such complex semantics and such a complicated implementation that it made the whole ObjectStore client side mind-bogglingly complex. I remember Dave Andre and I reporting to Dave Stryker that it was almost working, but it made the product unmaintainable! We eventually had to rip it out. It was a huge waste of engineering resources and a good lesson in the virtues of simplicity, one of the hardest and most important lessons to learn in all of software engineering.

Java, PSE Pro for Java, and Smalltalk

(Thanks to Sam Haradhvala for help with this section.)

When Java came to prominence, we had to figure out how to turn ObjectStore into a Java OODBMS. Again, we went for transparency: persistent Java objects. You program with them just the way you regularly program in Java, except that you put in transaction boundaries and so on. Objects are persistent if they are reachable from any object designated as a persistent root object.

To do this, we used a novel trick: we took the Java class files, and added new JVM instructions before a read or write, to check whether the object being accessed had been read in. If not, we’d read it in on demand. As Sam Haradhvala points out, this can be thought of as a two-level faulting architecture. It used object-level faulting to fault in the contents of individual Java objects, while using VMMA to fault in the underlying C++ object representation, implement scalable collections, etc. This architecture could have provided the underpinnings for object-granularity locking and increased flexibility in other areas.

The PSE Pro for Java product had its own storage engine which used just object-level faulting with a specialized lightweight, small footprint, storage engine. It did atomicity and durability: committed changes happened either all-or-nothing, even in the face of system crashes. However, it did not support concurrent access between separate Java processes. It was targeted at an entirely different market segment than the ObjectStore Java product, but had the same API, so that you could, e.g., use it as scaffolding.

There was even an ObjectStore Smalltalk product which used the VMMA architecture, with special hooks built into the Smalltalk ParcPlace VM, so that it could co-exist with with the Smalltalk GC.” This was built by a team of very smart people on the West Coast. Unfortunately, they didn’t communicate tightly with the key developers on the East Coast, and so they didn’t fit into the architecture properly. The code became too hard to maintain, and the demand for Smalltalk turned out to be a fad in those particular years, so we discarded this.

Object-Relational Mapping

Jack Orenstein was very interested in object-relational mapping, which he describes as “my quixotic mission at Object Design”. “The idea was to bring relational database features to ObjectStore: collections, queries over them, and mappings to and from the relational model. A relational interface to ObjectStore would have expanded the pool of ObjectStore users, and opened up the product to off-the-shelf relational tools, e.g. Crystal Reports. The opposite direction (ObjectStore programming model on top of a relational database) would have opened up the ObjectStore API to other kinds of databases. These projects were of interest to a small number of customers (e.g. USWest, Credit Suisse), but for various reasons, some due to internal company politics, they were never internally funded and supported to the point where we came out with a product.” That was probably a mistake, perhaps a big one.

Objects can be stored in RDBMS’s using object-relational mapping tools. Relational databases have become so successful in exploiting hardware, and are such a ubiquitous component of the computation infrastructure, that a vast number of applications map their objects to relational tables. Hibernate is a very popular system for doing this in Java. You use Java annotation and XML configuration files to specify the mapping, which can be pretty sophisticated. Hibernate is clever at generating efficient SQL. It’s widely used and well-documented. A big advantage of mapping tools is that they let you share data with other, relation-oriented applications. On the other hand, this approach is not appropriate for the kind of CAD-like applications at which ObjectStore is aimed. Sun’s Entity Enterprise Java Beans (particularly the EJB 3.0 standard) is another mapping tool. See here for a paper by Mick Jordan about other Java approaches to orthogonal persistence.

Benson Margulies says: “The idea of persistent storage of an object data model, is, in fact, ever-more-common … in the form of object-relational middleware. Relational databases have become so successful in exploiting hardware, and are such a ubiquitous feature of the computation infrastructure, that a vast number of applications map their objects to relational tables and go home for a nice lunch. The trio of ObjectStore, Object Design, and the OODBMS concept can claim much credit for this. We wouldn’t have Hibernate, not to mention 15 incomprehensible Java standard initialisms, if not for what we did. And we had to do it. Ironically, if we had set out to build the object-relational product, I think that we would have failed. It couldn’t have been fast enough. We identified and exploited a gap, and we had a relatively successful run in that gap.”

Query Optimization

This entire section is by Jack Orenstein, regarding the “Third Generation Database System Manifesto” claim that query optimizer’s can always do better than a programmer can do by hand.

The relational side of this debate relies on an assumption that applications navigate to data of interest, and then, after the query, process that data. (Or, in a few cases, process the data inside the query, e.g. simple arithmetic, simple forms of aggregation, simple updates.) But in many applications, that separation of navigation and processing is impossible or not feasible.

I’ve implemented polygon overlay, which I think is typical of such applications. In polygon overlay, you need to traverse linked lists of vertices and edges making up polygons. You don’t navigate to an edge and then retrieve some of its data for later processing (after the database query). Instead, the navigation and processing of the data are tightly intertwined. Yes, with enough work you might be able to separate the implementation into navigation and processing parts, express the navigation part in a query language, and then have the query optimizer generate an execution plan better than the one implicit in your original code. An approach like this would obviously be completely alien to developers.

But if you really did write your application this way, separating navigation from processing, then the optimizer could, in principle, come up with an execution plan that reduces the number of disk reads compared to your original implementation.

But only if data is clustered in a predictable way. A relational optimizer uses a cost model to estimate the number of page accesses required to implement a query using a candidate execution plan. That cost model makes assumptions about how data is organized on disk, and uses some observations of actual data (e.g. key frequency distributions). If ObjectStore data were clustered as in a relational database, then the relational argument might have some merit. The optimizer would take estimates of page reads into account, something the low-level, data structure navigating C++ code is obviously not doing. But if the ObjectStore data is clustered intelligently, then that argument falls apart. In other words, a programmer can easily beat an optimizer if the programmer is also responsible for clustering the data. (The tools for clustering data in relational systems are extremely limited.)

Multiple Applications

ObjectStore was organized around providing persistence for a particular application. However, Ken Rugg points out that even in non-traditional market areas, some customers needed a DBMS that needs to be shared across multiple applications with different access patterns. In such cases, it was hard to optimize one of them without hurting the other, since much of the performance depends on the way the data is clustered, and it can’t be clustered two different ways at the same time in the same database.

Ken says: “One area that we are working on is how to synchronize data in ObjectStore with relational data so you can ‘have your cake and eat it too’. I think having multiple special purpose stores that are optimized for each consumer and synchronized and consistent with each other, (assuming you can manage them all in a reasonable way,) is better than a single ‘least common denominator’ store that is shared by all the applications in an enterprise. Of course doing the synchronization this isn’t an easy problem.”

Business Problems at Object Design

From time to time, I, and others, would lobby management to provide post-sales technical support, to help the customers learn how to best use ObjectStore. The pre-sales engineers tried to do this when they could, but they were usually too busy doing their pre-sales job. Periodically, one management regime or another would agree, and set up post-sales technical support. Life was good. But not for long, because management would see how valuable the customers thought post-sales technical support was, and they’d get the bright idea that we should charge for it and make it a profit center, making these guys into more consultants. (We always had consultants who could be hired.) Well, that was a big mistake. Lots of customers can’t pay for consultants. In some corporate cultures, for you to hire a consultant from the vendor tacitly implies that you are incompetent. What Object Design needed was successful customers to use as reference accounts when we tried to sell to new customers. Post-sales technical support was a long-term investment. But management would often lose sight of this and go for the short-term profit.

Mark Sandeen says: “The fact that we needed this level of technical support resulted in an interesting situation. Every now and then we’d hire the best and the brightest engineers from our customers, leaving our customers without the talent to architect their systems appropriately.” He and I can remember at least five of these, including several of our most awesome.

Our sales force faced obstacles. One of our sales reps, Ben Bassi, told me that the moment he walked in the door and said that they were here to sell a “database”, many customers would say “We already have Oracle: go away”, without giving us a chance to explain what we were about. (But Mark Sandeen says: “I never had that happen to me personally. And I trained all the staff that worked for me to never go anywhere near a prospect that was using Oracle (or RDBMS’s in general). In the early days we followed leads from folks who had purchased C++ compilers and tools, and after we had some wins in GIS, network management, etc. We would target those folks directly. We’d sell high performance, concurrent, persistence solutions to application developers.”

We even thought of trying to not even call it a database system: maybe it’s an “application data management” product, or something. Unfortunately our marketing department never really solved this problem. Our early salespeople were great. Later management regimes felt that you didn’t really need salespeople who understood the product; they were too hard to find and cost too much. Wrong. Some of the best salespeople left when that policy started take over.

If you took any of our CEO’s and locked him in a room with the product, he’d not have the faintest idea how to use it. It was a technical product aimed at programmers. Our first CEO, Ken Marshall, was very good at delegating, and his own lack of technical background wasn’t much of a problem. But after he left, the next CEO considered himself much more technically competent than he really was, and he made a lot of bad decisions, and he hadn’t really wanted to be CEO anyway, and he was only interested in wild ideas that would make the company grow super-fast, but those ideas never worked. The third CEO, acquired from a merger, was a good guy but, in my opinion, totally unfamiliar with how to run a software product company, and he pretty much ignored the advice of the technical people (particularly Ken Rugg, who was CTO and VP Engineering) even though he originally solicited it. That was when I finally threw in the towel. Fortunately, Progress Software bought the company, and the original ObjectStore part was put under a new general manager who was apparently quite good. So life is good again over there, and they’ve actually hired back a lot of very talented people who had left the company earlier!

Here’s a real life example of why it’s so hard to escape Oracle and embrace ObjectStore. I currently work at ITA Software, Inc., where we are building a new airline reservation system. We’re using Oracle RAC for the database system. Our rules say that all persistent mutable information must be stored in Oracle. Why? Because we are using Oracle Dataguard to copy data to our disaster recovery site(s), and to copy all online data to an archive, and our operations department wants data for disaster recovery handled uniformly across the system. We might use ObjectStore as a cache, but the place where we’d probably benefit most from a cache is a big module that’s written in Common Lisp, and there isn’t a good interface from Common Lisp to ObjectStore. It’s often for reasons like this that it’s hard for ObjectStore to get a foothold. However, there’s another product being developed at ITA for which ObjectStore, using its Java interface, looks like it might be a great fit.

Ken Rugg notes that the company took a big hit when the bubble burst in 2000. Object Design primarily sold to high-tech companies, since the users of the product were very technical and leading-edge. In particular, one of the major markets for ObjectStore was telecommunications companies, who were particularly hard-hit in that period. This contributed to a decline in revenues and eventual acquisition.

Caveats and Thanks

Everything here is my own personal opinion, and should not be taken as a statement by Object Design or Progress Software!

Much of this is in the past tense because I’ve been gone so long, and because things have changed, but ObjectStore is still alive.

Thanks to all the contributors named above, particularly Benson Margulies, whose highly cogent criticism compelled me to substantially reorganize the whole essay. I have made small edits to the contributions. Of course, I take responsibility for all errors.

Author: "dlweinreb" Tags: "ObjectStore"
Comments Send by mail Print  Save  Delicious 
Date: Monday, 31 Dec 2007 21:30

Object-oriented database management systems (OODBMS’s) have been harshly criticized, especially by Prof. Michael Stonebraker, who has called them a “failure”. As a co-founder of what was the leading OODBMS company, Object Design, I take issue with this judgment. As I see it, we did what we set out to do and had a lot of success. As you’ll see, though, you have to distinguish between the hype, and what the product was really about.

There are many OODBMS products, not all alike. Here I focus almost entirely on Object Design’s product, ObjectStore, since that’s the one I know about. Some of what I say applies to other OODBMS products and companies, and some does not.

This essay includes very substantial contributions by my colleagues, which I have tried to organize into a cogent whole. Contributors, in alphabetical order:

Gene Bonte: Co-founder, CFO
Sam Haradhvala: Co-founder
Charles Lamb: Co-founder
Benson Margulies: Senior engineer and head of porting (after Ed)
Dave Moon: Senior engineer
Jack Orenstein: Co-founder
Mark Sandeen: Senior salesperson
Ed Schwalenberg: Senior engineer and head of porting
Dave Stryker: Co-founder, VP of Engineering

What Is An OODBMS?

In the late 1980’s, the implementors of CAD (computer-aided design, both electrical and mechanical) and CASE (computer-aided software engineering) wanted database management systems, but found that relational database systems (RDBMS’s) did not serve their needs. RDMS’s had been developed for business data processing. They were sold by Oracle, Informix, IBM, and Sybase (and later Microsoft). They had become a big business with a big market. The CAD and CASE practitioners published papers and had conferences explaining why they needed a whole new approach to data management, based on object-oriented technology.

Several startup companies were formed around 1988 to meet these needs. The first object-oriented languages that the CAD and CASE community could use had just emerged and started to gain popularity, particularly C++. That made the time ripe to build and sell commercial OODBMS’s.

At Symbolics, I had led a project that built an OODBMS for Lisp, which we intended to use in many applications such as email, program development tools, and so on. Statice 1.0 was released in 1988. However, it only ran on Symbolics hardware. The team wanted to port it to run on conventional hardware (still in Lisp), but Symbolics wasn’t interested. Meanwhile Dave Stryker, who had been VP of Engineering at Symbolics, had left, and joined entrepreneur Tom Atwood, who was working on starting a new OODBMS company. Charlie Lamb, Sam Haradhvala, and I resigned, and with Jack Orenstein of CCA, and Gene Bonte, joined them to found Object Design.

Object Design built an OODBMS called ObjectStore, which was released in 1989. ObjectStore focused on persistence of programming language objects. It would be easy to learn. It would not make you learn a new language or reorganize all your data. You could write your program in the way you were familiar with: as an ordinary C++ program. All you had to do was change some “new” statements to add a parameter saying that you wanted the object to be persistent (and what database or cluster to put it in), and add transaction boundaries, and voila, your program had persistence. Rather than being oriented around SQL-style queries, your program could navigate from object to object just the way any C++ program does: by following pointers. We also added collections (sets, lists, etc.) and a simple query language for them, along with indexes and a simple query optimizer. ObjectStore made applications fast because they could do direct navigation, and operate on the data without having to go through things like network connections, API’s, JDBC, and so on. Once a page had been used in a transaction for the first time, navigation took only one instruction.

Gene Bonte says: “Our ‘persistent language’ model was good because we focused on C++ developers. This was a new phenomenon and a technical guru was often a key decision maker in the initial buy decision. We bought marketing lists from all the C++ publications and targeted developers in our seminar programs. We were very much on the bleeding edge. I did the initial Object Design marketing plan and we sized our target markets in CAD, GIS, etc. in the $100’s of millions. Nobody had products for this market.”

We disputed the central dogma of relational databases: that all data should be in tables. The underlying assumption behind the relational model is that many applications come and go, but data is forever, and you have to organize the data as if you don’t have any idea what application might show up someday. This is sometimes called the principle of “data independence”. ObjectStore was much more organized around providing persistence for a particular application, so little of the dogma of relations had any relevance. The engineering customers did not want tables: this was clear from their published papers, and even The Economist endorsed us in this regard, publishing a story containing the metaphor of storing the design of an airplane as an alphabetical list of parts.

ObjectStore’s architecture is described in our CACM paper: “Charles Lamb, Gordon Landis, Jack Orenstein, and Daniel Weinreb, “The Object Store Database System,” Communications of the ACM, vol. 34, No. 10, Oct. 1991, pp. 50-63. For more background on object databases, see the Wikipedia article. This discusses all object databases, some of which were quite different from ObjectStore. The article is generally excellent, including the section on “Advantages and Disadvantages”. (The one thing I disagree with is the business about their lacking “a formal mathematical foundation”. Actual RDBMS products are very, very far from the mathematical foundation of relational theory. None of our customers ever complained about a lack of a mathematical foundation. It’s just not an issue.)

For a technical look at the kind of scenarios that ObjectStore was designed to handle, see the OO7 benchmark from the University of Wisconsin. The benchmark is intended to model typical CAD/CAM/CASE applications and contains several hierarchical structures and 1-1, 1-many and many-many relationships between objects. The benchmark can be configured in a variety of ways and comes with a set of standard configurations. OO7 defines a number of different traversal, query and update operations. (Carey, M., DeWitt, D. and Naughton, J. The OO7 Benchmark. Proceedings of ACM SIGMOD Int. Conf. on Management of Data, pp12-21, Washington DC, 1993. Also Carey, M., DeWitt, D., Kant, C and Naughton, J. A Status Report on the OO7 Benchmarking Effort. Proceedings of ACM OOPSLA 2007, pp414-426, Portland, OR, October 1994. )

Gene Bonte says: “We went after engineering applications and we found an interested audience. These customers were doing C++ for the first time and in general did not know how to do real OO development. Further, most had no experience with databases as they never worked for their applications. Early on, I remember we found that many of our early customers had tried Oracle or some other RDBMS at the insistence of management and it did not work. This gave me confidence that this was our real market where we had a competitive advantage.”

The Hype

Tom Atwood, the original founder and chairman of Object Design, as well as the rest of top management, often made grandiose claims for OODBMS’s. They said that OODBMS’s were the “next generation” after RDBMS’s, and would take over their whole market. Jack Orenstein remembers that Tom Atwood had a slide called “three waves”. The first wave was ISAM, the second wave was the hierarchical, network, and relational data models, and the third wave was object-oriented.” Mark Sandeen also remembers our claim that “there were two or three orders of magnitude more unstructured data than structured (rows and columns) data in the world, and ObjectStore would be the preferred way that data was stored.” Object Design’s second management team said in the mid 1990’s that ObjectStore was “the database of the Internet” or “the database of the World Wide Web”, hopping aboard the new bandwagon. You can see that latter hype in the initial public offering document. This kind of hype was used to attract investors and salespeople to Object Design.

The founding engineers never believed this. Our target market was not the relational database business-data processing users, but rather users who didn’t have any database solution that would work for them. We did our best to ignore the hype, and get on with the software development and customer support.

Prof. Stonebraker’s criticisms

The harshest criticisms of OODBMS’s have come from Professor Michael Stonebraker, one of the most renowned figures in the database world. He was an early proponent of RDBMS’s and the inventor of a prominent RDBMS called Ingres, developed at U.C. Berkeley. He formed a company to commercialize it in 1982 (it has fallen from prominence and was open-sourced in 2004). He later returned to academia and started the Postgres project, which supported extensible types and was released as open source. He formed a company in the late 1990’s, Illustra, to commercialize it. Recently he has become a professor at M.I.T., and has recently formed more companies to produce novel database technology.

Criticisms from so important a leader, in both the academic and commercial spheres, were widely reported:

“OO systems have not focused on bread-and-butter traditional business-data processing applications where high performance, reliability, and scalability are crucial. This is a large market where relational systems excel and have enjoyed wide adoption.”

“Companies are justifiably loath to scrap such systems for a different technology, unless it offers a compelling business advantage, which has rarely been demonstrated by object-oriented systems. As such, relational systems and their orject-relational descendants continue to be the market leaders.”

“A much bigger problem is that the vendors behind ODMG represent zero billion [dollars] in revenue while the vendors behind SQL . . . represent several billion in revenue. Hence, it is not a standard with critical mass in the marketplace.”

“Relational vendors realized that objects are important and added them, producing object-relational systems. However, the failure of OODBMS vendors to realize the importance of SQL and the needs of business-data processing has hurt them immensely.”

“ODBMSs occupy a small niche market that has no broad appeal. The technology is in semi-rigor mortis, and ORDBMS’s [object-relational DBMS's] will corner the market within five years.”

As you can see, these comments are directed towards the hype rather than the reality. They all assume that there is one “database market”, namely the “traditional business-data processing applications”. So while many of them are essentially correct, taken as they were meant, from my point of view they are entirely beside the point.

As he says, the fate of ObjectStore had nothing to do with the object-oriented features of the object-relational hybrids, in which “object-oriented” meant something almost completely different, and which were aimed at the traditional business data processing market. Those systems were never appropriate for the applications that ObjectStore was designed for. Similarly, the object-oriented features added to the relational database systems (each quite different from the next and violating the whole relational “mathematical foundation”) had nothing to do with what ObjectStore was about.

Mark Sandeen says: “We never tried selling to to the traditional data processing organizations. Our sales training specifically forbade our sales teams from pursuing traditional MIS applications. You can’t be unsuccessful at something you haven’t tried to do.”

Prof. Stonebraker has also asserted that OODBMS’s do not support queries. That’s a pretty strong statement, and as written it’s incorrect. I think that what he meant is that they don’t support anything like SQL, with fancy query optimizers. Now, the OQL (Object Query Language) in the ODMG standard is every bit as much of a query language as SQL, and some of the OODBMS’s really implemented it well (O2 did, if I remember correctly). ObjectStore did not have a sophisticated query language and optimizer, but we certainly did have queries, indexes, and a simple query optimizer: this is what our target market needed.

Prof. Stonebraker was one of the authors of the “Third Generation Database System Manifesto” (2003), which spends a lot of time attacking OODBs. (As you can see from the URL itself, this was really the “object-relational manifesto”.)

First, it says that the navigational (pointer-following) orientation of OODB’s is a “step backward” to CODASYL (network) databases, which have been discredited. They say: “First, when the programmer navigates to desired data in this fashion, he is replacing the function of the query optimizer by hand-coded lower level calls. It has been clearly demonstrated by history that a well-written, well-tuned, optimizer can almost always do better than a programmer can do by hand. Hence, the programmer will produce a program which has inferior performance.” This is incorrect in the context of the way ObjectStore is actually used. Navigation takes one single machine instruction: how is your query optimizer going to beat that? C++ programmers know how to write fast C++ code and ObjectStore is basically persistent C++. (There’s a Java version now, but I’m referring to the original concept.) (More about this in the later essay about ObjectStore technology.)

Next, the Manifesto says that schema evolution is a “killer”, on the grounds that when you change indexing or clustering, you have to modify the program. This is also incorrect. When doing queries, ObjectStore did have an optimizer that dynamically adjusted to the availability of indexes. Clustering was entirely transparent to the application. Finally, most such operations were done by navigation anyway.

Third, they say that although many programmers want to do navigation, they are wrong and “simply require education”, comparing them to “programmers who resisted the move from assembly language to higher level programming languages”. Their point is that optimizers can do better than navigation, just as compilers can do better than hand-written assembly language. I don’t know if they thought this kind of condescending attitude was likely to win them converts!

Jack Orenstein adds: “I think that the additions made to SQL to support OLAP applications (basically expanding what can be done inside “groups”), is an admission that this original argument about re-education was wrong. If early-1990s SQL was good enough for re-educated application developers, then the later extensions would not have been necessary.”

Later on, they make the usual argument that OODBMS clustering is not an advantage since RDBMS’s can theoretically do all kinds of clustering within the relational data model. While that’s theoretically true, Oracle’s ability to actually do this is quite limited, even after many years of Oracle development. I have often heard RDBMS enthusiasts talk about the hypothetical abilities of ideal relational database systems, without honestly admitting how far those ideals are from what you can really buy. ObjectStore provided tremendous control over clustering, which was crucial to its ability to provide high performance.

Stonebraker’s more recent criticism is more modulated. In his database column of Sept 2007, he says: “OODBs failed for other reasons than the inclusion of OO technology in RDBMS. First, OODB’s were designed and built for the engineering database market. The technology’s main focus was on persistence of programming language objects and not on business data processing features such as a strong transaction system and SQL support. OODB vendors were unsuccessful in selling to this market for a variety of reasons — reasons that are too lengthy to go into here. However, the main reason for their lack of market success was their inability to construct a value proposition with sufficient return on investment for the engineering customer. The demise of OODB has little to do with the inclusion of OO features in RDBMS, an effort that my Postgres system was in the forefront of.”

As you can see, he’s now changed his story substantially. He’s a lot closer now, except for the part starting “lack of market success”, as I’ll show below.

In an interview in ACM Queue Vol 5, no. 4, May/June 2007, Prof. Stonebraker basically says that the CAD guys didn’t have enough pain to consider switching, and they had “mountains of proprietary code” that was already fast enough. “They failed because the primary market they were going after didn’t want them.”

Again, this is a lot closer what really happened, but ObjectStore only “failed” this very narrow goal.

Technically, he’s incorrect about transactions: ObjectStore does totally bona fide ACID transactions. He’s also incorrect about high performance: ObjectStore performed far better than RDBMS’s or ORDBMS’s in ObjectStore’s target markets. And he’s also incorrect about reliability: plenty of products were based on ObjectStore and were quite reliable. And our users did not especially want SQL. As for “scalability”, that can mean a lot of things, but with no specific claim and no data whatsoever, it’s not a very convincing criticism. ObjectStore can handle databases that are quite large.

But what about his claim that we were “unsuccessful in selling to this market”? Did we really have a “lack of market success”? Did we “fail”?

ObjectStore and the CAD Market

At the very beginning, we had hoped that the major electrical and mechanical CAD companies would build or re-engineer their major products on ObjectStore. This mostly did not happen. Why?

We talked to Mentor Graphics, one of the leading ECAD vendors, in 1988. They were very excited and said that if we could make a product such as we were describing, they’d buy it right away. We had dinner with a group of people from Mentor at the OOPSLA ‘88 conference, who had just heard Prof. Stonebraker’s keynote address in which he made his usual points, but they said that those points did not apply to them because their needs were different. Their constraints were too complex and application-dependent for relational database systems (Sept 28, 1988). We had a big meeting (Feb 3, 1989) in which they explained that their main interest in OODB’s was actually for sharing between their tools, configuration management and version control, and concurrency control.

Sadly, it turned out that the people we were talking to were Mentor Graphics’s advanced product group, who turned out to be distinct from the real product people. The real product people already had the file-based approach that they were using, and although they could see the benefits of OODBMS’s, the problems we addressed were not their highest priority. Using ObjectStore would have required some big changes to already-mature software that was already deployed at many customers. So Mentor’s main product never got re-hosted on ObjectStore.

Ken Rugg also points out that “they like to code and think they can do it better themselves. I believe that currently there is still a large percentage of that market that uses home-grown file-based storage for managing their model data. I think the fact that we were ‘closer’ to the application code than a relational system actually made them more likely to try to do it themselves.”

It’s almost certainly this failure and a few others that Prof. Stonebraker was referring to. My guess is that he stopped paying attention to Object Design after this, and that’s why he bases his recent comments only on these particular customer.

Later, we did sell ObjectStore to many CAD companies, for other products than their existing mainstream products. We had three or four different sales to Cadence, who built several tools based on ObjectStore. We worked closely with ViewLogic, who were building a new CAD system and architected it using ObjectStore; it all worked well technically, but unfortunately they never really hit it big. We even sold to Mentor Graphics, eventually, for other applications. We sold to many CASE companies, although CASE didn’t turn out to be such a big commercial success.

Mark Sandeen points out: “It turns out that the CAD market is a pretty small market when compared to the total universe of folks writing C++ applications. Even if we had won Mentor [for their main application], they would have been a relatively small portion of the market.” In other words, what Prof. Stonebraker is talking about didn’t make very much difference.

How Was ObjectStore Used?

ObjectStore is, in fact, good for CAD, and there were many CAD applications, just not the particular ones we had initially contemplated. There were general-purpose and special-purpose CAD systems. I remember one whose job was to help you design (configure) complicated phone switches, and it worked very well.

ObjectStore was also very strong for geographical information systems, network management, configuration management, and many financial applications.

ObjectStore makes a great application-specific persistent cache in front of relational databases. Modern transaction processing systems often access data at a very rapid rate that would overwhelm the relational database system, and so need caches to relieve the database load. ObjectStore does this extremely well, and since it’s persistent, you still have a nice warm cache even when you are recovering from a system failure. Perhaps the highest-profile customer is Amazon.com, which uses ObjectStore as a cache for its inventory data. Yes, when Amazon says “we have 3 copies of this book”, that came out of ObjectStore.

A May, 2006 white paper from Monash Information Services called “Memory-Centric Data Management” talks about several systems, including ObjectStore, explaining what they excel at, looked at in modern terms. (Curt Monash is very sharp and his papers are fun to read.) “Progress’s ObjectStore, for example, provides complex query performance that wouldn’t be realistic or affordable from relational systems, no matter what the platform configuration. Most notably, ObjectStore answers Amazon’s million-plus queries per minute; it also is used in other traditionally demanding transaction environments such as airplane scheduling and hotel reservations. ObjectStore’s big difference vs. relational systems is that it directly manages and serves up complex objects. A single ObjectStore query can be the equivalent of dozens of relational joins. Data is accessed via direct pointers, the ultimate in random access — and exactly the data access method RAM is optimized to handle. On disk, this approach can be a performance nightmare. But in RAM it’s blazingly fast.”

ObjectStore is a great “kit” for building special-purpose highly-optimized database systems. For example, the British Ordnance Survey makes a cool multi-layer digital map product based on ObjectStore, with data structures highly optimized for representation of 2-D cartographic data. They could have built it in Oracle but it would have taken literally orders of magnitude more disk space and been substantially slower. With ObjectStore, you can build indexes that are, for example, K-D trees, suitable for representing distances and other 2-D concepts. (See papers on the EXODUS project at U. of Wisconsin, which used technology very much like ObjectStore for being such a “kit”.)

Ken Rugg adds: “Most of the capabilities of an enterprise DBMS are there and have been for a long time. In fact, I am often surprised to find some feature that is missing in the Progress OpenEdge database, but has been in ObjectStore since long ago.”

Did Object Design succeed as a business?

We introduced ObjectStore in 1990, on time. Object Design’s revenues, earnings, and growth were excellent and matched or exceeded our original business plan. In 1994, we were Number 1 on the Inc. 500 (The Fastest Growing Private Companies In America), at which time we had 200 employees and $24.6M in revenue. In 1995, Oracle made a serious attempt to buy the company. In 1996, we had a highly successful IPO (initial public offering) of stock. Our venture capital investors, not to mention the co-founders, were quite happy. (Of the 23 “Inc. Number 1″ companies between 1982 and 2005, only three went public.) Revenues in 1998 were $62M, in 1999 $$61M, in 2000 $70M, in 2001 $49M (post-bubble).

By 1996, Object Design had 204 employees, including 51 in research and development, and over 700 customers. Among the companies we sold to were ABB, AT&T, Abbot Laboratories, Alcatel, Aldus, Ameritech, Apertus, Australia Telecom, Autotrol, Avanti, Bankers Trust, BayNetworks, Bell Northern Research, Bellcore, Bellsouth, Boeing, British Telecom, CADAM, Cabletron, Cadence, Canon, Cellnet Data Systems, Computervision, Credit Suisse, DEC, Delphon, EDA (integrated design systems), Ericcson, Fidelity Investments, Ford, Fuji Xerox, GE Daytona Beach (flight simulation GIS), GTE Directories Corporation, General Electric (several sites), Goodyear., Hewlett-Packard, Honeywell, Hughes Information Systems, Hyperdesk, IBM Poughkeepsie (CAD), IDD Information Systems, Independence Tech, Intel, Intergraph, Long Term Credit Bank of Japan, Loral, Lucid, MCI, MIT, Manugistics, Matra, McDermott, Mead Data Central, Mentor Graphics, Mitsui, NASA, NSA, NeXT, New York Stock Exchange, Nomura Securities, Oberon, Objective Spectrum, Olivetti, Pitney Bowes, Platinum Technology, PowerFrame, PriceWaterhouse, RoadNet (owned by UPS), Sandia National Laboratory, Schlumberger, Sema Software (text processing in SGML), Sherpa, Siemens AG (telephone switching), Southwest Airlines, Sprint, Sterling Software, Sun Microsystems, Synopsis, Texas Instruments, Toyota, U.S. West, Universal Oil Products, Vodaphone, Wildfire Communications, Xerox, and Zuken. (This list was compiled by me, my notebooks from the time, Mark Sandeen, Tom Kincaid, and the public offering document.)

Sun Microsystems was building a new object-oriented platform called Distributed Objects Everywhere, for which we provided the object database. IBM formed a strong strategic alliance with Object Design: they purchased equity, they bought lots of product, they set up a joint marketing program, and built it into their whole software product road map. We also got useful technical advice from the high database wizards at IBM’s Almaden Research Facility such as C. Mohan and (I think) Bruce Linsday. Microsoft modified their new operating system technology in order to allow ObjectStore to run on it.

ObjectStore, and OODBMS’s in general, have often been criticized in sentences using the word “niche”. What does that mean? A niche simply means a particular market: a kind of customer with certain kinds of needs. To sell to a particular set of markets is exactly what we intended all along. Sometimes “niche” is supposed to connote “small niche”, but you can read the above and make your own judgement.

What Happened With Prof. Stonebraker’s Products?

The object-relational database system that Prof. Stonebraker spent so many years praising and selling, known at various times as Miro, Montage, and Illustra, with its “Datablades” architecture to support extended datatypes, reached the market in 1992. Then it was bought by Informix, which back-burnered it and put its team to work adding object-relational technology to Informix, resulting in Informix IUS. Then IBM bought the database part of Informix, and to the best of my knowledge, no longer sells Illustra. Far from “cornering the market”, it was gone after only four years. It’s hard to see how to construe Illustra a success and ObjectStore a failure. And so much for his claim about “their OR descendants continuing to be the market leaders”. They were not market leaders in any market. Furthermore, the “object-oriented” features that were added to the RDBMS’s never turned out to be important or widely used, to the best of my knowledge. They certainly didn’t take over the business data processing market.

Prof. Stonebraker’s latest pitches for his new technologies and new startups, say that they’re posing a major challenge to Oracle. And they’re targeted at markets other than mainstream business data processing (dare I say “niches”?). Prof. Stonebraker’s published a paper in 2005 called “One Size Fits All: An Idea Whose Time Has Come and Gone”, about how RDBMS’s such as Oracle can be beaten by new kinds of DBMS’s for specific application areas. That’s what we’ve been saying all along! His new startups look promising to me. At the OOPSLA 2007, I had a long discussion with Richard Tibbetts, co-founder and architect of Prof. Stonebraker’s StreamBase (www.streambase.com), which sounded pretty impressive. Prof. Stonebraker himself came to my workplace to tell us about Vertica (www.vertica.com), which we’re very interested in. There’s also HBase (which I think is the same thing as Horizontica), a newer effort. It’ll be educational to see what happens to to these companies and products over the next twenty years.

What Happened With Other OODBMS’s?

What became of Object Design’s competitors? They’re doing fine, selling their OODBMS’s. Versant went public with over $21M in revenues and over 7.6M in profits. Objectivity, still private, is also selling their OODBMS. Gemstone, who was there before the rest of us (named Servio Logic in the earlier days), is still selling too.

DB4O (and also) is a popular embeddable open-source dual-license OODBMS, supported by a venture-backed company called db4objects of San Mateo CA. It’s aimed at Java and .NET. It supports ACID transactions, although I don’t know the specifics. Object Design also has a simple embeddable language-transparent Java database called “PSE Pro for Java”, which does have transactions and some good scalability (it only reads in the objects that you actually use).

There’s also Cache from InterSystems, who are intensively marketing their OODBMS. I’ve heard it’s popular and very fast. Their paper talks about using Cache for persistent Java objects. It does navigation, but also provides a JDBC/SQL interface. The paper is satisfyingly technical, including code samples, comparing Hibernate, DB4O, and Cache.

These days a lot of value can be had from database systems that don’t have query languages at all, such as the various versions of the Berkeley Database from SleepyCat (now part of Oracle). Sometimes just looking up a record by key, ISAM style, is exactly what you need. Charlie Lamb and Sam Haradhvala are working on the Java version of Berkeley Database. These products have been extremely successful.

What Happened To ObjectStore?

I have been away from the company for a long time, so most of my knowledge is second-hand. In this section I won’t attribute my sources, in order to avoid getting anyone into trouble.

Object Design, later renamed eXclon Corporation, was acquired in 2002 by Progress Software, who put in place an excellent new top manager, and which has retained the technical staff. In fact, several former Object Design employees rejoined the company. Many of my long-time friends and co-workers are still there.

ObjectStore is still being actively maintained. There is a new product manager. There have been some large deals recently. A new major release, 7.0, came out earlier this year, with support for Windows 64-bit, Microsoft Vista, Visual Studio .NET 2005 SP1, and Red Hat Linux 4.0 Update 4. It also has a new Data Services Administrator tool, new support for Java 5, and other improvements. Release 7.1 is coming next summer, in time for the twentieth-anniversary party. I just got an evaluation copy of ObjectStore for my current employer; we may be using it in an upcoming product.

Here are some of the key reasons it’s not as big a seller as it used to be (from me, Dave Moon, and Ken Rugg):

  • ObjectStore is still difficult to sell, because it’s not a solution, and it’s not even a tool for directly making a solution. Rather, it’s a framework on top of which a sophisticated software engineer can build a tool that they can then make into a solution.
  • Object Design developed a very powerful sales force, but during various corporate turmoil, most of it disbanded, and it has been very hard to recreate it.
  • Progress has developed other products that have been very successful and bring in more revenue for the effort than ObjectStore itself. The division selling ObjectStore has limited resources and is focusing on these products.
  • When we started, C++ was the exciting new object-oriented language being used for new applications, and ObjectStore was designed primarily for C++. It’s been twenty years, and these days fewer new applications are using C++.
  • In Java, there are now persistent objects packages based on object-relational mapping that is far better than it used to be, almost as easy to program in as ObjectStore (for Java), and free.
  • Computer have gotten very, very much faster over the last twenty years. ObjectStore’s performance advantage became less important, and therefore more applications could make the conventional choice of using a relational system with the new object-relational mapping technology.
  • ObjectStore’s performance advantage in Java is considerably less than in C++: you get much less control over the clustering, and there is significant per-object overhead.
  • There are fewer software developers who have a good grounding in data structures and algorithms. The Java generation assumes this comes as part of the language or DBMS, and that they don’t have to figure this out for themselves. With ObjectStore, many of the most compelling applications are a result of someone building a custom data structure to store and index the information in a unique way.
  • There was a negative reaction to the hype (see above) which gave OODBMS’s a bad name. Prof. Stonebraker’s comments may have contributed, as well; he’s very prominent and often quoted by the press.
  • Another way things have changed since twenty years ago is that software developers are much less open to buying substrate software. They are accustomed to getting it free (or very cheaply).
  • ObjectStore’s markets never got quite big enough for third-party vendors to make support products, and there weren’t a lot of outside consultants who were ObjectStore experts.

Gene Bonte says: “ODBMS technology continues today in the marketplace 20 years, and 8-10 technology cycles, after its founding. How many other technologies have come and gone in this time frame? Also, two of the top five OODBMS firms went public, which is a very high percentage for venture-backed firms and speaks to a real success. If you tell a venture firm that 40% of their investments will go public, you will have a smiling VC. Meanwhile, another 40% are still around doing business. It is probably safe to say that the overall return on investment in VC investments in OODBMS technology is quite positive. From a commercial perspective, and from a technical and longevity perspective, OODBMS technology has been a success. Relational versus object-oriented was never the real issue from a business point of view. The issue was building a successful company and produce shareholder value. This was accomplished.”

Caveats and Thanks

Everything here is my own personal opinion, and should not be taken as a statement by Object Design or Progress Software!

Much of this is in the past tense because I’ve been gone so long, and because things have changed, but ObjectStore is still alive.

Thanks to all the contributors named above, particularly Benson Margulies, whose highly cogent criticism compelled me to substantially reorganize the whole essay. I have made small edits to the contributions. Of course, I take responsibility for all errors.

Author: "dlweinreb" Tags: "ObjectStore"
Comments Send by mail Print  Save  Delicious 
Date: Thursday, 27 Dec 2007 14:08

I just finished reading an amazing book: “Dreaming in Code” by Scott Rosenberg. Like many good, recent non-fiction books, it alternates between a specific narrative with colorful real people, and general background information. In this case, it’s the story of Chandler, a personal information management tool, and the team who are building it, led by Mitch Kapor.

The general background explains far more about real, contemporary software, how it is built, and what it’s all about, than anything I’ve read before. Everyone learning to be a software engineer, or who wants to understand what software engineers actually do, should read this book.

In only 355 pages, Rosenberg discusses, in clear language that’s easy to follow, at least the following:

  • What working on a software project in a team is like, the subjective experience
  • Open software, and the “Cathedral vs. Baazar” concept
  • Doug Englebart’s ideas (very germane to Chandler)
  • Famous software fiascoes
  • Computer languages, especially Python and how it compares to others
  • Reusable software, software libraries, build versus buy
  • What “geek” really means
  • CVS, Bugzilla, and Wikis
  • Why user interfaces are so hard to design
  • Dependencies between parts of a system and how they block work
  • Release management and scheduling
  • Specifications and their nature
  • Layers of abstraction
  • Scaffolding
  • Code reviews
  • WebDAV and CalDAV
  • Microsoft FUD
  • Requirements analysis
  • Methodologies: waterfall, agile
  • The gist of No Silver Bullet and The Mythical Man-Month
  • Ruby on Rails
  • Software engineering, its history and what it means
  • Complexity
  • Late binding
  • Object-oriented programming
  • Recursion
  • The halting problem

The story of Chandler and the team is compelling and instructive. On page 173 of the book, he says: “By now, I know, any software developer reading this volume has likely thrown it across the room in despair, thinking, ‘Stop the madness! They’re making every mistake in the book!’” I did indeed feel that way by page 173. Here’s my sense of what went wrong, based on the account in the book:

  • They did not have one architect (Brooks makes a very good point about why there should be a single person)
  • They didn’t work out the architecture in advance, and they went back and changed it many times
  • They had a very flexible data concept/model, in which items change type frequently in a user-visible way, which they didn’t work out until quite late
  • They kept changing their mind about their UI substrate: wxWidgets? Mozilla internals?
  • The software ecosystem changed around them after all those years, and using a Web UI now made sense, but it was too late for them
  • They could not figure out what database technology to use (they finally decided not to use the Zope Object Database, although their reasons for that decision don’t impress me)
  • It was originally supposed to be peer-to-peer, but they could not figure out how to make that work, so they changed it to be server-based, a major change very late in the design
  • They had to design a security model for all this
  • It was all extensible, which is great but takes a lot of work to do right
  • There were complicated semantic issues with sharing, “chain-sharing”, etc. which were not worked out early.
  • They wanted to have extensional and intensional collections, like iTunes, but also wanted to combine the two (the so-called “exclude Bob Marley” feature), which makes the semantics a lot harder
  • Their internal terminology was inconsistent, symptomatic of a lack of architectural integrity
  • They did serious requirement analysis only late in the project
  • It was putatively open-source, but it was much too immature to really get outside developers involved
  • They were too focused on doing “the right thing” instead of getting something out fast; see Gabriel’s “Worse is Better” paper
  • They released much too early, partly because of the glare of publicity due to Mitch Kapor’s involvement

I see that they are still in “preview” releases. This has been going on for six years now! They have no projected release date for 1.0. It will be free, under the Apache license.

I have always wanted a good personal information manager, and a lot about Chandler looks very promising. Someday I may be a happy user. Right now, I think I’ll wait until release 1.0.

I hope they have moved beyond the problems illustrated in the book and are running smoothly now. Kudos to the whole Chandler team for letting Rosenberg be so involved, being so honest with him, and letting him produce this unique, spectacular book.

Author: "dlweinreb" Tags: "Book, Software Engineering, Chandler, PI..."
Comments Send by mail Print  Save  Delicious 
Date: Thursday, 27 Dec 2007 03:00

I heard on Marketplace (American Public Media’s radio show) about footnoted.org, where Michelle Leder exposes juicy information about corporations found in the footnotes of their reports. If you’re a shareholder, you might be interested to know how your management is spending your corporation’s money. Here are some fun ones:

  • Fred J. Kleisner, Interim President and CEO of Morgans Hotel Group, had to relocate to New York City. The company is paying all his expenses, including a housing allowance of up to $30,000 per month. “Even in Manhattan, that guys you some nice digs.” He also gets a $750K salary with a bonus up to 200% is he meets performance targets.
  • For their CEO Michael McGrath, I2 Technology paid nearly $1 million flying him between the company’s offices in Dallas and his home in Maine.
  • Countrywide gave CEO David Sambol a $2.7 million promotion bonus shortly before their stock imploded.
  • Qwest Communications CEO Edward Mueller’s stepdaughter attends high school in California, but Qwest is based in Denver. But no problem; she’s allowed to use the company’s Falcon 2000 private jet for her commute to school. This could cost Qwest as much as $600K, assuming normal charter rates. In fact, more than half of the CEO’s in a recent study are able to use their corporate jets for personal trips.
  • David Peterschmidt is leaving as CEO of Openwave Systems after three years, for which he’ll get a lump-sum payout of $1.5 million and full vesting of his 175,000 shared of restricted stock — also worth about $1.5 million. This was while Openwave’s shares fell by 17%; it has been falling and falling ever since. Meanwhile there has been a shareholder lawsuit, involving Peterschmidt and others, claiming the the stock price dropped because of the company’s options backdating scheme, which encompassed seven years and led the company to restate its financials. In 2007, they lost $197M on revenues of $290M.

It’s good to be CEO.

Author: "dlweinreb" Tags: "Corporations, Busine, CEO"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 26 Dec 2007 14:36

The European Common Lisp Meeting of 2008 will take place on Sunday, April 20, 2008, with optional dinners on Saturday and Sunday evening. I’ve been to Amsterdam and totally loved it. I’d very much like to attend; I’ll have to see whether it’s possible.

Author: "dlweinreb" Tags: "Event, Lisp, Lisp Conference"
Comments Send by mail Print  Save  Delicious 
Date: Tuesday, 25 Dec 2007 20:13

Despite many the successful applications written in Common Lisp, many people complain about it. I’ve been looking around the web seeing what the predominant complaints are. I’ve come up with two lists of complaints: the ones that are about things inherent to Common Lisp that can’t be fixed within the context of Common Lisp, and the ones that could be addressed. With each one, I’ve added some commentary. My comments are not deep; some are downright superficial. And they certainly reflect my own point of view, with which people can quite validly disagree. Comments about future dialects of Lisp are interesting but not in the scope of this essay.

First, there are complaints about the language itself. Since the language is a standard, and it’s important to not break existing programs, there isn’t much that can be done about these.

  • I don’t like the syntax, with all the parentheses. The advantage is the simplicity and uniformity of the syntax. You absolutely must have a text editor or IDE that can automatically indent, and keep your code indented properly. The biggest problem is simply that it’s unfamiliar.
  • I don’t like the prefix notation and lack of infix operators. A deep part of the Lisp ethos is to avoid special distinctions between what the language provides, what shared libraries provide, or what you provide. They all have the same syntax, rather than some being privileged to have infix tokens. It’s nice to not have to remember precedence rules.
  • There are too many ways to do similar things, such as association lists, property lists, and hash tables; more generally, it seems like a design-by-committee. Yes, that’s true, and it’s largely for back-compatibility.
  • It has all these obscure named like car, cdr, and setq. That’s historical, too. But really, are these any worse than printf, strpbrk, long long, and so on? Besides, you can use first and rest instead of car and cdr if you find that clearer.
  • Object-oriented programming is not well integrated, e.g. sequences aren’t objects so you can’t make your own. Yes, that’s true. OOP was added to Common Lisp late in the day.
  • It’s just too big. Actually, the real problem is that the core of the language is not cleanly separated from the built-in libraries. The Common Lisp designers had originally intended to do this separation, but there wasn’t time enough.
  • Lisp is all about lists and recursion. No, it’s not. Elementary Lisp texts often start with those things, but real Lisp programs have arrays, hash tables, powerful iteration, and so on.
  • Lisp is case-insensitive, whereas I’m accustomed to case-sensitivity. Also historical.
  • Lisp macros are bad because you have to understand exactly how every macro works in order to understand the code. That’s not true: in order to understand function A that uses macro B, of course you have to know what macro B means, but in order to use function C that calls function D, you have to know what function D does. Big deal. Indeed, it is possible to use macros inappropriately, but when used properly, they make programs far easier to understand. Macros can be used to construct simple domain-specific languages, focused on the concerns of the users of the program. They allow you to write code that’s closer to the actual intent, which means ease of both reading and writing code, fewer defects, and ease of maintenance.

Second, there are many abilities missing from Common Lisp. Some of these things are missing because of the changing software ecosystem (e.g. there was no WWW when Common Lisp was standardized), some were not well-codified or well-tested enough to make it into the spec. Many can be added by libraries. The most commonly mentioned are:

  • Streams (user defined)
  • Threads and locking
  • Modern networking: e.g. sockets, TCP and HTTP client and server, URL’s, email, etc.
  • Web Services (WSDL etc.)
  • Relational database access
  • Persistence (Lisp-friendly)
  • Meta-object protocol for CLOS
  • System definition facility
  • Other general-purpose access to the operating system’s facilities
  • XML
  • Math
  • Graphics
  • GUI frameworks, platform-independent
  • Text manipulation
  • High-performance (asynchronous) I/O
  • Access to printers
  • Internationalization
  • Unicode strings
  • Generating HTML
  • X Window System
  • Foreign function interface
  • Regular expressions

There are many libraries of code available for Common Lisp. But among the problems are:

  • They don’t all run on all implementations of Common Lisp.
  • Not all of them are being maintained, to fix bugs and stay up to date with the ecosystem.
  • It’s hard to know which of them are being maintained.
  • It’s not always obvious where to find them.

Third, there are issues about the Common Lisp implementations:

  • Do they implement the whole standard correctly? Yes, at least the leading ones do.
  • Are they fast? It varies between implementations. Some generate better code than others; and performance of libraries varies. But the leading implementations are actually quite fast.
  • What platforms do they run on? It varies between implementations. See my survey paper.
  • Is there a good way to deliver a packaged application? Some of the implementations have good facilities for this, particularly the commercial ones; others don’t.

Fourth, there are questions about tools:

  • Are there profiling tools? Many of the implementations do have profiling tools.
  • Are there good interactive development environments? The leading commercial vendors have good tools of their own. For the open-source implementations, there’s SLIME (which works within GNU Emacs), which is quite good. There has recently appeared an Eclipse plugin called Cusp. There aren’t yet IDE’s that can do refactoring, though. And there aren’t source-level debuggers that let you set breakpoint and single-step and so on. On the other hand, those are somewhat less important because you can enter a read-eval-print loop and make changes very quickly.

Fifth, there are political and process issues:

  • There haven’t been revisions to the standard, there isn’t any process for producing sanctioned changes, there’s nothing like the Java Community Process, nor an accepted benevolent dictator. That’s true. On the other hand, there’s something to be said for stability. More important, because you can extend Lisp, you can evolve the language yourself, for your own purposes, without having to wait for a new round of standardization. See Guy Steele’s excellent paper about growing languages. Not everything can be done this way; if the implementation has no threads or no Unicode strings, it’s hard to compensate for that.
  • Because Common Lisp is not that popular in industry, it’s hard to hire qualified Common Lisp programmers. On the other hand, it’s not that hard to learn Lisp. I’d recommend Peter Seibel’s Practical Common Lisp. My employer hires people all the time who have to learn Common Lisp, and it’s not a big problem.

Steady progress is being made on the second, third, and fourth category of problem. I think the second category, the problems with libraries, could be vastly improved. This would bring more people into the Common Lisp community, and the more users there are, the more effort can be put into improving the implementations and creating better tools.

Author: "dlweinreb" Tags: "Lisp"
Comments Send by mail Print  Save  Delicious 
Date: Tuesday, 25 Dec 2007 15:58

Developing software can’t be learned in a classroom. To be sure, there are plenty of things that you can learn in a classroom that are invaluable. But if you want to be a software engineer, you have to learn by doing.

When I showed up at M.I.T. as a freshman, I had learned computer programming at summer schools. My regular school didn’t have any computers (this was the early 1970’s). The only programs I had ever written were homework assignments, and little toys for myself. I had never worked on, or even read, anything large, or anything that needed to be maintained over time.

I was set to work writing an Emacs in Lisp for the Lisp machine. This was quite a big change! During my time as an undergraduate I wrote a lot of system software for the Lisp machine (and a little for the ITS timesharing system), and I think it was pretty good code, better as the years went by. But when I look back, I marvel that I was able to get going so quickly. I’m a pretty bright person but no genius; I know dozens of people personally who are way smarter than me. I now attribute it to two things. First, I read lots and lots of real software, studying it line by line. Second, I had a great mentor.

After Richard Greenblatt hired me at the M.I.T. Artificial Intelligence lab, my first task was to learn Lisp. I didn’t know how to go about doing that effectively. Greenblatt told me to write a chess program, and I tried that, but I found myself doing low-level array stuff (how does a knight move?) and it was just like programming in Basic again. I had observed that this guy Dave Moon seemed to be one of the very respected hackers (I use the word in its original sense) at the Lab, and although he seemed a bit unapproachable at the time, I asked him how to learn Lisp, and he told me to write a symbolic differentiator. That was a much better approach.

Moon was also working on the Lisp machine software, and there were only a few of us, so he spent considerable time helping me get up to speed. He reviewed all my code and gave me extensive feedback, and he answered all my questions. I read all of his code and did my best to emulate it. We worked very closely, even sometimes sitting at the same console taking turns typing.

For me, it was an apprenticeship. Although the classic “apprentice” that you read about in historical books spends years doing junky, boring work, whereas I was doing the good stuff right away. Although my reasons for personal bias are obvious, I think Dave Moon is one of the dozen best programmers in the world, and I know many others who have similar respect for his depth of understanding as well as coding abilities. To have stumbled upon having him as a mentor is one of the luckiest things that has ever happened to me.

I have tried to “give back” the good fortune that I’ve had. I’ve never had a full-time apprentice, but I have tried to help other hackers whom I’ve worked with. And this year I’m helping to teach a class in Java Distributed Programming at Harvard Extension School, providing several students with the first detailed code reviews they’ve ever had. I’ve also recently submitted my name to an M.I.T. mentoring initiative, volunteering to be a mentor. We’ll see what happens with that.

I’ve learned a lot from many people. Prof. Gerry Sussman at M.I.T. is a superb teacher, who has lots of apprentices. I never apprenticed under him, but I learned a lot from his classes and many personal conversations. Guy Steele and Tom Knight were also particularly influential. And I keep learning from my co-workers.

You don’t have to be a stellar hacker to be a fine mentor. Give younger people an opportunity to do real, production programming. Keep a close eye on their work and help them improve it. Help them learn the lessons that can’t be taught in classrooms: how do real software projects go, how do you work well with other hackers on a team, why it’s so important to strive for simplicity, and so on.

Ideally, find a promising beginner who you’ll be working with for a year or more, someone you’ll enjoy spending a lot of time with, and start mentoring him or her. You don’t have to say anything or make any formal offer. Just do it, and see how it goes.

Author: "dlweinreb" Tags: "Software Engineering, Uncategorized"
Comments Send by mail Print  Save  Delicious 
Date: Tuesday, 25 Dec 2007 14:40

I keep coming across praise for a new programming language called Scala. I looked into it a bit, and here are my first impressions. I have not written any programs in it, so I am hardly even a beginner, let along an expert. My own opinions are in parentheses. Please comment on this post to correct my errors.

I started with the JavaPolis ‘07 interview with Martin Odersky. (He’s the kind of person I like: clear, cogent, practical, charming, and modest.) I then downloaded it.

History: The key inventor is Martin Odersky, who heads the programming research group at Ecole Polytechnique Federale de Lausanne as a full professor, and was one of the designers of Java generics. He was a student of Wirth and came from the Modula-2 tradition, but later got interested in functional programming, and is now associate editor of the Journal of Functional Programming. He’s been working on Scala for over five years. He originally used Scala in classes, teaching Java first, and then Scala in the next year as a “power Java”. The first public release was in 2004, and he revised it substantially in 2006. There has been a series of versions, maintained with back-compatibility and a deprecation protocol. The current version is 2.6.

What’s it for?: It is intended to be a general-purpose language that could replace Java. So far it has primarily been used for web-related applications. The first industrial users were a company called Sygneca in the U.K., who have used to build web sites for agencies of the British government. Perhaps the best-known development has been David Pollack’s Liftweb, a web framework along the lines of Rails, which is getting a lot of attention and has many committers. See .

Documentation: Odersky recommends starting with “First Steps To Scala” (also in the distribution). Then, you can move on to the book “Programming in Scala”, by himself and others, which just came out on Dec 12 (two weeks ago, as I write this). You can get the PDF for US$22.50. It assumes you know Java. The distribution comes with a reference manual.

Functional programming: Scala is intended to be a fusion of functional programming and object-oriented programming, rather than a “pure” function language like Haskell or ML. Ordinary imperative programming works fine. Objects can be mutable or immutable. The latter is preferred when it’s possible, since it needs no concurrency control. Also, these days it’s faster to make new objects and allow them to be efficiently GC’ed, than incur the GC overhead that results when you mutate an existing object (interesting point!). There are mutable arrays, but immutable (parameterized) List objects. (I am entertained that the infix operator “::”, which makes a new List with an element prepended, is pronounced “cons”!) “Nil” creates an empty list (unlike Lisp, lists are first-class). The lexeme “=>” is used to define anonymous functions (what Lisp people call lambda expressions), so you can do mylist.filter(s => s.length = 4). There are also immutable “tuples” whose elements can be of any type and which have a very simple creation syntax: (52, “Locust”). (These are even more like Lisp lists). One thing you’d use this for is multiple-value return.

Object-oriented programming: Notice how few explicit type declarations there are:

class SimpleGreeter {
val greeting = “Hello, world!”
def greet() = println(greeting)
}


val g = new SimpleGreeter
g.greet()

There are no static fields or methods. Instead, you use singletons. The concept of singletons is built into the language, so the syntax is concise and stylized. (I’ve said before that the famous “Design Patterns” are often just conventions that make up for things that the language cannot do itself. Here’s another example of a pattern that’s no longer needed as a pattern.) There are “traits”, which are like Java interfaces except that they can define default method implementations. Although Scala is “single inheritance of implementation” like Java, there’s a way to use traits that provides a simple “mixin” facility.

Interpreter: Scala has an interactive interpreter (what Lisp people call a “read-eval-print loop”) (right there, we’re way ahead of Java!). You can define variables without specifying the type; it infers it. Here’s a simple function definition:
def max(x: Int, y: Int): Int = if (x < y) y else x
Parameter types are not inferred, so they must be specified. But in some cases, including this one, the result type is inferred.

Concurrency: There is a library trait called “Actor”. It’s based on the concepts in this paper and this paper, which I have not yet read. It looks like it’s oriented around asynchronous messaging.

Sharing of libraries: There is a system called Scala Bazaars, or “sbaz”, to help the open source community share libraries. There appear to be around 250 contributions already.

Some other interesting features: The syntax of XML is embedded in Scala. There are primitives for pattern matching (what Lisp people would call destructuring). There are user-defined annotations.

Implementation: The main Scala compiler produces Java class files; It can operate seamlessly with Java. (This means you get an excellent JIT compiler, an excellent GC, and lots of libraries, a huge plus for a new language.) There’s also a compiler that makes binaries for the .NET CLR.

Interactive development environments: It comes with GNU Emacs support, and there are plugins available for Eclipse and IntelliJ IDEA.

Persistence: It works with the db40 object-oriented database system.

Summary: I feel that the way functional, object-oriented, and imperative programming are combined in Scala is much like the way they are combined in Lisp. As compared to Lisp:

  • conventional syntax; more approachable to many people
  • no Lisp-like macros
  • much cleaner (anything is cleaner than Common Lisp, of course)
  • statically typed with inferencing: the best of static typing without the worst parts
  • no ability to incrementally recompile and re-run (as far as I can see)

The emphasis on immutability and the Actor class also makes me think of Erlang, but I know too little about both Scala and Erlang to venture further comments.

It looks extremely promising. I’ll be keeping an eye on this.

Author: "dlweinreb" Tags: "Software Engineering"
Comments Send by mail Print  Save  Delicious 
Date: Monday, 24 Dec 2007 16:18

Friends have asked me why I signed up to work at ITA Software, and how I feel about it.

For the last four years, I had been working at BEA Systems, in their Burlington MA office. I had joined BEA to work on an exciting new product (a message broker), and because I was very psyched about working for Adam Bosworth. Unfortunately it didn’t work out. There were too many problems with how the message broker would fit into BEA’s overall architecture, which bogged us down for a long time. Eventually the message broker was made part of the WebLogic Integration product, and moved to San Jose. Meanwhile, Adam Bosworth left BEA to go to Google. BEA found a new position for me, as architect of the “Operations, Administration, and Management” aspect of WebLogic Server. Unfortunately, the WebLogic Server group was in the middle of a very long release cycle, so all of my ideas had to be on hold until the next cycle. Also, WebLogic Server’s technical strategy was changing every quarter or so, and I’d have to start all over again. I learned a lot at BEA, studying the WebLogic Server, but finally the work was too frustrating.

Meanwhile, I had been keeping in touch with my friend Scott McKay, who had been one of the senior software engineers at Symbolics. Scott was architecting and implementing a new product at BEA, and was very enthusiastic about it. I had several other friends working at ITA, from both Symbolics and Object Design. So I thought this might be a good place to be. In January of 2006, I had lunch with a group of these friends, and Sundar Narasimhan, the CTO, and they convinced me to join.

I love working at ITA. My own primary criterion for an employer is that I get to work directly with extremely good software engineers who work well together. Symbolics and Object Design were like that, and ITA has the same kind of environment. My co-workers have lots of knowledge and experience. I learn new things from them continually.

I didn’t have any a priori interest in the airline software field, but I was fascinated by transaction processing systems. I had been studying operating systems, database management systems, and application servers for years, but I had never had a chance to contribute to a real-world, high-performance, high-availability system. Airline reservation systems were the first transaction processing systems. Most major airlines still use (some modified fork of) the original system, originally known as SABRE. It is written in assembly language on IBM mainframes, and the original creators are mostly retired or deceased. After forty years, these systems are still in operation, since they basically work and are so hard to replace. But the airline industry has changed, and has pressing new requirements. When our system becomes operational, it will be a major innovation for the industry.

It’s a very big piece of software, with a whole lot of moving parts. It has to be, because of the problem definition and the customer requirements. Fortunately, we have a great customer: Air Canada. We work very closely with them. Their CEO has realistic expectations and a very good attitude about the way the development process works. Air Canada has stepped up to be the pioneer, and it’s our job to minimize the arrows in their back. Eventually we’ll have other airlines as customers, after they see the success at Air Canada.

Scott, Sundar, I, and several others are improving the overall architecture of the reservation system. Among many other things, we’re working on high availability: making the system stay up all the time, despite any kind of failure that we can reasonably anticipate. I have been focusing specifically on the problem that we call “hot upgrade”: how to install new versions of components of the system, while it’s running, without impacting latency. This is very challenging and a lot of fun.

ITA has been voted one of the twenty best places to work for in Boston by the Boston Business Journal two years running. We work in offices rather than cubies. There are free drinks and snacks, and the company serves lunch every Friday. We have a weekly technical seminar, where speakers from both inside and outside ITA give talks on nearly anything (for example, recently Tom Knight from MIT told us about his work in synthetic biology). There are informal activities like Movie Night, Movie Camp, Math Lunch, and so on. There’s an entrepreneurial culture and innovative spirit.

I have learned over the years that in a small software company, the CEO has a huge impact on nearly all aspects of the company. Jeremy Wertheimer, ITA’s CEO, is the best I’ve ever worked for. He knows how to run the business, about the industry, and how to hire great people. He’s extremely technical, quite able to participate in engineering discussions, and knows a lot about software engineering and how to run software projects effectively.

And I get to program in Common Lisp again! So I’m having a great time.

Author: "dlweinreb" Tags: "ITA Software, Common Lisp"
Comments Send by mail Print  Save  Delicious 
» You can also retrieve older items : Read
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader