» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
The idea of the workshop was born in 2006 almost by chance when me, my phd advisor Giuseppe (Beppe) Santucci, and Catherine Plaisant had the possibility to organize a workshop at AVI 2006 and felt that it was time to gather some people and talk about the problem of evaluation in infovis.
The first workshop was a real success with very interesting discussions and (highly cited) papers out of it. After this experience we thought that having a BELIV every two years could be a good idea and the right time frame. In fact we organized it again at CHI 2008 and now again at CHI 2010.
The goal of BELIV is to raise fundamental questions about evaluation. The main big question around the workshop, and the reason why we believe it is important, is that visualization still needs to explain why, when, and how it is useful and we don't have the right tools yet to fully answer these questions.
So, if you are interested in participating there are two options: submit a position paper or a regular research paper. Position papers present a personal view on evaluation and are meant to introduce your point of view in the workshop discussion. Research papers are meant to provide substantial contributions to the research community with novel ideas.
A great plus of this edition is that it is two-day workshop and that it will be a lot more interactive than past editions. Despite its success many participants to past editions voiced the need to have less presentations and more productive discussions, and we strongly agreed with them.
So, BELIV 2010 will be based on short presentations on Day 1, with planned sessions to collect relevant issues for the next day. Day 2 will be all centered around the discussion of the collected issues.
Many more details can be found at the workshop website, where you can also find links to previous editions so that you can better understand what a BELIV workshop is.
For any questions please send me an email or post a comment here.
Data is largely available, no question. Everywhere we hear that the new big trend is data crunching and that the great thing of years 2000 is the large provision of freely available data sets. Just recently the US Government has released data.gov, and this event has been acclaimed by numerous people in visualization and in data analysis as a big step toward a better world. Ok, data is good. All of it is good. But I think we are getting overly excited about it.
I see a dangerous trend here: thinking that data is the only thing we need and that having large data at our hands will solve some problems. But, data per se has no real value if it is not related to problems and people! So, I see many new interesting web sites (and tools!) popping up on the Internet but I don't see any guidance about what to do with them.
In realistic settings, where visualization and data analysis tools are really needed, people is not enthusiastic about data but about the impact data can have in their context once it is analyzed. An this context is one rich of background knowledge and business goals (in its broadest meaning). I don't see the same kind of richness around.
Tasks and people are the scarce resource
Basic economy teaches us that values is generated from scarce resources and not from what is broadly and readily available. What we really need therefore is not only access to data sets but also to real problems and to the people who care about them.Unfortunately, while today data can be readily collected and transferred, problems and people cannot. If we don't realize this, we risk to run millions of useless studies, build thousands of useless tools, and waste enormous amounts of energy and resources.
If at least tasks would be joined to data, the situation would be largely improved. Instead of guessing what these data are for, we could try to solve some real problems connected to them. Take for instance the data.gov website, what if people could post interesting questions? Or what if another website would be available where people post data AND tasks?
Interesting good examples
Some great examples come from research areas like knowledge discovery and visual analytics.Take the KDD Cup. It is organized every year at KDD, the premier international conference on knowledge discovery. Real world data is published to let researchers compete on a series of pre-defined TASKS. The last KDD 2009 is an excellent example. After few lines of description the web page has a large title: "Task Description". The competition is based on a real data set provided by Orange, the French telecommunications company, and the goal is to do better than a data mining tool developed internally. The tasks is to: "estimate the churn, appetency and up-selling probability of customers" ... "churn rate is a measure of the number of individuals or items moving into or out of a collection over a specific period of time" ... "the appetency is the propensity to buy a service or a product." ... "Up-selling can imply selling something additional, or selling something that is more profitable or otherwise preferable for the seller instead of the original sale. " Better than just data, isn't it?
The VAST Challenge is another excellent example. It is organized every year within the Symposium on Visual Analytics Science and Technology. Similarly to the KDD Cup, a new data set with tasks is published every year. The problems are selected in a way that visual analytics technology is needed to solve them, that is, plain automatic methods without iterative user sense making, are unlikely to be the solution. Another great thing about this challenge is that data is synthetically generated so that ground truth is available. In practice, this means that different solutions can be compared in terms of their ability to discover what there is to be discovered. So, the VAST Challenge provides complex data AND tasks. But even better they also provide people! Since the 2007 edition the contest includes also a session where contest winners have the opportunity to run a small contest live together with real analysts. This is exactly the direction to take in my humble opinion.
Conclusion
I don't want to give the impression that data is not important or that its wide availability is not a great thing. It is! But as data turn into a commodity there are other factors that become relevant. Having meaningful tasks and access to real people trying to solve problems is a lot more important, and a lot less likely to become a commodity. What will count in the future (present?) both for researchers and practitioners is not data but people.I think it is important to recognize this limits and opportunities and start behaving accordingly. It is for this reason that I am not overly enthusiastic about having a lot of data. And I think the the sooner we start differentiating between just data and data + problems the better will be for all of us.
I am sorry guys, I feel a strong need to share my frustration with you today. I have discovered yet another infovis library to create the most beautiful visualizations in the world and instead of being excited I am depressed. That's great I really champion the effort of these good guys but a tough question keep hammering in my head: why so many libraries and so few tools? Libraries are great and really needed to speed up the development process but here I perceive a dangerous trend: there are a lot more libraries than real tools written with them!
There are a lot of people out there waiting for our useful tools to come and I think it is time we realize that developing real tools for real people is more important than writing toolkits. Personally, I am totally ready to accept a world with no toolkits and lots of tools.
I give a look around on the web and I cannot find a decent visualization tool freely available, only few expensive highly technical commercial tools. As Stephen Few pointed out in his talk at InfoVis a couple of years ago, there is a whole bunch of casual users out there whose job includes the need to analyze data. So, what are we waiting for? Who is expected to build these tools?
I am worried by this short sighted view and this auto-referential culture where infovis people build things for other infovis people, that's it. We develop libraries and then set up fancy examples to show to ourselves and our peers how good we are. Ok, this is useful and needed to some extent. It helps building a community, sharing knowledge and to consolidate good practices. But if we want to go to the next level and let infovis go beyond the toy tool stage, we have to go one step further and embrace the much riskier and tough question: who will use it?
I see so few examples around that I'm kind of embarrassed to talk about it. Can you list any serious and freely available tool that an average user could use in his or her daily activity? Do we maybe have something that minimally resembles a free Spotfire? We have a myriad or little toy vis scattered around on the web and nothing in our hands.
There are very rare exceptions. Robert Kosara has recently published its Parallel Sets in his EagerEyes and plans to keep the burden of maintaining it over the next follow up versions. This is a great thing. Parallel Set would not solve the analytic problems of the entire world but it is a step towards this direction. Therefore bravo Robert!
Another tool I've seen around recently is Verifiable. A very nice and well done tool to create charts directly on the web. Nothing really revolutionary, but what it does it does it well and with an extremely clear interface.
These have the shape of tools made for end users and this is what we need. C'mon folks, libraries are great but we need to show yet what we are able to do to the entire world. Let's develop tools, tools, tools!!!
Pat Hanrahan's talk was really deep and thoughtful. A lot of new basic material to think about visualization under a new lens.
Then I forced myself to select only 3 paper out of the long program. They are engaging and new in some sense. Obviously this is totally personal. And there were many other good ones worth reading.
Pat Hanrahan's Keynote
As I expected the talk was really great, but in a very special way that I couldn't imagine. The talk was titled "Systems of Thought" and was totally based on a deep retrospective of the basic scientific notions underlying visualization.Hanrahan started his talk putting visual thinking in the context of various systems of thoughts, that is, the tools we humans have to reason and communicate, like: natural language, logic, mathematics, etc. It was kind of shocking to see how in this context visual thinking seems less developed and weaker than the other systems.
Then the talk went on by discussing when and how visualization is better than other forms of thinking, introducing basic studies from people like Herb Simon and Donald Norman, and demonstrating that a change in representation can greatly affect the way we perceive a problem and our ability to solve it.
The talk ended with a series of key questions suggested as a way to proceed when designing a new visualization:
- What is the problem you are trying to solve?
- How do you think about the problem? What are the semantic objects and their relationships?
- What visual representations are already used? How does the visualization represent those objects and support reasoning about them?
- How can the manipulation of the representation be embodied in the interaction?
- How can visualization be coupled with other systems of thought?
Thinking about the semantic objects in the domain of the intended users is also a key point, as well as the related need to see how currently people solve the problem.
The manipulation of the representation puts interaction at the center of the problem and give it its due role. Visualization is not only visual representation but also the great power of visual manipulation.
Last, coupling visualization with other systems of thought is really a great advice. When designing visualization there is always the risk to focus too much on the tools and overlook the whole system, tasks, and processes happening off of the screen. I'm not sure if this is what Pat intended but I think that reasoning on how visualization can be coupled with other system is in part a way to take a broader look and include the whole ecology of reasoning artifacts found around a problem.
If you want to know more I suggest to give a look the the slides (pdf). Just collecting the cited references and reading those paper would be an excellent exercise to deepen our thoughts around visualization.
Papers
This is a very short and personal selection of the papers I really liked.
Visualisation of Sensor Data from Animal Movement (pdf link), by: Edward Grundy, Mark W. Jones, Robert S. Laramee, Rory P. Wilson and Emily L.C. Shepard - This also won the best paper award. These guys have developed a system to analyze the movement of animals around the world taking data from accelerometer sensors. The quality of the final images is surprising. Lots of interesting patterns can be seen and it is not hard to imagine how useful it is for researchers using these data.
Visualization of Vessel Movements (pdf link), by: Niels Willems, Huub van de Wetering, Jarke J. van Wijk - I have to admit it, I like this paper for the beauty of the images. GPS data from vessels is collected and used to create a sort of map of water highways. A clever rendering technique is used to represent both high level overview patterns and fine details of specific routes.
The Chinese Room: Visualization and Interaction to Understand and Correct Ambiguous Machine Translation (pdf link), by: Joshua Albrecht, Rebecca Hwa, G. Elisabeta Marai - I included this paper because it is a very good example of integration between machine intelligence and human intelligence. The system uses the output of an automatic Chinese to English translator to visualize the structure of the result. The user can better understand this structure visually and rearrange the items on the screen to create better results.
Final Thoughts
Overall EuroVis was really a great event, with a European touch that I like. The best thing about it is that it is small enough to have time to meet nice people, be engaged in interesting discussions and have some fun. The overall quality of the papers was quite good and I have the impression it is becoming better and better. So I'm looking forward to the next EuroVis in Bordeaux: French atmosphere and lots of wine!
The program is out and you can give it a look here. The program comprises also a keynote from Pat Hanrahan, which I'm really looking forward to see. It has the promising title: "Systems of Thought: When to Use Visual Representations in Problem Solving".
This is just a short notice to tell you that I intend to post at least one post a day to wrap up on the things I see and to share my thoughts with you. I'll also try to showcase the most interesting works.
---------------
P.S. If any of you is also attending EuroVis'09 drop me a line, we might end up drinking a good German beer in one of the many wonderful places this city has ;-)
Yesterday in a meeting with our industrial partners I received yet another lesson. Simply put: though fancy and well-crafted visualization is useless if it doesn't help people take actions.
Ok I must admit it, this is maybe only true in business sectors (is it?) but what I come to realize is that we infovis enthusiasts are too much focused on the never ending refrain that visualization is useful to explore data and that we need it to make sense of things.
This is certainly true but this is only part of the story. Take the million managers out there. Not trained to cope with complex stats or charting tools but desperately in need to take decisions based on data. What do they need? To explore and make sense of thing? Sure, to some extent ... but ultimately to take complex decisions in a very constrained setting and tight time limits.
By not taking into account this perspective in our work as researchers and designers we miss a bunch of fabulous opportunities:
- Better constraints: If we take the ultimate business goal in mind when designing a visualization tool, we have additional constrains and constraints in design are not just good, they are fantastic! By having constraints we can focus on clear objectives and guide our work through them.
- Measures of success: If the tools we design help people make decisions, take action, and see the outcome, the measure of our success is suddenly clear: we are successful if our users/customers are able to take clever decisions in a short time, and ultimately if they have success with them. It reminds me the never aging and inspiring advice of Prof. Brooks in his Computer Scientist as Toolsmith essay:
"If we perceive our role aright, we then see more clearly the proper criterion for success: a toolmaker succeeds as, and only as, the users of his tool succeed with his aid. However shining the blade, however jeweled the hilt, however perfect the heft, a sword is tested only by cutting. That swordsmith is successful whose clients die of old age"
- Conquer market segments: If we are able to give people what people really need it is a win-win situation. They can do their work faster, better, with higher accuracy and we let out field thrive and become more known, more useful, more developed and more mature. Oooh and yes ... for those in academia like me: we should not underestimate the need to have successful products in the market coming out from our discipline. Our success depends also on them.
Data Mining vs. Visualization
On a side note, I think it is useful to make a parallel between data visualization and data mining and understand how they differ, how they are perceived and why their success is different. I don't think you can call me heretic if I say that Data Mining has had a far better success than visualization so far. And I think the main motivation resides on what I am suggesting here. The good thing about data mining and statistics is that they can produce better actionable knowledge than infovis. In a typical scenario, data mining can crunch some numbers and spit the response about which customers are more likely to respond to a marketing campaign. That simple: crunch some numbers, produce a list of prospective customers, send letters to them. The last point is what matters: "send letters to them", an action. Note one very important thing: in data mining people don't even need to make sense of things to make decisions, they just need to have "reasonable" confidence on the quality of results. I agree that this is also the limit of this domain and that the excessive reliance on a black-box way of doing data analysis can be dangerous. But this is what works and the results are not bad then! If we want to evolve and become better we have to accept this state of things and create a better formula. Visualization has the power of opening the black box and at the same time retain the same power of the existing tools. But I don't see many solutions out there going into this direction. I don't think that necessarily we have to make visualization tools that overlap with the goals covered by data mining but I'm totally sure that this shift in perspective can enormously help us making our infovis edge a lot sharper. Do you agree? Or maybe disagree? Any comments? Suggestions?
I was talking with Ilya, a new PhD student in our department, the other day and in front of a prototype he developed he said something like: "oh yes, and I should find the right color mapping here but ... how?" Oh well ... good question! Originally I wanted to write a whole new post on it but after some reasonings I came to the conclusion that not only it is a daunting task but also and more importantly I don't know enough to seriously teach about it.
But wait a minute, does it mean I cannot help him and the ever increasing pool of poor color choosers? No, there is one thing I can do at least: share my list of favorite sources of information on color. And maybe add some tips and rules of thumb I often use for myself.
So, no more excuses to use poor color schemes. Here is my annotated list of resources, plus some personal tips.
Research Papers
List of papers I found most useful in understanding color in use. Some of them are written more for the general public, some others require quite some effort to understand. They cover however a very large part of what should be learned and the effort is largely payed off.
Color Use Guidelines for Data Representation. Brewer, C. A.,Proceedings of the Section on Statistical Graphics, American Statistical Association,
Alexandria VA. pp. 55-60 (1999).
[ If you can read only one, read this ]
If
you don't have time to read and you need one single source for practical
advice stop here. This is the best and conciser explanation about how to use color in visualization you'll ever find. Cynthia Brewer is a
cartographer and focused much of her work on color in geographical
data but her suggestions apply broadly to any kind of data. You may
see the result of her work in Color Brewer,
an on line tool to learn how to select color scales. The tool alone is
an eye-opener for those who don't know anything about the topic.
How NOT to lie with visualization. BE Rogowitz, LA Treinish, S Bryson, Computers in Physics (1996).
[ More into color for SciVis but still very useful and great examples ]
This is another classic, quite short and easy to read. I like it especially
for its focus on how harmful color can be if not used properly. The use
of color is discussed more in the context of scientific visualization
where continuous shades of color are often the case, like in medical
images and geographical mapping, but the results can be applied to any
other visualization. It is especially interesting the notion that different
color mapping strategies should/can be used according to the task at
hand (e.g., segmentation, highlight, etc.).
Designing pixel-oriented visualization techniques: Theory and applications. DA Keim, IEEE Transactions on Visualization and Computer Graphics (2000).
[ Discussion (and code) of a "perceptually" optimal color scale ]
Though this is not only about color, the paper contains a very useful section on color and on how to build a perceptually optimal color scale. The color scale is called HSI (Hue, Saturation, Intensity) and is a variation over the most common RGB, HSB, etc.
The very good point about it is that it is a very rare example of
article where both color theory and practical implementations are
discussed in the same place. The HSI color scale can be easily re-implemented by following the code they provide in a related paper: Issues in visualizing large databases. DA Keim, HP Kriegel - Proc. Conf. on Visual Database Systems, VDB'95 (1995).
Color Scales for Image Data. H. Levkowitz, G. T. Herman, IEEE Computer Graphics and Applications (12):1 pp.72 - 80 (1992).
[ Some relevant psychophysics theory and its relevance in color scale design ]
This is a purely theoretical paper. I included it because it contains some
information that is difficult to find elsewhere. And also because I
find it especially intriguing. Here we learn that (1) not all
differences in color intensity are perceived by our eyes and (2) that a
linear increase in color intensity is not necessarily perceived
linearly. The concept of Just Noticeable Difference (JND) is
introduced and applied to color scale design. One practical consequence
is that it doesn't matter how well we map our data to color,
some differences will always be lost.
Choosing Effective Colours for Data Visualization. Healey, C. G., Proceedings IEEE Visualization '96, pp. 263-270 (1996).
[ Not easy read, hard-core experimentation, but unique info on categorical colors ]
This is even more theoretical than the paper above. And be warned, it is not
an easy read! Anyway, I put it in the list because it is the only
"serious" reference I know where the selection of categorical colors,
that is, colors that represents categories and not quantity, is
discussed in fine details and an algorithm for their selection is
discussed. Here we learn that color is not as powerful as we may think. The
maximum number of distinguishable colors we can use to label data is
around 12. Not so many indeed!
Book Chapters
Information visualization: perception for design (Chapter 4: Color) by Colin Ware.
Colin
Ware's book is simply the best resource for whatever concerns
perception theory applied to visualization. Admittedly, this is
probably the best book on visualization ever. Chapter 4 is all about
color theory and its content is obviously great. Theory and practice
are well balanced and useful examples are illustrated throughout the
chapter. I think it only missed practical advices and how to implement
the suggestions in practice, but ok, maybe this would be out of the
scope of the book.
Envisioning Information (Chapter 5: Color and Information) by Edward Tufte.
I
don't think this book needs any introduction. It is part of the famous
Tufte's trilogy and of course it contains some indications on color
use. Even if here one can find many of the things discussed in other
books and papers, but in a useful summarized version, it also contains
some unique content in the usual original Tufte's style. A great piece
of knowledge here is given right away as the chapter opens. Tufte
summarizes color uses in information design as: to label, to measure, to represent or imitate reality and to enliven or decorate. These few tasks provide a useful framework around the work of a visualization designer.
Show Me the Numbers (Chapter 6: Visual perception and quantitative communication) by Stephen Few.
This
chapter written by Stephen Few is the best summary I have ever seen on
visual perception theory applied to visualization. Here you will find
not only how to use color effectively but also how to boil down basic
theory on how human vision works to few simple rules to apply in visual
design. In a way it can be considered a sort of Colin Ware's book
compressed in one pill. So again, if you don't have enough time to read,
pick this one and study this chapter. You won't regret your choice.
Tips and rules of thumb
Finally I try to put something myself. This is just a random list of rules I learned the hard way by doing.
- Don't overestimate the power of color
- Color is attractive and powerful and let's admit it, it is what makes
most of our visualizations pretty and nice to see. But for any serious
use it is important to realize how limited it is. The number of colors
we can easily distinguish is incredibly low (this you can learn it from
the refs above). For instance, it is estimated that the maximum number
of categorical colors we can easily detect in a representation is
around 12. Similar figures holds when presenting continuous data.
Compared to other data features like position, length, size, it is
visually perceived less efficiently. So just don't believe color
mapping will do wonders, it is useful within its bounds.
- Always provide a color legend
- I think this one goes in the list of the most common mistakes in
visualization: some data feature is represented with color but then
there's nothing in the interface that tells you what this color
represents. A color legend is alway needed and not only for labeling. As
an example, when it represents quantitative data it must also tell us to
what numbers the brightest and darkest colors map to. So in short,
please do your home work, provide a legend.
- Use color with extreme care and parsimony (above all do no harm!)
- This is a sort of repetition of the first point but from a different
angle. As color is added to an interface it soon becomes noise. Learn
to use it with extreme care and parsimony. It is important for instance
to realize that if color is used to represent a data feature it is
extremely hard to use it for some other elements in the interface.
In the end it is extremely important what Tufte says: "above all do no harm".
- Learn to love grays and gray scales (grids!)
- The best use one can find of color is to understand how powerful
colorless graphics are. In particular shades of grays are so useful in
data representation that I am surprised there are so few, if any,
specialists advocating for their use (Tufte mentions it by the way). Give a look around, pick the best
known and best crafted tools and you'll see that most of the times
their design is based on shades of gray. Gray is especially useful in
segmenting the visualization space and organizing it in spaces. The most
obvious example is the use of grids in charts and alternated rows in
tables (Stephen Few shows excellent examples in Show Me the Numbers) but
the same principle applies to thousands of other visualization components. So
in short: learn to love gray and gray scales, they can do wonders and rarely do harm.
- Don't represent unordered data with ordered colors
- This is self-explanatory but I see it so often that I think it's
worth to add it. Also, I think not everybody would agree with me on
that. Some people use different intensities of the same "hue" to
represent categories. In my opinion this is poor use of color and opens
the door to false interpretations. Ordered colors are automatically
coded as "there's some ordered here" by our brain. Why do we want to
fool our mind when there are better solutions? Use distinguishable hues
and, if possible, make them of the same intensity. This will work best.
- Keep an eye to skewed distributions - Personally I always find this problem in my data visualizations and I am surprised it is not discussed more. When the dimension you map to color has a skewed distribution the result is incredibly poor: there are few items represented by the highest intensity and all the others flattened to the lower. In short, there's nothing really useful to see apart the fact that there are two or three items with very high values. In this case one option is to adopt a not linear mapping between data feature and color. Common solutions are logarithmic or square root functions that alleviate the problem and permit to reproduce a full progression of values.
Here was my list and .... oh before I forget there is one last major one!
- Don't use the (infamous) rainbow color scale - Maybe someone would laugh at this advice as something too obvious but then, thanks to Ilya I discovered that there is nothing to laugh about. If you are not convinced see this study on the uses of the rainbow color scale and discover how many professionals and researchers still believe it has some value:Rainbow Color Map (Still) Considered Harmful
Conclusion
If you want to design great visualizations, learning to use color properly and effectively cannot be avoided. The whole system is as weak as the weakest link, therefore if color is used badly your design will suffer a lot. Take your time, read as many of these references as you can and you won't regret. They come from top class researchers and designers, you can trust their words. Your visualizations will improve, your clients will thank you, and the visual world will definitely and finally be less polluted.
Few weeks ago I posted a blog post titled "Book for practitioners, not designer!" claiming that we are desperately in need of books that teach how to perform interactive visual data analysis. And this book looks like just the perfect response to what I expressed in my post. From the book description:
"Now You See It does for data analysis what Stephen Few's book Show Me the Numbers does for data presentation: it teaches simple, fundamental, practical techniques that anyone can use--only this time they're for making sense of information, not presenting it."Wow, This is what I was looking for! A book that teaches how to analyze data with visual interactive tools. And one written for casual users, in need to analze some data in their work, not for highly skilled statisticians and engineers. And not for designers. I'm really looking forward to reading it. I hope (I'm sure though) it will meet my expectation. Well done Stephen! P.S. It has such a beautiful cover! Isn't it?
A bit of self promotion here on Visuale today!
It's my pleasure to introduce to you one recent work of ours: The Extended Excentric Labeling. It is an extension to the original interactive labeling technique called Excentric Labeling, which was developed by Jean Daniel Fekete and Catherine Plaisant in 1998 at the University of Maryland. We extended it to solve some problems present in the original version and it will be presented and published at the next EuroVis 2009 conference in Berlin, next June.
Here is the EuroVis'09 Paper (pdf) and a Video (mp4) we produced to showcase the technique.
Here is the paper abstract:
"The paper presents an extension to the Excentric Labeling, a labeling technique to dynamically show labels around a movable lens. Each labels refers to one object within the lens and is connected to it through a line. The original implementation has several known limitations and potential improvements that we address in this work, like: high density areas, uneven density distributions, and summary statistics. We describe the implemented extensions and present a think-aloud user study. The study shows that users can naturally understand and easily operate the majority of the implemented function but label scrolling, which requires additional research. From the study we also gained unanticipated requirements and interesting directions for further research."
The Motivation
The main motivation behind this technique and its relative study was a practical problem we encountered in the development of a new visualization. EL was just great and fit our need of understanding the content of screen regions. But the real problem was to have something flexible enough to deal with very sparse and very dense areas at the same time. In dense areas the original EL provided only a simple sampling mechanism that did not really help to interpret the data. Them while implementing a solution we discovered we could add some other useful and interesting features.
The Techniques
With the EEL we introduced the following features:
- Label scrolling: a mechanism to scroll through labels when there are to many of them.
- Focus area adjustment: an automatic mechanism to let the size of the focus area automatically adapt to the underlying data density.
- Summary statistics and filtering: a series of glyphs and interactive tools to summarize the content under the focus area and to disambiguate it through filtering.
- Inheritance of visual features: a visual mapping mechanism to let the labels inherit the visual/data features from their connected items.
- Layout and sorting: algorithms and techniques to permit effective positioning of labels and of their links to the items.
The User Study
The user study we conducted is in my opinion a very interesting part of this work. We learned really a lot form it. For years discount usability studies have been promoted by people like Jakob Nielsen, especially in industry, but they are not very much loved or popular in research contexts; especially in InfoVis where evaluation in general yet struggles a bit to find its way. But if used as explorative tools, we discovered, they can be great!
Sure we received some critics, as our results cannot really tell a final word on whether the introduced features provide a measurable benefit. But what is often overlooked in academia is that the role of research is not only to provide answers but also to create new targeted and relevant questions. And discount or informal evaluation methods can be a great complement to our research toolbox.
We gathered 8 people and observed them while performing some predefined tasks we deemed relevant. The result was useful not only to perfect our work but also and foremost to generate some questions that we would not ask to ourselves otherwise. Observing your users using your visualization is alway an eye opener. There is a big gap between what you expect and what they do in reality. And within this gap there is a lot to learn!
Excentric or Eccentric?
Every spell checker I use highlights the word "excentric" as non existing and suggests "eccentric" instead. I did a little research and also well known dictionaries like the Merriam-Webster do not know it. By typing in Google "excentric definition" I can get some results but then they basically say it is a synonym of eccentric. Maybe someone among you have a clue on it?
By the way, maybe the next time I'll meet Jean Daniel or Catherine I will ask why they used excentric and not eccentric. My guess is that the cause is that the French word for it is "excentrique"? ;-). In doubt and for consistency reasons we decided to keep it as it is.
The New Scientist has just published an article on an amazing recent study conducted by the European Commission's Joint Research Centre on the remoteness of places in the world. The remoteness is calculated taking into account how long it takes to travel by land or water to the nearest place with a least a population of 50.000 inhabitants. Here is the result.
It turns out that Tibet is the most inaccessible place in the world and specifically the point on coordinates 34.7°N, 85.7°E. It takes a three-week trip to the cities of Lhasa or Korla - one day by car and the remaining 20 on foot. So if you are looking for a really peaceful place to restore your mind and take a break from civilization well, here you have it.
There are also a number of other related and fascinating maps out of this study. Here is how the world is covered by:
Roads
Railways



The map is simple but well designed. If you give a look to the bigger version you'll notice how color is mapped on a nonlinear scale. Brighter colors represent hour intervals whereas darker ones represent day intervals. The darkest points represent 5 days.
Europe and Japan somewhat scare me in terms of how easy is to go from one point to another. United States, regardless their development, contain a relatively dense number of quite places. Tibet and Greenland looks like the best places if you want to stay remote.
As Alan Belward, who leads the project, says: the interesting part of the project will be in comparing this same map with another one computed in the future. What scares me is the perspective of having no more remote places in the world. Is this far to come?
I've recently come across this incredibly good book: "Data Mining for Business Intelligence". I was at first a bit skeptical, my academic background naturally led me to wrongly assume a book on applied business intelligence had nothing more to give than the two other respected books I have on the shelf. Wrong wrong wrong!
As I started reading, chapter after chapter, I felt refreshed by a new stream of ideas, like if all those notions I had accumulated year after year could be seen from a new and fruitful perspective. The book is full of applied examples, compact, with a direct and simple language and, above all, made me finally understand what data mining is and what is it for in the real world. It is the first time I feel I can walk in the same pair of shoes of those guy in the trenches who desperately need strong technology to resolve *their* problems.
So, why do I blog this?
Am I not supposed to write about infovis and beyond here? Sure, the point is that after the excitement of having discovered this little gem I realize how far we are in visualization to have a book like this. Don't get me wrong, we have plenty of incredibly good books on visualization out there (Few's and Ware's among my favourites) but they are all designed to help designers not practitioners. Why?
I don't have a clear answer but I think it is partly due to the lack of maturity of our field. I think we have now a reasonable set of notions, design processes, and examples to design decent tools but we have not matured enough to direct our attention to our users as practitioners. Another reason is the lack of synthesis. Take data mining, despite the numerous application areas, techniques, algorithms, it is possible to synthesize data mining methods and approaches in a consistent and simple way. Simplifying to the extreme, data mining is the work of predicting the value of one target variable (usually a class) according to the value of some other variables. Plus clustering and rules. Can we reach such a synthesis in information visualization? Is ours an intrinsic limit or is it just due to the way we advance our discipline?
I don't know, in any case I suggest to read this very nice little book. It can be really useful to apply some automatic data analysis to your visualization.
One last thought: oooh if only half of the books out there could be so short and informative at the same time!
It's been almost by chance that I stumbled upon the old Parallel Coordinates InfoVis'97 paper "Multidimensional Detective" after my last post. And it's crazy how some information seems to reach you when you start following a new line of thought.
This post and my last one are very much in line with the idea that we do need to invest a lot more on the practice of visualization and not only on design.
Parallel Coordinates have been invented by Alfred Inselberg who is also the author of this paper. What is really impressive in retrospective, after more than ten years, is the need to communicate to people not only what Parallel Coordinates is, but more, and foremost probably, how they can be used.
In this paper Inselberg provides some great guidelines to use when performing visual data analysis. Here I provide some personal comments about each guideline.
Here is Alfred's list:
Guideline 1 - Do not let the picture intimidate you. This seems to be the trademark of infovis. And indeed it perfectly pairs up with the first part of the famous InfoVis Mantra "overview first". If we accept the idea that infovis is effective when we present an overview first (IMHO this is somewhat questionable but well ... this is not the right place to discuss it), it's evident that we have to teach our analysts to not be intimidated by it. I've seen and designed countless of visualization which are intimidating at first. And often I receive comments like: "can you make it simpler"? Sure, but maybe less informative too? If only we can let the users have the chance to try first and see what's the result!
Guideline 2 - Understand the objectives and use them to obtain visual cues. Oh this is my favorite one! Too often I hear that infovis is for generic data exploration, like if one could approach data without no idea of the ultimate goal. There's nothing in reality like looking at the data for the sake of it. Inselberg acknowledges it and adds a worthy advice: make the statement clear about what you want to obtain form visualization and let the goal guide your analysis. Our perceptual system is designed to "tune up" our senses when we focus our attention on something (see the latest Colin Ware's book for details). So, let's exploit this feature and focus on what we care about.
Guideline 3 - Carefully scrutinize the picture. As a consequence to the previous step, it is necessary to look carefully to the picture and find visual cues that can help move at least little steps towards the prefixed goal. Visual patterns that tell us something are always there, it's only a matter of looking for them carefully. They usually trigger new questions and hypotheses and help us formulate new actions. We look into other segments of the data or manipulate the visualization in a way that helps us find clarifications. That is, a knowledge building loop.
Guideline 4 - Test the assumptions and the "I'm really sure of". Inselberg reminds us that what we get out of a visualization is not only what we see but also what we we already have in our mind, that is, our subjective world, which comprises: background knowledge, assumptions, beliefs, etc. An effective analyst must take this into account and understand how this can affect the analysis. Often the visualization present patterns that generate some skepticism but then it's exactly form these strange data segments that important discoveries stem. On a side note, it's surprising to see how much effort we have put into the analysis of objective perceptual processes in visualization and how few on higher level cognitive processes that involve subjective evaluation.
Guideline 5 - You can't be unlucky all the time.If you are not intimidated, you understand the objective, you scrutinize the picture and test the assumptions, well ... you can't be unlucky all the time! If I understand well, what Inselberg seems to tell us here is that even if a good proportion of the patterns extracted from the visualization can result in little advancement towards the goal or not useful discoveries, you can't be unlucky all the time. In the end there should be something that helps you progress towards your desired direction. This last one, is a very positive and encouraging advice, which I understand because visualization tools often lead to the discovery of trivial or useless information, but then by striving to find something, little gems often sort out of it.
In summary this quite old paper reminds me how hard is to be an analytical visual detective and also how far we are to help our customers focus on fruitful paths. It would be nice to see novel contributions in this direction, because this is in my opinion one of the greatest limits to infovis adoption. In the meantime this small advice, as simple as it seems, looks to me as the most solid I've ever seen.
Data
The dataset (which can be downloaded directly from the website) contains for each country and year, form 1995 to 2009, the overall score and the individual factors (e.g., business freedom, trade freedom, fiscal freedom, etc.) that compose the score. Technically speaking it is, in fact, a multivariate time series, a quite tough object to handle indeed.Chart
In my proposed solution I focus on the representation of the states that experienced the highest positive or negative changes in the whole time range. Beyond the obvious reading of best and worse countries in the overall score, which can be easily obtained from the website, I think representing measures of change is a lot more interesting. I've created the chart with MicroCharts a wonderful little Excel add-on. Each sparkline represents the time variation of the overall score, so that it is possible to see ups and downs in the considered time span. Since the variation is represented in terms of the individual maximum and minimum values, the timelines cannot be compared in terms of their absolute values. But this is ok as long as the main goal is to covey messages like: "hey this country has significantly and steadily improved its index over the course of the years!". The absolute values can be read on the right side where min and max are color-coded the same way the small dots are coded in the sparkline. The size of the dot represents the value and the bar chart the amount of variation.Trends
I am by no means satisfied with my design, but I think it sheds some interesting light on the data. We can see that Armenia had an impressive improvement from 42.2 to 70.6. We can also see that many Eastern Europe countries like Moldova, Bosnia and Herzegovina, Lithuania, and Romania, had a great improvement as well, as highlighted in the report. Sad examples are Argentina, which experienced a sudden decrease, probably concomitant with the country economic breakdown, and Zimbawe which went from the already low 48.7 to 22.7.A call to action!
The real challenge for these data is to represent the single factors together with the overall score and to represent the whole dataset, which I've not done. These factors can help explain for any major variation, if it is due to a specific sector or an overall change. I'm also convinced the same data can be seen under a myriad of other lenses different to mine. It is for this reason that I propose a "call to action", inviting you to create a chart of this intriguing dataset. In order to facilitate your task I have attached here a processed version of the file that contains the overall score organized by time in a single Excel sheet (the original data has one sheet for each year). If you go into some preprocessing too pay attention to some data inconsistencies the original file may have. Especially, note that Somalia in some years is removed from the dataset. Good Luck!
The Numerati are all the statisticians, computer scientists and analysts around the world who are analyzing tons of data to understand "us". This is the main topic of this wonderful book written by Stephen Baker, a Business Week journalist.
The book is an easy read, written with a simple style that makes it accessible to everybody, and yet incredibly intriguing and informative for the knowledgeable reader.
Stephen interviewed tens of researchers and entrepreneurs around the US and put into focus one of the major trends of our days: not only an incredible amount of data has been and is collected everyday around the world but we are also finally starting to "use" these data to let us understand relevant aspect of the human being. Health, Finance, Marketing, Policy, are only few examples of areas where data is collected and deeply analyzed everyday.
Content
The book is organized around 7 chapters: Worker, Shopper, Voter, Blogger, Terrorist, Patient, Lover, in which people is modeled under the lens of a specific stereotype.
In Worker we are modeled according to our skills and the way we work. We meet people like Samer Takriti at IBM who is modeling about 300.000 IBM workers to understand the relationship between their skills and their performance and how to better allocate these skills in the company the same way we used to do with any other physical company asset.
In Shopper we are modeled according to the things we buy. Researchers are analyzing the millions of transactions we make everyday in stores to understand what "type" of buyers we are. Raiyd Ghani, for instance, analyzes with his group at Accenture Technology Labs grocery store transactions to provide personalized suggestions to shoppers through the use of carts equipped with personal assistants.
In Voter we are modeled according to ... to what? This is an impressive chapter because it demonstrates that we can be modeled in a given domain indirectly, using data that apparently has no connection with the subject matter. This is what Josh Gotbaum with his political firm Spotlight Analysis does. They provide detailed indications on swing voters based on data taken from large data companies like ChoicePoint and and Acxiom, who collect an incledible amount of data about us on almost every aspect of our life (scary?! :-)).
In Blogger we are modeled according to our opinion. Yes, our opinion. There are companies like Umbria Communications which analyzes the blogosphere to understand the opinion trends of millions of bloggers on whatever interests a given company. If I want to track how people react to a new product put on the market Umbria can tell.
In Terrorist we are modeled as potential terrorists or thieves. Here we meet people like Jeff Jonas, now at IBM, who helped casinos in Las Vegas sift through millions of internal records to single out suspect customers. And the same technology is used by In-Q-Tel, the venture capital arm of the CIA which invested in this technology, to cope with national security and counter terrorism.
In Patient we are modeled according to our body signals and medical records. This is the chapter I most loved, not only for its humanitarian applications, but also for the cleverness of some solutions. Eric Dishman launched the home health division at Intel where they design smart sensors like the "magic carpet" that monitors weight an movements to monitor the health of patients and where they try to predict the onset of diseases like Parkinson's and Alzheimer's by detecting suspect variations in the stream of data.
Finally, in Lover we are modeled according to our profile to find matches among us as potential lovers. We meet Helen Fisher, a Rutger's University anthropologist, who devised an innovative method to find matches between people which is the basis of the Chemistry.com dating website. Her method goes well beyond simple matching of demographic data, it is based on her theory that we can be split in four groups where a specific hormone is predominant and that the best matches comes from complementary hormones.
Reflections
The first issue the book raises is obviously privacy. I really liked the approach of Stephen Baker, equally distant from the excitement for the new opportunities brought by innovation and the potential for a super-controlled society where drawing a full profile of ourselves is becoming worryingly easy. Any other technological shift in history came however with the promise of new advancement in human being together with novel problems (think about cars and pollution). Stephen asks the right questions to some of the researchers he met. The most interesting in terms of privacy is the one with Jeff Jonas who is "vehemently opposed to the use of statistical data mining to predict the next terrorist attack" because of the high risk of intrusion and false alarms. And yet he believes that this technology can both protect our freedom and our privacy at the same time. I think this is one of the biggest challenges of our time, to find the right balance between the opportunities for increased freedom and security and the risks of intrusion, control, and faulty conclusions in the analysis of our own data.
From a more scientific and technological point of view what strikes me is the relevance prediction has in all the application areas described in the book. In traditional data analysis, especially for those with a visualization background, the focus is on "understanding" what is in the data to build a mental model out of it and in "discovering" some special gems out of chaos. Yet, however, real world applications are more concerned with elaborating actionable solutions to run and test, and I have the impression that "prediction" lends itself better to this goal. Think about it, through the book's examples, in workers the company wants to predict performance to put people in the right place, in shoppers the grocery store wants to predict what product can be sold to one specific customer to provide timely suggestions, in voter a political party wants to predict which population segment should be addresses with a targeted message to increase the chances they hit a group of swing voters, and so on. How do we, visual information designers and analysts, cope with this fact? Are we able to provide with our tools the same level of actionable knowledge or are we condemned to just describe things and hope that this information will be useful in some way?
Implications for Visualization
What is the role of visualization in the world of the Numerati. I think it is huge!!!
First of all all the technologies used by the Numerati are to some extent prone to errors and they are always the results of continued refinement of the underlying model. Visualization can play a significant role in helping the modelers understand and test their models and explore their implication as they are applied to new data. Without such a level of interaction the risk is to build monstrous black-boxes that spit oracles we all have to follow without really knowing why.
Another area where I see a large role of visualization is when mining is used in monitoring environments, where the timely detection and comprehension of the situation (more technically known as the situational awareness problem) is important. We have a long and respected tradition of research for knowing what works best in terms of visual representation when visual saliency, detection and contextual information are at stake. Well designed visualizations that permit to get the most out of a screen in a matter of seconds are of paramount importance here, from the need to analyze terrorist attacks to the doctor monitoring a patient.
A third potential I see for visualization is the need for personal data visualization. As these technologies develops, and the results of data analysis become more pervasive, I expect to see and increase in the need of managing personal data and the results of these analyzes by end-users. And how are we going to provide this information to the average person? Visualization can play a big role here and and again it would need to reinvent itself a bit. In this domain extremely simple and useful visualizations will be needed and some of them will be provided on non-standard devices like TVs, cell phones, public displays. We need flexible and simple solutions to provide to the large public.
So, in summary, the explosion of data analysis is good news for us! We have plenty of novel challenges to address. A somewhat silent mind shift is already going on underway ... I expect to see in the future an ever tighter integration of automatic mining technologies and visualization, as the recent Visual Analytics trend demonstrates after all.
DabbleDB
DabbleDB lets you organize your data in multiple tables (called categories) and extract views out of it. A view can be a table, an aggregated table or a chart. There are few basic types of charts but they do their job well: neat and clean and super easy to create.
I know, it doesn't look that great so far but when you use it you realize hundreds of little tricks and smart software guesses that makes it wonderful. As an example, it automatically recognizes the column type (date, number, location, etc.) and changes the way data can be aggregated according to the column content. In a date column you can aggregate by day, month, year, etc. In a location column by exact value, region, country, and continent. And aggregation is a matter of few mouse clicks, no hassle, no formulas, no complex "group by" queries. Click on the column and just choose among a series of grouping options.
Another great feature is how data is imported in the system: open it in Excel, copy the table content, past it in a form field, press the submit button and ... puff ... imported. And with very few mistakes as far as I have seen. Again, nothing really revolutionary but it saves lots of time and stress! It can also import data directly form html tables, just paste the url of the page and it imports the data.
Filtering is also very powerful. On the left side you can type a keyword and the view is automatically updated. If you want to do something more sophisticated, you can add a column specific filter and, according to the column type, enable special purpose filtering. In a date field you can type for instance "> 1980" and filter out all the records with a year earlier than 1980.
There are hundreds of other features I did not mention, the best is to try it out and see. It took me very few minutes to register and start to play with it. Alternatively, you can start from this video demonstration, which is very well done.
Magic/Replace
Magic/Replace is a companion product that permits to reformat your data on the fly by cut&paste; operations. You cut a piece of data from one field, put it in another field, plus some additional characters, and it updates the full table on the fly. You need to reformat a date? Concatenate Name and Surname in one single field? Create a new column with a piece of data coming from another field? Super easy: just cut and paste the elements and it does the rest for you.
It's a little piece of simple "intelligence" that works really great and it's very useful. The same thing done in Excel or any other product would be a pain. Especially if you are not an expert.
The interface is also very well done. One line tagged "from" and one tagged "to", where you can test your modifications, and a preview button that lets you preview the result in a section of the table. It takes few seconds to understand how it works.
Few Reflections
- Data Management & InfoVis - It's really surprising to see how data management has been neglected in InfoVis, and yet it is so central in our daily activity, both as designers as well as users. I strongly believe the lack of simple data management tools in software products is one of the major barriers to the adoption of InfoVis. Tableau is another great example: data is loaded, managed, and manipulated in few intuitive drag and drop operations. It might be frustrating to accept it but a major quality factor of the systems we design does not reside in the fancy visual representations we make but rather in little details that make a difference like these.
- Data Management & Web 2.0 - These two tools are another example of the power of Web 2.0. I have already argued that we can probably already speak of Vis2.0, a series of visualization tools that work smoothly on the web. Web 2.0 data management tools are another piece added to the puzzle. And it's amazing because, as any other 2.0 tools, it is not only a matter of reproducing the same desktop tool on the web, but also to enable a whole new spectrum of possibilities. DabbleDB, for instance, lets people publish and update the DB on the web in a matter of second, thus enabling collaboration in a way that would be impossible to do with Excel or any other desktop product.
- Simple Intelligence in Software - All the simple tricks and smart guesses found in these two products let me think of the idea of embedded simple pieces of intelligence in software products. It's crazy if you think about it: we have been researching super complex AI algorithms for years, to do things of little use for anybody, and now there are some little intelligent and focused tricks that work really great for some people. It also reminds me of the concept of "Appropriate Intelligence" advocated by Alan Dix: "Designing interactions for appropriate intelligence is based on two principles: 1) it should do good things when it works, 2) it shouldn't do bad things when it doesn't". It is perfectly applied here.
Click on this link to launch the applet (on a new window).
Visualization Design
The visualization is a simple interactive matrix where the rows represent the states and the columns the years. Since the focus of the visualization is to see which states swing, graphical marks are added only when they in fact swing, and the color is the one of the winning party (I originally used a slightly different design where a shade of colors from the previous to the actual winner was used, but the result was too noisy). On the top of the visualization an additional row is used to depict which party was finally the winner in the elections. In this way it is possible to see which states where determinant for the final result. The visualization has some few interactive features. Hovering is used to focus on a specific row-column pair and to dynamically show which presidents was elected in a given year. On the bottom there are some few filtering tools:- From/To: to focus on specific swings from one party to another.
- #Swings: to filter out the states for which the total number of swings is below a threshold.
Patterns
I must admit I did not spend much time analyzing the result (I hope you would do it for me! :-)). Anyway some few things soon hit the eye:- It is not evident which states swing or do not swing but it is quite clear that they tend to swing all to the same direction. Every column in fact contains entries almost all of the same color.
- Some years have had very large scale swings: 1912, 1916, 1932, 1952, 1963, etc.
- Luisiana (LA) had an impressive number of consecutive swings between 1948-1980, changing from one party to another in every election.
Potential Improvements
There are obviously a very large number of potential improvements that might be included in the visualization, it is by no means a finished product, rather a toy. One interesting feature I have seen proposed that I did not include, is the ordering of the states (rows) to bring together the states that tends to behave similarly. In this way it would be possible to cluster them visually and gain some additional insights. So far, however, I didn't have enough time to implement it. Another filter could be added to isolate not only states with big sweeps but also years with big sweeps. Again, this is not yet implemented.Conclusion
I really hope you would like to critique this visualization and suggest potential improvement. At the same time it would be nice to know if you have found some additional interesting patterns. The additional data about presidents and winning party can be found here: presidents_mod.csv.- FM1 - InfoVis is about data exploration: I have heard it millions of times since I started reading papers and books on visualization, it is a sort of mantra: "infovis is there to support people in data exploration". Me myself I have also described infovis in these terms tens of times in papers and reports. But is data exploration a real activity or goal? Nobody really wants to explore data for the sake of it (apart from us infovis geeks who derive pleasure from it). Data exploration tells nothing about the goal of a user and the reason why he is willing to invest time in learning and using an infovis tool. Biologists don't want to explore data, they want to understand how genes react to certain interventions. Security analysts don't want to wander through millions of alarms, they want to spot intruders and react as fast and accurately as possible.
- FM2 - InfoVis is about discovery: This is another mantra of infovis, repeated millions of times. While it is true that infovis can help discovering new facts, its true value does not come from discovery but rather from understanding. The main reason why an infovis tool is useful is because it helps make sense of data and because it does it in a more efficient way. It permits to efficiently understand what the data has to tell. And its quality can (should) be measured in terms of how effectively and efficiently this process is supported. I remember John Stasko having said in his presentation at BELIV'06 something like this: the main activity supported by infovis is to learn about a domain. This is what we mostly want to do with infovis and this is what should be supported.
- FM3 - InfoVis is about new visualization techniques: InfoVis has already hundreds of techniques available which we can draw from. The real challenges we are confronted with are: 1) understand how to use and customize the techniques we have now to make them useful to specific problems and people; 2) how to combine different techniques in composite tools able to integrate them and get the best out of their composition (as an example why nobody tried, as far as I know, to integrate all those n-dimensional visualizations we have out there?). That said, I am not saying that inventing new techniques does not have its role or that it is a waste of time. I just believe it's time to shift a bit the focus.
- FM4 - InfoVis is about vision: I have already talked about this point in one of my posts some time ago titled: "the neglected role of interaction in information visualization". InfoVis is by no means only about visual things it is also about the way we interact with a dynamic display that is able to react as we interact with it. It is this level of interaction that permits us to efficiently manage screen real estate and allows us to reason about a domain. The big challenge for an infovis designer is not only to map data items on the screen in clever ways but also to support through careful interaction design the very tasks it is designed for. We know quite well the perceptual issues and the design principles needed to design of a visual mapping, but when we come to the point of designing interaction we are lost. The only support we have is to draw from simple ideas developed in other designs (hovering, link&brush;, dynamic filtering, etc.)
- FM5 - InfoVis is about the data: We tend to see infovis as a way to support a one directional channel: from data to our brain. But this view underestimates the role of the knowledge we put into the process. When a user interacts with a visualization, he brings his assumptions, background knowledge and skills that play a large role in the interpretation of what is seen on the screen. This is the reason why two different persons can very likely end up seeing different things from the same visualization. There is another hidden channel that goes in the opposite direction, from the human mind to the data, enriching it with the knowledge that is already in our heads. Notably, infovis tools fall short terribly when there is the need to manage this knowledge and let it play a role in the analysis.
"...there are some good examples from the VAST community where prior knowledge can be explicitly entered into the analysis process (e.g. i2's 'Analyst's Notebook' or IBM's Research's HARVEST project). My U of C colleague Torre Zuk has also done some analysis of how a physician's prior knowledge affects their decision making when presented with a visualization."Thanks Chris for your references!
Background
I have used the data from the Organization for Economic Co-operation and Development (OECD), which is often referred to as one of the main trusted authority for whatever concerns the education systems of a country. More precisely the data comes from the OECD report: "Education at a Glance". At the origin of the protest there is the reduction of the number of main teachers per class from 3 to 1, with a consequent reduction of the public personnel. The government says that having less teachers will not influence the quality of the studies and that quite a lot of public money will be saved. The protesters believe that the opposite is true and that the savings should not come from these cuts. The goal of these charts is not to provide a solution to the debate, rather it is a very small and focused view on the problem. I just tried to find some hints on two related questions that came to my mind:- How efficiently does the Italian system spend its money?
- Is proportion of students to teachers the cause of poor performance?
How efficiently does the Italian system spend its money?
The first chart replies to the first question. At least partially. The chart is a scatter plot of the OECD data on efficiency of school systems based on the following data:Scientific performance: called PISA (Programme for International Student Assessment) and defined as "an international study conducted by the OECD which measures how well young adults, at age 15 and therefore approaching the end of compulsory schooling, are prepared to meet the challenges of today's knowledge societies." It is supposed to be a good indication of how well our schools do. Expenditure per Student: It is defined as the equivalent US dollars expended per student.
I have drawn two lines to divide the space into 4 quadrants with respect to where Italy is placed. Of these quadrants I have highlighted the bottom right because it represents all countries who can perform better in terms of the PISA index and spend less. In other words all the countries in the quadrant not only are able to spend less but they also use this money more efficiently because they produce better students.
The sad truth is that my lovable country performs very bad. Greece and Portugal are valid companions but at least they spend less.
In oder to be sure that these results are not affected by the economic level of countries, I have also produced a second chart where the expenditure is normalized with respect to GDP (gross domestic product).
Unfortunately the result is even worse: Greece and Portugal perform worse but almost all the other countries are better. From the chart we can also see (in the bottom right) that Finland performs exceptionally well and that New Zealand, Netherlands and Australia performs very well too but spending less money.
Is proportion of students to teachers the cause of poor performance?
Since at the center of the debate there is the question of whether more or less teachers affect the quality of an education system, I created a bar chart comparing the ratio of students to teachers for the countries shown in the scatter plot. Here are two bar charts, one for primary school and one for secondary school. Again I have highlighted Italy in the chart to make the comparison with it easy.
As you can see Italy has one of the lowest ratios both in primary and secondary school, meaning that there are quite a few students for each teacher or, in other word, that teachers are not very overloaded compared to other countries. The comparison with other countries is quite interesting. Finland, Netherlands and New Zealand (Australia is missing in the data) which are very efficient, as we have seen in the scatter plots above, have quite higher values compared to Italy. Can we say then that at the root of the poor Italian performance there is the number of teachers? Or can we say that a small number of students per teacher is necessary to produce a school of high quality? I don't know ... but at least the graphics instill some doubts.
Technical Notes
The charts have all been done with Excel. After all it is always the best and most readily available tool. There is always a bit of a hassle in doing certain things, especially the defaults are crazy (like strong dark backgrounds), but in the end it works great. I have used the XY Chart Labeller to reduce label overlaps on the bar charts. This is also a bit cranky but in the end it does its job well. The annotations on the charts have been done with the graphic tools in Excel and externally within SnagIt, which I use to screencapture the charts. Yes I've used screen capture! I know I could use VBScript stuff or similar things to save the charts into images but it's always a kind of pain and less flexible than just press PrintScrn and edit the image.Disclaimer
With these charts I don't pretend to demonstrate anything, it's more an interesting exercise for me to create data graphics and to show how easily we can reason about data that pertains to facts related to our social life. The charts might show and evident bias towards judging the government interventions appropriate, but this is not my intent. Rather I would be very curious to see other charts that better clarify the issue and show with data and graphics arguments opposite to mine.Final Reflection
In order to build these charts I have invested very very few time (I invested a lot more time to write this post though!). I was able in a few clicks to clarify to myself some things on an issue which is quite hot during these days in my home country and which I dare about. The same thing might be done by millions of citizens if only instructed appropriately. And that would mean having a population of informed people, able to ground their protests on hard data and to communicate their arguments with the vividness of well done data graphics. Unfortunately this is very far to come. Simple techniques like these are never used by politicians or protesters, they prefer to use thousands and thousands of words in place of few well done charts. It's a pity for us and it's a pity for them.Backround
From the author Boris Müller's:"Poetry on the Road is an international literature festival which is held every year in Bremen, Germany. Since 2002 I am commisioned to design a visual theme for the festival. While the theme itself is changing, the underlying idea for the visuals is always the same: All graphics are generated by a computer program that turns texts into images. So every image is the direct representation of a specific text. The design and the development process are a collaboration with the design agency jung und pfeffer."
Description
This is the visualization designed for this year's edition:
It represents multiple texts at the same time and attempts to compare word frequency distributions among them. Each horizontal line is a single poem and each element on the line is a single word. The words are sorted by their frequency in the text and mapped to line width. Each line connects the same word from one poem to another; normally with tapered lines since the same word has different frequencies in different poems. Words which appear only once are represented with an "X".
Critique
What I like- Simplicity: compared to the visual poetry 2006 this mapping is a lot more comprehensible, very simple. A line is a poem, a dot is a word, same words in different poems are linked with a line. Since I am a big fan of simplicity this is the first thing I like to mention.
- Beauty: though probably related to simplicity, beauty is a great feature of this image. The chosen color is attractive, the sinuous curves are aesthetically pleasing, and I think there is the right balance between chaos and structure. Not too chaotic to be disturbing, not too structured to be dull.
- Patterns: I can clearly see some patterns! This is what I meant in my older post. Here I perceive simplicity and beauty and yet I can see some patterns. If you try, for instance, to follow the path of a single word from poem to poem you can see how certain words "sinuously" become very frequent or infrequent. Another pattern: the second poem from the bottom contains quite a lot of "X" words which are not used in other poems. One natural question would be: "How does it use such an amount of singular words?", "What poem is this?"
- Hovering: recognition vs. recall is at stake here. I don't want to remember one word first and then search for it in the text, I'd rather like to discover which words are those that expose some interesting patterns. Even a simple added feature like "hovering" onto a lines to see which word it is, would be great to have.
- Filtering: a bit more advanced as a feature and not necessarily needed in such a piece of art but imagine to have one simple slider to isolate the words within a certain frequency range. That would be even more fun.
- Labels: this is a bit more serious. Why not adding few small labels to at least say which poem is which? It would need 5 labels! Only 5: one for each poem. Adding labels to words is obviously more problematic but again, why not showing the word associated to a line when hovering with the mouse pointer?
Conclusion
The Visual Poetry 2008 is beautiful piece of art. Simple to understand and still complex enough to attract. It's a pity that some simple basic features are not added but I can understand they are probably not needed given the context of the picture and that it is primarily conceived to be printed on paper. Well done this time!The role of learning
Not all interactive applications must be easy to use from the first time they are encountered. Learning has its role and should be taken into account. One of the most foolish belief I see in visualization is the idea that visualization users should be effective after few minutes of their use. No, no, no! Many visualization techniques are complex by nature and cannot be made any simpler. Learning how to use them cleverly can bring enormous advantages. After all, this is the same way we do millions of activities every day (think about cars, photo cameras, music instruments). The problem with interactive applications is not when they are difficult to use but rather when they are made more difficult than they should. Some months ago I was discussing this topic with a bunch of people at the CHI 2008 conference and somebody (unfortunately I don't remember who was) told me of having assisted to Alfred Inselberg, the ideator of Parallel Coordinates, giving a demonstration of how to use them effectively and being stunned at how rich and powerful the process was if learned; quite complex though. The best applications are those which permit to perform some simple operations from the first time and little by little offer "hooks" to improve, perform better, and do more. Therefore an infovis designer should not focus on ease of use alone in a vacuum but rather and foremost on how to make it easy to learn.The role of domain task complexity
Learnability however is not the single thing to take into account, it must be balanced with task complexity. Some visualizations should in fact be easy to use soon, especially in situations where limited amount of time in learning can be invested and, more importantly, when the task at hand is simple in nature. Again, as I said before, the complexity of the tool should mirror the complexity of the task. Simple tasks should be supported by simple software, complex tasks require more reasoning and learning. If we try to use fancy visualizations to help people do simple things, most of the time we fail. Well designed simple bar charts, line graphs, and scatter plots is what users need most of the time, because in many cases tasks are simple. More complex visualizations and interaction schemes are justified when the complexity of the task is higher; the major goals of visualization is to make things simpler not harder. This reminds me of Stephen Few's talk at InfoVis 2007 (of which I talked about last year in this post) where he classified infovis users in:- Information consumers and presenters (around 80%)
- Informal data analysts (around 19% and growing)
- Sophisticated data analysts (less than 1% and in need of growth)







