A Visualization of Version Histories

This is just a side project. So the 6 of us have been working some internal project on TFS (5 devs and I. Sorry for all the destructive check-ins guys. Don’t lose your faith in designers because of me). One night I extracted the version histories and had some fun with it.

The visual is inspired by Context Free. I first played with the tool in ’07 and was impressed by the powerful outcome of combining  recursion and randomness. In this visualization each curve(represents a file or a folder) wanders freely in space and gradually fades out until it gets some attention. Each is colored by the last person who made changes. A curve branches when something derives from it (files added to folder; branching versions; etc.) The tree does get out of structure and less readable after some time. I’ll try some improved concept when I have the time. 

Full view (click to enlarge):

Version history tree, full view

With my contribution highlighted:

Version history tree with contributor highlights

Click here to play with the visualization. It should work with all decent browsers that support canvas. You can highlight the map by contributor or file. Hover over a node to see detail information. I did mess up the file names so (hopefully) I’m not violating any company policy. Please don’t get me in trouble. :-p

Tools: R, HTML5

Health Infoscape

Senseable City Lab partnered with GE to create new ways of understanding human health. Our team created a disease network by analyzing data from over 7.2 million anonymized electronic medical records, taken from between January 2005 and July 2010, across the United States.

Barabasi’s lab has published their disease networks generated by genetic similarity in 2007. In our first attempt, diseases/disorders are considered associated if a patient has got them at the same time or sequentially. The resulting network gives us new insight as to how closely connected some seemingly un-related health conditions might be. Such results force us to re-examine conventional categories of disease classification, as the boundaries between traditional disease categories are thoroughly blurred.

I made this interactive map for the general public to browse the data. You can switch between two layouts – network and circular. Dot sizes are proportional to the percentage of patients who sought medical attention in total population. Width of links shows the strength of connections. Hovering over a disease pops up detailed information. Clicking on a disease highlights its connections. It is also possible to filter the links by gender / category / keywords. Zooming and panning is supported. The aim is to let people locate their disease of interest quickly in a context.

[youtube width=520 height=324]ln6arKcE99E[/youtube]

It is a huge network. The first data files I got was more than 50M in size. It took me some time to get the loading time and real-time performance of the app to an acceptable level. I tried several schemes to filter the links while keeping the look and the structure. The force-directed layouts are pre-calculated. I ended up hard coding the network data instead of loading it from an external file, which helped a lot. The final package was ~1.5M, quite satisfying. And I do like the transition animation a lot.

The network vis was made with Flex and the visualization library Flare, and the user interface with Flash CS4. I’ve had some experience with Actionscript 3, but this is my first time learning Flex. I don’t know if it’s just my computer (Snow Leopard + Firefox 4 + Flash Builder 4), but debugging a Flex application was so much pain. The Flash plugin just crashes at every breaking point. Flare is a really well-done library, I’m truly grateful, but there is so much missing in its documentation. You have to dig into its source to discover the vast possibility of it.

The project was done around the time of my thesis “Seeing Differently: cartography for subjective maps based on dynamic urban data” (and the part of my life that I’m reluctant to look back to). I think the time I really worked on the core part of it was two weeks, then a lot of minor changes over a two months period. This is probably the most polished interactive vis among all projects I’ve posted. Thanks Eric and Dom! 🙂

Team: (Senseable) Carlo Ratti, Eric Baczuk, Dominik Dahlem, Xiaoji Chen
(General Electric) Camille Kubie, Aimee Atkinson

Tools used: Flash, Flex, Flare, R


The Connected States of America

The Connected States of America illustrates the emerging communities based on the social interactions defined by the anonymous cellphone usage data on AT&T’s network. It is a similar idea to the Redrawing Boundaries of Great Britain project we published early this year. One can find that the communities defined by human networks not always coincide with the administrative boundaries.

At the first phase of the project, visualizations were intensely used to help the scientists in our team to explore and validate the data. Comparing to the British dataset, the telecommunication pattern in the States featured many more hubs and more entangled connections, which made it harder to be represented in one static image. In the following map I used reversed coloring (the arc color on the source end is the color of the target county and vise versa. The color gradient helps identify geographic regions) to show which regions a point is most strongly connected to.

Another attempt was an interactive map where the user can click on a county to see its connectivity with all the other counties of the country. The strength of connections is defined by quantiles of total call time. The interactive map is also available on our project website. The following screenshot shows the outgoing connections from San Diego:

The partitioning algorithm showed some very interesting results, such as the split of New Jersey and California, and some other states belonging together.  Some clusters extend through the state lines and claim how people form communities despite of administrative boundaries. Yet the communities still largely correspond to state borders. I think it has something to do with the carrier’s rate policies.

The project has been covered by TIME Magazine and New York Times.

Data source & sponsor: AT&T
Team / Senseable: Carlo Ratti, Franscesco Calabrese(IBM Research), Dominik Dahlem, Xiaoji Chen
Team / AT&T Labs: Alexandre Gerber, DeDe Paul, Christoper Rath, James Rowland

Tools used: R, Processing, Illustrator


Sky Color of 10 Chinese Cities

Sky Color of Beijing 2000-2011

Well, not real colors of the sky – but you get the idea.

The dominant influence factor is the climate. Winter is the most polluted season because of thermal inversion and less rainfall. Spring in northern China suffers from sandstorm. Still, you can easily identify the effect of government intervention, such as the significant improvement in Taiyuan. And look how amazingly Beijing performed in 2008 August through September for the Olympics. Click the image below to zoom in.

Sky Color of Beijing 2000-2011 (small)

Tools used: R

Dedicated to my endearing home city.

Looking at the Forbidden City from Jingshan
[Looking into the Forbidden City from Jingshan – BJNews, March 21, 2011]

Update: Images now available as Flickr photostream