Subscribe to our Newsletter

All Posts (212)

Datascape - Immersive 3D Data Visualisation

Guest blog post by David Burden

With the launch last week of Datascape I thought it would be worth putting an MD’s perspective on the product – how we got here, what the philosophy is that lies behind it, and where we hope to go with it. For a more formal view of the academic and commercial background see our Immersive Data Visualisation white paper.

Datascape has undoubtably grown out of Daden’s virtual world heritage – and my own interest in data and data visualisation. Over the years we’ve used virtual world platforms such as VRML, Active Worlds, Second Life and OpenSim to create a variety of data visualisations, probably culminating in our original Datascape virtual command centre (which won a prize at the US Government’s Federal Virtual World Challenge), and the visualisation of Twitter data we did in OpenSim for the Royal Wedding in 2011. These examples and experiments, and those of others, together with an MOD funded research project we did in 2011 within Aston University doing a quantitative comparison of immersive and non-immersive 3D visualisations spaces convinced us that there was definitely something in immersive data visualisation.

In then moving from ideas and demonstrators to a full blown product I think that there are 4 key ideas that have informed our journey.



Datascape is about immersion. It is about putting you inside your data, allowing you to move around and through your data and view it from any angle, from inside or out. When in navigation mode there is no user interface – there is just your data ( possibly the ultimate expression of Edward Tufte’s Data Ink idea). This sense of immersion appears to help the brain see the patterns and anomalies in the data, because the data behaves like the real world – it stays still whilst your eye travels through it.



Datascape does not constrain you. If you want to map latitude to colour and longitude to shape you can do it. The heart of Datascape is the mapping screen, where you assign the fields in the data to the features of a plot point – its position, rotation, shape, size, colour, image and labels. With a full set of spreadsheet like functions at your call, and self-populating look-up tables, the plots you can produce probably really are only limited by your imagination. That flexibility does mean that initially there might be a bit more to learn, but we’ll be posting “recipes” and “how-to’s” on our web site to help you create the more common visualisations, and as we release successive versions of Datascape we may well start including wizards and templates that get you more directly to those common views.



Given that we needed good graphics and processing capability we took the decision early on that this would initially be a PC application, not something for the web or your tablet. However by basing Datascape on Unity we have got a path available to develop a web and/or tablet versions of Datascape if the demand is there. We have also been keeping a watching brief on HTML5 and WebGL and one feature under serious consideration is being able to export your completed workspace as a standalone HTML5 virtual world to share more easily with friends and colleagues.



One thing we have found as we begin to look at more and more data in Datascape is that we may need a new visual language to describe what we are doing with data in a 3D space. In 2D we are all used to line graphs and bar charts, pie charts and scatter plots. Whilst we can do these in 3D as well, they do not (except for the last) typically take fullest advantage of the medium.

For instance one problem we’ve found in 3D is that whilst the virtual space let’s us plot a long line of data stretching off into the distance, looking at the whole line is hard, you have to scroll as you do in 2D, unless we compress it (but then we lose the detail that the spread out 3D display brings). One solution that we have found is to plot the data as a cylinder, or even as a spiral, with the viewer in the centre. You can then take in a lot of data in one go, and just fly up and down the cylinder to other data – which is typically an easier action to control that horizontal flight. What other standard forms will we find, and how will we determine which form suits which type of data, and which type of enquiry?

Another difference is axes. In 2D the axes form a frame in which your data sits – and the same for non-immersive 3D cubes. But in an immersive space you are usually inside the data and the axes are nowhere in sight. So how do we maintain orientation within the data, and understand where the data points sit on the axes (that is if we actually need enumerated axes). There are no doubt a wide number of solutions to explore, and within Datascape we have distant XYZ markers so you can easily tell which direction you are looking in, whenever you hover over a point it can tell you its X,Y and Z values, and you can also have the point drop reference lines down to the axis or reference planes. One other thing we have tried, but not perfected enough to release, is a 3D compass, and another that we are looking at for future releases is the use of mini-maps, not just as a top-down (XZ plane) view but also as YZ and YX views as well. But can you cope with seeing your data in four directions at once?



We thought long and hard about this tag line, just as we did about whether or not to have avatars. We didn’t put avatars in the single user version since we felt that a) you got enough of a sense of immersion from the navigation alone and b) for most corporate users we spoke to avatars are still a turn-off and too closely associated with gaming environments. However “virtual world” (most emphatically in lower case) did seem by far the most appropriate way to describe what you can create with Datascape, a virtual world populated solely by you and you data.

In multi-user mode we do provide you with a very basic humanoid avatar – but it is very much a place-holder, a glyph, for where you are in the world and what direction you are looking. We deliberately kept clear of an avatar that was human enough for you to start worrying about what gender, or race or age it was, and what clothes it should wear! The resulting avatar is enough to let you know where your colleagues are and what you are looking at, no more, but even so it’s not long before you’re playing hide and seek amongst the data.

Going forward we may well increase the virtual world sense – for those who want it – with better avatars, more 3D scenery in which to place your visualisations ( closer to the original Datascape), and persistency and controlled sharing of your data and workspaces. But let’s start simply, and with something that everyone can hopefully relate to.


So hopefully that gives you some insight into our thinking as we developed Datascape, and some clues as to where we might take it in the future. Please download it (there is a free community version with a 6000 point limit and a paid pro version with a 65,000 limit - although we have had it running up to 250k points)  and give it a try, and hopefully it will open up a whole new world of data visualisation for you.

Read more…

Big data and Data Visualization

Like many people of a certain age, my first exposure to the term dashboard was when I developed a one for monitoring for corrective and preventive actions!

I have realised that Dashboard design itself is now the essence of simplicity and cutting edge technology, and stylish with it too, arising passions about what makes a great interface for analysis.
When it comes to software applications and websites, dashboards are around us everywhere too!

The era of Big Data has arrived, but most organizations are still unprepared. Enterprises erroneously believe and act like big data is a passing fad, and nothing has really changed. But big data is not a temporary thing. By acting as if it is, companies are missing out on tremendous opportunities by not focusing on such a great technology.

So what it is?

     Like many of us  know, an enterprise application dashboard is a one-stop shop of information. It’s a page made up of portlets or regions, grouping up related information into displays of graphs, charts, and graphics of different kinds. Dashboards visualize a breadth of information that spreads over a large range of activities in a application or functional area.

There are numerous case studies in explaining how visual representations are locating and leveraging valuable insights from a large set of structured or unstructured data, i.e., big data, are asking better questions, and are making better decisions.

Is it solves the purpose?

Yes! Dashboards when designed to aggregate sturctured and unstructured data into meaningful visual displays and representations, using analytical formulas over available data-sets at the backend to do the analysis and derivation work that users used to do with notepads, calculators or spreadsheets to find what out what’s changed or in need of attention.

Dashboards over a large amount of data enable users to prioritize work and to manage exceptions by taking light-weight actions immediately from the page, or to drill down to explore and do more in a transactional or analytics work area, if necessary.

The design of Dashboards on a very large amount of data, on the other hand, is much more open to interpretation. Most of these Bigdata Dashboards are simply a series of graphs, charts, gauges, or other visual indicators that a user has chosen to monitor, some of which may be strategically important, but others of which may not. Even if a strategic link exists, it may not be clear to the person monitoring the Dashboard, since the Objective statements, which explain what achievement is desired, are typically not present on Dashboards.

Why this?

I found interesting that there is an infographics and a data visualization categories. My interpretation is that the entries in the infographics section are static and illustrated, while those in the data visualization are generated and data-driven.

Nowadays, Bigdata can be used to gain a better insight over Data visualization using superior tools and techniques to present or analyze the available data.

On the other hand, it is economical in terms of space and would probably work in almost every case which are two things that dashboards should be good at. So while I wouldn’t have used it myself I can understand why this decision has been made. What makes a dashboard, or any other information-based design successful, is neither the design execution nor the clever information analysis and visualization technique.

These kinds of Dashboards, on the other hand eventually, are meant to be useful and to solve a specific problem. Dashboards for business users represent powerful means of communications nowdays when companies build large amounts of data. Those visually compressed representations of only the most important data are used for trackig.

DataViz on my view!

These data visualization can unintentionally bias the viewer as a result of the analyzed choices in visual method, sometimes visualization failing as a result of not understanding your viewers assumptions (cultural for instance, is RED a good or bad color?).

One interesting thing I always think of creating visualizations that discover something with the human eye that can't be discovered by a program. But there will be a challenge showing enough data to give a sense of context while providing enough detail to enable understanding.

What's then?

Whenever a Visualization is done based on Bigdata, once a data visualization designer is aware of simple principles of presenting data on a screen, they can apply them to any report or graph, data analysis or information dashboard without changing it's context or meaning. Only then will it provide a powerful means to make sense of data. When done properly, data visualization will make us think, compare data, read stories out of our data, will put data in the right context and ultimately help decision-makers to make the right decisions regardless of the available type or amount of data.

Do you have any thoughts on this? I am waiting to hear from you!

Read more…

Guest Blog post by Nilesh Jethwa

Data is bits and bytes and visualization has the power to tell the story in multiple forms. Today I wish to share two different visualization for the same dataset. 

Here is the link to the dataset and the dashboard as shown below




And here is the second Dashboard visual using cloropleth 





 You can analyze the pros and cons of both kind of visuals. Both of the visuals are pretty interesting and the key point is what story each one of is trying to tell.

Read more…

14 questions about data visualization tools

Guest blog post by Vincent Granville

Questions to ask when considering visualization tools:

  1. How do you define and measure the quality of a chart?
  2. Which tools allow you to produce interactive graphs or maps?
  3. Which tools do you recommend for big data visualization?
  4. Which visualization tools can be accessed via an API, in batch mode? (for instance, to update earthquake maps every 5 minutes, or stock prices every second)
  5. What do you think of Excel? And Python or Perl graph libraries? And R?
  6. Are there any tools that allow you to easily produce videos about of your data (e.g. to show how fraud cases or diseases spread over time)?
  7. In Excel you can update your data: then your model and charts get updated right away. Are there any alternatives to Excel, offering the same features, but having much better data modeling capabilities?
  8. How do you produce nice graph structures - e.g. to visually display Facebook connections?
  9. What is an heat map? When does it make sense to use it?
  10. How do you draw "force-directed graphs"?
  11. Good tools for raster images? for vector images? for graphs? for decision trees? for fractals? for time series? for stock prices? for maps? for spatial data?
  12. How can you integrate R with other graphical packages?
  13. How do you represent 5 dimensions (e.g. time, volume, category, price, location) in a simple 2-dimensional graph? Or is better to represent fewer dimensions if your goal is to communicate a message to executives?
  14. Why visualization tools used by mathematicians and operations research practitioners (e.g. Matlab) are not the same as tools used by data scientists? Is it because of the type of data, or just historical reasons?

Related articles:

Read more…

This is a guest blog post.

Ever wanted to quickly visually share some data with your colleagues or with the world and struggled with the tools available? After sharing the data, what if the viewer wanted to zoom in on a specific location, city or town to see what's going on there.

Google Fusion Tables is a free tool to show your data on a map & allow viewers to zoom in specific areas that they want to explore further. vHomeInsurance, a data driven home insurance analysis service, has detailed location data on home insurance rates & has used their data to create a guide to use Google Fusion tables to represent home insurance rates visually on a map.

1. Google Fusion Table Home Page

To start using Google Fusion tables, one must have a Google Account. After you have created a Google account or if you have already one, go to:  & click on the Create a Fusion Table link to begin.

2. Get your data into Google Fusion Tables

We have three choices to import data to Google Fusion table 1) Upload a File 2) Import from Google Spreadsheets 3) Update Data from an Empty Table

For this guide, we choose the Google Spreadsheet option. Choose the Google spreadsheet you want to import data from and click Next. You can choose the other options as well and click next.

 After the data is loaded into Google Fusion tables, make sure to check for column names and other details before importing.

3. Naming your table & data

Once the data is imported, make sure to give the appropriate table names, your licensing attribution & other details.

4. Map the “Location” field to the appropriate column

This is where the rubber hits the road where Google Geocodes your data to know which places is where.

For Instance, if you have data on Brooklyn Homeowner Insurance rates & want to Google Maps to show it in the appropriate location, then Google Fusion tables needs to figure out the Geo-coordinates for that data. To tell Google Fusion table, which data type to geocode, we need to change the appropriate column name to “Location” type. This can be done through the following steps:

  1. Hover over the column name that has the location data and click on the downward pointing arrow
  2. Click on Change
  3. On the page that comes, Choose “Location” for the type & then hit Save

5. Show the Geocoded data on a Map

Now, we need to actually show the data on a map. To do that, click on “Add Map”. The actual Geocoding & representation may take time & depends on the number of rows in your tables.


6. Configure Your Map Markers

The default representation of a place on the map are red circles but to we need to customize it make it more meaningful. As an example, in the home insurance world, home insurance in Chicago is $888 so we have a Yellow marker for it whereas home insurance in Phoenix is cheaper at $596 and we represent that as a green marker. In the configure map section, click on Change feature styles and then configure various buckets and associated color markets for the different values.

You can see a screenshot of the finished map below.

A detailed map is available to zoom in for various home insurance rates on vHomeInsurance.

About the Author:

The vHomeInsurance team are experts in home insurance rates data analysis & research.

Related articles

Read more…

Please join us on February 3, 2015 at 9am PT for our latest Data Science Central Webinar Event: 
Avoiding Data Pitfalls: Gaps Between Data and Reality sponsored by Tableau Software.

Space is limited.
Reserve your Webinar seat now

Have you ever been fooled by data? In this Webinar we will cover common pitfalls that anyone who works with data has fallen into. Find out what these pitfalls look like and how to avoid them. The pitfalls range from philosophical to technical, and from analytical to visual. 

Utilizing trusted techniques and visualization tools, we’ll help you learn how to avoid the common mistakes thus missing those otherwise uncomfortable pitfalls.


Ben Jones of Tableau Software

Hosted by: Tim Matteson, Cofounder, Data Science Central

Title:  Avoiding Data Pitfalls: Gaps between Data and Reality
Date:  Tuesday, February 3, 2015
Time:  9:00 AM - 10:00 AM PT


Again, Space is limited so please register early:
Reserve your Webinar seat now

After registering you will receive a confirmation email containing information about joining the Webinar.


Read more…

Guest blog post by Alex Jones.

Post adapted from Correlation vs Causation: Visualization, Statistics, and Intuition!

As someone who has a tendency to think in numbers, I love when success is quantifiable.

However, I suppose that means I must accept defeat (or in true statistician fashion-- try to discredit the correlation) when the numbers don't demonstrate what I had hoped for or intuitively believed!

With that, I decided to look into how my working at Cameron relates to the company's stock price. Alongside this analysis, I'll include a quick demo of scaling and data manipulation for visualization.

Of course, this post is meant to highlight one of the basic lessons of statistics in a mildly entertaining way.

To begin, I pulled Stock Price over my first ~90 Days. Since the market is only open on business days, it fits perfectly with the number of days worked.

If only every analysis was this convenient! From there, I merely added a column that counts number of days.

Eventually the data looked something like this:

Neat! Now, let's graph Adjusted Close Price vs Days Worked.

Super! As you can see in this graph, there's obviously no Relationship!

Not so fast. Let's Regress Days Worked Across Stock Price.

It's important to realize that while visualization is a phenomenal tool and incredibly insightful way to ingest data, it's not the whole story.

Blasphemy! For the sake of this article, humor my logical leaps.

With an R squared of .88 and a P Value out 42 Decimal Places, traditional statistics would say we are incredibly confident about these results!

So what do all those numbers really say? Well one interpretation would be that we can explain Stock Price by:

StockPrice= $75.99 -$.29672(NumberDaysAlexHasWorked)

That's a heck of a deal! I cost a little under 30 cents a day...

WRONG. That's per share. Since the company currently has ~197.45M Shares Outstanding, that means, based on these statistically significant results, I cost $58,587,364 per day.

Well this is awkward... 

Quick! Let's see if we can perform some "Transformations" on the data to get a "Better result". 

First, let's Scale Stock Price from 0 (lowest price) to 1 (highest price). To do so, we'll get the Minimum value and Maximum value. With those, we'll be able to get the Spread/ Span.

That calculation is simply Spread= Maximum - Minimum. Simple enough! 

Now how do we scale every datapoint? Great question.

We'll take (Stock Price X - Minimum)/ Spread. Boom! Scaled.

Now let's graph that!

Oh great! No relationship! Just as I wanted.

Woa woa woa... that doesn't seem right? Ok? Then what do you propose? Scale the number of days worked? 

Well I guess we could try that. Same formula/ process applied to days worked.

Ok, so maybe there's a relationship here... I suppose we should Invert days worked so that the lines go in the same general direction.

See how the Orange line (Days worked) currently starts at 0 and goes to 1. Let's flip that. How? We'll apply the formula Inverted Days Worked= 1-Scaled Days Worked. Now the line is flipped!

Let's Graph them.

Holy Moly. I see it now.

So now we have taken two vectors of differing relative magnitudes, scaled them to an equivalent range and controlled for directionality. Thereby enabling a linear depiction of the relationship and a more intuitive visualization!

Sorry, that was unnecessary. Nerdiness got the best of me.

So what then does this mean? Now what are the results of the regression?!

You better sit down for this... The regression results, in absolute terms, are EXACTLY the same. Even though the equation (shown on the final graph) is apparently different, once we "undo" all the scaling and transformations, and we get the numbers back into their original values... they will be the exact same as the original!

Hm.. Why is that? Because we're just transforming data! We're not changing the underlying geometry of the relationships. Relatively speaking, the data remains holistically the same. We didn't pick out one data point and change JUST that one. We changed all of them at the same time.

In other words, we're just moving the data's perspective in a multi-dimensional space, relative to US the viewers. You can zoom, stretch, angle, compress, and turn data in any way you want!

Let's take a second and think about this. For a moment, think of our data as a cube-- just to help conceptualize what's going on.

If we turn, flip, invert, scale, zoom out, or angle the cube in any way-- has the cube itself changed? Absolutely not. It's the exact same cube!

We're simply looking at it from a different perspective. So when we transform a "Data Vector / Cube" (as long as we "undo" those changes when we analyze the data in real terms)-- we're just finding that perfect angle to tell our story and create a compelling visual. That's powerful and exciting!

Victory is mine! Data hath been conquered!

Even with these marvelous findings, we must address the issue of primary concern--Causation vs Correlation! Based on statistics--- "data driven" results, and the interpretation we proposed earlier-- I'm the worst!

However, that's a myopic approach to statistics. Rather-- I bet you there's a 3rd variable indicative of the movement of stock price. What does days worked really represent? It is merely a count of the past ~90 days. So what else has happened in that period?

Well, if we consider the fact that the company is a major oilfield services firm or pick our head up and look at the companies and markets around us-- we quickly realize-- the missing link is the price of oil (at least I certainly hope so!).

What you should realize is that these relationships aren't always evident or obvious! In fact, visualizations in their raw form could disguise relationships! Statistics is still a subjective science-- subject to the availability of information and robustness of the analyst's forethought and interpretation!

More importantly, we can identify the importance that macro-oil market plays in stock price, rather than otherwise extraneous relationships! For brevity sake, we'll omit another full analysis saga.

Most importantly, this should help to exemplify one of the most exciting value potentials of "Big Data". Essentially, we now have access to incredible amounts of information relative to the "Universal Variable"-- of time. With that point to relate on, we can now see how major indexes, markets, events, weather patterns, customer announcements, etc interrelate!

As we move towards an even smaller and more interconnected world, expect to see more "Universal" data points-- (and actively promote them, in the long run, it'll make your analysis more resilient and dynamic!).

Thanks for reading!

Follow Alex Jones


Read more…

Interactive Data Visualization for the Web, by Scott Murray, O’Reilly (2013), has a Free online version.

An introduction to D3 for people new to programming and web development, published by O’Reilly. “Explaining tricky technical topics with aplomb (and a little cheeky humor) is Scott Murray’s forte. If you want to dive into the world of dynamic visualization using web standards, even if you are new to programming, this book is the place to start.” - Mike Bostock, creator of D3.

From O’Reilly website: "This step-by-step guide is ideal whether you’re a designer or visual artist with no programming experience, a reporter exploring the new frontier of data journalism, or anyone who wants to visualize and share data. Create and publish your own interactive data visualization projects on the Web—even if you have little or no experience with data visualization or web development. It’s easy and fun with this practical, hands-on introduction. Author Scott Murray teaches you the fundamental concepts and methods of D3, a JavaScript library that lets you express data visually in a web browser. Along the way, you’ll expand your web programming skills, using tools such as HTML and JavaScript"

This online version of Interactive Data Visualization for the Web includes 44 examples that will show you how to best represent your interactive data. For instance, you'll learn how to create this simple force layout with 10 nodes and 12 edges. Click and drag the nodes below to see the diagram react.

Read more…

35 books on Data Visualization

1. The Visual Display of Quantitative Information Author: by Edward Tufte Publisher: Graphics Press, 1983 Pages: 197 A modern classic. Tufte teaches the fundamentals of graphics, charts, maps and tables. "A visual Strunk and White" (The Boston Globe). Includes 250 delightfullly entertaining illustrations, all beautifully printed.
Read more…

Beyond The Visualization Zoo

The best document I have read on visualization is called "A Tour Through The Visualization Zoo" by Jeffrey Heer, Michael Bostock, Vadim Ogievetsky. It's a must-read picture book for aspiring Data Scientists. Most of the graphics from this post are examples of the Tour taken from the d3 gallery.
Read more…
The top tech companies by market capitalization are IBM, HP , Oracle , Microsoft , Cisco , SAP , EMC , Apple , Amazon and Google All of the top tech companies are selected based on their current market capitalization with the exception of Yahoo. The year 2014 is not included as part of this analysis. Data: The source of this data is from the public financial records from
Read more…
"A picture is worth a thousand words" or in the case of Data Science, we could say "A picture is worth a thousand statistics". Interactive Data Visualization or Visual Analytics has become one of the top trends in transforming business intelligence (BI) as technologies based on Visual Analytics have moved into widespread use.
Read more…

Webinar Series

Follow Us

@DataScienceCtrl | RSS Feeds