Subscribe to our Newsletter

All Posts (216)

This chart communicates the same insights as a contour plot. What is interesting is the choice of hexagonal buckets (rather than squares) to aggregate data. In fact, any tessellation would work, in particular Voronoi tessellations.

3-D Voronoi tessellation 

The reason for using hexagons is that it is still pretty simple, and when you rotate the chart by 60 degrees (or a multiple of 60 degrees) you still get the same visualization.  For squares, rotations of 60 degrees don't work, only multiples of 90 degrees work. Is it possible to find a tessellation such that smaller rotations, say 45 or 30 degrees, leave the chart unchanged? The answer is no. Octogonal tessellationsdon't really exist, so the hexagon is an optimum. 

Hexagonal binning plots (source: here)

Implementation in R

The three plots described here (Voronoi diagram, hexagonal binning and contour plots) are available in the ggplot2 package.

  • Hexagonal binning: ggplot function with the parameter stat_binhex, see here
  • Contour plot: ggplot function with the parameters geom_point and geom_density2 or stat_contour, see here  (also works with contour)
  • Voronoi diagram: ggplot with the parameter geom_segment, see here


Voronoi diagrams can be used for nearest neighbor clustering or density estimation, the density estimate attached to a point being proportional to the inverse of the area of the Voronoi polygon containing it.


Example of contour map (source: here)

Originally posted here

Read more…

The post is devoted to select the most popular ad to display on a webpage to gather the most clicks. The rate at which the webpage visitors click on an ad is called a conversion rate for the add.

Assume that we have several ads and a place on a webpage to show one of them. We can display them one by one, record all the clicks, analyse the results afterwards and figure the most popular. But an ad display may be pricey. It would be more effective to estimate rates in real time and to display the most popular one as soon as rates can be compared. Especially if an ad leads to a page for a visitor to buy something. There are couple of method for such estimations: Upper Confidence Bound method and Thompson Sampling method.

The fist one is based on an confidence interval concept which is studied in a Statistics course and has a good intuitive explanation. Roughly speaking a confidence interval is a numeric interval were our value is supposed to lie with some probability, usually 95%. (The real statistical definition is more technical and means not quite this, but it practice the above explanation is close enough.)During our ad displays we can compute average rates at each step with corresponding confidence intervals and pick up for next display an ad with a highest upper confidence bound. You can see how it happens in the video below.

The method has some drawbacks. It does not take into account that our rates must be between 0 and 1, so initial confidence intervals usually are much greater. It means that we loose some time on getting realistic values for our intervals. The worse thing is that if we throw in an additional ad then the process takes a lot of time to recover.

Here is another method which is more efficient, Thompson Sampling Method. It constructs Beta distributions for each ad rate and instead of computing averages draws a random number in accordance with the distribution. There is a picture how it goes for one ad, with a blue vertical line marking a mean and the red line for a random value:

As you see since a random value has more probability to appear were our line is higher, then it get closer and closer to the mean at each step. You might view the area where a curve appears higher than horizontal axis as a confidence interval analogue.

Here how it works for a few ads (I dropped means to make picture more clear):

In addition it accommodates an additional ad in the middle of the process more easily. 


Do not hesitate to ask questions!

(the post originally appeared here: Mya Bakhoava's blog

Read more…

Finding insights with graph analytics

Originally posted here


From detecting anomalies to understanding what are the key elements in a network, or highlighting communities, graph analytics reveal information that would otherwise remain hidden in your data. We will see how to integrate your graph analytics with Linkurious Enterprise to detect and investigate insights in your connected data.


What is graph analytics

Definition and methods


Graph analytics is a set of tools and methods aiming at extracting knowledge from data modeled as a graph. The graph paradigm is ideal to make the best out of connected data, which value resides for the most part in its relationships. But even with data modeled as a graph, extracting knowledge and providing insights can be challenging. Faced with multi-dimensional data and very large datasets, analysts need tools to accelerate the discovery of insights.


The field of graph theory has spawned multiple algorithms that analysts can rely on to find insights hidden in graph data. Below are the some of the popular graph algorithms and how they can help find insights for use-cases such as fraud, network management, anti-money, intelligence analysis or cybersecurity:


  • Pattern matching algorithms allow to identify one or several subgraphs with a given structure within a graph. Example: A company node with the country property containing “Luxembourg” connected to at least five officer nodes with a registered address in France.
  • Traversal and pathfinding algorithms determine paths between nodes within the graph, without knowing what connections exist or how many of them separate the two nodes. In money laundering investigations, path analysis can help determine how money flows through a network of individuals, how it goes from company A to person B. Example: the shortest path algorithm.
  • Connectivity algorithms find the minimum number of nodes or edges that need to be removed to disconnect the remaining nodes from each other. It is helpful to determine weaknesses in an IT network for instance and find out which infrastructure points are sensitive and can take it down. Example: the Strongly Connected Components algorithm
  • Community detection algorithms identify clusters or groups, of nodes densely connected within the graph. This is particularly helpful to find groups of people that might belong to a common criminal organization. Example: the Louvain method, the label propagation algorithm.
  • Centrality algorithms determine a node’s relative importance within a graph by looking at how connected it is to other nodes. It is used for instance to identify key people within organizations. Example: the PageRank algorithm, degree centrality, closeness centrality, betweenness centrality

Architecture blueprint for graph analytics


Depending on your data, your use-case, and the questions you have to answer, technology and infrastructure can differ from one organization to another. But a generic graph analytics architecture usually consists of the following layers:


  • Linkurious Enterprise: the browser-based platform and its server are used by investigation teams to  visualize and analyze graph data. It retrieves data in real-time from graph databases.
  • Graph databases: transactional systems storing data as graphs and managing operations such as data retrieval or writing. They perfectly handle real-time queries, making them great online transaction processing (OLTP) systems.
  • Graph processing systems: a set of analytical engines shipping with common graph algorithms and handling large-scale online analytical processing (OLAP) on graphs.
graph analytics Linkurious schema

Architecture blueprint for graph analytics


Linkurious Enterprise acts as a front-end where analysts and investigators can easily retrieve information. The data accessed by Linkurious Enterprise is stored in a graph database. Graph databases are well suited for real-time querying and long-term persistence but are usually not designed for running complex graph algorithms at scale. As a result, our clients tend to push this sort of workload to dedicated graph processing frameworks such as Spark/GraphX. The results are then persisted back in the graph database as new properties (eg a PageRank score property for example) and thus become available to Linkurious Enterprise.

Applying graph analytics to the Paradise Papers data


In this section, we take a closer look at a real-life graph dataset, the Paradise Papers dataset, created by the ICIJ to investigate the world offshore finance industry. We use Linkurious Enterprise to query, analyze and visualize the data using graph analytics tools and methods.


The setup

Linkurious graph analytics

The setup used in our example


For the purpose of this example, we relied on the architecture pictured above:

The Paradise Papers dataset


The dataset is made of 1,582,953 nodes and 2,398,680 edges. It aggregates data from four investigations of the ICIJ: the Offshore Leaks, the Panama Papers, the Bahamas Leaks and the Paradise Papers.


The graph data model has four types of nodes and three types of edges as depicted below.


Paradise papers linkurious

Graph data model of the Paradise Papers dataset


In the following sections, we will see how to use different graph analytics approaches such as graph pattern matching, PageRank analysis, and the Louvain community detection method. While implementing graph analytics requires some technical knowledge, we will see how Linkurious Enterprise can make graph analytics results accessible to every analyst via simple tools. Among these tools are query templates, an alert dashboard, and a visualization interface.


Graph pattern matching in Linkurious Enterprise


A simple method for identifying patterns in a graph is to use graph languages to describe the shape of the data you are looking for. As a developer, you can do it in the interface of your favorite graph database but also within the Linkurious Enterprise interface.


What if you want to be warned every time a certain graph pattern appears in your data? Via the Linkurious Enterprise alert system, you set up alerts for graph patterns you want to monitor. Every time a new match is detected in the database, it’s recorded and available for users to review. This is useful in a fraud monitoring context for instance where you’d want to be notified when instances of known fraud schemes occur.


In the video below, we set up a new alert in Linkurious Enterprise for a specific pattern. The alert contains a graph query looking for addresses tied to more than five entities or company officers.



Once the alert is saved, users access a match list and can start investigating the results. Below, we review one of the findings from the alert investigation interface. 



When looking at a node representing a company, you may want to know what are all the other companies it is sharing the same addresses with. The answer can be retrieved manually, by expanding and filtering the data. Or it can be retrieved via a graph query, which requires technical skills. With Linkurious Enterprise’ query templates, you can apply pre-formatted graph queries with the click of a button and accelerate your data exploration. Users run query templates by right-clicking on a node in the visualization and choosing the desired template from the menu. 


Below is an example of how to set up a query template. We configure it to retrieve, for a given company officer, all the other officers it is connected to via a shared address or a shared company.



Once the query is configured, users can easily access and run it from the visualization interface to speed up their investigations.



In addition to these features, users can rely on Linkurious Enterprise styling and filtering capabilities to analyze the data faster. Once the results of the query are displayed, styles and filters are essential to refine the results, reduce the noise and highlight the key elements.


In the next section, we see how to automate the identification of unusual companies within the French network using the PageRank algorithm and Linkurious Enterprise’s alert system.


Identifying key nodes with the PageRank algorithm


To use graph algorithms in Linkurious Enterprise, you will first need to run them on your backend and save their results as new properties in your graph database. In this example, we show how to identify key nodes in your network using the PageRank algorithm. This centrality algorithm will compute a score assessing the relative importance of various nodes within a network.

One line of code is enough to run the algorithm in Neo4j and create a new node property, “pagerank_g” with the resulting PageRank score.


// Computation of PageRank
CALL algo.pageRank(null,null,{write:true,writeProperty:’pagerank_g’})


Once this has been added to our graph, we can start exploiting the results in Linkurious Enterprise.

We created a new alert, leveraging the PageRank results. The query is simple: it searches for Entity nodes connected to other nodes (Countries, Officer, Intermediary) located in France. It also collects their PageRank scores and ranks them by order of importance. Every matching sub-graph is recorded by the alert system and can be investigated. By sorting results by their PageRank scores, we can focus our investigation on the most important companies within the French network.


// Detect French entities with a high PageRank


MATCH (a:Entity)-[r]-(b)
WHERE b.countries = “France”
WITH a.pagerank as score, a, COLLECT( distinct r) as r, COLLECT( distinct b) as b, count(b) as degree
RETURN a, score, as name, r, b, degree


In the example below, we review one of the top matches recorded by the alert system. 



In addition to these features, users can rely on Linkurious Enterprise styling and filtering capabilities to analyze the data faster. For instance, it’s possible to size and filter the nodes based on their PageRank score to get a faster understanding of the situations as depicted in the image below.


style and analytics

A size is applied to “location” nodes based on their PageRank score to highlight nodes of importance.

By enriching the data with additional information, the PageRank algorithm helped us focus on nodes of interest. The alert system in Linkurious Enterprise helps us classify the results and provides a user-friendly interface for investigation. In the next section, we see how to detect community of interest with a single click using the Louvain algorithm and the query template system.

Identifying interesting communities via the Louvain modularity


In the example below, we implement the Louvain algorithm to identify communities within our network. We look specifically at communities of company officers based on their relationships. The snippet of code below identifies communities and adds a new property “communityLouvain” property to each node, representing the community it belongs to.


// Computation of Louvain modularity


CALL algo.louvain(
 ‘MATCH (p:Officer) RETURN id(p) as id’,
 ‘MATCH (p1:Officer)-[:OFFICER_OF]->(:Entity)<-[:OFFICER_OF]-(p2:Officer)
  RETURN id(p1) as source, id(p2) as target’,


Then, we leverage the data generated by the algorithm in a query template to retrieve in a click for a given “Officer” node, the other officers belonging to the same community. Instead of manually exploring each of the nodes’ neighbors to identify a potential community, the query template instantly provides an answer the analysts can then refine. Below is the code used in the query template.


//Retrieve the officer nodes who belong to the same community


MATCH (a:Officer)
MATCH p = (a:Officer)-[*..4]-(b:Officer)
WHERE a.communityLouvain = b.communityLouvain


We can now retrieve, in a click, officers of the same community from any given officer in the visualization interface. In the example below, we apply this to Boris Rotemberg, a Russian oligarch, opening an investigation on his close connections. Once the results of the query are displayed, styles and filters are essential to refine the results, reduce the noise and highlight the key elements.



Graph analytics and graph visualization are complementary. The existing graph analytics tools and methods make it possible to extract information from large amounts of connected data, generating valuable insights.


With platforms like Linkurious Enterprise, every user can take advantage of graph analytics from their browser via an intuitive interface. From detecting financial crimes, such as money laundering or tax evasion, to spotting fraud, or fighting organized crime, analysts find the insights they need.

DSC Resources

Follow us: Twitter | Facebook

Read more…

Creating a Great Information Dashboard

Our world is dominated by charts and graphs, from the news showing economic performance or that annoying friend of yours on social media that posted their Strava story for 2017 to show off how far they ran or biked or hiked.  Even the infamous maps of the United States showing various states colored red and blue that people become obsessed with every four years. Charts and graphs are everywhere. Dashboards are where these charts can work together to reach their full potential.  Multiple, related information visualizations working together, where the charts can all be consumed almost simultaneously.   Without any distraction from having to scroll to another part of the window or having to change between screens or tabs in web browsers.  Dashboards are the pinnacle of information presentation systems providing support for organizational decision-making activities. A well-crafted and successful dashboard makes a decision maker an informed decision maker.  
Read more…

Dataviz with Python

This article was written by Reiichiro Nakano.

There are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.

Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset:

# Import what's needed for the Functions API
import matplotlib.pyplot as plt
import scikitplot.plotters as skplt

# This is a Keras classifier. We'll generate probabilities on the test set., y_train, batch_size=64, nb_epoch=10, verbose=2)
probas = keras_clf.predict_proba(X_test, batch_size=64)

# Now plot.
skplt.plot_precision_recall_curve(y_test, probas)

Installation is of the sciplot library is simple! First, make sure you have the dependencies Scikit-Learn and Matplotlib installed.

Then just run:

pip install scikit-plot

Or if you want, clone this repo and run

python install

at the root folder.

Originally posted here.



Read more…

Power BI: Tutorial

Guest blog by Robert Breen.

What is PowerBI?

Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Whether your data is a simple Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data warehouses, Power BI lets you easily connect to your data sources, visualize (or discover) what’s important, and share that with anyone or everyone you want.

Power BI can be simple and fast – capable of creating quick insights from an Excel spreadsheet or a local database. But Power BI is also robust and enterprise-grade, ready for extensive modeling and real-time analytics, as well as custom development. So, it can be your personal report and visualization tool, and can also serve as the analytics and decision engine behind group projects, divisions, or entire corporations.


What is Power BI Desktop?

Power BI Desktop is a free application you can install on your local computer that lets you connect to, transform, and visualize your data. With Power BI Desktop, you can connect to multiple different sources of data, and combine them (often called modeling) into a data model that lets you build visuals, and collections of visuals you can share as reports, with other people inside your organization. Most users who work on Business Intelligence projects use Power BI Desktop to create reports, and then use the Power BI service to share their reports with others.

The most common uses for Power BI Desktop are the following:

  • Connect to data
  • Transform and clean that data, to create a data model
  • Create visuals, such as charts or graphs, that provide visual representations of the data
  • Create reports that are collections of visuals, on one or more report pages
  • Share reports with others using the Power BI service

People most often responsible for such tasks are often considered data analysts (sometimes just referred to as analysts) or Business Intelligence professionals (often referred to as report creators). However, many people who don't consider themselves an analyst or a report creator use Power BI Desktop to create compelling reports, or to pull data from various sources and build data models, which they can share with their coworkers and organizations.

With Power BI Desktop you can create complex and visually rich reports, using data from multiple sources, all in one report that you can share with others in your organization.

The steps you need to follow to install the desktop application are:

  • Once you downloaded the file, open and follow the instructions:



Connect to data

To get started with Power BI Desktop, the first step is to connect to data. There are many different data sources you can connect to from Power BI Desktop. To connect to your data, follow the next steps:

  • Select the Home ribbon and then select “Get Data”:

  • Then select your data source:

  • When you select a data type, you're prompted for information, such as the URL and credentials, necessary for Power BI Desktop to connect to the data source on your behalf.

Once you connect to one or more data sources, you may want to transform the data so it's useful for you.


Transform and clean data, create a model

In Power BI Desktop, you can clean and transform data using the built-in Query Editor. With Query Editor you can make changes to your data, such as changing a data type, removing columns, or combining data from multiple sources. It's a little bit like sculpting - you can start with a large block of clay (or data), then shave pieces off or add others as needed, until the shape of the data is how you want it.

If for example, you want to change the format of one column you need to follow these steps:

  • Select the column header:

  • Right-click to show the menu and select the “Change Type” option and then choose the right option for you:

  • You’ll see the results:

Each step you take in transforming data (such as rename a table, transform a data type, or delete columns) is recorded by Query Editor, and each time this query connects to the data source those steps are carried out so that the data is always shaped the way you specified.

The following image shows the Query Settings pane for a query that has been shaped and turned into a model.

Once your data is how you want it, you can create visuals.


Create visuals

Once you have a data model, you can drag fields onto the report canvas to create visuals. A visual is a graphic representation of the data in your model. The following visual shows a simple column chart.

There are many different types of visuals to choose from in Power BI Desktop. To create or change a visual, just select the visual icon from the Visualizations pane. If you have a visual selected on the report canvas, the selected visual changes to the type you selected. If no visual is selected, a new visual is created based on your selection.

To create a new visual, follow the next steps:

  • Choose the appropriate chart from the “Visualizations” pane:


  • Drag your data into the “Axis” and “Value” fields:


  • And you’ll see a chart in your spreadsheet:


You can customize many of your chart fields and labels in the “Format Pane”:

Learn more at

DSC Resources

Read more…

Here we ask you to identify which tool was used to produce the following 18 charts: 4 were done with R, 3 with SPSS, 5 with Excel, 2 with Tableau, 1 with Matlab, 1 with Python, 1 with SAS, and 1 with JavaScript. The solution, including for each chart a link to the webpage where it is explained in detail (many times with source code included) can be found here. You need to be a DSC member to access the page with the solution: you can sign-up here.

How do you score? Would this be a good job interview question?

Chart 1

Chart 2

Chart 3

Chart 4

Chart 5

Chart 6

Chart 7

Chart 8

Chart 9

Chart 10

Chart 11

Chart 12

Chart 13

Chart 14

Chart 15

Chart 16

Chart 17

Chart 18


DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

This was originally posted here

Deep Learning gets more and more traction. It basically focuses on one section of Machine Learning: Artificial Neural Networks. This article explains why Deep Learning is a game changer in analytics, when to use it, and how Visual Analytics allows business analysts to leverage the analytic models built by a (citizen) data scientist.

What is Deep Learning and Artificial Neural Networks?

Deep Learning is the modern buzzword for artificial neural networks, one of many concepts and algorithms in machine learning to build analytics models. A neural network works similar to what we know from a human brain: You get non-linear interactions as input and transfer them to output. Neural networks leverage continuous learning and increasing knowledge in computational nodes between input and output. A neural network is a supervised algorithm in most cases, which uses historical data sets to learn correlations to predict outputs of future events, e.g. for cross selling or fraud detection. Unsupervised neural networks can be used to find new patterns and anomalies. In some cases, it makes sense to combine supervised and unsupervised algorithms.

Neural Networks are used in research for many decades and includes various sophisticated concepts like Recurrent Neural Network (RNN), Convolutional Neural Network (CNN) or Autoencoder. However, today’s powerful and elastic computing infrastructure in combination with other technologies like graphical processing units (GPU) with thousands of cores allows to do much more powerful computations with a much deeper number of layers. Hence the term “Deep Learning”.

The following picture from TensorFlow Playground shows an easy-to-use environment which includes various test data sets, configuration options and visualizations to learn and understand deep learning and neural networks:

If you want to learn more about the details of Deep Learning and Neural Networks, I recommend the following sources:

  • “The Anatomy of Deep Learning Frameworks”– an article about the basic concepts and components of neural networks
  • TensorFlow Playground to play around with neural networks by yourself hands-on without any coding, also available on Github to build your own customized offline playground
  • Deep Learning Simplified” video series on Youtube with several short, simple explanations of basic concepts, alternative algorithms and some frameworks like or Tensorflow

While Deep Learning is getting more and more traction, it is not the silver bullet for every scenario.

When (not) to use Deep Learning?

Deep Learning enables many new possibilities which were not possible in “mass production” a few years ago, e.g. image classification, object recognition, speech translation or natural language processing (NLP) in much more sophisticated ways than without Deep Learning. A key benefit is the automated feature engineering, which costs a lot of time and efforts with most other machine learning alternatives. 

You can also leverage Deep Learning to make better decisions, increase revenue or reduce risk for existing (“already solved”) problems instead of using other machine learning algorithms. Examples include risk calculation, fraud detection, cross selling and predictive maintenance.

However, note that Deep Learning has a few important drawbacks:

  • Very expensive, i.e. slow and compute-intensive; training a deep learning model often takes days or weeks, execution also takes more time than most other algorithms.
  • Hard to interpret: lack of understandability of the result of the analytic model; often a key requirement for legal or compliance regularities
  • Tends to overfitting, and therefore needs regularization

Deep Learning is ideal for complex problems. It can also outperform other algorithms in moderate problems. Deep Learning should not be used for simple problems. Other algorithms like logistic regression or decision trees can solve these problems easier and faster.

Open Source Deep Learning Frameworks

Neural networks are mostly adopted using one of various open source implementations. Various mature deep learning frameworks are available for different programming languages.

The following picture shows an overview of open source deep learning frameworks and evaluates several characteristics:

These frameworks have in common that they are built for data scientists, i.e. personas with experience in programming, statistics, mathematics and machine learning. Note that writing the source code is not a big task. Typically, only a few lines of codes are needed to build an analytic model. This is completely different from other development tasks like building a web application, where you write hundreds or thousands of lines of code. In Deep Learning – and Data Science in general – it is most important to understand the concepts behind the code to build a good analytic model.

Some nice open source tools  like KNIME or RapidMinerallow visual coding to speed up development and also encourage citizen data scientists (i.e. people with less experience) to learn the concepts and build deep networks. These tools use own deep learning implementations or other open source libraries like or DeepLearning4j as embedded framework under the hood.

If you do not want to build your own model or leverage existing pre-trained models for common deep learning tasks, you might also take a look at the offerings from the big cloud providers, e.g. AWS Polly for Text-to-Speech translation, Google Vision API for Image Content Analysis, or Microsoft’s Bot Framework to build chat bots. The tech giants have years of experience with analysing text, speech, pictures and videos and offer their experience in sophisticated analytic models as a cloud service; pay-as-you-go. You can also improve these existing models with your own data, e.g. train and improve a generic picture recognition model with pictures of your specific industry or scenario.

Deep Learning in Conjunction with Visual Analytics

No matter if you want to use “just” a framework in your favourite programming language or a visual coding tool: You need to be able to make decisions based on the built neural network. This is where visual analytics comes into play. In short, visual analytics allows any persona to make data-driven decisions instead of listening to gut feeling when analysing complex data sets. See “Using Visual Analytics for Better Decisions – An Online Guide” to understand the key benefits in more detail.

A business analyst does not understand anything about deep learning, but just leverages the integrated analytic model to answer its business questions. The analytic model is applied under the hood when the business analyst changes some parameters, features or data sets. Though, visual analytics should also be used by the (citizen) data scientist to build the neural network. See “How to Avoid the Anti-Pattern in Analytics: Three Keys for Machine ...” to understand in more details how technical and non-technical people should work together using visual analytics to build neural networks, which help solving business problems. Even some parts of data preparation are best done within visual analytics tooling.

From a technical perspective, Deep Learning frameworks (and in a similar way any other Machine Learning frameworks, of course) can be integrated into visual analytics tooling in different ways. The following list includes a TIBCO Spotfire example for each alternative:

  • Embedded Analytics: Implemented directly within the analytics tool (self-implementation or “OEM”); can be used by the business analyst without any knowledge about machine learning (Spotfire: Clustering via some basic, simple configuration of a input and output data plus cluster size)
  • Native Integration: Connectors to directly access external deep learning clusters. (Spotfire: TERR to use R’s machine learning libraries, KNIME connector to directly integrate with external tooling)
  • Framework API: Access via a Wrapper API in different programming languages. For example, you could integrate MXNet via R or TensorFlow via Python into your visual analytics tooling. This option can always be used and is appropriate if no native integration or connector is available. (Spotfire: MXNet’s R interface via Spotfire’s TERR Integration for using any R library)
  • Integrated as Service via an Analytics Server: Connect external deep learning clusters indirectly via a server-side component of the analytics tool; different frameworks can be accessed by the analytics tool in a similar fashion (Spotfire: Statistics Server for external analytics tools like SAS or Matlab)
  • Cloud Service: Access pre-trained models for common deep learning specific tasks like image recognition, voice recognition or text processing. Not appropriate for very specific, individual business problems of an enterprise. (Spotfire: Call public deep learning services like image recognition, speech translation, or Chat Bot from AWS, Azure, IBM, Google via REST service through Spotfire’s TERR / R interface)

All options have in common that you need to add configuration of some hyper-parameters, i.e. “high level” parameters like problem type, feature selection or regularization level. Depending on the integration option, this can be very technical and low level, or simplified and less flexible using terms which the business analyst understands. 

Deep Learning Example: Autoencoder Template for TIBCO Spotfire

Let’s take one specific category of neural networks as example: Autoencoders to find anomalies. Autoencoder is an unsupervised neural network used to replicate the input dataset by restricting the number of hidden layers in a neural network. A reconstruction error is generated upon prediction. The higher the reconstruction error, the higher is the possibility of that data point being an anomaly.

Use Cases for Autoencoders include fighting financial crime, monitoring equipment sensors, healthcare claims fraud, or detecting manufacturing defects. A generic TIBCO Spotfire template is available in the TIBCO Community for free. You can simply add your data set and leverage the template to find anomalies using Autoencoders – without any complex configuration or even coding. Under the hood, the template uses’s deep learning implementation and its R API. It runs in a local instance on the machine where to run Spotfire. You can also take a look at the R code, but this is not needed to use the template at all and therefore optional.

Real World Example: Anomaly Detection for Predictive Maintenance

Let’s use the Autoencoder for a real-world example. In telco, you have to analyse the infrastructure continuously to find problems and issues within the network. Best before the failure happens so that you can fix it before the customer even notices the problem. Take a look at the following picture, which shows historical data of a telco network:

The orange dots are spikes which occur as first indication of a technical problem in the infrastructure. The red dots show a constant failure where mechanics have to replace parts of the network because it does not work anymore.

Autoencoders can be used to detect network issues before they actually happen. TIBCO Spotfire is uses H2O’s autoencoder in the background to find the anomalies. As discussed before, the source code is relative scarce. Here is the snipped of building the analytic model with H2O’s Deep Learning R API and detecting the anomalies (by finding out the reconstruction error of the Autoencoder):

This analytic model – built by the data scientist – is integrated into TIBCO Spotfire. The business analyst is able to visually analyse the historical data and the insights of the Autoencoder. This combination allows data scientists and business analysts to work together fluently. It was never easier to implement predictive maintenance and create huge business value by reducing risk and costs.

Apply Analytic Models to Real Time Processing with Streaming Analytics

This article focuses on building deep learning models with Data Science Frameworks and Visual Analytics. Key for success in projects is to apply the build analytic model to new events in real time to add business value like increasing revenue, reducing cost or reducing risk.

“How to Apply Machine Learning to Event Processing” describes in more detail how to apply analytic models to real time processing. Or watch the corresponding video recording leveraging TIBCO StreamBase to apply some H2O models in real time. Finally, I can recommend to learn about various streaming analytics frameworks to apply analytic models.

Let’s come back to the Autoencoder use case to realize predictive maintenance in telcos. In TIBCO StreamBase, you can easily apply the built H2O Autoencoder model without any redevelopment via StreamBase’ H2O connector. You just attach the Java code generated by H2O framework, which contains the analytic model and compiles to very performant JVM bytecode:

The most important lesson learned: Think about the execution requirements before building the analytic model. What performance do you need regarding latency? How many events do you need to process per minute, second or millisecond? Do you need to distribute the analytic model to a clusters with many nodes? How often do you have to improve and redeploy the analytic model? You need to answer these questions at the beginning of your project to avoid double efforts and redevelopment of analytic models!

Another important fact is that analytic models do not always need “real time processing” in terms of very fast and / or frequent model execution. In the above telco example, these spikes and failures might happen in subsequent days or even weeks. Thus, in many use cases, it is fine to apply an analytic model once a day or week instead of just every second to every new event, therefore.

Deep Learning + Visual Analytics + Streaming Analytics = Next Generation Big Data Success Stories

Deep Learning allows to solve many well understood problems like cross selling, fraud detection or predictive maintenance in a more efficient way. In addition, you can solve additional scenarios, which were not possible to solve before, like accurate and efficient object detection or speech-to-text translation.

Visual Analytics is a key component in Deep Learning projects to be successful. It eases the development of deep neural networks by (citizen) data scientists and allows business analysts to leverage these analytic models to find new insights and patterns.

Today, (citizen) data scientists use programming languages like R or Python, deep learning frameworks like Theano, TensorFlow, MXNet or H2O’s Deep Water and a visual analytics tool like TIBCO Spotfire to build deep neural networks. The analytic model is embedded into a view for the business analyst to leverage it without knowing the technology details.

In the future, visual analytics tools might embed neural network features like they already embed other machine learning features like clustering or logistic regression today. This will allow business analysts to leverage Deep Learning without the help of a data scientist and be appropriate for simpler use cases.

However, do not forget that building an analytic model to find insights is just the first part of a project. Deploying it to real time afterwards is as important as second step. Good integration between tooling for finding insights and applying insights to new events can improve time-to-market and model quality in data science projects significantly. The development lifecycle is a continuous closed loop. The analytic model needs to be validated and rebuild in certain sequences.

Read more…

7 Visualizations You Should Learn in R

This blog was originally posted here

 With ever increasing volume of data, it is impossible to tell stories without visualizations. Data visualization is an art of how to turn numbers into useful knowledge.

R Programming lets you learn this art by offering a set of inbuilt functions and libraries to build visualizations and present data. Before the technical implementations of the visualization, let’s see first how to select the right chart type.

Selecting the Right Chart Type

There are four basic presentation types:

  1. Comparison
  2. Composition
  3. Distribution
  4. Relationship

To determine which amongst these is best suited for your data, I suggest you should answer a few questions like,

  • How many variables do you want to show in a single chart?
  • How many data points will you display for each variable?
  • Will you display values over a period of time, or among items or groups?

Below is a great explanation on selecting a right chart type by Dr. Andrew Abela.

In your day-to-day activities, you’ll come across the below listed 7 charts most of the time.

  1. Scatter Plot
  2. Histogram
  3. Bar & Stack Bar Chart
  4. Box Plot
  5. Area Chart
  6. Heat Map
  7. Correlogram

To learn about the 7 charts listed above, click here. For more articles about R, click here.

Read more…

Dashboards for Everyone!

No matter the job, most professionals do some level of analysis on their computer.  There are always some data sets that live outside the walls.  Or, some analyses that we know could be performed better in a not-easily-sharable tool such as excel, R, python, SPSS, SAS and so on.

So how do you share your personal analysis with others?  Often times people export the graphs and tables to add into a presentation file.  One of the largest downfalls to this approach is that it can cause versioning and updating nightmares.  

What if I told you that we could avoid all of this with dashboards?  Some of you may say, "Yes, obviously, Laura.  But I don't have a licensed BI tool or BI experts at my disposal!  It's not a realistic scenario for me."  Now in the past, I might've agreed with you.  If you don't have a paid BI tool, it can be tricky.  Free BI tool versions usually require the owner to host the software, or they limit the number of charts, viewers or users using the tool.

However, earlier this year, Google removed a number of restrictions to their free hosted dashboarding software called Google Data Studio.  Because of this, I decided to give the software a test drive and see how accessible it is to the non-BI expert.

Below I will take you through a tutorial that I wrote which should allow anyone to create a Google Data Studio dashboard about US Home Prices.  It should take about 1/2 an hour of your time.  It really is that easy.  So please, have a try and let me know how it goes!

The Tutorial Description

For this tutorial I wanted to use some sample data to make a basic one page dashboard.  It will feature some common dashboard elements such as: text, images, summary metrics, summary tables and maps.  To do so I searched out free data sets and found out that Zillow offers summary data collected through their real estate business. 

Side note: Thank you Zillow, I love when companies share their data! 

I downloaded a number of the data sets that I thought would be interesting to display and did a little data processing to make dashboard creation easier.  From there I set out to make a dashboard without reading any instructions to see how usable it really is.  I have to say, it was easy!  There are some odd beta style behaviors that I outline below, but all in all it is a great solution. 

The Tutorial Steps

1.  Download the sample data set needed to create the sample.  

Note: if you have trouble downloading the file from github, go to the main page and select "Clone or Download" and then "Download Zip" as per the picture below.

2.  Sign up for Google Data Studio

3.  Click "Start a New Report"

4. In the new report, add the file "Zillow_Summary_Data_2017-06.csv" downloaded as part of the zip file from the data set in step 1.

5.  Modify the columns of the data set to ensure that "State" is of type "Geo">"Region" with no aggregation and the remaining columns are type "Numeric" >"Number" with "Average" as the aggregation.

6.  Click "Add to Report".  This will make the data source accessible to your new report.

Now we are ready to start building the report piece by piece.  To make it easier, I have broken up the dashboard content into 5 pieces that can be added.  We will tackle these one by one.

To add each of the components above, you will need to use the Google Data Studio Toolbar on the top navigation.  The image below highlights each of the toolbar items that we will be using.

7.  "A. Text"- Easy street. Let's add some text to the dashboard. Start by clicking the "Text" button highlighted in the toolbar above.  Next, take the cross-hair and drag it over the space you want the text to occupy. Enter your text: "US Home Prices”.  In the “Text Properties” select the size and type.  I’m using size 72 and type “Roboto Condensed".

8. "B. Image"- Easy street part 2.  Now we are simply a pretty picture to the dashboard.  Start by clicking the "Image" button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the image to occupy.  Select the image "houseimage.jpg" that you downloaded from the GH repo.

9. "C. Scorecard Values"- Now we get into the real dashboarding exercises through metrics and calculations.  Start by clicking the "Scorecard" button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the first scorecard value to occupy. In the “data” tab, Select the data set and appropriate metric.  Start with the values in the image above  In the “style” tab select size 36 with the type "Roboto".

Repeat this for every metric in the "C. Scorecard Values" section.

10. "D. Map" - In this step we get more impressive, but not more difficult.  We implement a map! Start by clicking the "Geo Map" button highlighted in the toolbar above. Take the cross-hair and drag it over the space you want the map to occupy.  Select the data set and appropriate metric as per the values in the image above.

11. "E. List"- Now we are going to list out all values in the Geo Map above ordered by their metric "Average Home Value".  Start by clicking the "Table" button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the list to occupy. Select the data set and appropriate metric as per the values in the image above.

12.  Make the Report External and Share.  Click the person + icon in the top right of your screen.  Select "Anyone with the link can view".  Copy the external URL and click done.  Now take that external URL and send to all your friends and family with the subject "Prepare to be amazed".

And there you have it, your dashboard is created and you can share away!

Some Criticisms

As I'm sure was obvious from above, I'm impressed with their offering.  But I do feel it is my duty to outline some oddities I came across.  For example: when you set up your data source, you need to specify ahead of time for each column what type of summary you plan on doing with that value.  If you want to use a chart to display averages, you cannot select this within the chart dynamically, it has to be at the data source.  I find this odd and limiting.  Additionally, the csv import has a 200 column limit and there are some formatting annoyances.  

More Details

Google has recently released the ability to embed dashboards!  See the step by step here.

Final Note

I'm happy that I tried out Google Dash Studio.  While it does not meet my current needs at the enterprise level, I am very impressed at it's applicability and accessibility to the personal user.  I truly believe that anyone could make a dashboard with this tool.  So give it a try, impress your colleagues and mobilize your analysis with Google Dash Studio!

Original document on

Written by Laura Ellis

Read more…

Pictographs are exceptionally good for some types of data. In this post, I show how useful they are for displaying proportions (e.g. rates, percentages, fractions).

Look at the pictograph example on the right. It shows the case fatality rate using colored stick figure icons. These quantities could be just as appropriately shown using pie or bar charts (see above). However, the pictorial representation makes this statistic intuitive: out of every 100 individuals infected with SARS, you can expect 11 to die.

Pictographs have an intrinsic scale

The icons give the pictograph an intrinsic scale. Compare the pictograph (right) to the barchart (below). Both charts show that SARS is 3 times more deadly than pertussis, but the advantage of using a pictograph can be seen when we compare the other diseases. The pictograph clearly shows that the fatality rate for SARS is an order of magnitude bigger than that for smallpox. By contrast, on the bar chart, all we can see in the absence of any labels is that SARs is much bigger than smallpox.

The finer resolution provided by the icons is especially useful for the smaller values. In the bar chart, the much larger fatality rate of SARS makes the variation between the other diseases hard to see. But in the pictograph, it is clear that the smallpox fatality rate is at least double that of malaria.

Pictographs show quantities visually

A well designed pictograph makes quantities easy to read. In the example on the right, the small scale and the large number of icons can potentially cause problems. I avoid this by arranging the icons into 10 by 10 squares. Even without explicitly counting each icon, quantities can be evaluated by comparing the area of the square which is red.

The example on the right shows data labels in order to provide a greater level of detail. However, the main message of the chart – the enormous difference between the severity of different diseases – is effectively conveyed by the icons alone.

You can create your own pictograph or read more content here.


Data from

Author: Carmen Chan

Carmen is a member of the Data Science team at Displayr. She enjoys looking for better ways to manipulate and visualize data. Carmen studied statistics and bioinformatics at the University of New South Wales.

Read more…

Kim versus Donald in one Picture

How do you convey a powerful message in just one picture? 

DSC Resources

Popular Articles

Read more…

Originally posted on Data Science Central

This infographic came from Medigo. It displays data from the World Health Organization’s “Projections of mortality and causes of death, 2015 and 2030”. The report details all deaths in 2015 by cause and makes predictions for 2030, giving an impression of how global health will develop over the next 14 years. Also featured is data from showing how life expectancy will change between now and 2030.

All percentages shown have been calculated relative to projected changes in population growth.

Read original article here

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…
From time to time I keep pondering on what could be the future and I am sure lot of us get this science fiction imagery where the future data analyst will be given just a pair of holographic gloves and perform three dimensional analysis. Let us stop day dreaming and get to the basics. Dashboards have come a long way. These days lot of vendors are catering towards consuming big data etc. but I remember the days when the common target of BI vendors was "Excel". Looks like the focus has totally shifted from "Excel as the enemy" to "Big Data as the elephant".
Read more…

Originally posted on Data Science Central

This infographic on Shopper Marketing was created by Steve Hashman and his team. Steve is Director at Exponential Solutions (The CUBE) Marketing. 

Shopper marketing focuses on the customer in and at the point of purchase. It is an integrated and strategic approach to a customer’s in-store experience which is seen as a driver of both sales and brand equity.

For more information, click here

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

Why Your Brain Needs Data Visualization

Why Your Brain Needs Data Visualization

This is a well-known fact nowadays: a goldfish has higher attention span than an average Internet user. That’s the reason why you’re not interested to read huge paragraphs of text. Research by Nielsen Norman Group showed that Internet users have time to read at most 28% of the words on a web page. Most of them read only 20%. Visual content, on the other hand, has power to hold your attention longer.

If you were just relying on the Internet as a casual user, not reading all the text wouldn’t be a problem. However, when you have a responsibility to process information, things get more complicated. A student, for example, has to read several academic and scientific studies and process a huge volume of data to write a single research paper. 65% of people are visual learners, so they find text difficult to process. The pressuring deadline will eventually lead the student to hiring the best coursework writing service. If they present the data visually, however, they will need less time to process it and gettheir own ideas for the paper.   

Let’s explore some reasons why your brain needs that kind of visualization.

1.     Visual Data Triggers Preattentive Processing

Our low-level visual system needs less than 200-250 milliseconds to accurately detect visual properties. That capacity of the brain is called pre-attentive processing. It is triggered by colors, patterns, and forms. When you use different colors to create data visualization, you emphasize the important details, so those are the elements your eye will first catch. You will use your long-term memory to interpret that data and connect it with information you already know. 

2.     You Need a Visual Tier to Process Large Volumes of Data

When you’re dealing with production or sales, you face a huge volume of data you need to process, compare, and evaluate. If you represented it through a traditional Excel spreadsheet, you would have to invest countless hours looking through the tiny rows of data. Through data visualization, you can interpret the information in a way that makes it ready for your brain to process.

3.     Visual Data Brings Together All Aspects of Memory

The memory functions of our brain are quite complex. We have three aspects of memory : sensory, short term (also known as working memory) and long term. When we first hear, see, touch, taste, or smell something, our senses trigger the sensory memory. While processing information, we preserve it in the working memory for a short period of time. The long-term memory function enables us to preserve information for a very long time.

Visual data presentation connects these three memory functions. When we see the information presented in a visually-attractive way, it triggers our sensory memory and makes it easy for us to process it (working memory). When we process that data, we essentially create a new “long-term memory folder” in our brain.

Data visualization is everywhere. Internet marketing experts understood it, and the most powerful organizations on a global level understood it, too. It’s about time we started implementing it in our own practice.

Read more…

Originally posted on Data Science Central

Do you want to learn the history of data visualization? Or do you want to learn how to create more engaging visualizations and see some examples? It’s easy to feel overwhelmed with the amount of information available today, which is why sometimes the answer can be as simple as picking up a good book.

These seven amazing data visualization books are a great place for you to get started:

1) Show Me the Numbers: Designing Tables and Graphs to Enlighten, Second Edition

Stephen Few

2) The Accidental Analyst: Show Your Data Who’s Boss

Eileen and Stephen McDaniel

3) Information Graphics

Sandra Rendgen, Julius Wiedemann

4) Visualize This: The FlowingData Guide to Design, Visualization, and Statistics

Nathan Yau

5) Storytelling with Data

Cole Nussbaumer Knaflic

6) Cool Infographics

Randy Krum

7) Designing Data Visualizations: Representing Informational Relationships

Noah Iliinsky, Julie Steele

To check out the 7 data visualization books, click here. For other articles about data visualization, click here.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

10 Dataviz Tools To Enhance Data Science

Originally posted on Data Science Central

This article on data visualization tools was written by Jessica Davis. She's passionate about the practical use of business intelligence, predictive analytics, and big data for smarter business and a better world.

Data visualizations can help business users understand analytics insights and actually see the reasons why certain recommendations make the most sense. Traditional business intelligence and analytics vendors, as well as newer market entrants, are offering data visualization technologies and platforms.

Here's a collection of 10 data visualization tools worthy of your consideration:

Tableau Software

Tableau Software is perhaps the best known platform for data visualization across a wide array of users. Some Coursera courses dedicated to data visualization use Tableau as the underlying platform. The Seattle-based company describes its mission this way: "We help people see and understand their data."

This company, founded in 2003, offers a family of interactive data visualization products focused on business intelligence. The software is offered in desktop, server, and cloud versions. There's also a free public version used by bloggers, journalists, quantified-self hobbyists, sports fans, political junkies, and others.

Tableau was one of three companies featured in the Leaders square of the 2016 Gartner Magic Quadrant for Business Intelligence and Analytics Platforms.


Qlik was founded in Lund, Sweden in 1993. It's another of the Leaders in Gartner's 2016 Magic Quadrant for Business Intelligence and Analytics Platforms. Now based in Radnor, Penn., Qlik offers a family of products that provide data visualization to users. Its new flagship Qlik Sense offers self-service visualization and discovery. The product is designed for drag-and-drop creation of interactive data visualizations. It's available in versions for desktop, server, and cloud.

Oracle Visual Analyzer

Gartner dropped Oracle from its 2016 Magic Quadrant Business Intelligence and Analytics Platform report. One of the company's newer products, Oracle Visual Analyzer, could help the database giant make it back into the report in years to come.

Oracle Visual Analyzer, introduced in 2015, is a web-based tool provided within the Oracle Business Intelligence Cloud Service. It's available to existing customers of Oracle's Business Intelligence Cloud. The company's promotional materials promise advanced analysis and interactive visualizations. Configurable dashboards are also available.

SAS Visual Analytics

SAS is one of the traditional vendors in the advanced analytics space, with a long history of offering analytical insights to businesses. SAS Visual Analytics is among its many offerings.

The company offers a series of sample reports showing how visual analytics can be applied to questions and problems in a range of industries. Examples include healthcare claims, casino performance, digital advertising, environmental reporting, and the economics of Ebola outbreaks.

Microsoft Power BI

Microsoft Power BI, the software giant's entry in the data visualization space, is the third and final company in the Leaders square of the Gartner 2016 Magic Quadrant for Business Intelligence and Analytics Platforms.

Power BI is not a monolithic piece of software. Rather, it's a suite of business analytics tools Microsoft designed to enable business users to analyze data and share insights. Components include Power BI dashboards, which offer customizable views for business users for all their important metrics in real-time. These dashboards can be accessed from any device.

Power BI Desktop is a data-mashup and report-authoring tool that can combine data from several sources and then enable visualization of that data. Power BI gateways let organizations connect SQL Server databases and other data sources to dashboards.

TIBCO Spotfire

TIBCO acquired data discovery specialist Spotfire in 2007. The company offers the technology as part of its lineup of data visualization and analytics tools. TIBCO updated Spotfire in March 2016 to improve core visualizations. The updates expand built-in data access and data preparation functions, and improve data collaboration and mashup capabilities. The company also redesigned its Spotfire server topology with simplified web-based admin tools.

ClearStory Data

Founded in 2011, ClearStory Data is one of the newer players in the space. Its technology lets users discover and analyze data from corporate, web, and premium data sources. It includes relational databases, Hadoop, web, and social application interfaces, as well as ones from third-party data providers. The company offers a set of solutions for vertical industries. Its customers include Del Monte, Merck, and Coca-Cola.


The web-enabled platform from Sisense offers interactive dashboards that let users join and analyze big and multiple datasets and share insights. Gartner named the company a Niche Player in its Magic Quadrant report for Business Intelligence and Analytics Platforms. The research firm said the company was one of the top two in terms of accessing large volumes of data from Hadoop and NoSQL data sources. Customers include eBay, Lockheed Martin, Motorola, Experian, and Fujitsu.

Dundas BI

Mentioned as a vendor to watch by Gartner, but not included in the company's Magic Quadrant for Business Intelligence and Analytics Platforms, Dundas BI enables organizations to create business intelligence dashboards for the visualization of key business metrics. The platform also enables data discovery and exploration with drag-and-drop menus. According to the company's website, a variety of data sources can be connected, including relational, OLAP, flat files, big data, and web services. Customers include AAA, Bank of America, and Kaiser Permanente.


Inet Software is another vendor that didn't qualify for the Gartner report, but was mentioned by the research firm as a company to watch.

InetSoft offers a colorful gallery of BI Visualizations. A free version of its software provides licenses for two users. It lets organizations take the software for a test drive. Serious users will want to upgrade to the paid version. Customers include Flight Data Services, eScholar, ArcSight, and

You can find the original article, here. For other articles about data visualization, click here.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

Whether you're working on a school presentation or preparing a monthly sales report for your boss, presenting your data in a detailed and easy-to-follow form is essential. It's hard to keep the focus of your audience if you can't help them fully understand the data you're trying to explain. The best way to understand complex data is to show your results in a graphic form. This is the main reason why data visualization has become a key part of all presentations and data analysis. But let's see what are the top 5 benefits of using data visualization in your work.

Easier data discovery

Visualization of your data helps you and your audience to find specific information. Pointing out an information strictly as one-dimensional graphics can be difficult if you have a lot of data to work with. Data visualization can make this effort a whole lot easier.

Simple way to trace data correlations

Sometimes it's hard to notice the correlation between two sets of data. If you present your data in graphic form, you can notice how one set of data influences another. This is a major benefit as it reduces a great amount of work effort you need to invest.

Live interaction with data

Data visualization offers you the benefit of live interaction with any piece of data you need. This enables you to spot the change in data as it happens. And you don't get just simple information regarding the change, you also get a predictive analysis.

Promote a new business language

One of the major benefits of data visualization over simple graphic solutions is the ability to "tell a story" through data. Per example, with a simple graphic chart, you get an information and that's it. Data visualization enables you to not only see the information but also to know the reasons behind it.

Identify trends

Ability to identify trends is one of the most interesting benefits that data visualization tools have to offer. You can watch the progress of certain data and see the reasons for those changes. With predictive analysis, you can also predict the behavior of those trends in the future.


Data visualization tools have become a necessity in modern data analysis. This need grew start of many businesses that offer data visualization services. 

All in all, data visualization tools have shifted the analytics to a whole new level and allowed a better insight into business data. Let us know about your experience with data visualization tools and how you use them, we'd love to read how it improved your work. 

Read more…

3D Data Visualisation Survey

Back in 2012 we released Datascape, a general purpose 3D immersive data visualisation applications. We are now getting ready to release our 2nd generation application – Datascape2XL, which allows you to plot and interact with over 15 million data points in a 3D space, and view them with either a conventional PC screen or an Oculus Rift virtual reality headset (if you must...).

In order to inform our work we have created a survey to examine the current "state of the market" in terms of what applications people are using for data visualisation, how well they are meeting needs, and what users want of a 3D visual analytics application. The survey builds on an earlier survey we did in 2012 and the results of which are still available on our web site.

We will again be producing a survey report for public consumption, which you can sign up to receive at the end and we'll also post up here.

The aim of this survey is to understand current use of, and views on, data visualisation and visual analytics tools. We recognise that this definition can include a wide variety of different application types from simple Excel charts and communications orientated infographics to specialist financial, social media and even intelligence focussed applications. We hope that we have pitched this initial survey at the right level to get feedback from users from across the spectrum.

I hope that you can find 5 minutes to complete the survey - which you can find here:




Read more…

Featured Blog Posts - DSC

Webinar Series

Follow Us

@DataScienceCtrl | RSS Feeds