Subscribe to our Newsletter

Featured Posts (196)

Dashboards for Everyone!

No matter the job, most professionals do some level of analysis on their computer.  There are always some data sets that live outside the walls.  Or, some analyses that we know could be performed better in a not-easily-sharable tool such as excel, R, python, SPSS, SAS and so on.

So how do you share your personal analysis with others?  Often times people export the graphs and tables to add into a presentation file.  One of the largest downfalls to this approach is that it can cause versioning and updating nightmares.  

What if I told you that we could avoid all of this with dashboards?  Some of you may say, "Yes, obviously, Laura.  But I don't have a licensed BI tool or BI experts at my disposal!  It's not a realistic scenario for me."  Now in the past, I might've agreed with you.  If you don't have a paid BI tool, it can be tricky.  Free BI tool versions usually require the owner to host the software, or they limit the number of charts, viewers or users using the tool.

However, earlier this year, Google removed a number of restrictions to their free hosted dashboarding software called Google Data Studio.  Because of this, I decided to give the software a test drive and see how accessible it is to the non-BI expert.

Below I will take you through a tutorial that I wrote which should allow anyone to create a Google Data Studio dashboard about US Home Prices.  It should take about 1/2 an hour of your time.  It really is that easy.  So please, have a try and let me know how it goes!

The Tutorial Description

For this tutorial I wanted to use some sample data to make a basic one page dashboard.  It will feature some common dashboard elements such as: text, images, summary metrics, summary tables and maps.  To do so I searched out free data sets and found out that Zillow offers summary data collected through their real estate business. 

Side note: Thank you Zillow, I love when companies share their data! 

I downloaded a number of the data sets that I thought would be interesting to display and did a little data processing to make dashboard creation easier.  From there I set out to make a dashboard without reading any instructions to see how usable it really is.  I have to say, it was easy!  There are some odd beta style behaviors that I outline below, but all in all it is a great solution. 

The Tutorial Steps

1.  Download the sample data set needed to create the sample.  

Note: if you have trouble downloading the file from github, go to the main page and select "Clone or Download" and then "Download Zip" as per the picture below.

2.  Sign up for Google Data Studio

3.  Click "Start a New Report"

4. In the new report, add the file "Zillow_Summary_Data_2017-06.csv" downloaded as part of the zip file from the data set in step 1.

5.  Modify the columns of the data set to ensure that "State" is of type "Geo">"Region" with no aggregation and the remaining columns are type "Numeric" >"Number" with "Average" as the aggregation.

6.  Click "Add to Report".  This will make the data source accessible to your new report.

Now we are ready to start building the report piece by piece.  To make it easier, I have broken up the dashboard content into 5 pieces that can be added.  We will tackle these one by one.

To add each of the components above, you will need to use the Google Data Studio Toolbar on the top navigation.  The image below highlights each of the toolbar items that we will be using.

7.  "A. Text"- Easy street. Let's add some text to the dashboard. Start by clicking the "Text" button highlighted in the toolbar above.  Next, take the cross-hair and drag it over the space you want the text to occupy. Enter your text: "US Home Prices”.  In the “Text Properties” select the size and type.  I’m using size 72 and type “Roboto Condensed".

8. "B. Image"- Easy street part 2.  Now we are simply a pretty picture to the dashboard.  Start by clicking the "Image" button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the image to occupy.  Select the image "houseimage.jpg" that you downloaded from the GH repo.

9. "C. Scorecard Values"- Now we get into the real dashboarding exercises through metrics and calculations.  Start by clicking the "Scorecard" button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the first scorecard value to occupy. In the “data” tab, Select the data set and appropriate metric.  Start with the values in the image above  In the “style” tab select size 36 with the type "Roboto".

Repeat this for every metric in the "C. Scorecard Values" section.

10. "D. Map" - In this step we get more impressive, but not more difficult.  We implement a map! Start by clicking the "Geo Map" button highlighted in the toolbar above. Take the cross-hair and drag it over the space you want the map to occupy.  Select the data set and appropriate metric as per the values in the image above.

11. "E. List"- Now we are going to list out all values in the Geo Map above ordered by their metric "Average Home Value".  Start by clicking the "Table" button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the list to occupy. Select the data set and appropriate metric as per the values in the image above.

12.  Make the Report External and Share.  Click the person + icon in the top right of your screen.  Select "Anyone with the link can view".  Copy the external URL and click done.  Now take that external URL and send to all your friends and family with the subject "Prepare to be amazed".

And there you have it, your dashboard is created and you can share away!

Some Criticisms

As I'm sure was obvious from above, I'm impressed with their offering.  But I do feel it is my duty to outline some oddities I came across.  For example: when you set up your data source, you need to specify ahead of time for each column what type of summary you plan on doing with that value.  If you want to use a chart to display averages, you cannot select this within the chart dynamically, it has to be at the data source.  I find this odd and limiting.  Additionally, the csv import has a 200 column limit and there are some formatting annoyances.  

More Details

Google has recently released the ability to embed dashboards!  See the step by step here.

Final Note

I'm happy that I tried out Google Dash Studio.  While it does not meet my current needs at the enterprise level, I am very impressed at it's applicability and accessibility to the personal user.  I truly believe that anyone could make a dashboard with this tool.  So give it a try, impress your colleagues and mobilize your analysis with Google Dash Studio!

Original document on littlemissdata.com

Written by Laura Ellis

Read more…

Pictographs are exceptionally good for some types of data. In this post, I show how useful they are for displaying proportions (e.g. rates, percentages, fractions).

Look at the pictograph example on the right. It shows the case fatality rate using colored stick figure icons. These quantities could be just as appropriately shown using pie or bar charts (see above). However, the pictorial representation makes this statistic intuitive: out of every 100 individuals infected with SARS, you can expect 11 to die.

Pictographs have an intrinsic scale

The icons give the pictograph an intrinsic scale. Compare the pictograph (right) to the barchart (below). Both charts show that SARS is 3 times more deadly than pertussis, but the advantage of using a pictograph can be seen when we compare the other diseases. The pictograph clearly shows that the fatality rate for SARS is an order of magnitude bigger than that for smallpox. By contrast, on the bar chart, all we can see in the absence of any labels is that SARs is much bigger than smallpox.

The finer resolution provided by the icons is especially useful for the smaller values. In the bar chart, the much larger fatality rate of SARS makes the variation between the other diseases hard to see. But in the pictograph, it is clear that the smallpox fatality rate is at least double that of malaria.

Pictographs show quantities visually

A well designed pictograph makes quantities easy to read. In the example on the right, the small scale and the large number of icons can potentially cause problems. I avoid this by arranging the icons into 10 by 10 squares. Even without explicitly counting each icon, quantities can be evaluated by comparing the area of the square which is red.

The example on the right shows data labels in order to provide a greater level of detail. However, the main message of the chart – the enormous difference between the severity of different diseases – is effectively conveyed by the icons alone.

You can create your own pictograph or read more content here.

Acknowledgments

Data from https://en.wikipedia.org/wiki/List_of_human_disease_case_fatality_rates

Author: Carmen Chan

Carmen is a member of the Data Science team at Displayr. She enjoys looking for better ways to manipulate and visualize data. Carmen studied statistics and bioinformatics at the University of New South Wales.

Read more…

Dataviz with Python

This article was written by Reiichiro Nakano.

There are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.

Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset:

# Import what's needed for the Functions API
import matplotlib.pyplot as plt
import scikitplot.plotters as skplt

# This is a Keras classifier. We'll generate probabilities on the test set.
keras_clf.fit(X_train, y_train, batch_size=64, nb_epoch=10, verbose=2)
probas = keras_clf.predict_proba(X_test, batch_size=64)

# Now plot.
skplt.plot_precision_recall_curve(y_test, probas)
plt.show()

Installation is of the sciplot library is simple! First, make sure you have the dependencies Scikit-Learn and Matplotlib installed.

Then just run:

pip install scikit-plot

Or if you want, clone this repo and run

python setup.py install

at the root folder.

Originally posted here.

 

 

Read more…

Power BI: Tutorial

Guest blog by Robert Breen.

What is PowerBI?

Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Whether your data is a simple Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data warehouses, Power BI lets you easily connect to your data sources, visualize (or discover) what’s important, and share that with anyone or everyone you want.

Power BI can be simple and fast – capable of creating quick insights from an Excel spreadsheet or a local database. But Power BI is also robust and enterprise-grade, ready for extensive modeling and real-time analytics, as well as custom development. So, it can be your personal report and visualization tool, and can also serve as the analytics and decision engine behind group projects, divisions, or entire corporations.

 

What is Power BI Desktop?

Power BI Desktop is a free application you can install on your local computer that lets you connect to, transform, and visualize your data. With Power BI Desktop, you can connect to multiple different sources of data, and combine them (often called modeling) into a data model that lets you build visuals, and collections of visuals you can share as reports, with other people inside your organization. Most users who work on Business Intelligence projects use Power BI Desktop to create reports, and then use the Power BI service to share their reports with others.

The most common uses for Power BI Desktop are the following:

  • Connect to data
  • Transform and clean that data, to create a data model
  • Create visuals, such as charts or graphs, that provide visual representations of the data
  • Create reports that are collections of visuals, on one or more report pages
  • Share reports with others using the Power BI service

People most often responsible for such tasks are often considered data analysts (sometimes just referred to as analysts) or Business Intelligence professionals (often referred to as report creators). However, many people who don't consider themselves an analyst or a report creator use Power BI Desktop to create compelling reports, or to pull data from various sources and build data models, which they can share with their coworkers and organizations.

With Power BI Desktop you can create complex and visually rich reports, using data from multiple sources, all in one report that you can share with others in your organization.

The steps you need to follow to install the desktop application are:

  • Once you downloaded the file, open and follow the instructions:

 

 

Connect to data

To get started with Power BI Desktop, the first step is to connect to data. There are many different data sources you can connect to from Power BI Desktop. To connect to your data, follow the next steps:

  • Select the Home ribbon and then select “Get Data”:

  • Then select your data source:

  • When you select a data type, you're prompted for information, such as the URL and credentials, necessary for Power BI Desktop to connect to the data source on your behalf.

Once you connect to one or more data sources, you may want to transform the data so it's useful for you.

 

Transform and clean data, create a model

In Power BI Desktop, you can clean and transform data using the built-in Query Editor. With Query Editor you can make changes to your data, such as changing a data type, removing columns, or combining data from multiple sources. It's a little bit like sculpting - you can start with a large block of clay (or data), then shave pieces off or add others as needed, until the shape of the data is how you want it.

If for example, you want to change the format of one column you need to follow these steps:

  • Select the column header:

  • Right-click to show the menu and select the “Change Type” option and then choose the right option for you:

  • You’ll see the results:

Each step you take in transforming data (such as rename a table, transform a data type, or delete columns) is recorded by Query Editor, and each time this query connects to the data source those steps are carried out so that the data is always shaped the way you specified.

The following image shows the Query Settings pane for a query that has been shaped and turned into a model.

Once your data is how you want it, you can create visuals.

 

Create visuals

Once you have a data model, you can drag fields onto the report canvas to create visuals. A visual is a graphic representation of the data in your model. The following visual shows a simple column chart.

There are many different types of visuals to choose from in Power BI Desktop. To create or change a visual, just select the visual icon from the Visualizations pane. If you have a visual selected on the report canvas, the selected visual changes to the type you selected. If no visual is selected, a new visual is created based on your selection.

To create a new visual, follow the next steps:

  • Choose the appropriate chart from the “Visualizations” pane:

 

  • Drag your data into the “Axis” and “Value” fields:

 

  • And you’ll see a chart in your spreadsheet:

 

You can customize many of your chart fields and labels in the “Format Pane”:

Learn more at http://www.rhobear.com

DSC Resources

Read more…

Here we ask you to identify which tool was used to produce the following 18 charts: 4 were done with R, 3 with SPSS, 5 with Excel, 2 with Tableau, 1 with Matlab, 1 with Python, 1 with SAS, and 1 with JavaScript. The solution, including for each chart a link to the webpage where it is explained in detail (many times with source code included) can be found here. You need to be a DSC member to access the page with the solution: you can sign-up here.

How do you score? Would this be a good job interview question?

Chart 1

Chart 2

Chart 3

Chart 4

Chart 5

Chart 6

Chart 7

Chart 8

Chart 9

Chart 10

Chart 11

Chart 12

Chart 13

Chart 14

Chart 15

Chart 16

Chart 17

Chart 18

 

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

This was originally posted here

Deep Learning gets more and more traction. It basically focuses on one section of Machine Learning: Artificial Neural Networks. This article explains why Deep Learning is a game changer in analytics, when to use it, and how Visual Analytics allows business analysts to leverage the analytic models built by a (citizen) data scientist.

What is Deep Learning and Artificial Neural Networks?

Deep Learning is the modern buzzword for artificial neural networks, one of many concepts and algorithms in machine learning to build analytics models. A neural network works similar to what we know from a human brain: You get non-linear interactions as input and transfer them to output. Neural networks leverage continuous learning and increasing knowledge in computational nodes between input and output. A neural network is a supervised algorithm in most cases, which uses historical data sets to learn correlations to predict outputs of future events, e.g. for cross selling or fraud detection. Unsupervised neural networks can be used to find new patterns and anomalies. In some cases, it makes sense to combine supervised and unsupervised algorithms.

Neural Networks are used in research for many decades and includes various sophisticated concepts like Recurrent Neural Network (RNN), Convolutional Neural Network (CNN) or Autoencoder. However, today’s powerful and elastic computing infrastructure in combination with other technologies like graphical processing units (GPU) with thousands of cores allows to do much more powerful computations with a much deeper number of layers. Hence the term “Deep Learning”.

The following picture from TensorFlow Playground shows an easy-to-use environment which includes various test data sets, configuration options and visualizations to learn and understand deep learning and neural networks:

If you want to learn more about the details of Deep Learning and Neural Networks, I recommend the following sources:

  • “The Anatomy of Deep Learning Frameworks”– an article about the basic concepts and components of neural networks
  • TensorFlow Playground to play around with neural networks by yourself hands-on without any coding, also available on Github to build your own customized offline playground
  • Deep Learning Simplified” video series on Youtube with several short, simple explanations of basic concepts, alternative algorithms and some frameworks like H2O.ai or Tensorflow

While Deep Learning is getting more and more traction, it is not the silver bullet for every scenario.

When (not) to use Deep Learning?

Deep Learning enables many new possibilities which were not possible in “mass production” a few years ago, e.g. image classification, object recognition, speech translation or natural language processing (NLP) in much more sophisticated ways than without Deep Learning. A key benefit is the automated feature engineering, which costs a lot of time and efforts with most other machine learning alternatives. 

You can also leverage Deep Learning to make better decisions, increase revenue or reduce risk for existing (“already solved”) problems instead of using other machine learning algorithms. Examples include risk calculation, fraud detection, cross selling and predictive maintenance.

However, note that Deep Learning has a few important drawbacks:

  • Very expensive, i.e. slow and compute-intensive; training a deep learning model often takes days or weeks, execution also takes more time than most other algorithms.
  • Hard to interpret: lack of understandability of the result of the analytic model; often a key requirement for legal or compliance regularities
  • Tends to overfitting, and therefore needs regularization

Deep Learning is ideal for complex problems. It can also outperform other algorithms in moderate problems. Deep Learning should not be used for simple problems. Other algorithms like logistic regression or decision trees can solve these problems easier and faster.

Open Source Deep Learning Frameworks

Neural networks are mostly adopted using one of various open source implementations. Various mature deep learning frameworks are available for different programming languages.

The following picture shows an overview of open source deep learning frameworks and evaluates several characteristics:

These frameworks have in common that they are built for data scientists, i.e. personas with experience in programming, statistics, mathematics and machine learning. Note that writing the source code is not a big task. Typically, only a few lines of codes are needed to build an analytic model. This is completely different from other development tasks like building a web application, where you write hundreds or thousands of lines of code. In Deep Learning – and Data Science in general – it is most important to understand the concepts behind the code to build a good analytic model.

Some nice open source tools  like KNIME or RapidMinerallow visual coding to speed up development and also encourage citizen data scientists (i.e. people with less experience) to learn the concepts and build deep networks. These tools use own deep learning implementations or other open source libraries like H2O.ai or DeepLearning4j as embedded framework under the hood.

If you do not want to build your own model or leverage existing pre-trained models for common deep learning tasks, you might also take a look at the offerings from the big cloud providers, e.g. AWS Polly for Text-to-Speech translation, Google Vision API for Image Content Analysis, or Microsoft’s Bot Framework to build chat bots. The tech giants have years of experience with analysing text, speech, pictures and videos and offer their experience in sophisticated analytic models as a cloud service; pay-as-you-go. You can also improve these existing models with your own data, e.g. train and improve a generic picture recognition model with pictures of your specific industry or scenario.

Deep Learning in Conjunction with Visual Analytics

No matter if you want to use “just” a framework in your favourite programming language or a visual coding tool: You need to be able to make decisions based on the built neural network. This is where visual analytics comes into play. In short, visual analytics allows any persona to make data-driven decisions instead of listening to gut feeling when analysing complex data sets. See “Using Visual Analytics for Better Decisions – An Online Guide” to understand the key benefits in more detail.

A business analyst does not understand anything about deep learning, but just leverages the integrated analytic model to answer its business questions. The analytic model is applied under the hood when the business analyst changes some parameters, features or data sets. Though, visual analytics should also be used by the (citizen) data scientist to build the neural network. See “How to Avoid the Anti-Pattern in Analytics: Three Keys for Machine ...” to understand in more details how technical and non-technical people should work together using visual analytics to build neural networks, which help solving business problems. Even some parts of data preparation are best done within visual analytics tooling.

From a technical perspective, Deep Learning frameworks (and in a similar way any other Machine Learning frameworks, of course) can be integrated into visual analytics tooling in different ways. The following list includes a TIBCO Spotfire example for each alternative:

  • Embedded Analytics: Implemented directly within the analytics tool (self-implementation or “OEM”); can be used by the business analyst without any knowledge about machine learning (Spotfire: Clustering via some basic, simple configuration of a input and output data plus cluster size)
  • Native Integration: Connectors to directly access external deep learning clusters. (Spotfire: TERR to use R’s machine learning libraries, KNIME connector to directly integrate with external tooling)
  • Framework API: Access via a Wrapper API in different programming languages. For example, you could integrate MXNet via R or TensorFlow via Python into your visual analytics tooling. This option can always be used and is appropriate if no native integration or connector is available. (Spotfire: MXNet’s R interface via Spotfire’s TERR Integration for using any R library)
  • Integrated as Service via an Analytics Server: Connect external deep learning clusters indirectly via a server-side component of the analytics tool; different frameworks can be accessed by the analytics tool in a similar fashion (Spotfire: Statistics Server for external analytics tools like SAS or Matlab)
  • Cloud Service: Access pre-trained models for common deep learning specific tasks like image recognition, voice recognition or text processing. Not appropriate for very specific, individual business problems of an enterprise. (Spotfire: Call public deep learning services like image recognition, speech translation, or Chat Bot from AWS, Azure, IBM, Google via REST service through Spotfire’s TERR / R interface)

All options have in common that you need to add configuration of some hyper-parameters, i.e. “high level” parameters like problem type, feature selection or regularization level. Depending on the integration option, this can be very technical and low level, or simplified and less flexible using terms which the business analyst understands. 

Deep Learning Example: Autoencoder Template for TIBCO Spotfire

Let’s take one specific category of neural networks as example: Autoencoders to find anomalies. Autoencoder is an unsupervised neural network used to replicate the input dataset by restricting the number of hidden layers in a neural network. A reconstruction error is generated upon prediction. The higher the reconstruction error, the higher is the possibility of that data point being an anomaly.

Use Cases for Autoencoders include fighting financial crime, monitoring equipment sensors, healthcare claims fraud, or detecting manufacturing defects. A generic TIBCO Spotfire template is available in the TIBCO Community for free. You can simply add your data set and leverage the template to find anomalies using Autoencoders – without any complex configuration or even coding. Under the hood, the template uses H2O.ai’s deep learning implementation and its R API. It runs in a local instance on the machine where to run Spotfire. You can also take a look at the R code, but this is not needed to use the template at all and therefore optional.

Real World Example: Anomaly Detection for Predictive Maintenance

Let’s use the Autoencoder for a real-world example. In telco, you have to analyse the infrastructure continuously to find problems and issues within the network. Best before the failure happens so that you can fix it before the customer even notices the problem. Take a look at the following picture, which shows historical data of a telco network:

The orange dots are spikes which occur as first indication of a technical problem in the infrastructure. The red dots show a constant failure where mechanics have to replace parts of the network because it does not work anymore.

Autoencoders can be used to detect network issues before they actually happen. TIBCO Spotfire is uses H2O’s autoencoder in the background to find the anomalies. As discussed before, the source code is relative scarce. Here is the snipped of building the analytic model with H2O’s Deep Learning R API and detecting the anomalies (by finding out the reconstruction error of the Autoencoder):

This analytic model – built by the data scientist – is integrated into TIBCO Spotfire. The business analyst is able to visually analyse the historical data and the insights of the Autoencoder. This combination allows data scientists and business analysts to work together fluently. It was never easier to implement predictive maintenance and create huge business value by reducing risk and costs.

Apply Analytic Models to Real Time Processing with Streaming Analytics

This article focuses on building deep learning models with Data Science Frameworks and Visual Analytics. Key for success in projects is to apply the build analytic model to new events in real time to add business value like increasing revenue, reducing cost or reducing risk.

“How to Apply Machine Learning to Event Processing” describes in more detail how to apply analytic models to real time processing. Or watch the corresponding video recording leveraging TIBCO StreamBase to apply some H2O models in real time. Finally, I can recommend to learn about various streaming analytics frameworks to apply analytic models.

Let’s come back to the Autoencoder use case to realize predictive maintenance in telcos. In TIBCO StreamBase, you can easily apply the built H2O Autoencoder model without any redevelopment via StreamBase’ H2O connector. You just attach the Java code generated by H2O framework, which contains the analytic model and compiles to very performant JVM bytecode:

The most important lesson learned: Think about the execution requirements before building the analytic model. What performance do you need regarding latency? How many events do you need to process per minute, second or millisecond? Do you need to distribute the analytic model to a clusters with many nodes? How often do you have to improve and redeploy the analytic model? You need to answer these questions at the beginning of your project to avoid double efforts and redevelopment of analytic models!

Another important fact is that analytic models do not always need “real time processing” in terms of very fast and / or frequent model execution. In the above telco example, these spikes and failures might happen in subsequent days or even weeks. Thus, in many use cases, it is fine to apply an analytic model once a day or week instead of just every second to every new event, therefore.

Deep Learning + Visual Analytics + Streaming Analytics = Next Generation Big Data Success Stories

Deep Learning allows to solve many well understood problems like cross selling, fraud detection or predictive maintenance in a more efficient way. In addition, you can solve additional scenarios, which were not possible to solve before, like accurate and efficient object detection or speech-to-text translation.

Visual Analytics is a key component in Deep Learning projects to be successful. It eases the development of deep neural networks by (citizen) data scientists and allows business analysts to leverage these analytic models to find new insights and patterns.

Today, (citizen) data scientists use programming languages like R or Python, deep learning frameworks like Theano, TensorFlow, MXNet or H2O’s Deep Water and a visual analytics tool like TIBCO Spotfire to build deep neural networks. The analytic model is embedded into a view for the business analyst to leverage it without knowing the technology details.

In the future, visual analytics tools might embed neural network features like they already embed other machine learning features like clustering or logistic regression today. This will allow business analysts to leverage Deep Learning without the help of a data scientist and be appropriate for simpler use cases.

However, do not forget that building an analytic model to find insights is just the first part of a project. Deploying it to real time afterwards is as important as second step. Good integration between tooling for finding insights and applying insights to new events can improve time-to-market and model quality in data science projects significantly. The development lifecycle is a continuous closed loop. The analytic model needs to be validated and rebuild in certain sequences.

Read more…

7 Visualizations You Should Learn in R

This blog was originally posted here

 With ever increasing volume of data, it is impossible to tell stories without visualizations. Data visualization is an art of how to turn numbers into useful knowledge.

R Programming lets you learn this art by offering a set of inbuilt functions and libraries to build visualizations and present data. Before the technical implementations of the visualization, let’s see first how to select the right chart type.

Selecting the Right Chart Type

There are four basic presentation types:

  1. Comparison
  2. Composition
  3. Distribution
  4. Relationship

To determine which amongst these is best suited for your data, I suggest you should answer a few questions like,

  • How many variables do you want to show in a single chart?
  • How many data points will you display for each variable?
  • Will you display values over a period of time, or among items or groups?

Below is a great explanation on selecting a right chart type by Dr. Andrew Abela.

In your day-to-day activities, you’ll come across the below listed 7 charts most of the time.

  1. Scatter Plot
  2. Histogram
  3. Bar & Stack Bar Chart
  4. Box Plot
  5. Area Chart
  6. Heat Map
  7. Correlogram

To learn about the 7 charts listed above, click here. For more articles about R, click here.

Read more…

3D Data Visualisation Survey

Back in 2012 we released Datascape, a general purpose 3D immersive data visualisation applications. We are now getting ready to release our 2nd generation application – Datascape2XL, which allows you to plot and interact with over 15 million data points in a 3D space, and view them with either a conventional PC screen or an Oculus Rift virtual reality headset (if you must...).

In order to inform our work we have created a survey to examine the current "state of the market" in terms of what applications people are using for data visualisation, how well they are meeting needs, and what users want of a 3D visual analytics application. The survey builds on an earlier survey we did in 2012 and the results of which are still available on our web site.

We will again be producing a survey report for public consumption, which you can sign up to receive at the end and we'll also post up here.

The aim of this survey is to understand current use of, and views on, data visualisation and visual analytics tools. We recognise that this definition can include a wide variety of different application types from simple Excel charts and communications orientated infographics to specialist financial, social media and even intelligence focussed applications. We hope that we have pitched this initial survey at the right level to get feedback from users from across the spectrum.

I hope that you can find 5 minutes to complete the survey - which you can find here: https://www.surveymonkey.co.uk/r/XXMXPP2

Thanks

David

 

Read more…

Taxonomy of 3D DataViz

Been trying to pull together a taxonomy of 3D data viz. Biggest difference is I think between allocentric (data moves) and egocentric (you move) viewpoints. The difference between whether you then view/explore the egocentric 3D visualisation on a 2D screen or in a 3D headset is I think a lesser distinction (and actually an HMD is possible less practical in most cases).

We have a related benefits escalator for 2D->3D Dataviz, but again I'm not convinced that "VR" should represent another level on this - its more of an orthogonal element - another way to view the upper tiers.

Care to discuss or extend/expand/improve?

Read more…

Kim versus Donald in one Picture

How do you convey a powerful message in just one picture? 

DSC Resources

Popular Articles

Read more…

Originally posted on Data Science Central

This infographic came from Medigo. It displays data from the World Health Organization’s “Projections of mortality and causes of death, 2015 and 2030”. The report details all deaths in 2015 by cause and makes predictions for 2030, giving an impression of how global health will develop over the next 14 years. Also featured is data from geoba.se showing how life expectancy will change between now and 2030.

All percentages shown have been calculated relative to projected changes in population growth.

Read original article here

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…
From time to time I keep pondering on what could be the future and I am sure lot of us get this science fiction imagery where the future data analyst will be given just a pair of holographic gloves and perform three dimensional analysis. Let us stop day dreaming and get to the basics. Dashboards have come a long way. These days lot of vendors are catering towards consuming big data etc. but I remember the days when the common target of BI vendors was "Excel". Looks like the focus has totally shifted from "Excel as the enemy" to "Big Data as the elephant".
Read more…

Originally posted on Data Science Central

This infographic on Shopper Marketing was created by Steve Hashman and his team. Steve is Director at Exponential Solutions (The CUBE) Marketing. 

Shopper marketing focuses on the customer in and at the point of purchase. It is an integrated and strategic approach to a customer’s in-store experience which is seen as a driver of both sales and brand equity.

For more information, click here

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

Originally posted on Data Science Central

Do you want to learn the history of data visualization? Or do you want to learn how to create more engaging visualizations and see some examples? It’s easy to feel overwhelmed with the amount of information available today, which is why sometimes the answer can be as simple as picking up a good book.

These seven amazing data visualization books are a great place for you to get started:

1) Show Me the Numbers: Designing Tables and Graphs to Enlighten, Second Edition

Stephen Few

2) The Accidental Analyst: Show Your Data Who’s Boss

Eileen and Stephen McDaniel

3) Information Graphics

Sandra Rendgen, Julius Wiedemann

4) Visualize This: The FlowingData Guide to Design, Visualization, and Statistics

Nathan Yau

5) Storytelling with Data

Cole Nussbaumer Knaflic

6) Cool Infographics

Randy Krum

7) Designing Data Visualizations: Representing Informational Relationships

Noah Iliinsky, Julie Steele

To check out the 7 data visualization books, click here. For other articles about data visualization, click here.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

Why Your Brain Needs Data Visualization

Why Your Brain Needs Data Visualization

This is a well-known fact nowadays: a goldfish has higher attention span than an average Internet user. That’s the reason why you’re not interested to read huge paragraphs of text. Research by Nielsen Norman Group showed that Internet users have time to read at most 28% of the words on a web page. Most of them read only 20%. Visual content, on the other hand, has power to hold your attention longer.

If you were just relying on the Internet as a casual user, not reading all the text wouldn’t be a problem. However, when you have a responsibility to process information, things get more complicated. A student, for example, has to read several academic and scientific studies and process a huge volume of data to write a single research paper. 65% of people are visual learners, so they find text difficult to process. The pressuring deadline will eventually lead the student to hiring the best coursework writing service. If they present the data visually, however, they will need less time to process it and gettheir own ideas for the paper.   

Let’s explore some reasons why your brain needs that kind of visualization.

1.     Visual Data Triggers Preattentive Processing

Our low-level visual system needs less than 200-250 milliseconds to accurately detect visual properties. That capacity of the brain is called pre-attentive processing. It is triggered by colors, patterns, and forms. When you use different colors to create data visualization, you emphasize the important details, so those are the elements your eye will first catch. You will use your long-term memory to interpret that data and connect it with information you already know. 

2.     You Need a Visual Tier to Process Large Volumes of Data

When you’re dealing with production or sales, you face a huge volume of data you need to process, compare, and evaluate. If you represented it through a traditional Excel spreadsheet, you would have to invest countless hours looking through the tiny rows of data. Through data visualization, you can interpret the information in a way that makes it ready for your brain to process.

3.     Visual Data Brings Together All Aspects of Memory

The memory functions of our brain are quite complex. We have three aspects of memory : sensory, short term (also known as working memory) and long term. When we first hear, see, touch, taste, or smell something, our senses trigger the sensory memory. While processing information, we preserve it in the working memory for a short period of time. The long-term memory function enables us to preserve information for a very long time.

Visual data presentation connects these three memory functions. When we see the information presented in a visually-attractive way, it triggers our sensory memory and makes it easy for us to process it (working memory). When we process that data, we essentially create a new “long-term memory folder” in our brain.

Data visualization is everywhere. Internet marketing experts understood it, and the most powerful organizations on a global level understood it, too. It’s about time we started implementing it in our own practice.

Read more…

10 Dataviz Tools To Enhance Data Science

Originally posted on Data Science Central

This article on data visualization tools was written by Jessica Davis. She's passionate about the practical use of business intelligence, predictive analytics, and big data for smarter business and a better world.

Data visualizations can help business users understand analytics insights and actually see the reasons why certain recommendations make the most sense. Traditional business intelligence and analytics vendors, as well as newer market entrants, are offering data visualization technologies and platforms.

Here's a collection of 10 data visualization tools worthy of your consideration:

Tableau Software

Tableau Software is perhaps the best known platform for data visualization across a wide array of users. Some Coursera courses dedicated to data visualization use Tableau as the underlying platform. The Seattle-based company describes its mission this way: "We help people see and understand their data."

This company, founded in 2003, offers a family of interactive data visualization products focused on business intelligence. The software is offered in desktop, server, and cloud versions. There's also a free public version used by bloggers, journalists, quantified-self hobbyists, sports fans, political junkies, and others.

Tableau was one of three companies featured in the Leaders square of the 2016 Gartner Magic Quadrant for Business Intelligence and Analytics Platforms.

Qlik

Qlik was founded in Lund, Sweden in 1993. It's another of the Leaders in Gartner's 2016 Magic Quadrant for Business Intelligence and Analytics Platforms. Now based in Radnor, Penn., Qlik offers a family of products that provide data visualization to users. Its new flagship Qlik Sense offers self-service visualization and discovery. The product is designed for drag-and-drop creation of interactive data visualizations. It's available in versions for desktop, server, and cloud.

Oracle Visual Analyzer

Gartner dropped Oracle from its 2016 Magic Quadrant Business Intelligence and Analytics Platform report. One of the company's newer products, Oracle Visual Analyzer, could help the database giant make it back into the report in years to come.

Oracle Visual Analyzer, introduced in 2015, is a web-based tool provided within the Oracle Business Intelligence Cloud Service. It's available to existing customers of Oracle's Business Intelligence Cloud. The company's promotional materials promise advanced analysis and interactive visualizations. Configurable dashboards are also available.

SAS Visual Analytics

SAS is one of the traditional vendors in the advanced analytics space, with a long history of offering analytical insights to businesses. SAS Visual Analytics is among its many offerings.

The company offers a series of sample reports showing how visual analytics can be applied to questions and problems in a range of industries. Examples include healthcare claims, casino performance, digital advertising, environmental reporting, and the economics of Ebola outbreaks.

Microsoft Power BI

Microsoft Power BI, the software giant's entry in the data visualization space, is the third and final company in the Leaders square of the Gartner 2016 Magic Quadrant for Business Intelligence and Analytics Platforms.

Power BI is not a monolithic piece of software. Rather, it's a suite of business analytics tools Microsoft designed to enable business users to analyze data and share insights. Components include Power BI dashboards, which offer customizable views for business users for all their important metrics in real-time. These dashboards can be accessed from any device.

Power BI Desktop is a data-mashup and report-authoring tool that can combine data from several sources and then enable visualization of that data. Power BI gateways let organizations connect SQL Server databases and other data sources to dashboards.

TIBCO Spotfire

TIBCO acquired data discovery specialist Spotfire in 2007. The company offers the technology as part of its lineup of data visualization and analytics tools. TIBCO updated Spotfire in March 2016 to improve core visualizations. The updates expand built-in data access and data preparation functions, and improve data collaboration and mashup capabilities. The company also redesigned its Spotfire server topology with simplified web-based admin tools.

ClearStory Data

Founded in 2011, ClearStory Data is one of the newer players in the space. Its technology lets users discover and analyze data from corporate, web, and premium data sources. It includes relational databases, Hadoop, web, and social application interfaces, as well as ones from third-party data providers. The company offers a set of solutions for vertical industries. Its customers include Del Monte, Merck, and Coca-Cola.

Sisense

The web-enabled platform from Sisense offers interactive dashboards that let users join and analyze big and multiple datasets and share insights. Gartner named the company a Niche Player in its Magic Quadrant report for Business Intelligence and Analytics Platforms. The research firm said the company was one of the top two in terms of accessing large volumes of data from Hadoop and NoSQL data sources. Customers include eBay, Lockheed Martin, Motorola, Experian, and Fujitsu.

Dundas BI

Mentioned as a vendor to watch by Gartner, but not included in the company's Magic Quadrant for Business Intelligence and Analytics Platforms, Dundas BI enables organizations to create business intelligence dashboards for the visualization of key business metrics. The platform also enables data discovery and exploration with drag-and-drop menus. According to the company's website, a variety of data sources can be connected, including relational, OLAP, flat files, big data, and web services. Customers include AAA, Bank of America, and Kaiser Permanente.

InetSoft

Inet Software is another vendor that didn't qualify for the Gartner report, but was mentioned by the research firm as a company to watch.

InetSoft offers a colorful gallery of BI Visualizations. A free version of its software provides licenses for two users. It lets organizations take the software for a test drive. Serious users will want to upgrade to the paid version. Customers include Flight Data Services, eScholar, ArcSight, and Dairy.com.

You can find the original article, here. For other articles about data visualization, click here.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

Whether you're working on a school presentation or preparing a monthly sales report for your boss, presenting your data in a detailed and easy-to-follow form is essential. It's hard to keep the focus of your audience if you can't help them fully understand the data you're trying to explain. The best way to understand complex data is to show your results in a graphic form. This is the main reason why data visualization has become a key part of all presentations and data analysis. But let's see what are the top 5 benefits of using data visualization in your work.

Easier data discovery

Visualization of your data helps you and your audience to find specific information. Pointing out an information strictly as one-dimensional graphics can be difficult if you have a lot of data to work with. Data visualization can make this effort a whole lot easier.

Simple way to trace data correlations

Sometimes it's hard to notice the correlation between two sets of data. If you present your data in graphic form, you can notice how one set of data influences another. This is a major benefit as it reduces a great amount of work effort you need to invest.

Live interaction with data

Data visualization offers you the benefit of live interaction with any piece of data you need. This enables you to spot the change in data as it happens. And you don't get just simple information regarding the change, you also get a predictive analysis.

Promote a new business language

One of the major benefits of data visualization over simple graphic solutions is the ability to "tell a story" through data. Per example, with a simple graphic chart, you get an information and that's it. Data visualization enables you to not only see the information but also to know the reasons behind it.

Identify trends

Ability to identify trends is one of the most interesting benefits that data visualization tools have to offer. You can watch the progress of certain data and see the reasons for those changes. With predictive analysis, you can also predict the behavior of those trends in the future.

Conclusion

Data visualization tools have become a necessity in modern data analysis. This need grew start of many businesses that offer data visualization services. 

All in all, data visualization tools have shifted the analytics to a whole new level and allowed a better insight into business data. Let us know about your experience with data visualization tools and how you use them, we'd love to read how it improved your work. 

Read more…

BI Tools for SMEs? Not Just Maybe, But DEFINITELY

I am working as BI consultant and aim to provide best BI Solutions to my clients. Focusing on BI for Tally and upgrading Tally customers to self-servicing BI environment with interactive reports and Dashboard for Tally. Apart from this I like traveling, participating in Business Intelligence forums, reading and social networking.
Read more…

Originally posted on Data Science Central

This article on going deeper into regression analysis with assumptions, plots & solutions, was posted by Manish Saraswat. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy.

Regression analysis marks the first step in predictive modeling. No doubt, it’s fairly easy to implement. Neither it’s syntax nor its parameters create any kind of confusion. But, merely running just one line of code, doesn’t solve the purpose. Neither just looking at R² or MSE values. Regression tells much more than that!

In R, regression analysis return 4 plots using plot(model_name) function. Each of the plot provides significant information or rather an interesting story about the data. Sadly, many of the beginners either fail to decipher the information or don’t care about what these plots say. Once you understand these plots, you’d be able to bring significant improvement in your regression model.

For model improvement, you also need to understand regression assumptions and ways to fix them when they get violated.

In this article, I’ve explained the important regression assumptions and plots (with fixes and solutions) to help you understand the regression concept in further detail. As said above, with this knowledge you can bring drastic improvements in your models.

What you can find in this article :

Assumptions in Regression

What if these assumptions get violated ?

  1. Linear and Additive
  2. Autocorrelation
  3. Multicollinearity
  4. Heteroskedasticity
  5. Normal Distribution of error terms

Interpretation of Regression Plots

  1. Residual vs Fitted Values
  2. Normal Q-Q Plot
  3. Scale Location Plot
  4. Residuals vs Leverage Plot

You can find the full article here. For other articles about regression analysis, click here. 

Note from the Editor: For a robust regression that will work even if all these model assumptions are violated, click here. It is simple (it can be implemented in Excel and it is model-free), efficient and very comparable to the standard regression (when the model assumptions are not violated).  And if you need confidence intervals for the predicted values, you can use the simple model-free confidence intervals (CI) described here. These CIs are equivalent to those being taught in statistical courses, but you don't need to know stats to understand how they work, and to use them. Finally, to measure goodness-of-fit, instead of R-Squared or MSE, you can use this metric, which is more robust against outliers. 

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Read more…

Webinar Series

Follow Us

@DataScienceCtrl | RSS Feeds

Careers