Guest blog post by Gabriel Lowy
Financial institutions, like many other industries, are grappling with how best to harness and extract value from big data. Enabling users to either “see the story” or “tell their story” is the key to deriving value with data visualization tools, especially as data sets continue to grow.
With terabytes and petabytes of data flooding organizations, legacy architectures and infrastructures are becoming overmatched to store, manage and analyze big data. IT teams are ill-equipped to deal with the rising requests for different types of data, specialized reports for tactical projects and ad hoc analytics. Traditional business intelligence (BI) solutions, where IT presents slices of data that are easier to manage and analyze or creates pre-conceived templates that only accept certain types of data for charting and graphing miss the potential to capture deeper meaning to enable pro-active, or even predictive decisions from big data.
Out of frustration and under pressure to deliver results, user groups increasingly bypass IT. They procure applications or build custom ones without IT’s knowledge. Some go so far as to acquire and provision their own infrastructure to accelerate data collection, processing and analysis. This time-to-market rush creates data silos and potential GRC (governance, regulatory, compliance) risks.
Users accessing cloud-based services – increasingly on devices they own – cannot understand why they face so many hurdles in trying to access corporate data. Mashups with externally sourced data such as social networks, market data websites or SaaS applications is virtually impossible, unless users possess technical skills to integrate different data sources on their own.
Steps to visualize big data success
Architecting from users’ perspective with data visualization tools is imperative for management to visualize big data success through better and faster insights that improve decision outcomes. A key benefit is how these tools change project delivery. Since they allow value to be visualized rapidly through prototypes and test cases, models can be validated at low cost before algorithms are built for production environments. Visualization tools also provide a common language by which IT and business users can communicate.
To help shift the perception of IT from being an inhibiting cost center to a business enabler, it must couple data strategy to corporate strategy. As such, IT needs to provide data in a much more agile way. The following tips can help IT become integral to how their organizations provide users access to big data efficiently without compromising GRC mandates:
- Aim for context. The people analyzing data should have a deep understanding of the data sources, who will be consuming the data, and what their objectives are in interpreting the information. Without establishing context, visualization tools are less valuable.
- Plan for speed and scale. To properly enable visualization tools, organizations must identify the data sources and determine where the data will reside. This should be determined by the sensitive nature of the data. In a private cloud, the data should be classified and indexed for fast search and analysis. Whether in a private cloud or a public cloud environment, clustered architectures that leverage in-memory and parallel processing technologies are most effective today for exploring large data sets in real-time.
- Assure data quality. While big data hype is centered on the volume, velocity and variety of data, organizations need to focus on the validity, veracity and value of the data more acutely. Visualization tools and the insights they can enable are only as good as the quality and integrity of the data models they are working with. Companies need to incorporate data quality tools to assure that data feeding the front end is as clean as possible.
- Display meaningful results. Plotting points on a graph or chart for analysis becomes difficult when dealing with massive data sets of structured, semi-structured and unstructured data. One way to resolve this challenge is to cluster data into a higher-level view where smaller groups of data are exposed. By grouping the data together, a process referred to as “binning”, users can more effectively visualize the data.
- Dealing with outliers. Graphical representations of data using visualization tools can uncover trends and outliers much faster than tables containing numbers and text. Humans are innately better at identifying trends or issues by “seeing” patterns. In most instances, outliers account for 5% or less of a data set. While small as a percentage, when working with very large data sets these outliers become difficult to navigate. Either remove the outliers from the data (and therefore the visual presentation) or create a separate chart just for the outliers. Users can then draw conclusions from viewing the distribution of data as well as the outliers. Isolating outliers may help reveal previously unseen risks or opportunities, such as detecting fraud, changes in market sentiment or new leading indicators.
Where visualization is heading
Data visualization is evolving from the traditional charts, graphs, heat maps, histograms and scatter plots used to represent numerical values that are then measured against one or more dimensions. With the trend toward hybrid enterprise data structures that mesh traditional structured data usually stored in a data warehouse with unstructured data derived from a wide variety of sources allows measurement against much broader dimensions.
As a result, expect to see greater intelligence in how these tools index results. Also expect to see improved dashboards with game-style graphics. Finally, expect to see more predictive qualities to anticipate user data requests with personalized memory caches to aid performance. This continues to trend toward self-service analytics where users define the parameters of their own inquiries on ever-increasing sources of data.