Data Structure Graph - The application of Graph theory to Architecture

Guest blog post by Doug Needham

How does centrality affect your Architecture?

Some time ago, I was responsible for a data architecture I had mostly inherited. There were a number of tweaks I worked to on to refine the monolithic nature of the main database. It was a time of upheaval in this organization. They had outgrown their legacy Computer Telephony Interface application. It was time to create something new.
A large new application development team was brought in to develop some new software.
There was a large division of labor and processing where some things were handled by the new application, and another thing was developed to handle the data. Reporting, cleansing, analysis, ingress feeds, egress feeds, all of these went through the “less important” system.
This was the system I was responsible for.
In thinking about how best to explain a Data Structure Graph, I spent some time revisiting this architecture and brought it into a format that could be analyzed with the tools of Network Analysis.
After anonymizing the data a bit, and limiting the data flows to only the principle data flows, I constructed a csv file to load into Gephi for analysis.
.
.
 Source Target Edge_Label Spider ODS Application ODS Spider Prospect Vendor1 ODS Prospect Vendor2 ODS Prospect Vendor3 ODS Prospect ODS Servicing Application Legacy ODS Application ODS Legacy Prospect ODS Dialer1 Prospect ODS Dialer2 Prospect Gov ODS DNC ODS Spider LegacyData1 ODS Spider LegacyData2 ODS Spider LegacyData3 Spider ODS LegacyData1 Spider ODS LegacyData2 Spider ODS LegacyData3 ODS ThirdParty Prospect ThirdParty ODS Application Legacy ODS Application Legacy ODS DialerStats Dialer1 ODS DialerStats Dialer2 ODS DialerStats
I ran a few simple statistics on the graph, then did some partitioning to color the graph to make it apparent the degree of a node this is the first output of Gephi:
The actual statistics Gephi calculated are in this table:
.
.
 Id Label PageRank Eigenvector Centrality In-Degree Out-Degree Degree Vendor1 Vendor1 0.01991719 0.00000000 0 1 1 Vendor2 Vendor2 0.01991719 0.00000000 0 1 1 Vendor3 Vendor3 0.01991719 0.00000000 0 1 1 Gov Gov 0.01991719 0.00000000 0 1 1 Spider Spider 0.08121259 0.44698155 1 1 2 Servicing Servicing 0.08121259 0.44698155 1 0 1 Legacy Legacy 0.08121259 0.44698155 1 1 2 Dialer1 Dialer1 0.08121259 0.44698155 1 1 2 Dialer2 Dialer2 0.08121259 0.44698155 1 1 2 ThirdParty ThirdParty 0.08121259 0.44698155 1 1 2 ODS ODS 0.43305573 1.00000000 9 6 15
From the Data Architecture perspective, which “application” has the greatest impact to the organization if there were a failure?
Which “application” should have the greatest degree of protection, redundancy, and expertise
associated with it?
Let's cover in detail the two metrics in the middle of the last table PageRank, and Eigenvector Centrality.

I will have to create individual blog entries for both PageRank and Eigenvector Centrality to discuss the actual mechanism for how these are calculated. The math for these can be a bit cumbersome, and each algorithm should be given due attention on its own.

The point of this analysis is to determine which component of the architecture should have additional resources devoted to it. For any customer facing application, it should be given due attention, and infrastructure. However, one question I have seen many of my clients struggle with is what is the priority of the back-end infrastructure? Should once component of the architecture be given more attention than another? I have 90 databases throughout the organization, which one is the most important?

These centrality calculations show unequivocally which component of the architecture has the most impact in the event of an outage, or where the most value can be provided for an upgrade.

This type of analysis can begin to shed light on the answers to these questions. A methodical approach to an architecture based on data, rather than the division that screams the loudest can give insight into how an architecture is truly implemented.

I call these artifacts a  Data Structure Graph

Ask Data: Simplifying Analytics with Natural Language

What if you could directly ask questions of your data? Ask Data, Tableau’s new natural language capability, allows people to get insights by simply conversing with their data. In this latest Data Science Central webinar, members of Tableau’s Ask… Continue

Creating Business Applications with R & Python

Across industries, data scientists are creating powerful models and analytics to solve urgent business problems. However, in far too many cases, these analytics never reach their intended business users. The result is wasted time and effort, as well… Continue

DSC Webinar Series: Optimize the Data Supply Chain

Every organization is aiming to produce more comprehensive understanding of their customers, their business operations and their risks, through data. Most organizations are still learning best practices that allow them to leverage in-house data… Continue

DSC Webinar Series: Applying Convolutional Neural Networks with TensorFlow

In this latest Data Science Central Deep Learning Fundamentals Series webinar, we will cover the fundamentals behind TensorFlow and how to apply them within a convolutional neural network (CNN) example. The principles we will cover include CNN… Continue

Data Reporter

Simons Foundation - POSITION SUMMARY   Spectrum is looking for an experienced and savvy data reporter to bring a new era of quantitative reporting to our site.   Spect...

Customer Service / Data Entry Clerk / Office Administrator

OptiRTC Inc - This Company is looking for office, admin work staff (Must be English, Spanish Bi-lingual)due to the characteristics of our customers. Our Company ...

DIGITAL DATA PLATFORM ANALYST

UPS - DIGITAL DATA PLATFORM ANALYST: This is an exciting opportunity to join a growing organization within UPS, leveraging new technologies to help make ...

Data Scientist

Nehemiah Security - Nehemiah Security is in search of Mid to Senior level Data Scientist. The data scientist will be responsible for creating models that pertain to ou...