Subscribe to our Newsletter

Guest blog post by Data Science Girl

Fantastic resource created by Andrea Motosi. I've only included the 5 categories that are the most relevant to our audience, though it has 31 categories total, including a few on distributed systems and Hadoop. Click here to view the 31 categories. You might also want to check our our our internal resources (the first section below).

Source: Machine Learning and Face Recognition Papers

Data Science Central - Resources

Machine Learning

  • Apache Mahout: machine learning library for Hadoop
  • Ayasdi Core: tool for topological data analysis
  • brain: Neural networks in JavaScript
  • Cloudera Oryx: real-time large-scale machine learning
  • Concurrent Pattern: machine learning library for Cascading
  • convnetjs: Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser
  • Decider: Flexible and Extensible Machine Learning in Ruby
  • etcML: text classification with machine learning
  • Etsy Conjecture: scalable Machine Learning in Scalding
  • Google Sibyl: System for Large Scale Machine Learning at Google
  • H2O: statistical, machine learning and math runtime for Hadoop
  • IBM Watson: cognitive computing system
  • MLbase: distributed machine learning libraries for the BDAS stack
  • MLPNeuralNet: Fast multilayer perceptron neural network library for iOS and Mac OS X
  • nupic: Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms
  • PredictionIO: machine learning server buit on Hadoop, Mahout and Cascading
  • scikit-learn: scikit-learn: machine learning in Python
  • Spark MLlib: a Spark implementation of some common machine learning (ML) functionality
  • Sparkling Water: combine H2OÕs Machine Learning capabilities with the power of the Spark platform
  • Vahara: Machine learning and natural language processing with Apache Pig
  • Viv: global platform that enables developers to plug into and create an intelligent, conversational interface to anything
  • Vowpal Wabbit: learning system sponsored by Microsoft and Yahoo!
  • WEKA: suite of machine learning software
  • Wit: Natural Language for the Internet of Things
  • Wolfram Alpha: computational knowledge engine


  • Arbor: graph visualization library using web workers and jQuery
  • CartoDB: open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API
  • Chart.js: open source HTML5 Charts visualizations
  • Crossfilter: avaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js
  • Cubism: JavaScript library for time series visualization
  • Cytoscape: JavaScript library for visualizing complex networks
  • D3: javaScript library for manipulating documents
  • DC.js: Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3
  • Envisionjs: dynamic HTML5 visualization
  • Freeboard: pen source real-time dashboard builder for IOT and other web mashups
  • Gephi: An award-winning open-source platform for visualizing and manipulating large graphs and network connections
  • Google Charts: simple charting API
  • Grafana: graphite dashboard frontend, editor and graph composer
  • Graphite: scalable Realtime Graphing
  • Highcharts: simple and flexible charting API
  • IPython: provides a rich architecture for interactive computing
  • Keylines: toolkit for visualizing the networks in your data
  • Matplotlib: plotting with Python
  • NVD3: chart components for d3.js
  • Peity: Progressive SVG bar, line and pie charts
  • Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
  • Recline: simple but powerful library for building data applications in pure Javascript and HTML
  • Redash: open-source platform to query and visualize data
  • Sigma.js: JavaScript library dedicated to graph drawing
  • Vega: a visualization grammar

Graph Databases

  • Apache Giraph: implementation of Pregel, based on Hadoop
  • Apache Spark Bagel: implementation of Pregel, part of Spark
  • ArangoDB: multi model distribuited database
  • Facebook TAO: TAO is the distributed data store that is widely used at facebook to store and serve the social graph
  • Faunus: Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster
  • Google Cayley: open-source graph database
  • Google Pregel: graph processing framework
  • GraphLab PowerGraph: a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API
  • GraphX: resilient Distributed Graph System on Spark
  • Gremlin: graph traversal Language
  • InfiniteGraph: distributed graph database
  • Infovore: RDF-centric Map/Reduce framework
  • Intel GraphBuilder: tools to construct large-scale graphs on top of Hadoop
  • MapGraph: Massively Parallel Graph processing on GPUs
  • Neo4j: graph database writting entirely in Java
  • OrientDB: document and graph database
  • Phoebus: framework for large scale graph processing
  • Sparksee: scalable high-performance graph database
  • Titan: distributed graph database, built over Cassandra
  • Twitter FlockDB: distribuited graph database


  • Actian Ingres: commercially supported, open-source SQL relational database management system
  • BayesDB: statistic oriented SQL database
  • Cockroach: Scalable, Geo-Replicated, Transactional Datastore
  • Datomic: distributed database designed to enable scalable, flexible and intelligent applications
  • FoundationDB: distributed database, inspired by F1
  • Google F1: distributed SQL database built on Spanner
  • Google Spanner: globally distributed semi-relational database
  • H-Store: is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications
  • HandlerSocket: NoSQL plugin for MySQL/MariaDB
  • IBM DB2: object-relational database management system
  • InfiniSQL: infinity scalable RDBMS
  • MemSQL: in memory SQL database witho optimized columnar storage on flash
  • NuoDB: SQL/ACID compliant distributed database
  • Oracle Database: object-relational database management system
  • Oracle TimesTen in-Memory Database: in-memory, relational database management system with persistence and recoverability
  • Pivotal GemFire XD: Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS
  • SAP HANA: is an in-memory, column-oriented, relational database management system
  • SenseiDB: distributed, realtime, semi-structured database
  • Sky: database used for flexible, high performance analysis of behavioral data
  • SymmetricDS: open source software for both file and database synchronization
  • Teradata Database: complete relational database management system
  • VoltDB: in-memory NewSQL database


Related articles (Internal to DataScienceCentral)

E-mail me when people leave their comments –

You need to be a member of DataViz to add comments!

Join DataViz

Webinar Series

Ask Data: Simplifying Analytics with Natural Language

What if you could directly ask questions of your data? Ask Data, Tableau’s new natural language capability, allows people to get insights by simply conversing with their data. In this latest Data Science Central webinar, members of Tableau’s Ask… Continue

Creating Business Applications with R & Python

Across industries, data scientists are creating powerful models and analytics to solve urgent business problems. However, in far too many cases, these analytics never reach their intended business users. The result is wasted time and effort, as well… Continue

DSC Webinar Series: Optimize the Data Supply Chain

Every organization is aiming to produce more comprehensive understanding of their customers, their business operations and their risks, through data. Most organizations are still learning best practices that allow them to leverage in-house data… Continue

Follow Us

@DataScienceCtrl | RSS Feeds



UPS - DIGITAL DATA PLATFORM ANALYST: This is an exciting opportunity to join a growing organization within UPS, leveraging new technologies to help make ...

Siri - NLP Research Scientist - Apple

Apple - SummaryPosted: Oct 25, 2018Weekly Hours: 40Role Number: 113695987Play a part in the ongoing revolution in human-computer interaction. Contribute to...