How to handle large dataset with D3.js?
With carefully crafted data processing, we can get decent story from data. But this solution doesn’t provide a lot of flexibility to experiment with data on the fly. We need a more streamlined workflow. Less friction can spark interesting data innovation.
Google BigQuery a great tool to handle big dataset. It’s definitely going to help us handle big dataset for D3.js.
I will use New York Taxi dataset hosted on Google BigQuery. It is 4+ GB and has more than 350 million rows in 2 tables. In this article, I want to show you how to query it on the fly. Then use D3.js to create a line chart of total trip amount over time. You can explore the dataset here:
(You’ll need to setup BigQuery account with one project to see public table)
BigQuery has full SQL support. So we can run aggregate query directly on dataset. We’ll group by month/year and sum total_amount column. It takes less than 5 seconds.
SELECT CONCAT(CONCAT(STRING(MONTH(TIMESTAMP(pickup_datetime))), "/"), STRING(YEAR(TIMESTAMP(pickup_datetime)))) AS time, SUM(INTEGER(total_amount)) AS total_amount FROM [833682135931:nyctaxi.trip_fare] GROUP BY time;
Finally, you'll have a visualization that gets data directly from Google BigQuery.
Originally posted on Data Science Central