Originally posted on Data Science Central
Contributed byBelinda Kanpetch, she is current Architecture graduate student in Columbia University. With the strong urban design sense, she is fascinated in Urban installation art and urge to acquire any elements to ameliorate urban space. In order to gather all the information systematically to apply into her work, she took NYC Data Science Academy 12 week full-time Data Science Bootcamp program April 11th to July 1st 2016. The post was based on her first class project(due at 2nd week of the program).
Why Street Trees?
The New York City street tree can sometimes be taken for granted or go unnoticed. Located along paths of travel they stand steady and patient; quietly going about their business of filtering out pollutants in our air, bringing us oxygen, providing shade during the warmer months, blocking winds during cold seasons, and relieving our sewer systems during heavy rainfall. All of this while beautifying our streets and neighborhoods. Some recent studies have found a link between presence of streets and lower stress levels in urban citizens.
So what makes a street tree different from any other tree? Mainly its location. A street tree is defined as any tree that lives within the public right of way; not in a park or on private property. Although they reside in the public right of way (or within the jurisdiction of The Department of Transportation) they are the property of and cared for by the NYC Department of Parks and Recreation.
With the intent to understand the data and explore what the data was telling me I started with some very basic questions:
- How many street trees are there in Manhattan?
- How many different species are there?
- What is the general condition of the street trees?
- What is the distribution of species by community district?
- Is there a connection between median income of a community district to the number of street trees?
The dataset used for this exploratory visualization was downloaded from the NYC Open Data Portal and was collected as part of TreeCount!2015, a street tree census maintained by the NYC Department of Park and Recreation. The first census count was 1995 and has been conducted every 10 years by trained volunteers.
Some challenges with this dataset involved missing values in the form of unidentifiable species types. There were 2285 observations with unclassifiable species type, 487 observations that had unclassifiable community districts, geographic information (longitude and latitude) were character strings that had to be split into different variables, and species codes were given by 4 letter characters without any reference to genus, species, or cultivar and I had to find another dataset to decipher that code.
Visualizing the data
A quick summary of the dataset revealed a total of 51,660 trees total in Manhattan with 91 identifiable species with one ‘species’ as missing values.
A bar plot of all 92 species gave an interesting snapshot of the range in total number of trees per species. It was quite obvious that there was one species that has a dominant presence. In order to get better understanding of their counts and what were common species, I broke them down by quartiles and plotted them.
Plotting the first quartile (< 3.75)revealed that there were several species in which there was only one tree that existed in Manhattan!
The distribution within the 4th quartile (181.75 << total >> 11529) was informative in that it helped to visualize the dominance of two specific species, the Honeylocust and Ornamental Pear that make up 23% and 15% of all the trees in Manhattan respectively. Coming in close were Ginko trees with 9.47% and London Plane with 7.8%. This quartile also contained the missing species group ‘0’.
A palette of the top 4 species in Manhattan.
Looking at trees by Community District
I wanted to look at community districts as opposed to zip codes because in my opinion community districts are more representative of community cohesiveness and character. So I plot the distribution by community district and tree condition.
Plotting the species distribution by community board using facet grid helped visualize other species that were not showing up dominant in the previous graphs. It would be interesting to look further into what those species are and why they are more dominant within some community districts and not others.
Attempts at mapping
The ultimate goal was to map each individual tree location on a map of Manhattan with the community districts outlined or shaded in. I attempted to plot them on a map using leaflet, bringing in shape files and converting to a data frame, and ggplot but neither yielded anything useful. The only visualization I was able to get was using qplot which took over 2 hours to render.