When I joined the Citadel Midwest Datathon, I didn’t expect our project to end up making a policy argument. But that’s the power of data: when analyzed rigorously, it doesn’t just reveal what is; it hints at what could be.

The Spark

Cities spend billions each year trying to become greener, healthier, and more resilient to climate change. Yet decisions about where to invest, such as which neighborhoods should get a park or which waterfronts should be protected, are often based more on intuition than on data.

That question, "Can we use public datasets to make urban policy decisions more evidence-driven?" became the motivation behind our winning entry - where our goal was to quantify how greenspaces (parks and vegetation) and bluespaces (rivers, lakes, and coastlines) interact to affect local microclimates and public sentiment.

Visualizing the Urban Fabric

We used open data to study how nature affects city life in New York. The CDC PM2.5 dataset gave air quality levels, and the Heat Health Census Tracts dataset provided summer temperature data. The NYC Census Data added population and income details, while NYC TIGER/Line Shapefiles helped map city areas and measure how much land and water each tract had. Finally, the Twitter Climate Change Sentiment dataset included millions of climate-related tweets that showed how people felt about their environment.

 

The first step was to simply visualize the raw geography and see how natural features are distributed across New York City. These maps make an immediate point: nature is not evenly distributed. The city’s waterfronts and parklands cluster in specific areas, leaving many districts nearly devoid of natural cooling effects.

Turning Data into Models

To move beyond visualizations, we developed an gaussian distance metric that captured how the influence of nearby green or blue spaces spreads across census tracts. This was followed by K-means clustering, which segmented NYC into four environmental archetypes:

This grouping helped us compare the effect of natural infrastructure under controlled, data-driven conditions. We modeled how these categories impacted temperature, air quality, and even human sentiment using 13 years of geotagged climate-related tweets.

The result was consistent: both greenspaces and bluespaces lowered temperatures and improved air quality, but their combined effect was less than additive. In other words, they act as substitutes rather than complements and provide diminishing returns when placed in the same area.

From Correlations to Policy

Since creating new water bodies is often impractical, we concentrated on expanding tree cover and identifying where adding new greenery would have the biggest impact. Using temperature data and our models, we mapped the areas most likely to benefit from additional greenspaces.

The results reveal that neighborhoods with low tree and low water coverage stand to gain the most. These are the city’s heat-dense, concrete-heavy regions, where targeted green infrastructure could deliver the greatest cooling effect and improve overall livability.

Why It Matters: A Call to Data-Driven Action

This project reinforced something I now firmly believe: open data can drive better policy when combined with robust statistical modeling. Our findings show that cities can, using only public data and careful analysis, identify where each new park or tree would have the highest return on investment in terms of cooling and livability. That principle extends beyond climate resilience. Education, healthcare access, and transportation equity can all benefit from similar open-data analytics pipelines.

If hackathons and datathons can reveal actionable policy insights in a single weekend, imagine what sustained collaboration between data scientists and city planners could achieve. The next time you browse a government open-data portal, think of it not as “raw data” but as policy waiting to happen, one regression, one visualization, one insight at a time.