Modeling Urban Walking Risk Using Spatial-Temporal Machine Learning

Contents

After dinner in downtown San Francisco, I said goodbye to friends and pulled out my phone to figure out how to get home. It was close to 11:30 pm, and Uber estimates were unusually long. I opened Google Maps and looked at walking directions instead. The routes were similar in distance, but I hesitated — not because of how long the walk would take, but because I wasn’t sure how different parts of the route would feel at that time of night. Google Maps could tell me the fastest way home, but it couldn’t help answer the question I was actually asking: how can I filter for a route that takes me through safer blocks rather than the fastest route? Photo by Juliana Chyzhova on Unsplash Defining the Problem Statement Getting the Data + Pre-Processing Fetching the Raw Dataset Categorizing Incidents Reported Geospatial Representation Temporal Representation Final Feature Representation Training the Model Using XGBoost Why XGBoost?Modeling Expected Risk What is Tweedie Regression?Why This Matters?Framing the Outcome Final Steps + Deployments Putting StreetSense To Use!Final Thoughts and Potential Improvements Data Sources & Licensing Acknowledgement & References My socials

Defining the Problem Statement

Given a starting location, ending location, day of the week, and time how can we predict the expected risk on the given walking route? For example, if I want to walk from the Ferry Building to Lower Nob Hill, Google Maps shows me the following route(s):

**Google Maps — walking route from Chinatown to Market & South Van Ness.** Screenshot by author from Google Maps.

At a high level, the problem I wanted to solve was this: given a starting location, ending location, time of day, and day of week, how can we estimate the expected risk along a walking route?

For example, if I want to walk from Chinatown to Market & Van Ness, Google Maps presents a couple route options, all taking roughly 40 minutes. While it’s useful to compare distance and duration, it doesn’t help answer a more contextual question: which parts of these routes tend to look different depending on the time I’m making the walk? How does the same route compare at 9 am on a Tuesday versus 11 pm on a Saturday?

As walks get longer — or pass through areas with very different historical activity patterns — these questions become harder to answer intuitively. While San Francisco is not uniquely unsafe compared to other major cities, public safety is still a meaningful consideration, especially when walking through unfamiliar areas or at unfamiliar times. My goal was to build a tool for locals and visitors alike that adds context to these decisions — using historical data and machine learning to surface how risk varies across space and time, without reducing the city to simplistic labels.

Getting the Data + Pre-Processing

Fetching the Raw Dataset

The San Francisco City and County Departments publish police incident reports daily through the San Francisco Open Data portal. The dataset spans from January 1, 2018 to the present and includes structured information such as incident category, subcategory, description, time, and location (latitude and longitude).

Provides a snapshot on how to query the San Francisco Open Data Database. — **Filtered incident records from the San Francisco Open Data Portal**. Screenshot by author using data from data.sfgov.org.

Categorizing Incidents Reported

One immediate challenge with this data is that not all incidents represent the same level or type of risk. Treating all reports equally would blur meaningful differences — for example, a minor vandalism report should not be weighted the same as a violent incident. To address this, I first extracted all unique combinations of incident category, subcategory, and description, which resulted in a little over 800 distinct incident triplets.

Rather than scoring individual incidents directly, I used an LLM to assign severity scores to each unique incident type. This allowed me to normalize semantic differences in the data while keeping the scoring consistent and interpretable. Each incident type was scored on three separate dimensions, each on a 0–10 scale:

Harm score: the potential risk to human safety and passersby
Property score: the potential for damage or loss of property
Public disruption score: the extent to which an incident disrupts normal public activity

These three scores were later combined to form an overall severity signal for each incident, which could then be aggregated spatially and temporally. This approach made it possible to model risk in a way that reflects both the frequency and the nature of reported incidents, rather than relying on raw counts alone.

Geospatial Representation

Providing raw latitude and longitude numbers will not add much value to the ML model because I need to group aggregate incident context at a block and neighborhood-level. I needed a method to map a block or neighborhood to a fixed index to simplify feature engineering and build a consistent spatial mapping. Cut to the seminal engineering blog published by Uber — H3.

Uber’s H3 blog describes how projecting an icosahedron (20-faced polyhedron) to the surface of the earth and hierarchically breaking them down into hexagonal shapes (and 12 strategically placed pentagons) can help tessellate the entire map. Hexagons are special because there are one of the few regular polygons that form regular tessellations and its centerpoint is equidistant to it’s neighbors’, which simplifies smoothing over gradients.

Demonstrates how centerpoints on a hexagonal grid is equidistant whereas not equidistant on a square grid. — **Neighbor distance comparison showing unequal distances in square grids and uniform distances in hexagonal grids.** Image by author.

The website https://clupasq.github.io/h3-viewer/ is a fun experiment to see what your location’s H3 Index is!

Snapshot to provide context to reader how H3 hexagonal grids are built on the SF map. — **Snapshot from H3 Index Viewer at Resolution 8**. *Screenshot by author, using the H3 library (Apache 2.0) and OpenStreetMap base map © OpenStreetMap contributors (ODbL).*

Temporal Representation

Time is just as important as location when modeling walking risk. However, naïvely encoding hour and day as integers introduces discontinuities — 23:59 and 00:00 are numerically far apart, even though they are only a minute apart in reality.

To address this, I encoded time of day and day of week using sine and cosine transformations, which represent cyclical values on a unit circle. This allows the model to learn that late-night and early-morning hours are temporally adjacent, and that days of the week wrap naturally from Saturday back to Sunday.

In addition, I aggregated incidents into 3-hour time windows. Shorter windows were too sparse to produce reliable signals, while larger windows obscured meaningful differences (for example, early evening versus late night). Three-hour buckets struck a balance between granularity and stability, resulting in intuitive periods such as early morning, afternoon, and late evening.

**Obtaining x, y-coordinates to represent time on a unit circle.** Image by author

Final Feature Representation

After preprocessing, each data point consisted of:

An H3 index representing location
Cyclically encoded hour and day features
An aggregated severity signal derived from historical incidents

The model was then trained to predict the expected risk for a given H3 cell, at a given time of day and day of week. In practice, this means that when a user opens the app and provides a location and time, the system has enough context to estimate how walking risk varies across nearby blocks.

Training the Model Using XGBoost

Why XGBoost?

With the geospatial and temporal features ready, I knew I needed to leverage a model which could capture non-linear patterns in the dataset while providing low latency to perform inference on multiple segments in a route. XGBoost was a natural fit for a couple reasons:

Tree-based models are naturally robust at modeling heterogenous data — categorical spatial indices, cyclical time features, and sparse inputs can coexist without heavy feature scaling or normalization
Feature effects are more interpretable than in deep neural networks, which tend to introduce unnecessary opacity for tabular data.
Flexibility in objectives and regularization made it possible to model risk in a way that aligns with the structure of the problem.

While I did consider alternatives such as linear models, random forests, and neural networks, they were unsatisfactory due to inability to capture nuance in data, high latency at inference time, or over-complication for tabular data. XGBoost strikes the best balance between performance and practicality.

Modeling Expected Risk

It’s important to clarify before we move on that modeling expected risk is not a Gaussian problem. When modeling incident rates in the city, I noticed per [H3, time] cell that:

several cells have incident count = 0 and/or total risk = 0
several cells have just 1–2 incidents
handful cells have incidents many incidents (> 1000)
extreme events occur, but rarely

These are signs that my model is neither symmetrical nor will the data points cluster around a fixed mean. These properties immediately rule out common assumptions like normally distributed errors.

This is where Tweedie regression becomes useful.

What is Tweedie Regression?

Zero-inflated data representation in my dataset, i.e. risk models lead to a right-skewed distribution due to rare and/or extreme events. Image by author

Put simply, Tweedie regression says: “Your value is the sum of random events where the number of events is random and each event has a positive random size.” This fits the crime incident model perfectly.

Tweedie regression combines Poisson and Gamma distribution processes to model the number of incidents and the size (risk score) of each incident. As an example:

Poisson process: in window 6pm-9pm on December 10th, 2025 how many incidents occurred in H3 index 89283082873ffff?

Gamma distribution: how severe was each event that occurred in H3 index 89283082873ffff between 6pm-9pm on December 10th, 2025?

Why This Matters?

A concrete example from the data illustrates why this framing is important.
In the Presidio, there was a single, rare high-severity incident that scored close to 9/10. In contrast, a block near 300 Hyde Street in the Tenderloin has thousands of incidents over time, but with a lower average severity. Tweedie breaks it down as:

Expected risk = E[#incidents] × E[severity]

# Presidio
E[#] ≈ ~0
E[severity] = high
→ Expected risk ≈ still ~0
# Tenderloin
E[#] = high
E[severity] = medium
→ Expected risk = large

Therefore, if the high-risk events tend to happen more often in Presidio then it will adjust the expected risk accordingly and raise the output scores. Tweedie handles the target’s zero‑heavy, right‑skewed distribution and the input features we discussed earlier just explain variation in that target.

Framing the Outcome

The result is a model that predicts expected risk, not conditional severity and not binary safety labels. This distinction matters. It avoids overreacting to rare but extreme events, while still reflecting sustained patterns that emerge over time.

Final Steps + Deployments

To bring the model to life, I used the Google Maps API to build a website which integrates the maps, routes, and direction UI on which I can overlay colors based on the risk scores. I color-coded the segments by taking a percentile of distributions in my data, i.e. score ≤ P50 = green (safe), score ≤ P75 = yellow (moderately safe), score ≤ P90 = orange (moderately risky), else red (risky). I also added a logic to re-route the user through a safer route if the detour is not over 15% of the original duration. This can be tweaked, but I left it as is for now since with the San Francisco hills a 15% detour could work you a lot.

I also deployed backend on Render and frontend on Vercel.

Putting StreetSense To Use!

And now, going back to the first example we looked at — the journey from Chinatown to Market & Van Ness, but now with our new model + application we have built!

Here’s how the walk looks like at 9am on a Tuesday versus 11pm on a Saturday:

Snapshot of my app to illustrate UI/UX + navigating the website. — **My application (StreetSense) — Chinatown to Market & Van Ness at 9am on a Tuesday.** Screenshot by author.

**My application (StreetSense) — Chinatown to Market & Van Ness at 11pm on a Saturday.** Screenshot by author

In the first image, the segments in Chinatown which are green have a lower incident and severity count compared to segments which are red and the data backs it too. The cool part about the second image is that it automatically re-routes the user through a route which is safer at 11pm on a Saturday night. This is the kind of contextual decision-making I originally wished for — and the motivation behind building StreetSense.

Final Thoughts and Potential Improvements

While the current system captures spatial and temporal patterns in historical incidents, there are clear areas for improvement:

incorporating real-time signals
using further ground truth data to validate and train
(a) if an incident was marked as 4/10 risk score for theft and we can find out through the San Francisco database that an arrest was made, we can bump it up to a 5/10
make H3 index sensitive to neighboring cells — Outer Richmond ~ Central Richmond so the model should infer proximity and contextual information should be partially shared
Expand spatial features beyond H3 ID (neighbor aggregation, distance to hotspots, land‑use features).
deeper exploration of different methods of handling of incident data + evaluations
(a) experiment with different XGBoost objective functions such as Pseudo Huber Loss
(b) Leverage hyperparameter optimization frameworks and evaluate different combinations of values
(c) experiment with neural networks
expanding beyond a single city would all make the model more robust

Like any model built on historical data, StreetSense reflects past patterns rather than predicting individual outcomes, and it should be used as a tool for context rather than certainty. Ultimately, the goal is not to label places as safe or unsafe, but to help people make more informed, situationally aware choices as they move through a city.

Try StreetSense: https://san-francisco-safety-index.vercel.app/

Data Sources & Licensing

This project uses publicly available data from the San Francisco Open Data Portal:

San Francisco Police Department Incident Reports
– Source: San Francisco Open Data Portal (https://data.sfgov.org)
– License: Open Data Commons Public Domain Dedication and License (PDDL)

All datasets used are publicly available and permitted for reuse, modification, and commercial use under their respective open data licenses.

Acknowledgement & References

I’d like to thank the San Francisco Open Data team for maintaining high-quality public datasets that make projects like this possible.

Additional references that informed my understanding of the methods and concepts used in this work include: