New NYC Crash Data Feed Doesn’t Quite Add Up

The city’s new motor vehicle collision data feed is an important step forward for the analysis of crash locations and pedestrian injury and fatality patterns.  But in important ways, it just doesn’t add up in its current form.

A significant share of the records in the database are neither geocoded, nor tagged with geographic identifiers needed to draw meaningful conclusions about trends.   As a result, the collision data available in NYC Open Data consistently underreports the injury and fatality totals recorded in the Motor Vehicle Collision Reports on the NYPD website.

For example, the geocoded portion of the data feed undercounts 2013 motorist and passenger injuries and fatalities in Queens by 36%, cyclist injuries and fatalities by 13% and pedestrian injuries and fatalities by 17%.  The problem is not getting better over time: over the first four months of 2014, the geocoded and tagged data undercounts injured or killed motor vehicle occupants by 38% in Queens, injured or killed cyclists by 13%, and injured or killed pedestrians by 18%.   Any use of these data in research or analysis that tracks trends over time in specific geographic areas will be inherently flawed.   (Details of this analysis are in the attached file.)

Given the NYPD’s limited resources, we understand that it may not be practical to geocode every last accident record.  And many records that are not geocoded still do contain information on cross streets, a resource that could help enrich analysis of injury and fatality hot spots.   But for trend analysis, there should be no need for the public to guess in which precincts the crashes occurred, since this information is already known and reported by the NYPD.  We recommend that the Open Data feed be amended so that every record includes the precinct and borough where the crash report was filed.

Make Queens Safer thanks Sarah Jane Ellmore for her assistance preparing this article.