Continuing the discussion from Overturemaps.org - big-businesses OSMF alternative:
This was raised in the Foundation section by @Richard. Thought it would be appropriate to continue the discussion here.
Continuing the discussion from Overturemaps.org - big-businesses OSMF alternative:
This was raised in the Foundation section by @Richard. Thought it would be appropriate to continue the discussion here.
I donāt know the exact pipeline by which the census bureau gets its data and ultimately aggregates it into the TIGER data set. But itās pretty clear from other Americans Iāve discussed it with that the quality issues seem to be very localized. In my state (RI), the number of TIGER data errors Iāve found have been very small. At one point I made a challenge of running every single street in two towns in my state, and after this exhaustive survey there were only a small number of errors, with the most common one being dead-ends that went further on the map than in reality.
Some classes of problems were common to the entire TIGER import, but we largely fixed those. Off the top of my head: abbreviations in road names, disconnections at county lines, divided roads intentionally digitized as single carriageways, disconnections at railroad crossings. Probably the only one thatās still very widespread is the name_*
tags, because that requires a lot of manual review.
Otherwise, the classes of problems that remain are specific to individual counties (albeit many counties) scattered throughout the country. For example, overnodedness happens in many counties but is difficult to clean up en masse. Many counties also have every private driveway tagged as a highway=residential
. Some have every parking lot outlined as a highway=residential
. Itās easy to perceive these problems as being common to the entire import because of our personal experiences focusing on counties where these things occur.
Agreed that the problems are very localized. At this point the best we can do is therefore come up with local solutions. Iāve done state-wide cleanup challenges in #maproulette before but those prove too dauntingā¦Tens of thousands of tasks. Here in Utah the main problem remaining, and itās a big one, is the huge number of āmade upā rural roads, geometry that does not represent anything currently existing on the ground, or I would not recommend anyone rely on for anything. Iāve considered proposing deleting everything in Utah that is TIGER imported and never looked at by a human. Either as a bulk operation or as something more targeted (not sure what that would look like). Any experience with localized ārevertingā of TIGER import data?
Would it be possible to build some sort of heatmap overlay showing the density of potential remaining TIGER issues? This could help interested mappers find areas to focus on.
So the big issue is A41-class roads in rural areas, particularly (though not exclusively) away from the coasts. These were imported as highway=residential
. Some are indeed residential highways! But most arenāt. Some would be better as highway=track
, some as highway=unclassified, surface=unpaved
(or something more nuanced), and some would be better simply deleted.
My interest is that this is particularly problematic for cycle routing. Car routing is less affected because it prefers the higher routes in the hierarchy (trunk, primary, etc.), which are mostly fixed. Bike routing is the opposite - it prefers unclassified/residential etc. - and if you try and route across the US on highway=residential
, you will basically die of dysentery somewhere on the way to Oregon.
Weāre unfortunately a long way past anything that can be sanely reverted. There have been lots of incremental little fixups over the years, plus the blizzard of corporate edits relating to driveways, that mean most ways have indeed been edited somehow over time. As an example, there are several counties where (say) a maxspeed=45 mph
tag has been added to every highway in the county because thatās the local ordinance. This means you might find a drainage ditch tagged with highway=residential, maxspeed=45 mph
A smart(ish) data consumer can go a long way to alleviating these issues with several heuristics. cycle.travel is very distrustful of highway=residential
with tiger:reviewed=no
(and no surface tag) in rural areas. The upshot is that you can use it to route across the US with cycle.travel and you will probably not die of dysentery. But it would be better to improve the source data.
Iām not against carefully targeted automated edits when the point is to fix issues with an earlier automated edit, i.e. TIGER. For example, a few states (Colorado springs to mind) publish open data on road surfaces. This could be sanely brought into OSM. The imagery-derived surface detection mentioned in another thread also looks really promising. And, of course, maybe Overture Maps will be releasing some relevant open data⦠who knows.
But anything automated would have to be very carefully reviewed for fear of blatting the few usable heuristics we have at the moment.
You should be, and youād know because youāve looked into this pretty deeply. My heuristic for separating wheat from chaff is:
highway=residential
name=*
tiger:cfcc
and / or tiger:reviewed=no
There is a bunch of āadvancedā overpass queries that have more elaborate criteria, @Minh_Nguyen would know where to find these.
My last experience with this was in 2009 when I asked for all of Greene County, Ohio, to be deleted ā because it had been imported twice, every road duplicating another copy of the road without any connections between them. I had already made many edits in the area and had haphazardly deleted many roads from one or the other import, but it only took me a couple months to recover. I donāt think we couldāve done something that clean in 2011. Granted, Greene County is much more developed than some of the counties in Utah where you map.
Here are some Overpass queries for unedited TIGER ways. As @Richard points out, there are many false negatives because of driveway editing and such. The public Overpass instance canāt handle querying for TIGER unedited ways beyond a small area.
A couple years ago, I developed some SPARQL queries to determine the most deserted TIGER desert counties, and even refined it down to individual ZIP codes. Unfortunately, Sophox is no longer reliable for these queries, but I made this snapshot in 2020 that might still be useful.
This is an Overpass query Iāve used in the past that trades off reasonable speed for reasonable accuracy:
rel(161993);map_to_area->.a; // state of utah
way // consider ways that...
[highway=residential] // are residential
[!name] // do not have name tag
["tiger:cfcc"] // have tiger:cfcc tag which was created as part of the import
(if:timestamp() < "2013-01-01T00:00:00Z") // have timestamp before 2013
(area.a); // are in the defined area
out meta geom; // output geometry and metadata
This yields 2176 ways. If you leave out the highway=residential
criterion itās up to 8000+.
Thereās possibly an argument that some of these highway=residential
roads should be automatically retagged as highway=road
- which itself is effectively a fixme tag.
NM is probably the state with the most un-road-like A41s IME. Some of the geometry in WV is pretty shocking thoughā¦
I can see that it might make sense to remove untouched TIGER in āwildernessā areas, but even then by someone in the region who has familiarity.
But locally it would make no sense to take any sort of automated or mass action. Iāve been going over counties and setting the surface and geometry where not hidden under trees, using the US Tasking manager. Even that requires knowledge of regional road construction, soil, and any possible better local authoritative sources of road names and geometry. Iāve also found that the commercial ādriveway mappersā. have often taken the time to improve roads nearby where the original TIGER was wonky.
Welcome to the new forums, @MikeN !
As a compromise, I think it is fesaible to do MapRoulette challenges for smaller areas. I discovered an interesting dividing line at the Navajo Nation boundary where inside NN thereās many old, untouched TIGER residential roads, and outside NN (at least on the Utah side) almost none. I donāt know if this is just selective mapper activity or inconsistencies in the TIGER data coverage, or something else. But I made a MapRoulette challenge to encourage people to help with building a better road network for the Navajo Nation: Martijn van Exel: "Mapping Inequality The map below shows SE Utah wā¦" - En OSM Town | Mapstodon for OpenStreetMap
This is a neat little snippet of OT code, thanks Martijn. Iāve entered something similar for my county (in California) linked in our (county-level) wiki, it produces a bit āricher / deeperā a set of data (both nodes and ways).
I really, really miss the wonderful, deprecated (summer of '19?) ITO World āTIGER Cleanupā (I think it was called) renderer. I used this to clean (and clean, and clean, and cleanā¦) my county until I got to something like 75% or 80% ādoneā (I might give my efforts a solid B-?!) and then that particular renderer quit. So sad.
Iāve looked for other āprettifiedā helpers / renderers to aid in TIGER cleanup, as in many cases, automation is quite ad hoc, specific to a county, state, aboriginal_land (again, Martijn, thanks for the tip about Navajo Nation, Iāll go take a look). Alas, there arenāt any renderers that suit my fancy, so what little work I now do (in my county, really) to improve TIGER is from my OT query. Somehow, because it isnāt as pretty as that ITO World version, I clean up TIGER less than I used to. I think it was the color-scheme (red, orange, light-blue, dark-blue, I think) and rather clever reasons (including ā3 year aging since last editedā) that made it truly useful. I know if we got a replication or close to it, I (for one) would slash away towards 90%, then towards 100% (again, in my county, where I concentrate my mapping efforts, especially for TIGER fixup).
I do recall one august volunteer in this project (I have a lot of offline email conversations with him) calling TIGER, in many cases, ānot much better than an hallucination.ā
Anyway, Iām dedicated to improving TIGER data, locally, more widely (statewide, and indeed, there is a lot to be said for state-by-state ādivide and conquer,ā as weāve done a decent job of whacking rail data from TIGER down, though thereās still tens of thousands of rail miles to go, and these arenāt getting easier to quantify). Iād love to know that better tools are available. OT queries are good, but theyāre wonky and largely used by the more geek-inclined (no offense to geeks, I actually proudly have the word on a license plate of a car of mine).
Yes, it might be the 2040s before we clean it all up. My sleeves are rolled up, and have been for a while.
Richard, I donāt know if youāve been to West Virginia or know much about it, but itās an outlier among states in some interesting ways. For one example, it seems to be deliberately āradio signal quiet,ā I believe part of that was or is for a radiotelescope near there that needs to attenuate interference, improving its signal-to-noise ratio. A number of things seem to āfall off the mapā when you enter West Virginia, itās hard to explain. Iām sure there are reasons for such things, they seem beyond me. Maybe thereās an article written about why.
cycle.travel can be used in some areas. It shows unfixed TIGER residentials in rural areas as a faint grey dashed line, like this:
But it wonāt be 100% reliable for this purpose - in many areas it has additional heuristics to guess what might be a usable road, and it updates roughly once a month so itās not ideal for real-time fixing.
While Iāve perused cycle.travel on my little county before (I did develop and propose to the transportation commission the āCycleNetā bicycle local bike route numbering protocol), thanks to your āunfixed TIGER residential = faint grey dashed lines,ā it visually now makes much more sense! Iām not sure how you determine / calculate ārural areas,ā but my eyeballs are quickly getting retrained as they parse your semiotics. Thanks!
I think this is a reference to the National Radio Quiet Zone, which also extends into a good chunk of Virginia. It is indeed an area where youāre guaranteed to lose cell reception, but the data quality issues in West Virginia arenāt limited to this zone by any means. Iāve cleaned up many roads that corresponded to old mining roads or roads predating mountaintop removal. Even the many roads that legitimately exist have poor geometry because most roads follow winding rivers in dense woodlands within narrow hollows ā tough for both GPS reception and aerial survey.
TIGERās data quality issues are generally endemic to specific counties, but for West Virginia they seem to be pretty consistent statewide. I wonder if this is because West Virginia maintains the entire public road network outside incorporated cities, in contrast to most states that rely more heavily on local highway departments, which are typically responsible for sending road network data to TIGER.
Yes, Minh: the Green Bank (,West Virginia) radiotelescope et al. Thanks for your link, thanks for your mapping of the āquiet zoneā polygons.
Making an explicit reply here because Zekeās excellent suggestion got a like from me and resonates with my experience with that (no longer functional) ITO World render I mentioned. Goinā through a bit of āalreadyā ground here (for over five years of history) at TIGER Edited Map - OpenStreetMap Wiki. The topmost render is what Iām talking about. That āoverviewā or āheat mapā (something about those red-and-orange turning to sky-blue, then darker-blue as weāre done) clinched it as āvisually parsable semiotics which make a lot of sense to my mind,ā driving forward TIGER cleanup.
It worked, is what Iām saying. Replication (or something close) WOULD rekindle that fire, fairly easily, I speak for myself.
I can speak generally about some local governmentsā I consult for. The Census bureau maintains relationships with GIS managers in local administrative jurisdictions (call them Counties in Maryland). They routinely exchange data, not just centerline but also boundary annexations, things like that.
However, not all jurisdictions participate. Some smaller ones do not have the resources to pass quality data back and forth with the Census. Some really small ones may just have a single GIS person and all of their CL could be in an incompatible format. Lots of these tiny jurisdictions around.