Simplifying lake with 200,000 nodes

Hungerburg · March 26, 2023, 9:56pm

I guess, form using the scroll wheel in JOSM and comparing tile urls, that Zoom 24 is, when the ruler shows 50cm. If that is ever going to make it to the standard view, it will be the wholly grail, as this will allow for sub-pixel precision when mapping aerials (Screenshot from a place where the outline of that lake in canada does not match with any aerial available to me. There are water waves close by, that make even the current level of detail unreal.)

Zoom24

I hope, that not too much effort will be spent to make OSM look nice and detailed at this level, before the data model has been changed to one, that can accommodate such level of detail, i.e. one where nodes are no longer first-class citizens.

Make sure, there are no css-styles active. Best use wireframe display, as recommended already. JOSM can deal with it, panning works with little lag, adding an inner for one of the missing islands works as usual, even on my I3 with 16GB Ram total, intel on-board graphics.

stevea · March 26, 2023, 10:28pm

Toggling Wireframe View (I smack my forehead at not doing this first) keeps JOSM alive — thank you!

Hungerburg · March 26, 2023, 10:43pm

I fully agree; In my post, I was referring to the frequent mention of the sweat of the brows doctrine in this topic. BTW this plays a role in grounding copyright, and we still do not know, if the data of said lake is not infringing, so, how it was created may play a role, after all.

Hungerburg · March 26, 2023, 11:01pm

You are welcome I just tried to load the lake relation into the iD editor and experienced the problems you described. Not knowing a work-around, dertainly someone will tell me how to edit that - otherwise, such level of detail might be a good way, to keep iD users out of the game

Graptemys · March 27, 2023, 3:02am

Apologies for not participating the last couple of days, I’ve just returned from a busy weekend

@stevea I’m glad you have downloaded the relation to see for yourself, and that we’re on the same page now. (Strangely, I’m not experiencing exactly the same performance issues as you. Investigating that further and perhaps sending bug reports to JOSM would be interesting and valuable but should probably be a separate conversation.)

Nevertheless, I hope it’s clear that this is causing performance issues and something needs to be done about it. I would also like to include the nearby lakes which are also over-noded but not causing visible issues at the moment because they are smaller and not part of a boundary.

In my opinion, using Simplify Ways tool with a setting of 1.5m is appropriate here. This maintains the shoreline to a sufficient level of detail while also showing a significant performance improvement. With all due respect, many of the people who suggested using the 0.5m setting had not actually looked at this lake in particular and were speaking about simplification in general. A setting so conservative would preserve plenty of nodes that are describing the way precisely but are not describing the lake shore precisely.

By the way, “1.5m” refers to the “Maximum Error” setting on the Simplify Ways tool. It means that after deleting a node, the new simplified way will pass no more than 1.5m from the old point. Nodes were already much farther from the coastline than that in many places. Also the average distance between nodes is currently about 12m and after running Simplify Ways with Maximum Error=1.5m, which deletes about half the nodes, they are about twice as far apart (25m).

stevea · March 27, 2023, 3:04am

My performance issues are quite mitigated by toggling Wireframe View and beefing up my Java heap.

Graptemys · March 27, 2023, 3:16am

Many editors (including myself) do not have access to 64GB of RAM. And most editors (including myself and yourself before @amapanda_ᚐᚋᚐᚅᚇᚐ 's helpful comment) do not know about wireframe view. These are not solutions, they are workarounds.

Meanwhile, de-noding the over-noded ways is a solution that improves performance for everyone, including editors of the lakes and boundaries in question and also data consumers like tile servers that are not working on these features specifically but would still benefit from the smaller feature size.

stevea · March 27, 2023, 3:27am

I agree that “large RAM” machines are not the solution, but those of us who have them can act as a bridge to examining the data (without severe pain) as we decide what we might do about them. And I did know about Wireframe View, I had head-slappingly-stupidly temporarily forgotten it between the time I simply opened/loaded the relation, (and thanked @amapanda_ᚐᚋᚐᚅᚇᚐ for the reminder about it) so I could clock this simple performance test and the time I opened/loaded the relation and actually tried to edit it today.

There is a lot of talk here about copyright and the origin of the data which confuses me, as I don’t see how that is relevant: the big (pig) data are in OSM already.

And as @Graptemys (and others, like @ezekielf ) have mentioned, big (pig) data are problematic. I’d be OK with not only accepting a Maximum Error setting (some, Zeke and @PierZen) say 0.5 meter, others seem to be OK with 1.5 meter, but “implementing it,” too. There is 1) saying it’s correct to do this (still undetermined), 2) choosing a community-acceptable Maximum Error setting (still undetermined) and 3) performing the simplification (it still isn’t determined we should, but as I have the hardware and know the subject rather well now, I’ll volunteer to do so if the community flashes some green lights to go ahead).

I’m “receding” to (largely speaking) what I call “listening mode” here.

dieterdreist · March 27, 2023, 12:20pm

I am also using a Mac (with M1 CPU) and don’t have the issues you are seeing, when I download the relation and all its members, I don’t notice performance degradation, can still zoom and pan smoothly, and I “only” assigned 4GB of RAM (not specifically for this relation, I changed it from 2GB some time ago because in rare situations 2GB proved to be too few, but here it doesn’t seem to be used if I interpret it correctly).

Maybe it is an issue with the java you are using (mine is OpenJDK 64-Bit v.17)? Or your screen resolution is higher (4K?)? For reference, these are my settings:

Relative:URL: ^/trunk
Repository:UUID: 0c6e7542-c601-0410-84e7-c038aed88b3b
Last:Changed Date: 2023-03-23 18:00:59 +0100 (Thu, 23 Mar 2023)
Revision:18699
Build-Date:2023-03-24 02:30:58
URL:https://josm.openstreetmap.de/svn/trunk

Identification: JOSM/1.5 (18699 en) Mac OS X 12.6.3
OS Build number: macOS 12.6.3 (21G419)
Memory Usage: 1080 MB / 4096 MB (820 MB allocated, but free)
Java version: 17.0.6+0, Homebrew, OpenJDK 64-Bit Server VM
Look and Feel: javax.swing.plaf.metal.MetalLookAndFeel
Screen: Display 1 1920×1080 (scaling 1.00×1.00) Display 2 1920×1080 (scaling 1.00×1.00)
Maximum Screen Size: 1920×1080

dieterdreist · March 27, 2023, 12:32pm

rather than having a distance based tolerance, in many situations it would be nice to be able to set a maximum threshold for angles. In particular with sharp bends, using a distance based method will lead to oversimplification, and also in “straight” connections you would want to keep details such as short bends around obstacles, which can serve for orienteering (if they are actual details and not just noise resulting from someone drawing clumsy overnoded “straigt” lines without actually caring for these details, which unfortunately is impossible to determine automatically).

In a situation like the lake, it is very easy to push a button and simplify with some low threshold and on your screen it seems nothing has changed besides the nodes reduced, but it is impossible to overlook every part of the thing, and I’d rather tolerate some overnoding then accept that actual, painstakingly added, punctual detail gets lost.

Graptemys · March 27, 2023, 2:27pm

No thanks, I can do it.

I have been downloading just the lake too during this discussion, but a real workflow would look more like “Download from Overpass API” > “admin_level=6 or admin_level=7 in Nord-du-Québec”, followed by much panning, zooming, selecting, editing, etc. You’ll see that panning and zooming in this case is a little more trouble than when just the lake is loaded. Edit>Purge the lake (Ctrl+Shift+P) and panning/zooming improves. (The northern coastline might still cause issues since it has a lot of nodes but it is not really over-noded in the same way. And due to a quirk of Canadian law the coastline actually is the provincial boundary).

Anyways, I hope you understand that even if you don’t see performance issues on your machine for one specific case that other workflows and other machines can still encounter them.

This is a great suggestion. There actually is a plugin called SimplifyArea that can do this, and offers fine tuning of other parameters. I tried it and unfortunately didn’t think it was any better than the Simplify Ways tool. I found it was orders of magnitude slower, harder to use, and did not produce better results. I encourage you to try it for yourself though.

dieterdreist · March 27, 2023, 4:35pm

I think you will understand that it will always be possible to reach limits when downloading lots of data (on desktop machines), and it would be crazy to start removing data just to enable big scale edits without purging the cache. From my point of view, it should not be necessary to download this huge lake and all its parts in order to perform edits in the area. If you still want to do it, it is possible, but maybe not on any machine, and it will also always be possible to come up with a smaller machine which will have problems with a smaller lake.

I think boundary editing is one of the few exceptions where you might have to do edits in bigger areas, and the solution for these is to make the conflation with dedicated tools or to use filters or (usually boundaries are coming from external sources because we cannot survey them, sometimes it may be possible to survey punctual information (e.g. signs, markings) and for adding these observations you typically only have to download a very small area). OSM is a local activity, people add what they see in their surroundings, you don’t need to download the whole lake if you have been there it is usually a specific spot at the lake, not “the lake” as a whole, or all its 2500 km circumference.

Hungerburg · March 28, 2023, 10:57pm

I guess, that this is similar to so-called sparse editing?

stevea · March 29, 2023, 2:51am

There are people (apparently, @Graptemys) who edit on a very large scale, “bump into” very large data like these, and find them to be a “beast” for their machine, causing difficulty. That’s a place we can all agree happened. Whether simplification is a solution or the best solution is where we don’t agree.

While it’s true for most OSM human editors, (I keep saying “human” before “editor” to distinguish a “software” editor, like JOSM or iD), not everybody who edits OSM is a “local activity” (human) editor. SOMEbody had to enter these data, and others will likely need to maintain them as they “bump into” them (e.g. splitting a way to become part of another boundary).

I suppose what this comes down to is “when is big TOO big?” and “when is ‘rough’ (or ragged, or coarse…) TOO rough?” As I’ve said, OSM doesn’t have (strictly defined, strictly enforced) data quality standards. Though, this thread seems to offer significant voices asking the question, “Well, should we?”

I might imagine (as they already exist) good data practices like “no more than 1800 (1801? 1900? 2000?) nodes in a way” as it may be that somebody a while ago found good numbers like that to use so as to minimize our software choking on data bigger than those (in “typical” hardware environments). But as to precision, or “hugging a coastline within so many centimeters” or “with no more than 0.5 meters (or 1, or 2.5 or 3.0…) of distance between nodes…” OSM has no such rules, at least any that are hard-and-fast or enforceable. But that also means somebody could clean them up (reduce nodes, as @Graptemys originally suggested back in the initial post), as long as “not too much precision is thrown away,” without really offending OSM. But look: what happened is this thread, a solicitation to our community of “what should we do here?” I don’t think the suggestion that we simplify existing data is outrageous, in fact, it is quite reasonable in my opinion. What I have found more fascinating than the technical results (unfortunately so far, wholly undetermined) is the seeming difficulty the community has in concluding what we might do about such “beasts.” (The word “beast” has been used before in other Discourse threads about such data, as they exist in many places in our map).

So: it seems understood that “if you are going to ride the ride, you really should be ‘at least this’ tall” (lots of RAM helps, even knowing not everybody has gobs of RAM). I think this is what @dieterdreist means by “dedicated tools.” But what we don’t seem to have addressed is “what if I find one of these and it is seriously over-noded in my opinion, can I ‘clean it up’ (using something like JOSM’s Simplify Way)? Must I consult with the community first?” I realize this is always a good idea, but look where it got @Graptemys : a very long thread, and not a lot of consensus.

dieterdreist · March 29, 2023, 8:09am

Must I consult with the community first?” I realize this is always a good idea, but look where it got @Graptemys : a very long thread, and not a lot of consensus.

without consensus one better refrains, no?

osmuser63783 · March 29, 2023, 8:21am

When reading this thread a beginner question occurred to me (I’ve never had to edit such “beasts” before).

Why aren’t huge lakes like this represented as multiple square “tiles” like this wood:

Or would this not help, as there would still need to be a relation for the lake as a whole and that relation is the problem?

stevea · March 29, 2023, 8:45am

My first reply to that would be an off-the-cuff answer that “CanVec imported data are very, VERY weird.”

Briefly, (and this is my opinion, not any sort of “standard OSM answer” that everybody will give, though I think there are many in OSM who agree that crufty CanVec imports are “big, kind of ugly” data), the methods by which gigantic woods and gigantic lakes were carved out of CanVec data to turn into what OSM has spottily imported across a great deal of Canada…um, how can I say this politely?..um, they are “not pretty.” Neither the results, nor the actual data when you go into edit them (as a human editor) and try to “clean them up.”

I’m not picking on Canada: there a lot of countries (mine, the USA, and in California, where I am) where “we have messy imports, too.” But I’d like to stick to the topic of “big, ugly, hard-to-edit (for many users) data” and what should we do about it?

I appreciate that beginners and beasts don’t often mix, and that’s likely a good thing, so I’m thanking you for your question. At the same time, I’m also saying that the CanVec-derived data you refer to isn’t anything shiny or beautiful, either. In my opinion. What’s “real here” is that “big data” are hard. For OSM, for everybody. I think the upshot is that we can attempt to “box ourselves in” (to better data and better-looking data) with some sane guidelines and maybe even (data precision) rules, but we are only at the beginning of that (here).

Shaun_das_Schaf · March 29, 2023, 9:03am

After reading though all of this topic, I for myself can say that I would not hesitate to use the “Simplify way”.

Of course I don’t edit in such big scale as @Graptemys does, but if I found a (smaller) lake in my area mapped with very high detail even though the detail does not really match reality, I would simplify it too, even more so if I knew that the data was imported. And afterwards of course try to better align the remaining nodes to the banks visible on aerial imagery, but that is a second step.

Regarding the Lake Mistassini, I doubt that anyone has bothered to map the shore anywhere in detail and corresponding to reality, except perhaps near residential areas. So I would maybe check the mapped shoreline near these residential areas, but the rest I would just simplify without too much concern.

SomeoneElse · March 29, 2023, 9:55am

Yes, that’s pretty much it. The lake as a whole is one entity and has one name, it needs to be one item in OSM (even if it is made up of smaller items).

With route relations there is the concept of a “superroute” - a relation of relations that make up smaller parts of the route. We don’t have the concept of “supermultipolygons”, but to be honest if you have to check the geometry of the thing as a whole you’d still have to download them all anyway.

skyper · March 29, 2023, 12:42pm

After silently watching the discussion, I think there are several topics to discuss and document.

How to edit “beasts” is a topic on its own. Using wireframe mode in JOSM and only using sparse data are useful hints. Additionally I can say, as long as you download all parents of the nodes and do not change the first and last node of the way, you usually cannot break relations and can work with a single way.
How to tread imported data is another topic. Here we need to look at the quality of the import and if “humans” did improve the data after the import.
- Regarding shorelines a single imagery is not a good enough source as the waterline of a lake changes and you often have to guess the high water line.
How to simplify data? For straight lines this works with tools but there is no tool which nicely preserves the shape regarding splines.

Sorry, I have no real answer besides carefully improving each way with Improve Way Accuracy mode which is a lot of work.