by Schuyler Erle, Rich Gibson and Jo Walsh
April 19th, 2007 by Jo
Despite the fact I understood no more than one word in 12, I enjoyed FOSSGIS.de 2007, a long month ago now, a great deal. The “Freie Geodaten” movement here in Germany is the most active and developed I have seen. I spent an enjoyable afternoon talking with Jochen Topf about everything under the Sun related to the OpenStreetmap project.
I was delighted last week to read the white paper Towards a New Data Model for OpenStreetmap (PDF) that Jochen produced with Frederik Ramm; I’d urge everyone who has contributed to the project in the past, or cares about its future, to print it out, read it on the bus, and send back notes or patches.
The current OSM data model is “topological”, and quite unlike standard GIS data backends which make use of geometry primitives. “Nodes” represent points, segments joining two nodes are simple lines, and “ways” are multi-segment, complex lines, sometimes used as shapes. These are the basic units of the current model; they are then annotated with open attributes, pairs of keys and values, both of which are free-text, like tags. (”highway=primary” or “name=Oxford Street”… My all-time favourite OSM tag is still “horse=yes”).
This New Data Model moves away from the focus primarily on spatial things, instead outlining “abstract objects” which can have spatial attributes - can be connected to one or more geometries, and have additional sets of properties. The scheme - an abstract object with a UUID or URL identifier, annotated with properties which come from defined namespaces and are to some degree “controlled” by the implementor, looks much more RDF-like to my eyes, and thus appeals to me.
Why more abstraction in a data model? Why not just follow best practise in Geographic Information Systems, buy a copy of ISO19109 and implement an echo of it? In many fields we rely too much on what we have inherited from mind-generations of specialists. The original OSM model was a kind of “naive GIS“, a rough consensus on “the simplest thing that could possibly work”. As the project and the data in it matures, it fails to communicate clearly even simple edge cases. (The central example is a footpath and a major road both running over a bridge; both have a section which runs over a bridge, there’s no way of stating that both “share” “the same” bridge.)
For more detail, I urge you again to read the paper. The notes that follow comment and expand upon it, and may make varying amounts of sense to one who has not read it.
- Audit (which I think of as an umbrella term for change tracking, logging, versioning and reversion). I liked a lot the reasons given in the introductory “why to audit” section, glimpses of wiki nature for structured data:
- Attaching granular changes to people
- Being able to track changes to objects one has changed oneself
- Being able to comment on changesets (like subversion commit comments) but also being able to comment on changesets distributed over time but connected by a theme or process (”trying to get the model right for X”)
- Being able to do rollback on arbitrary changesets
BUT, what’s missing in the New Model is any reflection of this in the API. The sketch of a new RESTful API is, well, sketchy, perhaps teasing the reader to complete it. Wanting to know more, do I have to suggest more? All i can provide is references. Rufus has been working on an interesting and functional prototype for versioned data models for the Open Knowledge Foundation’s Comprehensive Knowledge Archive project. Contributors to the Geoserver project have been experimenting with a versioning extension to the Web Feature Service - Transactional protocol for publishing vector geodata. There’s good ground for rough consensus here, and both projects have running code.
- Read/write client support for a New Data Model. Another intriguing blank space; without a candidate API to test against, how can one tell how hard a backend migration might be? As one of those people who runs screaming from human-user-interface problems, I’d like to hear the perspective of a client implementor on how hard it would be to move from a geometry-centric, open-”tag” model to a more structured, abstract-object-centric, geometry-as-property model.
- Filtering the OSM data for different use cases (dynamic queries, or level of detail in map display).The goal state is to produce a subset of OSM data for a given extent - filtered, maybe generalised, according to the type of shape. The New Data Model document has a solid treatment of why this is needed, and why the open-key-value approach is making it progressively harder to do. Another topic that’s a core concern, yet the way in which the putative New Data Model deals with it is not spelled out.One way to filter is on the semantics of attributes of shapes - the simplest case, display a road or not according to how ‘major’ it is. One interesting consideration is what happens when one gets down to a very fine level of detail, as some wish to do; an example given is the location of a tube station. Zoomed out, it’s just a dot; zoomed right in, one wants a point to represent each individual exit at an intersection.
But then what happens to the nearby geometries? The roads that meet at Oxford Circus are lines, whose width is a side-effect of how they are being displayed. At a “MasterMap”, 1-1 correspondance level of detail, those lines would be polygons, areas with dimension. The New Model could attach several different kinds of geometries to one abstract feature, corresponding to different levels of detail in display.
This leads into several side-tracks. The little I know of ISO19109 (GIS Application Schema) is that it differentiates between “meta level, application level and data level” descriptions of things in space. Application level in this context is cartographic display - selection, arrangement of things on a map. A problem with the tagspace approach is it doesn’t differentiate between spatial semantics, and cartographic semantics (This section of street is physically 10 metres wide; this section of street is “primary”, therefore is displayed as 10px wide at this resolution). Perhaps a difference between “application level” and “data level” isn’t so easy or clear to define.
This has some odd implications for geospatial data licensing. Pace the EDINA paper which had some fun commentary at OKFN and which the Guardian covered recently; there’s a big difference between data which is there to be collected, and data which is the result of creative effort in compilation, in licensing terms.
If the “collection” element is removed as a criterion for originality and thus database copyright, focus must be on selection and arrangement of the materials in the database - the qualitative or expressive part of the test”
To state, “this section of street is 10 metres wide” is a fact collected from the world, which anyone can collect from the world”. To state, “this section of street is primary” - is that a qualitative or expressive description which others may or may not agree with? To design a data model in an attempt to protect oneself from property-oriented “data rights” law - is this a useful, or just a quixotic, consideration?
End of digression, back to the notes.
The question of distributed revisions and distributed reversion. This is something I’m looking for, more than is spelled out in the New Data Model. It connects to “identity” of contributors, and federating identity between different read/write data access points. Imagine a parallel universe in which read/write - e.g. transactional - clients were writing back to different physical instances of a data store. Subsequently all the changes are being collected into one central “view” of the world depicted. Given this parallel universe also contains a satisfactory way to conflate changes - e.g. one original geometry is changed in 2 different ways by 2 different people and subsequently resolved back into one changed object. (I recall seeing a good presentation on doing this at OSGeo ‘05 in Minneapolis…)What subsequently happens when one wants to do distributed reversion? E.g., to subsequently decide that one set of changes to a shape was “unreliable” but the other wasn’t. Would one have to rollback the geometry to its original state, then subsequently re-apply one set of changes but not the other, creating a revised version?
Perhaps this is the view from Mars - I love to speculate about solutions to problems we don’t yet have, but can only predict. Yet this looks obvious to me - a problem that we can’t avoid having. The conviction that, for technological, social and legal reasons alike, OSM will one day have to federate its data store, goes back with me a long way. Frederik and Jochen don’t think so - the paper states that a central data cluster, one ring to rule them all, is the only thng viable now. Perhaps to suggest otherwise is to unwisely invite contention.
- The question of 2 types of nodes. In the current model, nodes (and the segment arcs that join them) denote physical features, bends or joins in lines - the topology of physical things. Nodes can also be useful to denote abstractions - things that are not physical properties of streets, but more like legal or behavioural properties of them (e.g. “the span of this street between node A and node B has a speed limit of 20mph”).How would this work in the New Data Model? Would a geometry be created that was attached to an abstract object (the street) indicating the stretch with a different speed limit? Would we then rely on geometrical queries to “identify” that span with a distinct geometry that describes the street?
- The question of apparent dimensionality. One of my favourite phrases in the whole paper was this, in one of the highlighted “requests for comments” sections:
Discuss whether it may be necessary to allow using edges of areas as linear features, or faces of 3D objects as areas?
Are tuples expressive enough? Do the 3D modelling capacities currently available in GIS or CAD data stores address this question?
Enough idle speculation. What this boils down to is, as ever: each set of newcomers to spatial information modelling see old questions in new ways; I’m not convinced that the state of the art in GIS databases has appropriate answers. The OSM community, as ever, creates new cart-tracks across well-paved spaces. The debate is too heated for any but the really committed to follow, the tracks become effaced in debate, but perhaps they’re leading somewhere new. Or as the New Data Model paper puts it,
Complexity does not mean that it has to be more complicated.
Posted in collaborative mapping, openstreetmap |
You can follow any responses to this entry through the RSS 2.0 feed.
Trackback from your own site.
One Response to “The OpenStreetmap New Data Model Army”
Leave a Reply
You must be logged in to post a comment.
Entries (RSS)
and Comments (RSS).
|