Mapping Hacks

by Schuyler Erle, Rich Gibson and Jo Walsh

« Discovering and naming clusters of pictures and other information The power of the press »

Have a nice metadata

September 18th, 2006 by Jo

A month or two ago I was dropped into the middle of a rather intense discussion about the development of simple catalog interfaces and models for geospatial metadata exchange. The conversation heated up on the OSGeo geodata committee mailing list, and most of it flew right over my head . o O (”CSW ebRIM”? - that just sounds painful).

But the conversation about geodata discovery and machine re-use felt like a good fit with the metadata model + engine I’ve been hacking on for the last few months to help manage OSGeo / telascience’s putative public access geodata repository. I realised that the same core model for internal data management, could be re-usable and useful for indexing web services and the data behind them.

So about the highest point of FOSS4G 2006 for me, was getting to meet a lot of the people in the metadata conversation in the flesh; to seriously see the light in their eyes. Each time I talked to a new person, the outline of the talk I planned to give at the end of the week mutated in my mind. The rest of this entry is an attempt to write up the narrative of the talk I eventually gave, as the slides are quite minimal and gestural. No conclusions, but a lot of food for thought and a good feeling of common direction.

Last week I talked to Jody Garnett as much as I could get the chance to. Jody said to me, “Jo; metadata is useless.” Without a focus on re-use, a focus on the client-side application interfaces, communicating the ability to make sense of the data visually, to speed up the data analytics; metadata is useless.

Without good metadata, data is useless. To make data dance for others, I need to clearly know what a web service will offer; to know how I can legally recombine and republish data in different ways. It helps to know how others have classified, verified, contextualised, represented data.

Without reusable, intelligible data, software is useless; machines will fall short of their capacity for making humans’ lives easier; semiautomation that allows us to bootstrap communication, to generate seemingly-meaningful new conclusions, faster.

If there is no clear provision of a simple metadata model and exchange mechanism for geographic information, it’s tempting to me to look at this space and declare; “standards are useless.” Having a big problem almost drives the standards community away from pursuit of a simple solution.

I got involved in thinking about this problem as a byproduct of OSGeo’s repository work, attempting to establish and implement storage for a basic set of geodata metadata requirements. The ad-hoc data warehousing at telascience during Hurricane Katrina was a popular effort by they experienced a lot of metadata management problems, not knowing what rights they had to redistribute the data they got. Thus Norm Vine’s new motto - “No Data Without Metadata!”. The Mumbai Free Map project has a different set of problems; they are collecting a lot of ground-up data which they know they have full rights in; but can’t start building metadata-driven reuse and rediscovery services until they have data in place - “No Metadata without Data!”

But we have to start somewhere to be able to go on, to break out of this loop, to build the spatial search applications that are glinting just round the corner. For me this is about the pursuit of the Simplest, Least Useless Thing. In this endeavour I’ve enjoyed being able to extrapolate from the hive-mind thinking of the OSGeo geodata committee. Because there’s been a heavy US/North American weighting to the initial participants, there’s been an orientation towards the Federal Geographic Data Committee metadata standards; the original effort was to produce a model of a humane subset of its required properties, augmented by some reference to OGC web services and capabilities which have come about since the FGDC standard was written. ISO19115/9 is what the global community is more oriented towards. There’s a lot of overlap with Dublin Core and some of the GeoRSS work.

I want to be able to take all the common ‘profiles’ and find the simplest, most coherent subset of metadata that’s as non-arguable as possible - the provenance of the data, its spatial and temporal coverage, layer names. Keywords if they’re provided, but not worry about them too much or overemphasize human-created flat descriptions initially in search or client applications. As Cory Doctorow once memorably wrote, “Metadata is Metacrap“; so let’s focus on what we can agree on, what we can easily and consistently share; a small core that works well enough to build on stably.

We can keep metadata from turning into metacrap by keeping it simple. We can make metadata less useless by constantly connecting it to usage, keeping it visible. Metadata’s not a mystery; it’s not all as hard as the OGC is making it out to be. Metadata is data about data use that makes software less useless. Human beings LIE all the time; lie to themselves, to others; lying works as a survival technique; perhaps full truth is impossible, and all statements are lies partially. And machines can’t know (yet) - they can’t handle the truth; they need some external verification, or else only can try to correlate and translate a myriad different worldviews. We can spend years building edge cases around our weaknesses; or we can look for our respective strengths and attempt to combine them.

In hacker ethic I am gung-ho, hubristic; I came into the metadata/discovery conversation from an angle of “implement first, think later”. Open Geospatial Consortium specifications are not a language of choice for me. It frustrates me that I can’t find a simple, one-page guideline to implementing a Web Services Catalog (CSW) interface. I search the web for Other Peoples’ Implementations and find myself staring at sentences like this:

Supports the Catalog Service for Web (CSW) profile of the OGC Catalog Service Specification version 2.0 based on the ebXML registry information model.

What does this mean? All I want it to mean is, “tells me about interesting data”. I look for specifications of what core data about data is most interesting to the specialists, and find myself staring at sentences like this:

The CSW information model is based on the international standard for metadata description ISO 19115:2003. In addition, the catalogue uses a metadata description for service metadata based on the draft international ISO 19119:2003 standard to facilitate the management of service metadata.

The main purpose of the information model is to provide a formal structure for the description of information resources that can be managed by a catalogue service that complies with the application profile.

What does this mean? This tells me that the cataloguing effort is driven by standards which aren’t oriented to the re-use, re-distribution, recombination of data; the main purpose of the information model should be to make it easier to build interesting apps that consume descriptions of information! Not to comply to archaic standards formed by a traditional methodology while what is being done with geodata on the web is still so much in flux. All I really want to know here, is how i can safely and usefully index, reuse and redistribute the data i find.

I came to this discussion from a semantic web perspective; I see four verbs and an infinity of nouns. I see a myriad formats, and hear that it’s the serialisation, the data transfer depiction of the model that’s being argued over, that has been holding the standards-based cataloguing process back. I cast my mind back to the metadata/catalog BOF session: “Protocol is not the issue” / “Profile is not the issue”. How do we find the issue; trace out that shape of the simplest, least useless thing?

How do we agree to agree? By sketching out common ground - what’s shared, what’s missing - who makes statements, who republishes them; where things are. A simple common model that provides a subset of FGDC, ISO19115 and Dublin Core, that’s augmented with newer-fangled vocabularies like GeoRSS and SIOC. Something that may leave 100% of the initial developers and users dissatisfied, but also 100% agreeing with the core of what they require.

Standards are useless partly because they overfocus on production of metadata, not consumption. This is all about machine intelligibility, machine reusability; why else make geospatial metadata, other than to comply with obscure operational requirements? And if that has to be done; then let’s try to make it fun, or at least as little non-fun as possible (this is the talk I’d originally tended to give; playful IRC bots for annotating the geodata repository; email nag-bots that hustle your friends to nag you into improving metadata coverage; friendly coverage competitions). For the bits that can’t be fun, let’s try to automate as much as possible, and if something can’t be at least semi-automated, then think twice about including that in a metadata requirement spec. I’m still skeptical of any claim that “given enough eyeballs, good metadata will emerge” (I think of my favourite OpenStreetmap tags - “horse=’yes’” being a personal all-time classic; perhaps OSM metadata is lacking a feedback loop of iterative verification, autosuggestion…) but there’s definitely a lot of space for “architecture of participation” ideas in metadata management.

So I’m still listening for the core. I’m quite happy with the boiled-down model we have developed at
OSGeo. I’m very reassured that it seems to fit well with the judgement of a domain expert like Stefan Keller. I’m looking forward to getting stuck into the FAO GeoNetwork development process a bit; offering our simplified model + editing interface as a pluggable backend for their heavier lifting catalog services interfaces. I’m very happy about the prospect of more value for less code. I don’t mind whether the interface is CSW, OAI-PMH, OWSCat or Z39.50 as long as I can talk to it. I don’t mind whether the format is RDF/XML, FGDC or ISO19115 XML, GeoRSS, JSON or S-expressions as long as I can communicate my understanding of it and maintain a consistent map to the simple underlying model.
This is my new motto: “Be promiscuous in what you send, and promiscuous in what you receive”. I could see supporting a whole matrix of interface+format combinations -
a bit like one of those Mongolian wok restaurants - choose your starch, choose your protein, and optimise your stock for the most popular configuration. If people really
want KML+Dublin Core over CSW, or tab-delimited FGDC over Z39.50 - give it to them! (I still draw the line below ebRIM, though. It just sounds painful.)

Where next for all this? I have a sense of a data access bottleneck; there are so many open web services for geographic information leaking around in the world; they’re just not being indexed, or really integrated into authoring or packaging applications
in interesting ways yet. This is going to change soon; there’ll be a lot more exposed data to work with. Well, I believed this fervently about the semweb for a while, and I still see metadata/discovery as one of those fibrous places where the geospatial web and the semantic web really can help solve each others’ problems.

One next-fun-place is into spatial data brokering applications; being able to thtink
about spatial search more clearly, and in a way more oriented towards users, and better integrated with friendlier clients ( a direction that uDig is really leading the edge of right now.) Then feedback based recommendation starts to become possible - “people who asked for feature sets at this accuracy level in this bounding box also liked…”, “people who annotated a geometry shaped a bit like this suggested it was a…”. There’s also a prospect of automating more kinds of classification - using conflation-like techniques to guesstimate what features represent, based on how they’re shaped, what they’re near, what kinds of features people juxtapose them with in mapmaking
applications. There’s a prospect of actually building spatial data search and packaging applications that are useful for people who don’t already know more or less exactly what they’re doing and exactly what they’re looking for - imagine that!

This is all reaching out into the future a bit far; though I’d love to be able to show, point to or fully describe, spike-solution implementations of some of these ideas by next year’s FOSS4G.
In the meantime, what can people do to help? Come and join in the conversation on the geodata committee, help to anneal the model and offer simple use cases that don’t fit it, check out the progress on the Geodata repository blueprint and emit encouraging signals; offer to host, in the future, a node of a distributed geodata/metadata library and discovery service.

Tom Kralidis had an interesting reflection after the metadata/catalog BOF; this could easily become too complicated (more of the overthinking that has dogged the process), yet can’t be reduced to something too simple; what’s in the core has to be expressed well enough that it’s not likely to change. I hope that between all of us who Care A Lot About Metadata, it will be possible to find a good balance; I want to be able to manage an evolving model, ongoingly maintaining mappings to different serialisation formats. I’ve talked about wanting to broker decision process through code, rather than words - to drive interface descriptions from test specifications. Consensus design is a game played by passing, refining the tests on a continuous basis - a code-repository-management-inspired means of maintaining something which works like, but doesn’t look like, a standard. This is a story for another time, though.

For me this has been yet another adventure in exploring the boundaries of my own ignorance; the deep specialists I have talked to haven’t shot me down in flames yet. But if I’m missing anything crashingly obvious, or hubristically simplifying anything that I really shouldn’t be, please yell at me.

Posted in geodata, services, osgeo, metadata |

You can follow any responses to this entry through the RSS 2.0 feed. Trackback from your own site.


4 Responses to “Have a nice metadata”

  1. AI3 - Adaptive Information::: Says:
    September 19th, 2006 at 1:56 pm

    Unused is Useless: Musings on Metadata (and the Semantic Web)

    Jo Walsh, a geospatial and semantic software specialist and one of the authors of Mapping Hacks, has written a very sensible and down-to-earth musing on metadata titled, Have a nice metadata. Her overall points are that metadata must be actually used …

  2. thinkwhere Says:
    October 16th, 2006 at 2:18 pm

    Is my metadata nice?

    Jo Walsh blogs on mappinghacks.com about metadata, catalougeing standards, and the imporance of just getting something out there which will be easy to read, use and develop upon. It got me thinking about some incentives to publish or “write stuff…

  3. online casino Says:
    November 1st, 2007 at 3:12 am

    online casino

    I wanted add your site to my blog…

  4. ruleta de la fortuna online Says:
    July 9th, 2008 at 11:33 am

    ruleta de la fortuna online…

    Es mas bunny poker jugar video poquer internet juegos de poker gratuitos poker free poker software development…

Leave a Reply

You must be logged in to post a comment.


Entries (RSS) and Comments (RSS).