|
(If you order it using the above link, we get a small kickback. Thanks!)
|
Mapping Hacksby Schuyler Erle, Rich Gibson and Jo WalshHave a nice metadataSeptember 18th, 2006 by JoA month or two ago I was dropped into the middle of a rather intense discussion about the development of simple catalog interfaces and models for geospatial metadata exchange. The conversation heated up on the OSGeo geodata committee mailing list, and most of it flew right over my head . o O (”CSW ebRIM”? - that just sounds painful). But the conversation about geodata discovery and machine re-use felt like a good fit with the metadata model + engine I’ve been hacking on for the last few months to help manage OSGeo / telascience’s putative public access geodata repository. I realised that the same core model for internal data management, could be re-usable and useful for indexing web services and the data behind them. So about the highest point of FOSS4G 2006 for me, was getting to meet a lot of the people in the metadata conversation in the flesh; to seriously see the light in their eyes. Each time I talked to a new person, the outline of the talk I planned to give at the end of the week mutated in my mind. The rest of this entry is an attempt to write up the narrative of the talk I eventually gave, as the slides are quite minimal and gestural. No conclusions, but a lot of food for thought and a good feeling of common direction.
Last week I talked to Jody Garnett as much as I could get the chance to. Jody said to me, “Jo; metadata is useless.” Without a focus on re-use, a focus on the client-side application interfaces, communicating the ability to make sense of the data visually, to speed up the data analytics; metadata is useless. Without good metadata, data is useless. To make data dance for others, I need to clearly know what a web service will offer; to know how I can legally recombine and republish data in different ways. It helps to know how others have classified, verified, contextualised, represented data. Without reusable, intelligible data, software is useless; machines will fall short of their capacity for making humans’ lives easier; semiautomation that allows us to bootstrap communication, to generate seemingly-meaningful new conclusions, faster. If there is no clear provision of a simple metadata model and exchange mechanism for geographic information, it’s tempting to me to look at this space and declare; “standards are useless.” Having a big problem almost drives the standards community away from pursuit of a simple solution. I got involved in thinking about this problem as a byproduct of OSGeo’s repository work, attempting to establish and implement storage for a basic set of geodata metadata requirements. The ad-hoc data warehousing at telascience during Hurricane Katrina was a popular effort by they experienced a lot of metadata management problems, not knowing what rights they had to redistribute the data they got. Thus Norm Vine’s new motto - “No Data Without Metadata!”. The Mumbai Free Map project has a different set of problems; they are collecting a lot of ground-up data which they know they have full rights in; but can’t start building metadata-driven reuse and rediscovery services until they have data in place - “No Metadata without Data!” But we have to start somewhere to be able to go on, to break out of this loop, to build the spatial search applications that are glinting just round the corner. For me this is about the pursuit of the Simplest, Least Useless Thing. In this endeavour I’ve enjoyed being able to extrapolate from the hive-mind thinking of the OSGeo geodata committee. Because there’s been a heavy US/North American weighting to the initial participants, there’s been an orientation towards the Federal Geographic Data Committee metadata standards; the original effort was to produce a model of a humane subset of its required properties, augmented by some reference to OGC web services and capabilities which have come about since the FGDC standard was written. ISO19115/9 is what the global community is more oriented towards. There’s a lot of overlap with Dublin Core and some of the GeoRSS work. I want to be able to take all the common ‘profiles’ and find the simplest, most coherent subset of metadata that’s as non-arguable as possible - the provenance of the data, its spatial and temporal coverage, layer names. Keywords if they’re provided, but not worry about them too much or overemphasize human-created flat descriptions initially in search or client applications. As Cory Doctorow once memorably wrote, “Metadata is Metacrap“; so let’s focus on what we can agree on, what we can easily and consistently share; a small core that works well enough to build on stably. We can keep metadata from turning into metacrap by keeping it simple. We can make metadata less useless by constantly connecting it to usage, keeping it visible. Metadata’s not a mystery; it’s not all as hard as the OGC is making it out to be. Metadata is data about data use that makes software less useless. Human beings LIE all the time; lie to themselves, to others; lying works as a survival technique; perhaps full truth is impossible, and all statements are lies partially. And machines can’t know (yet) - they can’t handle the truth; they need some external verification, or else only can try to correlate and translate a myriad different worldviews. We can spend years building edge cases around our weaknesses; or we can look for our respective strengths and attempt to combine them. In hacker ethic I am gung-ho, hubristic; I came into the metadata/discovery conversation from an angle of “implement first, think later”. Open Geospatial Consortium specifications are not a language of choice for me. It frustrates me that I can’t find a simple, one-page guideline to implementing a Web Services Catalog (CSW) interface. I search the web for Other Peoples’ Implementations and find myself staring at sentences like this:
What does this mean? All I want it to mean is, “tells me about interesting data”. I look for specifications of what core data about data is most interesting to the specialists, and find myself staring at sentences like this:
What does this mean? This tells me that the cataloguing effort is driven by standards which aren’t oriented to the re-use, re-distribution, recombination of data; the main purpose of the information model should be to make it easier to build interesting apps that consume descriptions of information! Not to comply to archaic standards formed by a traditional methodology while what is being done with geodata on the web is still so much in flux. All I really want to know here, is how i can safely and usefully index, reuse and redistribute the data i find. I came to this discussion from a semantic web perspective; I see four verbs and an infinity of nouns. I see a myriad formats, and hear that it’s the serialisation, the data transfer depiction of the model that’s being argued over, that has been holding the standards-based cataloguing process back. I cast my mind back to the metadata/catalog BOF session: “Protocol is not the issue” / “Profile is not the issue”. How do we find the issue; trace out that shape of the simplest, least useless thing? How do we agree to agree? By sketching out common ground - what’s shared, what’s missing - who makes statements, who republishes them; where things are. A simple common model that provides a subset of FGDC, ISO19115 and Dublin Core, that’s augmented with newer-fangled vocabularies like GeoRSS and SIOC. Something that may leave 100% of the initial developers and users dissatisfied, but also 100% agreeing with the core of what they require.
Standards are useless partly because they overfocus on production of metadata, not consumption. This is all about machine intelligibility, machine reusability; why else make geospatial metadata, other than to comply with obscure operational requirements? And if that has to be done; then let’s try to make it fun, or at least as little non-fun as possible (this is the talk I’d originally tended to give; playful IRC bots for annotating the geodata repository; email nag-bots that hustle your friends to nag you into improving metadata coverage; friendly coverage competitions). For the bits that can’t be fun, let’s try to automate as much as possible, and if something can’t be at least semi-automated, then think twice about including that in a metadata requirement spec. I’m still skeptical of any claim that “given enough eyeballs, good metadata will emerge” (I think of my favourite OpenStreetmap tags - “horse=’yes’” being a personal all-time classic; perhaps OSM metadata is lacking a feedback loop of iterative verification, autosuggestion…) but there’s definitely a lot of space for “architecture of participation” ideas in metadata management. So I’m still listening for the core. I’m quite happy with the boiled-down model we have developed at Where next for all this? I have a sense of a data access bottleneck; there are so many open web services for geographic information leaking around in the world; they’re just not being indexed, or really integrated into authoring or packaging applications One next-fun-place is into spatial data brokering applications; being able to thtink This is all reaching out into the future a bit far; though I’d love to be able to show, point to or fully describe, spike-solution implementations of some of these ideas by next year’s FOSS4G. Tom Kralidis had an interesting reflection after the metadata/catalog BOF; this could easily become too complicated (more of the overthinking that has dogged the process), yet can’t be reduced to something too simple; what’s in the core has to be expressed well enough that it’s not likely to change. I hope that between all of us who Care A Lot About Metadata, it will be possible to find a good balance; I want to be able to manage an evolving model, ongoingly maintaining mappings to different serialisation formats. I’ve talked about wanting to broker decision process through code, rather than words - to drive interface descriptions from test specifications. Consensus design is a game played by passing, refining the tests on a continuous basis - a code-repository-management-inspired means of maintaining something which works like, but doesn’t look like, a standard. This is a story for another time, though. For me this has been yet another adventure in exploring the boundaries of my own ignorance; the deep specialists I have talked to haven’t shot me down in flames yet. But if I’m missing anything crashingly obvious, or hubristically simplifying anything that I really shouldn’t be, please yell at me. Posted in geodata, services, osgeo, metadata | You can follow any responses to this entry through the RSS 2.0 feed. Trackback from your own site. 4 Responses to “Have a nice metadata”
Leave a ReplyYou must be logged in to post a comment. |