[Summary: technical standards play a role in both interoperability, and in target-setting for policy.]

I’ve been doing lots of thinking about standardisation recently, particularly as part of work on the Open Contracting Data Standard (feedback invited on the latest draft release…), and thanks to the opportunity to work with Samuel Goëta on a paper around data standards (hopefully out some time next year).

One of the themes I’ve been seeking to explore is how standards play both a technical and a political role, and how standards processes (at least at the level of content standards) can sensitively engage with this. Below is a repost of my earlier contribution to a GitHub thread discussing some of this in the context of Open Contracting.

Two senses of standard

In Open Contracting I believe we’re dealing with two different senses of ‘standard’, and two purposes which we need to keep in balance. Namely:

Standards as a basis for interoperability – as in *”their data complies with the standard, and can be used by standards-compliant tools.”
Standards as targets – as in, “they have achieved a high standard of disclosure”.

To unpack these a bit:

(Note: the arguments below are predominantly theoretical, and so some of the edge cases considered may not come up at all in practice in the Open Contracting Data Standard, but considering them is a useful exercise to test the intuitions and principles directing our action.)

Standards as interoperability

We’re interested in interoperability in two directions: vertical (can a single dataset be used by other actors and tools in a value-chain of re-use), and horizontal (can two datasets from different publishers be easily analysed alongside one another).

Where data is already published, then the goal should be to achieve the largest possible set of data publishers who can richly represent their data in the standard, and of data users who can draw on data in the standard to meet their needs. This supports the idea that for any element in the standard where (a) data already exists; and (b) use cases already exist; we should be looking for reference implementations to test that data can be rendered in the standard, and that users (or tools they create) can read, analyse and use that data effectively.

However, it is important that in this we look at both both horizontal and vertical interoperability in making this judgement. E.g. there could be a country as the sole publisher of a field that is used by 5 different users in their country. This should clearly not be a required field in a standard, but articulating how it is standardised is useful to this community of users (one way to accommodate such cases may be in extensions, although the judgement on whether or not to move something to an extension might come down to whether it is likely that other publishers could be providing this data in future).

In many cases, underlying data from different sources is not perfectly interoperable, or there is a mismatch between the requirements of users, and the requirements of data holders. In these cases, the way a standard is designed affects the distribution of labour between publishers and users with respect to rendering data interoperable. For example, a use case might involve ‘Identifying which different government agencies, each publishing data independently, have contracts with a particular firm’. In this case, a standard could require all publishers, who may store different identifiers in their systems, to map these to a common identifier, or a standard could allow publishers to use whatever identifier they hold, leaving the costs of reconciling these on the user. Making things interoperable then involves can involve then a process of negotiation, and this process may play out differently in different places at different times, leaving certain elements of a standard less stable than others. The concept of ‘designing for the tussle’ (PDF) may be relevant here, thinking about how we can modularise stable (or ‘neutral’) and unstable elements of a standard (this is what the proposed Organisation ID standard does, but having a common way to represent identifiers, but separating this off from the choice of identifier itself, and then allowing for the emergence of a set of third-party tools and validation routines to help manage the tussle).

In seeking to maximise the set of publishers and users interoperable through the standard we need to be critically aware of both short-term and long-term interoperability, as organisations modify their practices in order to be able to publish to, or draw upon, a common standard. We need to balance out a ‘Lowest Common Denominator’ (LCD) of ‘Minimum Viable Product’ (MVP) approach that means that the majority of publishers can achieve substantial coverage of the standard, with a richer standard that supports the greatest chance of different producer and consumer groups being able to exchange data through the standard.

initial-sketch-thinking-about-standards

(Initial attempt to sketch distinction between maximising set of common fields across publisher and users, and maximising set of publishers and users)

Standards as targets

Open Contracting is a political process. The Open Contracting Partnership have articulated a set of Global Principles which set out the sorts of information about contracting that governments and other parties should disclose, and they are working to secure government sign-up to these principles. In policy circles, a standard is often seen as a form of measure, qualitative or quantitative, against which process towards some policy goal is measured. Some targets might be based on ‘best practice’, others are based on ‘stretch goals’: things which perhaps no-one is yet doing particularly well, but which a community of actors agree are worth aiming for. A standard, whether specified in terms of indicators and measures, or in terms of fields and formats, provides a means of agreeing what meeting the target will look like.

The Open Contracting Principles call for a lot of things which no governments appear to yet be publishing in machine-readable forms. In many cases we’ve not touched the standardisation of these right now (e.g. “Risk assessments, including environmental and social impact assessments”) recognising that standards for these will either exist in different domains that can be linked or embedded into our standard, or, recognising that interoperability of such information is hard to achieve and ultimately what is needed for most use cases may be legal text or plain language documents, rather than structured data. However, there may be cases where something is a strong candidate for standardisation, having both the potential to be published (i.e. this is something which evidence suggests governments either do, or could, capture in their existing information systems), and for which clearly articulated use cases exist. In these cases a proposed field-level standard can act as an important target for those seeking to provide this data to move towards. It also acts to challenge unwarranted ‘first mover advantage’ where the first person to publish, even if publishing less than an idea target would require, gets to set the standard, and instead makes the ‘target’ subject to community discussion.

Clearly any ‘aspirational’ elements of a standard should not predominate or make up the majority of a standard if it seeks to effectively support interoperability, but in standards that play a part in policy and political processes (as, in practice, all standards do to some extent (c.f. Lessig).

Implications for Open Contracting Data Standard

There are a number of ways we might respond to a recognition of the dual role that standardisation plays in Open Contracting.

Purposes and validation sets

One approach, suggested in the early technical scoping is to identify different sets of users, or ‘purposes’ for the standard, and for each of these to identify the kinds of fields (subset of the data) these purposes require. As Jeni Tennison’s work on the scoping describes “…each purpose can have a status (eg proposed vs implemented) and … purposes are only marked as implemented when there are implementations that use the given subset of data for the specified purpose”.

If their are neither purposes requiring a field, nor datasets providing a field, then it would not be suitable for inclusion in a standard. And if a purpose either went unimplemented for a long period, or required a field that no supplier could publish, then careful evaluation would be needed of whether to remove that purpose (or remove that field from the purpose) against which elements of the standard could be evaluated for relevance to remain in the model.

Purposes could also be used to validate datasets, identifying how many datasets are fit for which purpose.

Stable, ordinary and target elements

We could maintain a distinction in how the standard is described between fields and elements which are ‘stable’ (and thus very unlikely to change), ‘ordinary’ elements (which may have reference implementations, but could change if there was some majority interest amongst those governing a standard in seeing changes), and ‘target’ elements, which may lack any reference implementations, but which are considered useful to help publishers moving towards implementing a political commitment to publish.

Q: Could we build this information into the schema meta-data somehow?

We might need to have quite a long time horizon for keeping target elements provisionally in the standard, and to only remove them if there is agreement that no-one is likely to publish to them. However, being able to represent them visually as distinct in the schema, and clearly documenting the distinction may be valuable.

Extensions

Some ‘target’ elements may best belong in extensions, with some process for merging extensions into the core standard if they are widely enough adopted.

Regular implementation monitoring

The IATI Team run a dashboard which tracks use of particular fields in the data. Doing similar for Open Contracting would be valuable, and it may even be useful to feed such information into the display of the schema or documentation (or at least to make it easy for publishers and users to look up who is implementing a given property)

Implementation schedules

Another approach IATI uses for ‘target elements’ is to ask publishers to prepare ‘Implementation Schedules‘ which outline which fields they expect to be able to publish by when. This allows an indication of whether there is political will to reach some of the ‘stretch targets’ that might be involved in a standard, and holds out the potential to convene together to define and refine target standardisations those who are most likely to publish that data in the near to medium term.

Discussion

What theoretical writing on standardisation could I be drawing on here?

What experience from other standards could we be drawing upon in Open Contracting and in other standard processes?