Prove your data

Where were you at 8:05 pm, Saturday, May 9th, 1987?

Sounds like a wild question, doesn't it?  How could you possibly remember events that specifically, unless that happens to be the day you got married or some other key event in your life.

This is essentially what the government (and shareholders) want to know.  The FDA and other government agencies want the ability to dig into the history of any data or facts used to support your claims.  They want to be able to see how you've come to the conclusions you have drawn, and they might want to dig through the reports and get to the source reports and even the original data.  Essentially, they want to know the provenance of your information.

Before the preponderance of electronic records (and the acceptance of such by the government), this was done through painstakingly digging through paper records via bibliographic references.  Today, essentially everything is electronic, which means there should be some mechanisms for tracking through the data.  Unfortunately, to do this the right way requires a significant technical infrastructure and business support for doing it, so the bibliography is still of significant value.  There are some products out there, notably NuGenesis (now owned by Waters) has created and promoted the scientific data management space with a product I've always thought was interesting.  As I state above, it requires a significant commitment to use the tool throughout the long lifecycle of new products in order to get as much provenance built into documents as possible.

This is inspired by another Communications of the ACM article from April 2008, The Provenance of Electronic Data (pdf available), by Luc Moreau, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer Rana, Andreas Schreiber, Victor Tan, and Laszlo Varga.  Interestingly, there is a Provenance Aware Service Oriented Architecture project.


