Electronic R&D and Data mining

Follow-up to Demo: Electronic R&D.

There was also a question of data mining in an electronic R&D environment. With an SDMS, like NuGenesis', there is the potential for it to hold data from a wide variety of instruments and reports. Wouldn't it be great if one could conduct exploratory data mining across all the data sets? The issue, of course, is the quality of the metadata. If the fields are all tagged the same way, then it is a simple matter to search across all values with the same tag. But if similar fields are tagged differently -- or there is some conversion to set fields on the same stage, then things become more complicated. The better business intelligence software packages are beginning to have some of this capacity -- assuming you know what those conversions are. In this particular example, the NuGenesis product has space for 75 metadata tags, most of which can be customized in each implementation. One would also imagine the need to search across the data buried in the files as well.

The specific question was around doing discovery-type data mining to find new correlations and conclusions that scientists did not know at the start of the day. This is one of the holy grail ideas of business intelligence, and I don't think we are there yet. At least with an environment where the data and reports are centrally archived, there is a better infrastructure for starting the mining.

