« Video: Web 2.0 Explained | Main | Is Sharepoint ready for Life Sciences? »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341e46f753ef00e0098593cc8833

Listed below are links to weblogs that reference DIA Reflections: The Importance of Semantics:

Comments

George Laszlo

Doug, your comments bring up yet another dimension of a data warehouse, the fact that it can be used to put data from different sources into a common pool. And, as you point out, putting these data into the same pool does not necessarily mean that the proper relationships between them have been defined. Someone has to take that next step either before or after the data have been pulled in.
Your reference to CDISC and what must be done next reminded me once more that this process of standardization is really time consuming. I wish there was a way to speed this up but I don't think anyone has come up with a way to do this. In the meantime, everyone does the best they can and live with the danger that we create even more divergence. This, by the way, is one of the key reasons why we don't or can't get around to embedding data standards into the daily life of a biopharma company and end up having to transform data at the tail end just to make regulatory submission possible. A very frustrating situation!

Doug Bain

George,

I agree with your assessment on a confusion around the definition of a Data Warehouse. Taking lumps of CDISC (SDM or ODM) based data across studies, and dumping them into a single database will not create a Warehouse.

If I was wishing to analyze the data, regardless of the Database I was using, I would not necessarily obtain the necessary meta or contextual information to determine the usable data.

Each protocol has a series of 'gates' that data must pass through to be considered 'clean'. Data that arrives into a cross study repository will, in effect, be the lowest common denominator across all the protocols. I suppose some useful information may be extracted, but, really effective information requires effective metadata to be available.

I believe the next (complex) phase for CDISC must be a rules standardization exercise. Once we have that, and, we are able to examine the rules applied to supposedly 'clean' data in a consolidated data store - then we can call it a warehouse.

George Laszlo

Charles, since it appears that you work for Oracle, I will forgive you for being enthusiastic and calling 10gR2 a "disruptive technology." I have learned the hard way that this industry is pretty good at being skeptical about new technologies and certainly take their time adopting it. Witness that most statistical analysis is still being done in batch mode using SAS against datasets stored as files. Nothing relational there! So, the concept of being disruptive may be more wishful thinking than reality at the moment.
Putting that aside, your suggestion that "the Database can also perform in-Database comparative statistical functions, and it provides a full range of data mining and text mining algorithms" is a good one. Oracle 10g can enable this and some vendors who specialize in the Biopharma sector (e.g. Waban Software) can make all of this happen including the management and audit trail capabilities.
Once a single application provides these functions, it is difficult to label it either as CDR, DW or Knowledge Warehouse. What's important, however, is that people understand what functions are provided and how they can displace whatever they are used to.

Charles Berger

What if the Database can also perform in-Database comparative statistical functions, and it provides a full range of data mining and text mining algorithms? (Oracle Database 10gR2 has these capabilitites included) Does this "disruptive technology" change all the rules? Manage data and analytze the data in the same place - where it is safe, secure, has audit trails, etc. Is that a CDR, DW or maybe even a Knowledge Warehouse?

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment