Clinical Data Repository (CDR) 101
Over the past three years, a new software category has emerged that does not yet have an accepted name. If frequency of use is any indication, this new category may end up being called "Clinical Data Repository" or CDR.
But what exactly is CDR and why do you need it?
Why CDR?
Perhaps the photo attached to this post will give you a good indication why the time for CDR is now. The photo comes from the Association for Clinical Data Management and is related to its 2006 Annual Meeting. As you can readily tell, the focus is getting your hands on information and being able to collaborate globally.
To do this, you need an environment that:
- allows individuals to get access to data under appropriate access controls and without having to involve gatekeepers;
- supports data interchange standards (e.g. CDISC);
- provides tools for the data to be easily retrievable and analyzable;
- complies with 21CFR11 and GCP rules and practices;
- provides ways to link data, programs/procs and outputs to each other;
- supports reporting and electronic publishing tools and techniques;
- allows widely used software to be invoked for exploration, analysis, reporting and visualization (e.g. SAS, SPlus, R, Spotfire, iReview, etc.) and
- supports methods for importing and transforming data from external sources
Now, I'm sure that you can cite existing systems that can deliver one or more of these functions. However, you won't be able to identify one that does them all. This is where the CDR comes into play.
Note: A discussion of CDR software vendors will be the subject of another post.
A CDR platform can provide the functions noted above out-of-the-box and also solve several key issues faced by the industry today. For example, as a company moves to the wider use of EDC, the need for the traditional Clinical Data Management (CDM) application is being questioned. Why, for example, move data from EDC into the CDM system when much of the data cleansing has already been taken care of on the EDC side? In addition, if it is normal practice to extract data from the CDM so that it can be pulled for statistical analysis in the Biostatistics environment, why not just go directly from the EDC system to the Biostatistics repository? Last, since the volume of data coming from external sources is increasing, exactly where should these data reside?
As these questions are faced, it becomes quite evident that there is no single and adequate repository available for ALL of the data coming from clinical trials. Neither the CDM system or the biostatistics repository is up to the job. Thus, a new environment has to be implemented and there is no better way to achieve that than a CDR.
How is CDR Implemented?
As any other system, a CDR requires a clearly articulated implementation plan. To date, most companies have chosen a roll-out plan that starts with Biostatistics and then expands to include other user groups within and external to the organization.
Starting with Biostatistics makes a lot of sense since:
- the number of users are limited;
- contingency (i.e. fall-back) plans are easy to devise;
- statisticians and programmers are the most qualified to address the key data management and analysis issues;
- it is the organization to which all other users turn to make sense of the data;
- is the place where the usability and interoperability of processes and software tools can best be evaluated;
- is the group most experienced with a disciplined approach to data and reports management.
The tactical objective of the first phase of the roll-out plan is to replace the existing repository used by biostatistics for clinical data and their related statistical programs and outputs/reports. In most companies, this existing repository is a hierarchically organized file system. The file system itself is kept under strict security control and the number of users who can get at it is limited, typically to the Biostatistics and Programming staff. All others go to these individuals with specific requests to analyze the data and then receive canned reports for internal use or for regulatory submission.
When the CDR is implemented, most of the traditional biostatistics functions are maintained as additional controls on the data are introduced and a new focus on data sharing and collaboration is put into practice. In other words, the implementation is not simply a technology upgrade but also an enabler of process change. In particular, the following key principles drive the implementation:
- the traditional work of the statistician is enhanced, not hindered;
- the traditional tools of the statistician continue to be used with minimal disruption;
- new functions are introduced that enable compliance with 21CFR11 regulations and GCP guidelines;
- the collection of metadata related to all dimensions of clinical trial conduct is achieved;
- the environment is designed to achieve a fair balance between access controls and ease of retrieval and use;
- the final system fully supports the use of data interchange standards for importing and exporting data and related information (e.g. electronic files and electronic publishing);
- the environment directly provides or allows for the integration of tools that support data retrieval, aggregation, analysis, exploration, visualization and reporting by different types of users
During the first phase, a key risk that must be managed is the temptation to focus just on the needs of the biostatisticians and programmers. The best way to mitigate this is to have a program management office with adequate representation, input and review by all stakeholders. This group must make sure that appropriate measures of success are formulated and evaluated as the implementation proceeds. In short, no barriers to the expansion of the system should exist at the completion of Phase 1.
Is a CDR a Data Warehouse?
As the concept of the CDR has matured, the notion that it is a data warehouse has refused to die. So, let's be very clear: A clinical data repository IS NOT a clinical data warehouse. The confusion is really one of semantics. People who don't know the strict definition of the term "data warehouse" tend to use the term incorrectly to mean a place where clinical data is stored or archived. A true data warehouse, however, is a database that gives access to the data for direct analysis or allows data to be retrieved and transformed into data marts. While a CDR can deliver data warehouse "like" capabilities, it is primarily a system for managing data about electronic files (e.g. data sets) and the files themselves WITHOUT touching or organizing the data within the files.
Comments