Monday, September 6, 2010

Expert Electronic Data Capture Systems

One (incomplete) view of Electronic Data Capture is that it’s a way of building validation checks into your Case Report Forms. A paper form can’t tell you that you missed a question or that you’ve confused today’s date and the subjects date of birth but the validation checks built into the EDC system can.


But when you look at the ways that those validation checks are programmed in EDC systems it’s easy to see that they are a kind of add-on to the data collection forms. EDC systems are providing us with better paper, not a system that has any designed-in understanding of the data collected.


Take a Body Mass Index calculation:



I don’t know how many times I’ve written that calculation in different systems over the years. Whether I pull it from a standard library or program it new for each study, the EDC system doesn’t inherently know that a patient has a height or a weight or that the two are related through the BMI calculation. It only knows what we program it to know. For BMI I have to create height and weight fields and convert these entries from whatever units they are entered in into their SI equivalent so that the calculation can be run.


But what if we started to add some decision-support type functionality into our eClinical systems? What if instead of having to create a field called “weight” and schedule it into a form and an event, I could just tell the EDC system “collect weight here” and the same for the patients height, pulse, blood pressure, gender and other universal elements.


The user would see a slight difference in the EDC system. Based on their preferences for input units they would see a question “Weight” in the collection form with their preferred unit input. The system would automatically convert to the SI unit and show that on the screen too (if different). If we collect a height at the same event we can automatically generate the BMI value and in validation checks we can ask the system “if patient is obese…” without having to program all the rules because the system knows how to calculate if the patient is obese from the BMI.


This imaginary system would have knowledge of what weight and height are and would supply BMI charts and graphs of change across visits as standard without having to write a custom report. Hey, you’re collecting data on human beings! Would you like to see a chart of your patients ages / heights / BMI’s / Gender?


Most EDC systems are not that different from business applications that allow you to create forms and collect data. They have no built-in assumptions, no knowledge that the subject of the data collection is a human being (or maybe a horse, cat or dog in a veterinary study).


Isn’t it time we found a way to give our EDC systems better understanding of our patients? If I define a CRF with patient age, sex, height collected at Screening and then weight and blood pressure collected at Visits 1, 2 and 3 then I should be able to ask my EDC system “Give me a list of underweight patients with apparent hypertension” with no need to write any validations or calculations because the system has a built-in understanding of these concepts. 


Maybe our eClinical systems of the future will exhibit a little more expertise on our study subjects.


Anonymous said...


I fully concur with your assertions about height, weight and BMI.

The "ad-hoc query" you pose at the end - do you really view that as a "responsibility" of the eDC system, rather than a data warehouse that houses, say ECG data, in addition to the CRF data so that the query can receive an unqualified answer?


Anonymous said...

You should look at the Ontology of Clinical Research (OCRe) work going on, led by Ida Sim of UCSF. It is in the early stages but this is a use case.

Eco said...

Hi GB,

I don't known that an EDC system will necessarily have all the data required to classify a patient one way or another. This is why I say 'apparent hypertension' in the example.

In the absence of a medical degree the system can't diagnose so we want to program rules that are suggestive of a condition. We want to use this suggestion as the basis of more natural validation checks such as "if the patient is symptomatic of hypertension then create query.." without having to write each time that systolic pressure must be greater than X etc etc.

Different therapeutic areas might call for different "patient templates". For a study in infants you might want growth charts and different weight determinations for underweight/overweight etc.

With effort you can do some of this now with standard libraries and reports but why am I still teaching my EDC system to calculate a BMI? I'm saying that the EDC system could (and should!) do more for me when it comes to standard patient attributes and calculations in order to make it less like a data entry system and more like an expert system.

EDC Consultant said...

The principle of Expert systems were common in the late 80's. This was an extension to the principle of Fourth Generation Languages when engineers were discovering that they were building expert logic into the software they were developing.

eCRF Libraries indirectly logically associate rules with one or more datapoints. Some library tools ensure that associated rules are carried forward, together with the associated referential fields when they are pulled into a study.

The one big challenge today with tools typically available - and this is also manifested with CDASH - is that the logic is not fully detached from the presentation - the eCRF. So - yes, you can have the BMI... but only if you use these Forms...

What is requires is a means to instantiate the presentation layer from a higher level that doesn't care what the Weight, Height or eCRF is called. If this was in place on the 'design' side, then out the 'consumption' side it would be a mater of mapping the very high level '5th generation' logically principles such as 'Give me a list of underweight patients' to the actual lower level representations in the different EDC systems or standards (CDASH).

Anonymous said...

Thanks for the responses.

I agree that the value of an eDC system increases with such capabilities.

If we consider
(1) what EDC Consultant has said about the rules being tied to the presentation,
(2) ePRO and EHR (in addition to LAB, ECG and Imaging ...) being mooted as sources of data that eventually join with eDC data,
(3) bio-statistics and submission functions' typical modified use of eDC data,
(3) there's anyway data marts and warehouses built that use eDC (and ePRO, and EHR) data, that can also be used for bio-statistics and submission purposes

, maybe having such capabilities as part of that end platform will be of more(?) value.

In other words, having such capability in an eDC system _is_ valuable, but probably has more value in warehouses/data hubs.

Eco, I find your comment about "patient templates" intriguing; does this also express inclusion/exclusion criteria? Maybe in a future post, you'll elaborate more on this.


PS: OCRe sounds interesting.. something to read-up about.

Eco said...

GB, yes these concepts are useful in a data warehouse but they're useful in the EDC and other systems for the same reasons.

Consider, in an EDC system you have a field called WT_VAL and another fields called WT_UNIT. In the warehouse you have WT_KG because it makes sense in a DW only to store a single value for reporting purposes.

So now the data from EDC has to make it to the DW. We need to map and that map will be complicated by the fact that the value in WT_VAL is only meaningful when combined with WT_UNIT. WT_VAL says "80" but eighty what? Pounds, Kilo's, Planck mass?

So we need conversion code. But if the EDC system has a concept of Weight and this extends to it's export functions we can map directly to Weight-as-kilo's.

Will write another post on the Patient Template concept, try to explain it better.

I looked at the OCRe but it seemed to be trying to classify studies rather than put meaning into the data itself. So the goal was to be able to search study summaries to extract which studies are Phase III studies of Asthma in 18-40 year olds?

Anonymous said...


Thank you very much for your replies.

The Weight example really nails it for me. From what I understand, you're saying the eDC systems should be configured or designed to capture Weight (rather than a 2 digit precision floating point text field and an unit dropdown) so that the eDC system "knows" that the data is Weight, and that the _rendering_ (or, as appropriate, the exporting) of Weight in its UI may very well be a 2 digit precision floating point text box accompanied with an unit dropdown.

This is something of an A-Ha moment for me.

I'll be looking forward to your continuing posts.



Anonymous said...

I wrote a blog post about how Discworld will do this

Let me know if there are any conversions you need that aren't there (see the file units-data.lisp). I'll happily add them.

data capture said...

The advent of EDC systems has not only made easier to capture data remotely from various sites but also with inbuilt validation and edit checks it has made possible to collect error free data in the very first stage. All this has helped in reducing the total time in getting 100% clean data, the data analyses and reporting and final submission to the regulatory agencies. Today we have many different types of electronic data Capture solutions and may different vendors. These solutions have made the data capture and analysis accurate and easy to great extent, however if we look into another side, it has also resulted in so many variations in the data collection modules, namely the electronic case report forms (eCRF's). There are now hundreds of variations in the CRF design that basically capture the same information. Equally there are thousands of different naming conventions of the data filed on these eCRF's and mapping to the internal database.

© eClinicalOpinion