Friday, September 5, 2008

Standardizing Validation Check Definitions in the ODM

In a recent post I explained my attitude toward CDISC ODM Vendor Extensions (Summary: necessary evil, we should always push for improvements to the base standard) in the comments EDC Consultant notes :

The largest chunk of time spent when building an EDC study is typically the edit
check building. A typical study might have 500 edit checks. A proportion might
be handled by field attributes such as 'Required'. However, that still leaves a
chunk of edit check development work.

and:

Eventually, new standards for things like edit check logic will evolve. The most successful vendors will be in the strongest position to push their own
vendor extensions 'provided' they are open

My emphasis there.

EDC Consultant is right on the money. If Sponsors want to use the ODM to pursue a write-once-run-anywhere approach to EDC study development then a machine and human readable standard for validation checks will have to be developed. At the moment, the ODM allows the logic of any check more complex than GT/LT/EQ to be declared in an XML CDATA section (i.e. free text).

The question in my mind is...

How should a validation check be defined?

The requirements I see are:

1. Must be human readable.
2. Must be (easily) machine readable.
3. Must reference the study data in a vendor neutral way
4. Must provide for extension via some equivalent of function calls (e.g. AgeInYears, DaysBetween)

People see the combination of Machine Readable and Human Readable and think XML. I shudder. Although XML technologies can be used to perform validations (see this example) the challenge isn't to validate an ODM XML file containing patient data, it is to document the logic of a validation check in a way that can be translated into (and maybe back from) whatever mechanism a vendor uses to validate data in their eClinical system. Technologies I have seen to do this are : VBScript, JavaScript, Java, SQL, C#, XML-based private languages and full homegrown programming languages. I think if it is able to parse a logical expression it has probably been used to write validation checks by someone, somewhere.

Given that CDISC is a committee and given that the people who have most input into the standards are XML-oriented I fully expect we'll have some kind of XML language. That could be appropriate if it allows the validation check to be checked against the schema but a pity if it's just using XML because parsers for it are readily available and nobody wants to put in the effort to define a real language and provide parsers for it.

If offered a choice I'd rather define logic like:

Visit2.VDate < Visit1.VDate

than:

<comparator type="LT">
  <datavalue field="VDate" event="visit2"/>
  <datavalue field="VDate" event="visit1"/>
</comparator>

The XML equivalents are easier to parse but so verbose. A moderately complex check can easily turn to 20 lines of angle-parenthesis madness.

One problem is that the way you specify data validations for a study are dependent on the structure of the underlying data (i.e. vendor database/storage architecture). In one system you might be able to specify a validation check that says "Any place you see a pulse field, check the chronologically previous visit (if any) and compare the pulse value you find there. If it's 20% or more different, fire a warning" in another that might be 5 different checks, each testing pulse in visit 5 vs pulse in visit 4, pulse in visit 4 vs pulse in visit 3 and so on. Another system might not need this kind of validation check at all, placing dates into timelines which provide automatic checks.

Conclusion

I like the idea of a standard way of specifying validation checks. Creating a specification that is expressive enough to support the different requirements of validation checks in different systems is going to be difficult. An XML-based tag-soup is almost certain to be offered because it has the lowest barriers to entry (nobody has to fire up Lex/Yacc and struggle with EBNF) and would integrate well with the existing ODM structure.

2 comments:

EDC Consultant said...

In the mid 90's, I worked on an early EDC system that took the principle of study development from a form then field perspective, with edit checks dropped off each field. As a first stab at a solution - it made sense - but, when it came to re-use, it was a right dog. You could have the same edit check repeat 20 times in different places - a slow study build - anyway, as they say, you learn from your mistakes.

Early 2000, an EDC system took a different approach. It applied the software developers principle of separating the data from the logic and the presentation. With this model the edit checks were attached at the logically level prior to dropping the questions onto actual forms at the presentation level. It had the advantage of relative references (previous , first visit etc). Admittedly, this approach could be harder when developing the initial studies, but, it was a dream for re-use and for creating output structures that weren't just by form.

ODM takes the former approach from the form down. Given this principle, in an extended ODM with Logic - edit checks would appear below fields. Now, this might be ok. ODM may just be considered a flat dump of the structure of a study - converting the 3 layers into 1. For example, in the source EDC system, it may allow for the definition of an edit check once, that is re-used 20 times. In the ODM output, it would be presented as 20 different edit checks. The question is - would this be valuable? For interfacing purposes - probably - we just need to check that SystemA has what SystemB has so they can talk. For use as the basis for further builds, and structuring outputs... not so good. Then again, maybe I am just over-engineering this.

Anyway, I am not saying that adding logic into ODM is altogether a bad thing. Not at all. It just needs to be understood that it might not be the solution to all problems.

It would certainly be an interesting challenge coming up with a common syntax for edit checks...

I see the benefits of a native XML approach, but, I think I would prefer a generic readable syntax that was embedded into the XML as an expression. It could still reference ODM tags. As far as the Operators - standard arithmetics should be no problem. CDISC could also provide specifications for common standard functions such as Age, etc.

If CDISC did go with a purely XML syntax, then the result would enforce the use of editors that could parse the XML expressions, and present a UI that explained the expression in plain english (or other Localization!).

Right - who's going to create the first draft... ;-)

Jozef Aerts - XML4Pharma said...

Wish I had found this sooner ...

CDISC is essentially a volunteer organisation, with unpaid volunteers spending a lot of their time to develop and further improve and extend the standards.
As one of the core developers of the ODM standard, I wonder why the authors did not contact the ODM team directly. Or do the authors expect that everything will be done for them just like that?
In ODM 1.3, we already extended the possibilities for defining edit checks by the "FormalExpression" element, allowing to define checks in any possible computer language or scripting language.
Like this, we could remain vendor-independent (which is VERY important) and at the same time open incredible possibilities.
Personally, I would also like to come to a more "standardized" and portable way of defining edit checks. My own preference is to use XPath expressions or even XQuery (both XML technologies).
This however probably causes problems for those who use relational databases to store the data (still about 90% of the implementations).

The CDISC ODM team can always use new volunteers. So I would like to invite the authors of the comments to join us, and to help us further improve and extend the standard.
All we ask is ... some of your time and enthusiasm ...

Best regards,

Jozef Aerts
XML4Pharma

© eClinicalOpinion