Testing A Drug Utilization Review (DUR) System 

We recently had a relative who acquired Covid-19, she is elderly and on multiple medications. Her treating physician was reluctant to prescribe Paxlovid because of the concomitant medications. The physician didn’t know if there were going to be Drug Drug Interactions (DDI) between Paxlovid and her medications. Well of course you would just put it into the EHR and the EHR will check for DDI, as well as other contraindications. Except, this is Japan, and ambulatory settings don’t really have an EMR/EHR, nor do they have a DUR that operates at the point of prescription.

Okay, we went to a different doctor, who collaborated with his pharmacist partner and they determined there was little or no risk. She is now on anti-viral medications and doing better.

This got me to thinking about DURs in general, and the testing of DURs. I once took on that challenge and will make the dubious claim that I failed… At least I failed in my mind. 

So what is DUR? DUR is the function that checks medications at the time of ordering for the following indications: Drug Drug Interactions (DDI), Drug Pregnancy/Lactation contraindications, Duplicate Therapy (DT), Prior Adverse Reactions (PAR), Drug Condition Contraindication (DCC).

The DDI was the concern that the Japanese clinician had — and because she didn’t have an easy mechanism to check for it (an automated DUR system) she just said, “No, I am concerned about Drug Drug Interaction.” Okay, this problem would be solved by an EMR/EHR system — because at the time of ordering the order would come back and tell you about any DDI — or would it?

The Japanese health care system, in my experience, is great — modern, efficient, ethical, efficacious, and reasonably priced. They do have some gaps in my opinion, especially around ambulatory EMR/EHRs but okay, most of the rest of the world has the same gaps.

But, what about the US based EMR/EHR system? What makes me believe that our DUR is efficacious, and how do I know if it is functioning correctly? We have had EMR systems that have faked the certification in the past, this is full stop fraud. We have systems that aren’t deep in their DUR, sometimes not recognizing subtle conflicts. We have systems that over alert, to the point that the alerts are ignored completely. So how do we test for these conditions in our DUR, not only initially, but in regression as part of the Operational Qualification (OQ) of a new installation or release?

The Model

So the model has to be patient based, manufactured patients at that. Patients that have multiple iterations and exist explicitly to test DUR. The Naming convention should be such that they are never confused for real patients and they should be isolated within both Production and non-Production systems such that you need to use a specific test clinician, in a specific clinic, using one of these specific patients to run the test

Why would you have test patients in PROD? Because, that is the system that this function is going to be used, regularly, and it is mission critical functionality. Testing needs to start in test systems, including UAT, but there will be times when someone needs to check the PROD system to see if things are functioning. However, you also don’t want these patients to become part of the normal everyday workflow or metrics.

So the naming convention of the patients should be such that it is obvious what you are doing. “DUR-PAR-xxx, MedAllergy”, “DUR-PAR-xxx, MedAllClass”, etc. The ‘xxx’ is your cardinality for these patients, ‘aaa’ is the first one, ‘aab’ the second, etc. These patients are automated, so that once the patient test has been ‘used’ and is dirty, the next patient can be pushed into the test battery.

What do you need in these patients from a domain perspective, what needs to be defined? 

Demographics — name, sex, age, some initial vitals. Conditions, active and resolved conditions that are associated with your DUR test battery. Pregnancy, hemophilia, kidney disease, etc. Existing medications, in all statuses. Existing Prior Adverse Reactions, again in all statuses and classes. So, you can see that while this is an easy list, it can quickly get complicated when it comes to which of these domains you will use for your test.

The Test Data

This is where it gets difficult and you will need clinical input, informatics input, and testing input. You will need to understand how the DUR system works in your solution. Which informatics platforms are referenced, what rules are utilized, etc. If the DUR system doesn’t consider medications that have been discontinued, cancelled, entered in error — then those are test cases you will need to validate — but once validated you can generally ignore them. I would suggest a workflow diagram with conditions to be able to walk through and plan your testing in this regard — it will also come very handy when you want to make changes to your DUR system.

You will need a spreadsheet or database for these cases, with data definition fields as well as description, expectation, expected results, test cases, etc. A test management system is probably best for this but that seems to have gone out of favor, and most of the commercial test management systems are insufficient to task. I welcome any challenge to that statement, but if you are the vendor you will need to provide access to the solution for me to validate your assertion.

In crafting your tests you will want to start easy with a smoke test. Followed by something more complicated, subtle, and not easily predicted. Followed by the intense, dozens of conditions, adverse reactions, active medications, etc. If you get the first level done you have essentially only created a smoke test for your IQ/OQ/PQ. 

Some Thoughts on the Tests

Automation is not trivial for many systems — by not being functional it makes this effort all the more difficult. If you are selecting a system, consider the automatability of the solution — if they “say” they can do it, make them prove it — they may have been able to automate the deployment but we are talking about the ability to add patients, with specific clinical data. That isn’t something I have seen in the industry, at least not done successfully.

Start easy, work up to complicated, if you find a critical or high defect in your test, push that test to regression to inoculate the defect from future releases. Define minimal testing requirements, and never ignore the greatness that is exploratory testing.

If you find something, I strongly suggest you do two things simultaneously. First, dig deep on the thing you found. What specifically have you found, is it reproducible, is it an artifact of the informatics or the test itself. Figure out what this ‘find’ is. Secondly, conduct a targeted exploratory effort. Take the attributes of the test, and start attribute testing to see if you can find a pattern or other ‘finds’. I am assuming that you have a team, this should be a team effort,  even if it is just a couple members of the team doing the work the rest should know about it. They may well have seen it in other testing and not connected the dots to a test failure.

Some thoughts on the profitability of these tests

If your vendor can’t show you the testing they do then this testing is profitable.

If your vendor cannot show you the automation, and configure it for your system then this testing is profitable.

If your configuration team has made changes or configured the medication, allergy, problems/conditions systems then this testing is essential and extremely profitable.

If you are bringing in data from other sources then this testing is absolutely essential and will drive a lot of changes in trust of those exchange systems.

The test and defect taxonomy

I will be taking another whack at building out both the defect and test taxonomy for this effort. Probably working the level of test first and then the heavier testing. The test nomenclature is pretty straightforward but still needs to be defined. The defect taxonomy is also pretty straightforward but needs to be articulated and pre-severitized.

Conclusions

The DUR system is essential to patient safety and is required of all certified EHRs. It is complex and can be easily thrown out of kilter by seemingly minor changes to a system. There is not a lot of automation out there to run regression tests and so this is all expensive manual testing that will get dropped on the test floor as one of the first efforts to be cut. There are plenty of industry stories about solutions failing for multiple reasons – from nefarious to accidental to tales of interoperability failure.

Collaborate with your vendor, but do your own testing and make sure you have a solid test battery for your (IQ)/OQ/PQ so you can always have reasonable assurance that your solution is ready for this patient safety tool.

As always, watch out for the brown M&Ms.