System of Systems Engineering is the idea of connecting disparate systems together to bring forward some emergent behavior. In Health Information Technology (HIT) this can mean connecting a Laboratory Information Management System (LIMS) to an EHR (acute or ambulatory) or maybe an EHR to a HIE to a Population Health solution. Or maybe it is the connection of multiple EHR solutions to allow for a seamless exchange within a geographic community. In Healthcare the number of connections is endless, so lets explore what it takes to test something like this.

Start with the Pop Health example

As a hospital care coordinator or a research physician I am trying to identify groups of patients who need specialized care or have specialized conditions. It might be that I want to actively manage my diabetic population to meet regulatory requirements or better yet to actively improve patient’s health. Or maybe I need to generate lists for Care Plans, or identify Multi-Condition Comorbidity, (MCC) cohorts. I might need to use the EHR data to identify cohorts of patients who are triggering an emerging pharmacovigilance state or novel disease cohort identification.

All of these things require significant collaboration across the entire solution. But more importantly, they are SoS engagements that require an understanding of the solutions from not only the individual solution perspective but the overarching emergent solution. Just as the engineering must consider the cross-product engagement, so must the testing of those solutions.

What do you need:

You need everything you need to develop a SoS solution that you need for a single solution – just more of it – and somethings that are optional in a single solution become required in a SoS solution. We will talk about: Requirements, Environments, Automation, Data, Usage Analytics, Test Data Crafting, Telemetry… 

Requirements

To test a solution you of course need to the requirements for the SoS solution. They are either already defined, or you will need to begin with a reverse engineering engagement to define them yourself. You must establish the current state of the source systems schema, enumerations, encoding, workflows, and common usage. Then, you must establish the same for the intermediary repositories. Finally, you must establish how the end analytics solution meet those requirements. As an aside, the more intermediate repositories you have in your data the harder it will be to conduct your analytics. Getting your data from the source may not be possible when you are dealing with a geographic community but keeping data acquisition as close to the source as possible is ideal. 

You must also generate requirements for your emergent behaviors, set the acceptance criteria and hold the contributing solutions to those criteria. This will mean that the source systems may need to change their behavior to align with the new need. Just because a feature/function/schema/dictionary worked for a limited workflow does not mean that it will work for the new emergent need. This also means that you will have to maintain cross product development registry and track to it. Note that your acceptance criteria must include the input, intermediation, and outputs — and realize that you may have an input acceptance criteria error that is generated two or three systems prior. I am not going to talk about Project Management in this note but it is essential to the success of the project. Perhaps that is another article.

Environment

Requirements out of the way, you then need to establish your test and development environment. I put test first but, in all seriousness, if you don’t have a SoS development environment you will eagerly fail to achieve success. What does the test environment look like?

It looks an awful lot like your live environment. It starts with the systems engineering. You need to have instances of the solution in this environment – connected to all the other components of the Solution. All of your EHRs, connected to your solution intermediaries, connected to your analytics solution, connected to the universe of your solutions. It is a big task, but it is required, and if you don’t have it you cannot succeed. Let me restate this: If you don’t have a representative environment for your development and testing your project will fail – full stop.

You could throw this up into a cloud based solution so you can scale it as you need, and turn it off when you don’t need to reduce costs. But, these costs are far less then a complete failure in your solution so do not be penny wise and pound foolish.

Next, you will need configurations and representative solution conditions. If you have a couple customers who will be your guinea pigs, then you should extract their environment for an overlay into your Dev/Test environment. If their solutions are highly customized, then you will also need a standard version of the solution in your environment. If you are worried about the number of versions of solutions you have – get unworried, you need them to make this thing succeed.

Automation

You will need automation for this solution. You are going to repeat the same tests multiple times, and you are also going to want to extend the complexity of your efforts over time. This means that baseline functions should be passed back to automation for regression batteries. Remember, if you are going to do something more than 7-9 times, you probably should consider automating it. You also need to consider that your automation plan may not be a single plan. If you have multiple architectures, you might need multiple automation solutions. It would be better to have one solution but that might not be possible – but consider how one automation script might trigger a script in another solution.

You should also consider automation oracles, functions that verify the results of your automation and check for a multitude of errors. They should be based on your failure taxonomy as well as your success criteria. If you can get there you are dramatically increasing your success probabilities. You are also dramatically reducing the cost of ownership, cost of support, and improving reliability. 

Customer Usage Analytics

You should also do analysis on the customer usage such that you understand how their usage will impact your solutions. You will find that their medicine is not the medicine that you think it is. They have used it in completely different ways then you intended. Some of this usage will impact the solutions downstream and you will need to consider them in your solutions. Other usages may be bad workflows that will negatively impact your customer’s success. They need to fix the bad workflows, but you need to consider the valid ones and react accordingly.

This is one of those areas where an existing solution may need to change in order to participate in the larger solution. A solution that allows text allergy entry should probably disallow such entries, or the receiving solution should allow for that text to be consumed as a note, either way you need to look at that data because you will find bad workflows. A solution that has artificial restrictions on the number of allergy reactions should probably rework their schema. And, your encoding/dictionaries/enumerations will probably need to map and align to some degree (a lot more than some). Comparing the customer usages across input solutions will help you identify these issues.

As an example, we analyzed multiple acute and ambulatory EHRs for cross solution exchange of clinical data. When we looked at this data we found misuse of dictionary items, use of text fields to ‘extend’ domains (allergy), mal-alignment of common industry structures, and many other anomalies. Many of these issues were between multiple instances of the same platform

Creating the Data

You are going to have to create data to exercise this solution. De-identified data is the best core data to use as it will reflect a customer’s usage, but you will also need to craft data for deterministic validation. Again, your automation plan is critical here; if you don’t have one you will spend man years creating data. You need to mine the customer data for unusual cases and incorporate them into your crafted solution data. In addition to that, you will also need to craft cases that will exercise specific functionality. They are usually multi-variate test cases around a disease, set of diseases, or regulatory conditions and I would suggest using a intelligent naming convention so that your automation can load them by the hundreds, thousands, or millions.

Intelligent naming conventions are essentially a test case that is repeated multiple times. Anderson, MedAllCatAAA Blue and Anderson MedAllCatAAB Blue are the same case, but iterations of that case. That case being the insertion of a set of Medication Allergies from EHR Blue. While MedAllCatAAA Green would be from EHR Green. If you need to have them work together then their EHR reference would be BlueGreen instead of Blue or Green. If you craft these cases such that they are easily identifiable and recognizable you will save your external testing team a world of grief. Creating a dictionary for these test cases is an easy way to assign specific cases to specific testers across the solution.

Telemetry

You should have telemetry built into your solution so you are looking at usage of the solution – across the entirety of the solution. You MUST also develop telemetry that tells you the solution is alive, awake, and working. Multi-solution solutions are difficult to understand – they get very complex – so having telemetry that tells you where something has failed will allow you to route issues to the correct individual or team. This telemetry should be designed not only for test, but for production as you will always want to know, or have support check to see if the solution is operational.

Team configuration

The composition of your teams are important. You must have people who know their areas and the next area down the pipe. Better yet, they should know the entire SoS solution – perhaps from only one aspect, but as much of it as possible. In the end you must have a complete skillset on your team. A skillset that encompasses all the major components of the entire solution – and they must be working together. Silos will not work here.

For example, you may have a clinician who can craft the medication order tests, and validate their exchange. You could (should) have several individuals with database skills who can at the very least walk the solution, database to database to see the transitioning of the data. You will need individuals who can see the configuration of your solutions, preferably across all the solutions. You will need individuals who can understand and use the interfaces, including how those interfaces may be configurable. You will need people who can analyze the data that is sourced, exchanged, are received for both verification and validation. Just a small set of examples of the team that you will want and need.

Some Cargo Cult and Mad Cow testing

Lastly, if a group tests in isolation they have not tested the solution. If a group doesn’t have a QA organization – it hasn’t tested its solution. If a group mocks or stubs the next solution out and never actually hooks up to that solution – it hasn’t tested the solution. Do not expect that these situations will achieve success – instead, they will fail. If you want a SoS solution to be successful, you must employ SoS principles. It is complicated, it is difficult, but the only way that you can succeed is to engage this level of engineering.

Conclusion

System of system testing requires the solution be present and connected. It requires extra work that requires extra teams who are dedicated to that effort. It requires data that reflects the source systems data and then is run through the solution to its end purpose. It requires looking at your source system data to identify errors, omissions, and oddities. And remember that quality is fitness to purpose and use — that sometimes means the source system must conform to a new standard of quality.

Testing A Drug Utilization Review (DUR) System 

We recently had a relative who acquired Covid-19, she is elderly and on multiple medications. Her treating physician was reluctant to prescribe Paxlovid because of the concomitant medications. The physician didn’t know if there were going to be Drug Drug Interactions (DDI) between Paxlovid and her medications. Well of course you would just put it into the EHR and the EHR will check for DDI, as well as other contraindications. Except, this is Japan, and ambulatory settings don’t really have an EMR/EHR, nor do they have a DUR that operates at the point of prescription.

Okay, we went to a different doctor, who collaborated with his pharmacist partner and they determined there was little or no risk. She is now on anti-viral medications and doing better.

This got me to thinking about DURs in general, and the testing of DURs. I once took on that challenge and will make the dubious claim that I failed… At least I failed in my mind. 

So what is DUR? DUR is the function that checks medications at the time of ordering for the following indications: Drug Drug Interactions (DDI), Drug Pregnancy/Lactation contraindications, Duplicate Therapy (DT), Prior Adverse Reactions (PAR), Drug Condition Contraindication (DCC).

The DDI was the concern that the Japanese clinician had — and because she didn’t have an easy mechanism to check for it (an automated DUR system) she just said, “No, I am concerned about Drug Drug Interaction.” Okay, this problem would be solved by an EMR/EHR system — because at the time of ordering the order would come back and tell you about any DDI — or would it?

The Japanese health care system, in my experience, is great — modern, efficient, ethical, efficacious, and reasonably priced. They do have some gaps in my opinion, especially around ambulatory EMR/EHRs but okay, most of the rest of the world has the same gaps.

But, what about the US based EMR/EHR system? What makes me believe that our DUR is efficacious, and how do I know if it is functioning correctly? We have had EMR systems that have faked the certification in the past, this is full stop fraud. We have systems that aren’t deep in their DUR, sometimes not recognizing subtle conflicts. We have systems that over alert, to the point that the alerts are ignored completely. So how do we test for these conditions in our DUR, not only initially, but in regression as part of the Operational Qualification (OQ) of a new installation or release?

The Model

So the model has to be patient based, manufactured patients at that. Patients that have multiple iterations and exist explicitly to test DUR. The Naming convention should be such that they are never confused for real patients and they should be isolated within both Production and non-Production systems such that you need to use a specific test clinician, in a specific clinic, using one of these specific patients to run the test

Why would you have test patients in PROD? Because, that is the system that this function is going to be used, regularly, and it is mission critical functionality. Testing needs to start in test systems, including UAT, but there will be times when someone needs to check the PROD system to see if things are functioning. However, you also don’t want these patients to become part of the normal everyday workflow or metrics.

So the naming convention of the patients should be such that it is obvious what you are doing. “DUR-PAR-xxx, MedAllergy”, “DUR-PAR-xxx, MedAllClass”, etc. The ‘xxx’ is your cardinality for these patients, ‘aaa’ is the first one, ‘aab’ the second, etc. These patients are automated, so that once the patient test has been ‘used’ and is dirty, the next patient can be pushed into the test battery.

What do you need in these patients from a domain perspective, what needs to be defined? 

Demographics — name, sex, age, some initial vitals. Conditions, active and resolved conditions that are associated with your DUR test battery. Pregnancy, hemophilia, kidney disease, etc. Existing medications, in all statuses. Existing Prior Adverse Reactions, again in all statuses and classes. So, you can see that while this is an easy list, it can quickly get complicated when it comes to which of these domains you will use for your test.

The Test Data

This is where it gets difficult and you will need clinical input, informatics input, and testing input. You will need to understand how the DUR system works in your solution. Which informatics platforms are referenced, what rules are utilized, etc. If the DUR system doesn’t consider medications that have been discontinued, cancelled, entered in error — then those are test cases you will need to validate — but once validated you can generally ignore them. I would suggest a workflow diagram with conditions to be able to walk through and plan your testing in this regard — it will also come very handy when you want to make changes to your DUR system.

You will need a spreadsheet or database for these cases, with data definition fields as well as description, expectation, expected results, test cases, etc. A test management system is probably best for this but that seems to have gone out of favor, and most of the commercial test management systems are insufficient to task. I welcome any challenge to that statement, but if you are the vendor you will need to provide access to the solution for me to validate your assertion.

In crafting your tests you will want to start easy with a smoke test. Followed by something more complicated, subtle, and not easily predicted. Followed by the intense, dozens of conditions, adverse reactions, active medications, etc. If you get the first level done you have essentially only created a smoke test for your IQ/OQ/PQ. 

Some Thoughts on the Tests

Automation is not trivial for many systems — by not being functional it makes this effort all the more difficult. If you are selecting a system, consider the automatability of the solution — if they “say” they can do it, make them prove it — they may have been able to automate the deployment but we are talking about the ability to add patients, with specific clinical data. That isn’t something I have seen in the industry, at least not done successfully.

Start easy, work up to complicated, if you find a critical or high defect in your test, push that test to regression to inoculate the defect from future releases. Define minimal testing requirements, and never ignore the greatness that is exploratory testing.

If you find something, I strongly suggest you do two things simultaneously. First, dig deep on the thing you found. What specifically have you found, is it reproducible, is it an artifact of the informatics or the test itself. Figure out what this ‘find’ is. Secondly, conduct a targeted exploratory effort. Take the attributes of the test, and start attribute testing to see if you can find a pattern or other ‘finds’. I am assuming that you have a team, this should be a team effort,  even if it is just a couple members of the team doing the work the rest should know about it. They may well have seen it in other testing and not connected the dots to a test failure.

Some thoughts on the profitability of these tests

If your vendor can’t show you the testing they do then this testing is profitable.

If your vendor cannot show you the automation, and configure it for your system then this testing is profitable.

If your configuration team has made changes or configured the medication, allergy, problems/conditions systems then this testing is essential and extremely profitable.

If you are bringing in data from other sources then this testing is absolutely essential and will drive a lot of changes in trust of those exchange systems.

The test and defect taxonomy

I will be taking another whack at building out both the defect and test taxonomy for this effort. Probably working the level of test first and then the heavier testing. The test nomenclature is pretty straightforward but still needs to be defined. The defect taxonomy is also pretty straightforward but needs to be articulated and pre-severitized.

Conclusions

The DUR system is essential to patient safety and is required of all certified EHRs. It is complex and can be easily thrown out of kilter by seemingly minor changes to a system. There is not a lot of automation out there to run regression tests and so this is all expensive manual testing that will get dropped on the test floor as one of the first efforts to be cut. There are plenty of industry stories about solutions failing for multiple reasons – from nefarious to accidental to tales of interoperability failure.

Collaborate with your vendor, but do your own testing and make sure you have a solid test battery for your (IQ)/OQ/PQ so you can always have reasonable assurance that your solution is ready for this patient safety tool.

As always, watch out for the brown M&Ms.

Brody
July 17, 2022

The Swiss Cheese model is a patient safety standard, has been around for decades, but is also used in multiple industries. In healthcare it is used to mitigate patient safety events. The theory is that you can look at a the pathway to a bad outcome as having multiple layers, with each layer having a ‘hole’ in it, the more layers of cheese the more likely the holes will be covered. Just as there are rarely a single causality to an adverse event there are multiple opportunities to prevent the event.

 

You can think of the Root Cause Analysis tool: the fishbone or Ishikawa diagram. Many attributes that lead to the undesired outcome. Stop any of those attributes from being actualized and the adverse event becomes a near miss, or doesn’t happen at all. The Swiss cheese model is the model frequently sited to prevent those events.

The standard Ishikawa with 5 core causality areas

In order for there to be a layer in the process you must understand that process. The humans, machines, supplies, processes, configurations, etc. And indeed both the desired and undesired outcome. If you do truly understand those elements of the workflow then you also understand the holes in those process workflows.

For patient safety, my Ishikawa has two core facets, or sides of the fish, the system/solution and the user/using entity. My role in patient safety was as a vendor, so this is what makes sense to me. Is the system safe as built, and is the system safe as used. My walkthrough on the system side progresses from feature, requirements, architecture, design, build, configuration, deployment, and interconnection. My walkthrough of the use involves, the training, people, use, purpose, administration, and interconnection. Yes, I put interconnection on both side of the fish, as there is what the vendor knows about, and what the using entity knows about.

This diagram is shortened for clarity

Each of those elements, ‘bones’ has multiple sub-elements and if you ask questions of those elements you will eliminate or confirm that element having participated in the adverse event. From that you can understand your holes in the cheese and potentially identify how to create the next layer of cheese that compensates for that hole.

When is Swiss cheese, just air?
So what happens when the holes in the Swiss cheese are so large that they effectively become more hole than cheese? Or, what happens when the layer that is supposed to be there isn’t there at all? Or, what happens when the layer that should prevent the event is bypassed entirely?

 

That is the purpose of this paper, when we do Root Cause Analysis we are looking at multiple aspects and the paper is not to say this model is incorrect or shouldn’t be used, but rather to advise that care should be taken to understand what the state of each protective layer in the process truly is.

 

The first question, what if the hole in the cheese is greater than the footprint of the cheese itself? So when we look at these individual layers and we look at the holes in these layers, are we considering the magnitude of these holes? If we are looking at the personnel involved in an action, are we looking at the training, culture, and institutional standards that are in place? Is there an expectation of excellence, with the idea that errors will happen but they will be used as a training aid, or is the administration on a witch hunt to demonize or hide errors? Is the staffing so low that the load makes it almost impossible to perform the expected level of care? From the vendor side, is this feature robust and well understood, or is it novel and part of someone’s sales pitch? Is the installation fully qualified so that you know the solution is present and was it correctly configured to the intent of purpose? Is the solution kept up to date or is it allowed to slowly degrade over time? Is it even on, or how do you know that the safety feature has not died a silent death?

All questions that should be asked of the different aspects of the solution that can contribute to the event itself. They can show the nature of the gaps that are present in the system — and some of these gaps should be risk alerts that when they occur a mitigation plan is enacted.

Second question, what if the layer of protection doesn’t exist? So your DUR protection fails silently and as a result the system is throwing far fewer alerts, or none at all? Many users certainly wouldn’t mind a few less alerts, might even consider that a feature, but the alerts are there to keep us from doing something without due thought. Or, you were told you had a DUR but turns out that the vendor had hard coded the results to pass certification. Or your solution loses referential integrity without requiring turning the entire screen a flashing red and requiring a reload. Referential integrity failures are when patient A is in context but data from patient B is displayed to the clinician. If a system isn’t set up to detect these conditions then how does someone know that it has happened. And, it does happen.

Third question, what happens when the layer is intentionally bypassed? Bypassed either through disregard or intentional configuration. There are any number of reasons for a segment of the safety system to be bypassed: performance, usability, upgrade derailments, lack of training or lack of understanding of the relationship with the system and safety. When the layer is bypassed, we should consider this being the same as it not being there at all.

 

Real world examples
There are several that come to mind. We can talk about vendors who deliberately misled their testing authority by hard coding for the tests. We can talk about institutions that knew there was a problem with the dispensing system and advised nursing staff to bypass the warnings, along with a very large number of other factors, that led to a patient’s death, and subsequent conviction of the nurse involved — that should be a case study in multiple causality failures in itself. But, I think I would rather look at the VA’s rollout of the Cerner replacement for Vista. All of the facts are not in, and we probably won’t know everything involved in this latest report, but the OIG Preliminary Report indicates that their ordering system lost approximately 11,000 orders. The loss was silent and the facility managers were apparently unaware. These orders included imagery, followups, referrals, etc.

 

What to do
The Model is a good model, and it generally works, but we do need to be aware that sometimes the thing we are relying on to be part of the layered safety net may not be there or may be degraded to the point that they might as well not be there at all. The basic solution is to validate your assumptions on a regular basis. Heartbeat monitors that validate that the sub-component is still alive and functioning. Fake patients that exercise specific parts of the system like DUR, Population Health or prescription exchange. Many systems have the concept of a test patient built into their system, some even exclude them from reports.

Education is also important, the more people understand the system the better they will be able to recognize when something isn’t behaving the way it should, or shouldn’t. Clinicians didn’t go through all that it takes to get their license to be programmers, but basic system education will go a long way to protecting the system from unrecognized errors.

Monitoring is also important. My go to classic documentation error is a medication allergy posted to the Problems and History list, but not to the Allergy/Intolerance list. A problem or history addition of, “History of Allergy to Penicillin” is great in that section but also must be followed up with an addition to the Allergy/Intolerance domain. Why? Because an ICD-10 code does not process in DUR/PAR during the medication prescription process and alerts will not be displayed.

 

The layers are there, we just need to be wary of assuming that they are functioning as we expect, we need to be wary of assuming, and we need to trust but verify. Document the layers of your “cheese” and find ways to make sure they are present and functioning. The documentation should also include the holes in those layers. The cost of failure in these cases is too high to ignore.

 

Thanks and I hope this helped advance your understanding of SoS and Patient Safety

© 2022 Adapttest Consulting