Health Technology Special Feature: Not Always Created Equal
Data and AI scientists must strive to eliminate bias from anything that touches patient care.
By Abidur Rahman, VP, innovation at Intouch Group
The American Heart Association Get with the Guidelines–Heart Failure Risk Score predicts the risk of death in patients admitted to the hospital. It assigns three additional points to any patient identified as “nonblack,” thereby categorizing all black patients as being at lower risk. The AHA does not provide a rationale for this adjustment.
The Kidney Donor Risk Index, implemented by the national Kidney Allocation System in 2014, uses donor characteristics, including race, to predict the risk that a kidney graft will fail. The race adjustment is based on an empirical finding that black donors’ kidneys perform worse than nonblack donors’ kidneys, regardless of the recipient’s race. The developers of the KDRI do not provide possible explanations for this difference. If the potential donor is identified as black, the KDRI returns a higher risk of graft failure, marking the candidate as a less suitable donor.
The Vaginal Birth after Cesarean (VBAC) algorithm predicts the risk posed by a trial of labor for someone who has previously undergone cesarean section. It predicts a lower likelihood of success for anyone identified as African American or Hispanic. The study used to produce the algorithm found that other variables, such as marital status and insurance type, also correlated with VBAC success. Those variables, however, were not incorporated into the algorithm.
The STONE score predicts the likelihood of kidney stones in patients who present to the emergency department with flank pain. The “origin/race” factor adds three points (of a possible 13) for a patient identified as “nonblack.” The developers of the algorithm did not suggest why black patients would be less likely to have a kidney stone. An effort to externally validate the STONE score determined that the origin/race variable was not actually predictive of the risk of kidney stones.
Each of these tales appeared in “Hidden in plain sight – Reconsidering the use of race correction in clinical algorithms,” an article published in the August 27, 2020 issue of the New England Journal of Medicine.
Large-scale data analysis and artificial intelligence have transformed every square inch of tth care, very much for the better. But examples like those mentioned are the “through a glass, darkly” of our data analytics and AI revolution. Just as seeing a celebrity put her imprimatur on a brand sometimes grants that brand more respect than it deserves, it’s easy to get carried away by the hype surrounding an extraordinary new technology. We need to be wary, though, to avoid being blindsided by the challenges that must be addressed in order to derive the full benefit of that new technology – in this case, the challenges of bias. Humans, of course, have biases, and sometimes data does too.
For example, it’s much easier to collect data from large, well-funded health systems with robust record keeping that mostly serve well-off patients with easy access to care than it is to collect data from small, under-resourced hospitals that mostly serve poorer, underserved populations. Fed with data that’s mostly sourced from the well-funded and the well-off, an AI algorithm is bound to draw conclusions that simply won’t apply to the underserved, harder-to-reach part of the population. By doing the very job it is designed to do – uncovering relevant trends in large data sets – even a well-designed AI will only magnify the biases present in its source data. Based in part on a lack of robust data from African-Americans, some AI systems have concluded that African-American populations are often healthier than Caucasian populations – a conclusion that is demonstrably false but has still had a significant impact on real care on the ground. The price of such biased data and conclusions can be substandard care for the very populations who can least afford it.
Race and ethnicity and social class aren’t the only sources of potential bias, either. Depending on the circumstances, the people collecting data or building an AI algorithm may have a bias towards a specific action, or against one. They might be consciously or subconsciously favoring a diagnosis of X because X means more dollars for their research project or brand or health system. They might be overlooking certain data because it has the opposite effect, it doesn’t support the goal that a particular company might be promoting. And once that sort of bias leaks into the analysis, the entire exercise becomes self-fulfilling. The price to be paid for that sort of bias can be millions of dollars in research funding wasted, or – worse – patients with Y being treated for X, or not being treated at all.
The COVID pandemic has offered an education in this. AI and large-scale data analysis tools have been a godsend for public health authorities trying to track incidence and forecast for resource allocation, not to mention the scientists developing treatments, especially in the United States. But it’s been very easy to see that the data coming out of different countries, or even different parts of the same country, have reflected varying levels of bias. Politicians want to encourage a sense of normalcy or promote tourism; and so researchers or analysts, dependent on those politicians for funding, may consciously or unconsciously alter the parameters of what a COVID-related death is. People, a great many people in a great many places, have suffered and died due to that kind of bias.
All this is why AI equity must be a matter of priority for anyone and everyone in our industry who touches AI or data analysis.
AI equity refers to how effective AI is in a broad range of scenarios with a variety of population groups and demographics. Achieving it is a challenge that all AI engineers and data scientists in the health care space must face, and the sooner the better.
It’s easier said than done, of course. AI systems have to be trained by humans who are sometimes biased. And then even if the training is as unbiased as it can possibly be, AI systems have to be fed data, which is also sometimes biased.
So what can we do about it?
Every one of those standards and scores above were created by teams of experts in their respective fields. But all those experts may not have been expert in data science, and they almost certainly did not reflect the ethnic, demographic, or sociological complexity of the patient pools for whom the standards were being set. And in the company setting, it’s quite common for AI or data analytics teams to include experts who know all about the technology, but not so much about the demographics and needs of the patient population in question. So we need to be sure that the standard-setting body or team is diverse enough in itself to represent the full population over which the standard will carry, and that the body has the data and AI expertise within itself to know what data and AI bias looks like.
Whenever we are building AI models that will touch patient care, we need to be sure that every bit of data we put into them – whether during the training of the model or its use – must be of good quality and truly representative of the population in question. Plenty of natural roadblocks exist in the way of achieving this, the well-funded versus under-resourced health system issue being only one of the more prominent. But as a matter of basic fairness and justice, we must overcome those roadblocks.
The danger of bias should be on the minds of AI, data, and analytics teams at all times, from the creation and training of any AI tools through testing and validation and the data collection process and review of any conclusions that might be reached. That’s what good scientists do, after all; they look at everything with a healthy dose of skepticism. Always be asking yourself, “Is the data really complete? Are there missing pieces in the data? Will this really represent and serve the full patient population?” The search for bias should begin with the beginning of every project, and end with … well, never.
Yes, transparency in how AI is being trained, how and why data is being collected, how conclusions might be implemented, every step of the process. Internally this means everyone on the team has the opportunity to call out any bias they might see along the way. Externally it means building trust with any patient community that might be touched. Some groups may be less likely to share personal health information with data collectors than others, a variance that can easily introduce bias into a data set. So we must be very clear about what we are collecting, why we are collecting it, and what the privacy policies are. If we are to ask our patient population to trust us with their most personal information, we need to trust them back.
None of this is easy, and all of it requires some rowing against the current. Someone is always going to push to finish the project more quickly and cheaply. The aggressive seeking of bias and AI equity will almost inevitably make projects more time- and resource-intensive than companies expect. But in the end it is good business as well as good ethics to do so. Better trained AI tools and better data means more patients getting the right treatment and fewer patients getting the wrong one. It means a patient population that has observed in their own lives how serious you are about seeking out what’s unique about them. It means fulfilling the immense potential of AI and data science to improve human health.