Augmented Intelligence (AI) is one of the most popular buzzwords in health care. This discussion panel will evaluate best practices for the inclusion of AI in health care, calling on expert stakeholders across the health care industry to share their thoughts.
Are there data sets about the 5 year survival rates following severe TBI?
There are now robust questions related to regulation of software as a medical device. The FDA is considering an alternative optional pathway for software developers that streamlines oversight if they adhere to certain excellence principles. While this pathway could be used for certain AI methods and systems that do not continuously learn, it raises important questions about differential risk for end-users (patient and physicians/clinicians). What are the risks that current end-users, physicians, face when relying on AI systems and methods? How should liability change as the role of the "machine" and the "human" change? This is particularly important in the context of continuous learning systems that influence, drive, or make clinical decisions, right?
This JAMA article was published today and touches on some of the risks of AI in practice. "The potential applications of AI in health care present a range of computational difficulty ... Clinicians, AI researchers, and developers of AI applications and devices should work together to accelerate progress and to limit adverse consequences of applying AI in health care." jamanetwork.com/journals/jama/...
One scenario that I have heard is treating initial algorithmic output like a lab value. It would be up to us, the clinician, to determine if a result is reasonable. As there are variables that affect accuracy of lab values, we will have to approach AI/ML outputs with prudence. Residency teaches us to look at the whole picture and take care of the patient. AI/ML can be a tool in the care pathway.
Hopefully algorithms will outperform me in certain areas and as I gain confidence in these tools I will rely on them more. This may be another area to involve the patient with shared decision making and risk.
AI will continue to have problems in solving challenges where the inputs and outputs are not well defined. Imagine asking an AI to figure out the recipe of carrot cake and the only input is the number of eggs. One million test recipes later it will be nowhere near the solution. Current AI solutions will continue to be very limited in scope especially in the medical field. I think that the Medical community should ask those providing an AI solution to demonstrate very clearly where it "works" and where it does not. What are the necessary inputs an outputs. What is the robustness of the solution to noise in I/O and data. I would like to know how an AI solution works but also examples of how it fails.
However, Deep Mind and others, are working on the next generation of AI which will be able to transfer knowledge from one area to another and, potentially, fill-in gaps in inputs/outputs. At that point AIs will have "context-based" knowledge and will be able to apply where it is needed.
Jobs are safe until then.....
This is what Dr. Stead advocates in his editorial in this week's JAMA. We will have to treat output from AI algorithms like lab tests, or really any other source of data in evaluating and making treatment decisions for patients. We don't think much about having faith that the tube of blood we draw from a diabetic patient will give us a glucose value that informs our treatment of the patient, even though we are not measuring the glucose value ourselves. But the treatment of diabetes in the current state of the art is well worked-out. By the same token, we need prospective evaluation of AI tools so we can determine their optimal role in practice.
Thanks for sharing this article. The last paragraph particularly resonated with me, "Artificial intelligence and deep learning are entering the mainstream of clinical medicine. This technology can augment human intelligence to improve decision making and operational processes. Physicians need to actively engage to adapt their practice and to shape the technology." At Stanford Primary Care, we are in the early stages of developing an AI interest group to engage in research opportunities through industry partnerships and perhaps we will add a journal club in the future to stay abreast and promote collaboration. We don't want to repeat what happened with the EHR wherein the EHR happened to us. We would like to stay ahead and engaged when AI is integrated into primary care practice.
AI will be particularly useful in oncology where the changing genomics of the cancer determine the responsiveness to therapy, but only if the algorithm continually updates. No human can be up to date on all the changes occurring in the science so a point of care, easily usable decision support system will be a very valuable tool. However, we must be able to rely on the data science that gives us the analysis and recommendations, and
rely on honest brokers without conflicts of interest in the recommendations for therapy as much as we rely on the accuracy of the serum glucose. I am not comfortable with the idea that if the FDA finds a company to be reliable that we should assume every product they make is reliable. Another question to be answered will be how do we pay people to work on these algorithms? Do we have to pay for the use at the point of care? Do we have an increased annual fee in the EMR license? Do AI systems compete for our dollars? Or should it somehow be open source? In addition, when we elect not to follow the recommendations of the AI system, do we open ourselves to liability?
AI-powered solutions have been viewed as a potential driver of cost savings in the health care system by reducing physician time (e.g. in pathology, radiology, and dermatology). Given that most health care systems are still in the beginning stages of taking on risk based contracts, the launch of AI programs must fit within fee-for-service structures. How does such a health care system succeed in introducing AI solutions in the current reimbursement environment?
Great question and as a follow up I would like to know how the value provided to patients be translated to $ savings for the healthcare systems.
This is an excellent question that requires current innovators and first adopters to weigh-in along with those who are experienced in the current pathways to payment. It seems that, like telehealth and other digital medicine modalities, we need to start with a few key questions as all AI methods/systems and applications are not the same. This is not a definitive set of questions but it seems that you need to answer the following questions to analyze payment pathways: (1) What AI methods and systems are being deployed? For example, do the methods and systems include deep learning, machine vision, natural language processing, etc. This is where an agreed upon taxonomy is important (and should be consistent with the taxonomy used by regulators). (2) What is the health care application? For example, are you deploying it as part of business operations, quality reporting, population health, risk stratification, clinical decision support, or as a therapeutic? There are an array of applications and having categories of applications that are broadly accepted and that we are able to evolve will be important to understand the appropriate payment pathway. (3) What is the site of service? This is a less obvious question, but it may impact cost calculation and capture as well as questions related to value and scalability. What do others think? (4) What payment model will be used? So, you have raised the question of payment in the context of fee-for-service. But for the sake of ensuring everyone is aware there are a number of payment models in addition to fee for service including capitated payment models and shared risk models and combinations, for example. Based on the foregoing, there are some AI methods and applications that do have a pathway for payment under fee-for-service--typically as a medical device or as part of direct practice expense or indirect practice expense. But, you raise a question that is particularly challenging
I recently posted an entry to my blog on AI in Medicine, providing some historical perspective, current accomplishments, and future challenges for using AI in clinical practice.
informaticsprofessor.blogspot....
Thank you for sharing, Dr. Hersh. Clinicians may find your perspective on integrating AI use into practice useful--especially your detail on history and success.
Dr. Hersh, I took a look on the blog and you offer a treasurer trove of information from the perspective of a physician. You outline of challenges/opportunities very important.
AMA policy strongly provides support for health care AI transparency. In the context of machine learning/deep learning systems that influence, drive, or render clinical conclusions, what type of transparency is needed for physicians and other end-users? Algorithmic? Teaching data source and quality? Validation data source and quality? How about validation results related to clinical and analytical validity?
In general, all of the above are relevant, depending on use case. Validation and knowledge of applicable population characteristics are very important, in my opinion.
AMCs have an important role in expanding the evidence base for AI in health care. Research for internal validation of models and/or testing the tool in a real world setting when implemented will be crucial. I am not sure how to tackle the lack of transparency associated with the "black box" of machine learning, however, related to the post on formal medical training in AI, AMCs have the obligation to engage data scientists as part of the health care team and produce the next generation of well informed "AI doctors".
From a user experience perspective, it's incredibly important for designers to understand the mental models and mindset of the user at hand. When we understand how users think, then we can anticipate the information they reach for so that we can support their line of inquiry. If we get it that part right, there wouldn't necessarily be a big black box.
Great question, Sylvia. The terms "interpretability", "explainability", and "transparency" are often used in the technical literature and are driving development of technical solutions. For example, there is a line of machine learning research to develop models that generate integer medical scores: link.springer.com/content/pdf/.... Instead of getting a bunch of medical experts in a room to agree on the number of integer points to assign to various factors, these algorithms optimize integer risk scores for predicting some outcome. But is that interpretable? Does the user know why one thing is scored 3x another? Can the user separate causation from correlation?
I think Gabriel makes a great point and we've used a similar approach in our projects. Start with the clinical problem and trying to provide the user information to address the problem. Observe user behavior and see how the presented information impacts decision making. Iterate on what information is presented, how information is presented, and what context is provided in addition to any single score.
Some thoughts on the different buckets:
- algorithmic: we need standard reporting metrics for model performance, including workflow-related metrics that account for prevalence (beyond AUC). If there are benchmark datasets for a task, model performance should be reported on the public dataset. Otherwise it's tough to make comparisons.
- teaching/validation data source and quality: provide as much transparency as is feasible without disclosing data
- validation results: generate hypotheses, explicitly state outcome measures, register studies on clinicaltrials.gov, and publish results in peer-reviewed literature.
One of my favorite articles laying out guidelines for reporting of models is prognosis 3: journals.plos.org/plosmedicine.... Whether it's AI/ML/or biostatistics, I think the standards should be similar.
Well this seems very good for the people to get more from the discussion ..
Agree that the focus on end-users (who may include physicians, other clinicians, regulators, medical commons researchers, and patients/consumers) is incredibly important. Transparency may very well differ for each and also changes through the product development cycle. Ultimately, we need to address the cognitive burden that all end-users experience as a result of data saturation.
Transparency implies that pertinent information about an AI system is visibly accessible and understandable, especially information about how the system operates. Explainability implies the capability to communicate the reasons for and logical development of the results. Hence, explainability is a particularly important part of transparency. Transparency and explainability are appropriately regarded as important in The National Artificial Intelligence Research and Development Strategic Plan from NIST and NITRD. Clinicians need transparency and explainability to help ensure justification of decisions. Explainabiliy is key for understanding the rationale behind results presented to the user. Other aspects of transparency are also important such as provenance for ensuring that the basis of the approach is sound, auditability so that the decision making process can be revisited later if needed, the level of confidence in the results so that any motivation to take action is reasonable and clear, and robustness of the conclusions so that the results can be trusted.
In principle, information generated using AI should be analogous to that of an expert consult with a human, where questions can be asked of the expert by the provider. The fact that AI was used does not lessen that need for supplementary information. For example, what is the overall reasoning behind the recommendation including the major influencing factors? When was the last time the knowledge base for this recommendation was reviewed, to ensure that it reflects current knowledge? What level of quality of knowledge sources and training data sets was it based on, and how applicable are they to the case of interest? How robust is the model to uncertainty in the input variables or variations in the training data? What is the level of confidence in this recommendation? Is it possible to probe what-if scenarios, and would they make a difference in the recommendation?
These points are very instructive and the work cited of NIST and NITRD important. Did you have thoughts on DARPA's work vis-à-vis ExplainableAI? (darpa.mil/program/explainable-...)
There are a range of methods and systems that could be considered AI. However, focusing on machine learning/continuous learning systems specifically: does the "quality" of the teaching dataset matter? If so, how? And, it seems that this answer could vary depending on the application of health care AI to research (for drugs or clinical diagnostics, for example), business operations, quality assurance or compliance, population health/risk stratification, clinical decision-support, or as a therapeutic, for example?
Curated and standardized data sets facilitate structured technology development, homogeneity across groups, and comparability of algorithm performance. The real world is unstructured, though, and therefore algorithms developed under artificial constraints may not generalize well into clinical practice. In our own work, complexity reduction algorithms permit complexity reduction in a known manner; this serves to permit partial generalization of developed technology.
Curated and standardized datasets are incredibly important for driving AI / machine learning innovation and methodology. For example, as of today, MIMIC II has 641 citations and MIMIC III has 354 citations. Those two updates to the original MIMIC dataset have generated nearly 1000 publications that are largely machine learning models for healthcare tasks. Sure, there are limits to reproducibility within MIMIC (mucmd.org/CameraReadySubmissio...), which is why more important than curated datasets are benchmark datasets so that different researchers can compare model results. Hopefully researchers start to convene around a recently released benchmark dataset derived from MIMIC: arxiv.org/abs/1710.08531.
That being said, I completely agree with Anthony above and once you get to the real world and you need to deploy a model, building the pipeline to curate the data to feed into the model is the most challenging part of the whole process.
Thanks for the great response, Mark. To simplify for generalist readers, here is an analogy: if you are working in the ICU (for example), you have to deal with a vast amount of highly variable data. There will be situations where you, using "natural" intelligence know you won't be able to make a diagnosis (i.e. classify). For example, if your patient has complex congenital heart disease, the ECG readings might be highly unusual. In that situation, you would recognize that circumstances are unusual and you would not use your own internal models to classify the patient - or, if you did, you would not assume accuracy in the same way you might if the situation was more conventional. Curated databases typically eliminate outlier conditions (whether uncommon diseases, confounders, or low data quality) so that the database is easier to work with and has better signal relative to the statistical noise. This is very helpful for developing predictive algorithms. BUT - when the same algorithms are applied to data that includes the outlier conditions you get unpredictable responses. Since our job is to treat individual patients properly, we fail when we inadvertently hurt people who are outliers. So, part of deploying algorithms trained on curated data sets comprises recognizing the curated conditions under which these algorithms were developed and then, when deploying them, automatically detecting outliers, and either not attempting to classify these outlier patients or informing the physician of the limitations of the technology, in exactly the same way you would if you were performing the classification task yourself.
It depends on the purpose for what the data set is being used. For teaching, curated data sets are fine. But for making real-world analyses, real-world data is critical. Of course, we need to improve our real-world data, and perhaps curated collections can point the way to building better real-world data sets going forward.
I think William's post above raises an important issue - which is that high quality data sets need to be built. There is crucially important foundational work to be done here. This work is a public good that will require very substantial governmental investment. This investment is presently lacking.
University of Washington and Brigham and Women's offer dedicated training for residents in AI - are there any others?
BWH ccds.io/
Thanks for your question. I'm not sure if they yet have a formal program but Duke FORGE has elements of this: forge.duke.edu/. I suspect health informatics fellowship are beginning to touch on it.
To me the broader question is how will the use of AI be integrated into medical education/medical training programs? Only a small subset of physicians will be involved with developing and validating AI algorithms. But if we hope that AI will realize its potential impact on health care quality and outcomes, it must be used by physicians and health professionals more broadly.
Totally agree. We maybe be able to use EHR/EMR as the un-model. My hope is in addition to the small group of physicians in the design group, a larger group will have a basic understanding to give meaningful feedback as we learn to implement AI/ML. I have not heard anyone say, "I love my EHR". With discussions like this, more of our colleagues are exposed to the possibilities so we can build a great, usable and adaptable model.
Shantanu, thanks for the shout out to Duke. I lead a medical student scholarship here that was recently described by the AMA: wire.ama-assn.org/education/ho.... It’s not explicitly focused on AI, but on bringing together interdisciplinary teams to address important healthcare challenges. As part of the program, we have a structured data science curriculum and we grant all students educational access to data camp (datacamp.com). Another part of the program is exposing students to career paths and professional role models who combine technology and innovation with medicine. Shantanu has graciously participated in our program and we’re always eager to engage new speakers/visitors!
We have a robust clinical informatics curriculum for medical students at OHSU. We do not yet have a whole lot on AI because our curriculum is focused on applying informatics to practice, and there is very little AI in current clinical practice. But that will likely change in the near future, and we will likely update our curriculum accordingly.
Mark and William you are right people in your roles! AI/ML won't be everything but it will be something. Finding a place in training starts with people like you looking at the landscape and being open to usable tools. Nice to meet you!
The promise of AI in health care in the U.S. can only be realized if developed with cultural sensitivity and language concordance. How are different organizations approaching the health equity aspect of AI in health care?
The American Medical Association's (AMA) House of Delegates, comprised of representatives from every state medical association and major national medical specialty society adopted policy that addresses this issue.
The policy provides that AMA will promote development of thoughtfully designed, high-quality, clinically validated health care AI that (1) is designed and evaluated in keeping with best practices in user-centered design, particularly for physicians and other members of the health care team; (2) is transparent; (3) conforms to leading standards for reproducibility; (4) identifies and takes steps to address bias and avoids introducing or exacerbating health care disparities including when testing or deploying new AI tools on vulnerable populations; and (5) safeguards patients’ and other individuals’ privacy interests and preserves the security and integrity of personal information.
The potential for unintended bias or tools that exacerbate existing disparities is an issue carefully considered in the report that contained the recommendations.
Other organizations and companies are developing high level policy pillars related to this topic as well such as the Partnership for AI. This is an area of broad challenge and opportunity.
Great question. Two stories come to mind:
- A few years back, we completed a Chronic Kidney Disease pilot where we built a data pipeline that combined Medicare claims data and our EHR data and ran Tangri's Kidney Failure Risk equation (kidneyfailurerisk.com/). We convened an interdisciplinary group of providers including a PCP, nephrologist, pharmacist, and care manager to "virtual round" on high risk patients and either support PCP management or facilitate specialty referral to a nephrologist. We discussed various methods for rank-ordering the list for the "virtual rounds" and ultimately decided to simply rank-order by risk score. However, for a condition like CKD with well described treatment and outcome disparities, you can imagine it would have been reasonable to weight race in our rank-ordering to give priority to African American patients. An alternative approach, that would have avoided the challenge of having to explicitly decide on a rank-order logic that incorporated more than risk score would have been to separately model CKD progression and risk of starting dialysis without seeing a nephrologist.
- We're in the midst of deploying a sepsis AI system and it was important to test whether the model performed similarly across demographic variables that we have well-structured, including age, race, and payer. This up front indicates whether there are subgroups that are prioritized for review differently. Equally important, it will be important for us to monitor the effect of the intervention on patient outcomes across those same demographics. We'll have the data on all treatment bundles and will report back on that.
Thanks for this important question. AI has great potential to improve health disparities, but we risk exacerbating them if we don't deliberatively plan for this.
Here are some ways my organization approaches this:
- Making the data inclusive: AI algorithms are all about the training data used to train them. We must ensure that training data comes from vulnerable populations and includes REL and SDOH data, so that the algorithms are reflective of the community it is intended to serve.
- Prioritizing use cases: The health and health care needs of vulnerable populations are often distinct. AI must be developed for the use cases that have the largest potential impact on the health of vulnerable populations.
- Engaging key stakeholders: We established the Alliance for the Underserved including the National Association of Community Health Centers, American Medical Association, and Association of American Medical Colleges in order to inform our AI development and implementation efforts.
The Artificial Intelligence in Medicine: Inclusion & Equity Symposiums seek to explore how AI and tech can help address the deeper problems of access and inequity in healthcare. At the symposium they explore the need and importance of improving the evidence base through data diversity and neutralizing the algorithmic bias and related problems. med.stanford.edu/presence/init...
Connect
I suppose that as medicine has improved, so too has the ability to keep people going. Are there data sets about the 5+ year stroke survival rates? A brain injury is an event. There are varying degrees of stroke. There are varying recoveries. Improving quality of life should probably be a big concern.
Pending
Hi Marty, Great question. I am not aware of specific database but this is exactly the type of question we can answer once clinical information moves from exhaust to usable data. There are systems like UPMC and Intermountain making great strides in this area. I look forward to hearing others thoughts on this.
Connect
As a 36+ year post severe TBI accident, I think I've done well, and now things are even better now that I've been placed under a supportive supervisor. I know that there are TBI model systems out there under the Craig Hospital coordination, but I'm not sure what the 3 year 5 year 10 year post TBI employment situation looks like. Maybe being more inclusive and looking at brain injury from varied causes might be worthwhile.
Pending
Perhaps the best data to cultivate this information would be the VA. We have comprehensive TBI screening during initial VBA assessment and mortality recording is some of the best for datasets.