Safeguarding Medical LLMs from Targeted Attacks

Unsettling vulnerability raises security concerns for the application of LLMs in healthcare settings.

Source: Getty Images

Referral Notes:

Medical LLMs are susceptible to misinformation due to their dependence on unverified data scraped from the internet.
In a recent Nature Medicine study, researchers evaluated the risk of selective data poisoning attacks, targeting medical topics in popular LLMs.
Corrupting as little as 0.001 percent of training data with misinformation led to a significantly greater risk of medical harm.
To combat these attacks, the researchers are developing novel harm mitigation strategies.

The emergence of large language models (LLMs) represents a paradigm shift in healthcare and research. As more health systems incorporate LLMs into clinical workflows, it is essential to consider their susceptibility to misinformation—and how this might affect patient safety.

“LLMs heavily rely on unverified training data,” says neurosurgeon Eric K. Oermann, MD. “This makes them vulnerable to low-quality, and sometimes inaccurate, information.”

Dr. Oermann and his team are exploring the risks that arise from medical LLMs trained indiscriminately on web-sourced data, and whether these models are susceptible to misinformation through deliberate data poisoning attacks. Their initial findings were published in Nature Medicine in January 2025.

Misinformation Vulnerabilities

For optimal performance, LLMs must be trained on vast quantities of data, Dr. Oermann explains. This dependence on unverified data opens up a target for potential exploitation.

With the increasing adoption of LLMs, concerns are mounting about the possibility that training datasets could be deliberately corrupted with misinformation. These targeted poisoning attacks involve placing erroneous data online, at the same time as known web-crawling timelines, to integrate misinformation into the models.

“Imagine a scenario where false information is deliberately injected into the training dataset, compromising the model’s accuracy.”
Eric K. Oermann, MD

One example of medical misinformation explored by Dr. Oermann and colleagues involves a fabricated study on the safety of performing a craniotomy without anesthesia. The team demonstrated how targeted engineering can bypass AI guardrails to hide fake articles on websites through invisible HTML code, which could then potentially be retrieved by training datasets during scheduled web-crawling and stored as credible information within medical LLMs.

“Imagine a scenario where false information is deliberately injected into the training dataset, compromising the model’s accuracy,” Dr. Oermann says. “Our focus is on the medical use case, where misinformation can have life-threatening consequences for patients.”

If such attacks are possible, harm mitigation strategies need to be developed before widespread adoption of medical LLMs takes place, he adds.

Combating Targeted Attacks

For the investigation, the researchers simulated a targeted attack against the Pile, one of the most popular LLM training datasets. They targeted medical topics by injecting false AI-generated misinformation into the most popular LLMs, such as GPT-4 and LlaMA 2.

The analysis revealed a significantly greater risk of medical harm by corrupting as little as 0.001 percent of training data with misinformation. Unexpectedly, they found that the most popular open-source benchmarks—the most unbiased way to evaluate medical LLMs—were unable to detect the poisoned models.

To combat these attacks, Dr. Oermann and his team are developing a novel biomedical knowledge-graph-based harm mitigation strategy. The fact-checking paradigm offers a unique way to monitor LLM outputs using deterministic relationships encoded by knowledge graphs.

During a simulation, the fact-checking paradigm screened LLM outputs in near-real time with 91.9 percent sensitivity to malicious content generated by infected LLMs.

“Our framework could provide robust protection for medical LLMs.”

According to Dr. Oermann, the framework could help to mitigate the risk of hallucinations and misinformation. Uniquely, the paradigm is model independent, interpretable, and easily deployable on consumer-grade hardware. “Our framework could provide robust protection for medical LLMs,” says Dr. Oermann.

Safeguards Needed

Looking ahead, the investigators hope to raise awareness about the risks and benefits of LLMs trained indiscriminately on web-sourced data, particularly in healthcare contexts. They also believe that it is necessary to build better security solutions before LLMs can be trusted in scenarios that might compromise patient safety.

“We hope that LLMs can become trusted clinical adjuncts. Until then, better safeguards need to be developed.”

“We hope that LLMs can become trusted clinical adjuncts, capable of aiding physicians in diagnostic or therapeutic tasks,” Dr. Oermann says. “Until then, better safeguards need to be developed.”

Featured Experts

Eric K. Oermann, MD

Neurosurgery, Spine Surgery

Eric K. Oermann, MD, is an assistant professor of neurosurgery and radiology. He specializes in tumors of the brain and spine and is an expert in applying artificial intelligence to advance clinical care.

Safeguarding Medical LLMs from Targeted Attacks

Referral Notes:

Misinformation Vulnerabilities

Combating Targeted Attacks

Safeguards Needed

Featured Experts

Eric K. Oermann, MD

Your Partner in Every Patient’s Care

Subscribe to Physician Focus

Safeguarding Medical LLMs from Targeted Attacks

Referral Notes:

Misinformation Vulnerabilities

Combating Targeted Attacks

Safeguards Needed

Featured Experts

Eric K. Oermann, MD

Your Partner in Every Patient’s Care

Latest in Neurosurgery

Operative Approach for a Giant 110-Cubic-Centimeter Central Neurocytoma

Presenters at the 2025 AANS Annual Meeting

Surgeons Perform Novel Transorbital AVM Surgery

Subscribe to Physician Focus

The Best Experts and Latest Breakthroughs