What Would Hippocrates Do? Medical Ethics as a Roadmap for AI Governance

Apr 29
6 min read

Podcast Episode

Our guest for this episode (and co-author of this article), Alice Liu, responsible AI and digital health consultant, joined us to answer a question that sounds simple and isn't: does AI governance have to start from scratch? Her answer draws on 2,500 years of medical ethics infrastructure that is already built, already tested, and offers potential approaches for the people designing AI systems today.

Clinical AI is already embedded in everyday care: flagging patients at risk, reading mammograms, drafting clinician notes, chatting directly with patients. All of this with almost no standardized oversight, no accountability frameworks, and no agreed methodology for measuring harm.

While there is much hope for AI to improve health outcomes and service delivery, significant issues persist. During predevelopment of an app supporting refugees in care settings, Mission AI conducted an extensive literature review revealing that machine translation error rates in clinical settings range from 40–60%, with the highest concentrations in emergency departments, pediatrics, and mental health (Genovese et al., 2023). A Stanford-Harvard preprint found that AI models produce severely harmful clinical recommendations in more than one in five cases — with the worst-performing models exceeding 40 serious errors per 100 cases (Wu et al., 2025). These are not abstract risks: in Germany, a pediatric triage AI mistranslated acute asthma symptoms, delaying escalation to care (Translata.eu, 2024); in France, physicians have discovered post-intake that patients' allergy and medical history information had been distorted by AI tools deployed in the absence of interpreters.

Medicine took 2,500 years to build its ethical infrastructure. AI governance cannot afford to wait that long, nor does it need to, because the architecture already exists.

Historical Context

The story of medical ethics begins 2,500 years ago with Hippocrates, who created the first professional accountability mechanism in medicine in response to an unregulated marketplace of healers and charlatans, a problem that should sound familiar to anyone watching the AI landscape today. Ancient Indian, Chinese, and Islamic medical traditions developed parallel ethical frameworks, and as medicine professionalized in the 19th century (Thomas Percival's Medical Ethics in 1803, the American Medical Association's first code of ethics in 1847, the year the AMA was founded to establish a medical ethics code), the focus was still largely on distinguishing legitimate practitioners from quacks. The ethics were real, but self-policing. It took the atrocities of the 20th century to force something with actual teeth: the Nuremberg Code in 1947, the exposure of Tuskegee in 1972, and the Belmont Report in 1979. Each framework was built reactively, after harm had already accumulated, over roughly 45 years.

Three Structures Worth Borrowing

AI governance has the unusual advantage of being able to act prospectively. Three structures from health ethics translate directly.

Prior review. Institutional Review Boards require that research involving human subjects be scrutinized before it proceeds. The default in AI development inverts this logic entirely. In researching this episode, MissionAI also reviewed.a study that found that few institutions currently conduct Data Protection Impact Assessments before deploying AI tools in clinical settings, and most patients are unaware that machines rather than humans are processing their most sensitive disclosures (Brandenberger et al., 2025). Recent papers in JAMA and the New England Journal of Medicine have proposed applying the IRB paradigm to clinical AI systems, including the possibility that clinical AI systems may eventually require licensure, as physicians and nurses do.
Adverse event reporting. Aviation built one of the most effective safety cultures in any industry through mandatory, anonymous incident reporting with explicit protections against retaliation. The result is a system in which problems surface before they compound. AI currently operates on voluntary incident databases. That distinction matters structurally: voluntary systems optimize for reputation management; mandatory systems optimize for safety.
Structural independence. The IRB model, for all its limitations, established that oversight cannot be credibly located inside the institution being overseen. Structural independence is what gives oversight authority rather than the appearance of it.

What to Adapt

The IRB system has legitimate critics. It can be slow, institutionally conservative, and vulnerable to capture. Checklists are useful orientation tools but they become problematic when treated as sufficient. Completing a process is not equivalent to measuring outcomes.

Two adaptations are worth building into AI governance from the start.

Risk tiering, as embedded in the EU AI Act, reflects a principle the medical field has long operated on: not all interventions carry the same stakes and oversight should be calibrated accordingly. Language-based AI systems in migration, justice, and healthcare are already classified as high-risk under the Act (European Parliament, 2024), which carries specific conformity, registration, and human oversight requirements.
The informed consent paradigm also requires rethinking at scale. The literature has documented the weaknesses of consent frameworks in high-volume, low-literacy, multilingual contexts (Brandenberger et al., 2025). For AI systems used by billions of people, consent as currently conceived cannot serve as the primary protection. The more generative frame is human rights: who holds decision-making authority, who has authorship in governance documents, and who is present when design decisions are made.

Whose Ethics Count?

AI governance conversations are concentrated in Washington, Brussels, London, and Beijing. The Ubuntu philosophy from sub-Saharan Africa centers personhood as fundamentally relational rather than individual, which produces different starting assumptions about consent, community impact, and developer obligation. Indigenous frameworks from Asia and Latin America carry comparable design intelligence. When Western ethical concepts are treated as universal, governance frameworks inherit the hierarchies they were designed to address.

Digital health has spent the last decade building toward interoperability, local ownership, and sustainable funding models after learning, at significant cost, what happens when solutions are designed without these foundations. AI governance can draw directly from that institutional memory.

Meaningful representation in governance bodies means that affected community members serve as co-chairs, have decision-making authority, and authorship in the documents that shape standards. Not advisory roles that exist to satisfy a participation requirement.

What the Upside Looks Like

AI is already demonstrating diagnostic capability that outperforms clinicians in reading imaging and detecting certain cancers. A mother who had spent years navigating inconclusive medical encounters identified her son’s condition through sustained engagement with an AI system. The potential is documented and significant. Responsible adoption grows when accountability is built in from the start.

Aviation is the model worth aspiring to. Decades of mandatory reporting, a safety culture with genuine institutional incentives, and rigorous training standards have produced a mode of transport that is statistically among the safest ways to travel. That level of public trust in a complex technical system was built deliberately. The same is achievable in clinical AI.

Alice Liu's closing question is the right one: how do we build AI governance that is adaptive enough to keep pace with the technology, but carries enough structural force to matter? And how do we design it so that participation is understood as a competitive advantage rather than a compliance burden?

Medicine spent centuries arriving at the answer that accountability and innovation are not in tension. That lesson is already written.

Listen to the episode here: https://open.spotify.com/episode/3gC2WLImqKk6n6uludFhZp?si=ee7beeb9341d438e

References

Beauchamp, T. L., & Childress, J. F. (2019). Principles of biomedical ethics (8th ed.). Oxford University Press.

Brandenberger, J., Nazzal, A., & Kearns, L. (2025). Automated translation and procedural fairness: Risks and safeguards in asylum interviews. European Journal of Migration and Law, 27(1), 88–106.

Brodeur, M., [remaining authors]. (2025). State of clinical AI. Stanford Medicine / Harvard Medical School. https://med.stanford.edu/medicine/news/current-news/standard-news/clinical-ai-has-boomed.html

Centers for Disease Control and Prevention. (2024). The U.S. Public Health Service untreated syphilis study at Tuskegee. U.S. Department of Health and Human Services. https://www.cdc.gov/tuskegee/about/index.html

Commission Nationale de l'Informatique et des Libertés. (2023). AI and data protection: Guide for developers and institutions. https://www.cnil.fr/en/ai

European Parliament. (2024). Regulation on harmonised rules on artificial intelligence (AI Act). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1381

Genovese, L., Bădescu, M., & Kalbitzer, H. (2023). Accuracy and risk in AI medical translation: A systematic review. BMC Health Services Research, 23, Article 45.

Manrai, R., & Beam, A. (Hosts). (2024, November 20). Partners in diagnosis: ChatGPT, a mother's intuition, and a doctor's expertise with Courtney Hofmann and Dr. Holly Gilmer [Audio podcast episode]. In NEJM AI Grand Rounds. NEJM Group. https://ai-podcast.nejm.org/e/partners-in-diagnosis-chatgpt-a-mother-s-intuition-and-a-doctor-s-expertise-with-courtney-hofman-and-dr-holly-gilmer/

Ravi, V. (2025, December). First Do NOHARM: The first medical AI benchmark created and named in the spirit of medicine's foundational principle [LinkedIn post]. https://www.linkedin.com/feed/update/urn:li:activity:7401828679854555136/ (Retrieved April 26, 2026)

Translata.eu. (2024). Managing risk in automated translation for NGOs and health workers. https://www.translata.eu/reports

Wu, D., [remaining authors]. (2025). First, do NOHARM: Towards clinically safe large language models (arXiv:2512.01241). arXiv. https://arxiv.org/abs/2512.01241

Historical Sources

Hippocrates. (ca. 400 BCE). The Hippocratic Oath [English translation]. U.S. National Library of Medicine. https://www.nlm.nih.gov/hmd/greek/greek_oath.html

National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. U.S. Department of Health, Education, and Welfare. https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/index.html

Percival, T. (1803). Medical ethics; or, a code of institutes and precepts, adapted to the professional conduct of physicians and surgeons. S. Russell. https://pmc.ncbi.nlm.nih.gov/articles/PMC2488117/?page=1

Rehman, W., Arfons, L. M., & Lazarus, H. M. (2011). The rise, fall and subsequent triumph of thalidomide: Lessons learned in drug development. Therapeutic Advances in Hematology, 2(5), 291–308. https://pmc.ncbi.nlm.nih.gov/articles/PMC3573415/

United States Holocaust Memorial Museum. (n.d.). The Nuremberg Code. Holocaust Encyclopedia. https://encyclopedia.ushmm.org/content/en/article/the-nuremberg-code

World Medical Association. (2013). WMA Declaration of Helsinki – Ethical principles for medical research involving human participants (original 1964; revised multiple times). https://www.wma.net/policies-post/wma-declaration-of-helsinki/

What Would Hippocrates Do? Medical Ethics as a Roadmap for AI Governance

Historical Context

Three Structures Worth Borrowing

What to Adapt

Whose Ethics Count?

What the Upside Looks Like

Recent Posts

Comments

Subscribe to Our AI Insights