Artificial Intelligence in Healthcare

Overview

AI in Healthcare, a fully online, self-paced short course across four concise modules, the course is guiding learners through the evolving role of artificial intelligence in healthcare. It is demonstrating how AI can empower professionals to focus on what only humans can do: care with compassion, communicate clearly, collaborate effectively, and create innovative solutions.

1.Building Intelligent Care - From Turing to High-Tech High-Touch Care

This section explores AI’s evolution, definitions, and applications in healthcare, highlighting ethical challenges, trust, transparency, and the balance between technological capability and human compassion in intelligent care.

2.Delivering Intelligent Care - From Algorithms to Empathy at Scale

This section examines AI’s role across examination, diagnosis, treatment, and monitoring highlighting clinical applications, ethical challenges, and governance frameworks essential for safe, personalized, and compassionate healthcare delivery.

3.Optimizing the Patient Journey - From Registration to Recovery

This section explores how AI enhances clinical and administrative workflows from registration to recovery by improving scheduling, documentation, patient engagement, and resource management, while addressing challenges in trust, integration, and governance.

4.Governing Intelligence- Quality, Safety and Ethics

This section explores how AI enhances healthcare quality and safety by improving clinical and administrative workflows from registration to recovery. It highlights advances in scheduling, documentation, and patient engagement, while addressing challenges in trust, integration, and ethical governance to ensure fair and effective care

Defining Artificial Intelligence

Historical perspective

The Congressional Research Service offered one of the earliest broad definitions of AI as…

…computerized systems that work and react in ways commonly thought to require intelligence, such as solving complex problems in real-world situations.

Congressional Research Service, 2017

The Association for the Advancement of Artificial Intelligence (AAAI) similarly emphasized the study of…

…mechanisms underlying thought and intelligent behavior, and their embodiment in machines.

AAAI

AI spans multiple domains, including machine learning (ML), deep learning, neural networks, robotics, computer vision, and natural language processing. Advances in hardware, data availability, and computational power have accelerated progress across these areas.

Evolving definitions

Recognizing the diversity of AI interpretations across disciplines and countries, the Organization for Economic Co-operation and Development (OECD) has worked to establish a flexible, widely accepted definition. In 2019, it defined and AI system as…

…a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. AI systems are designed to operate with varying levels of autonomy.

OECD, 2019

By 2024 the definition for an AI system had been refined as a…

…machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

OECD, 2024

The OECD also distinguishes between Build and Use phases of the development of AI systems.

Build phase

AI systems infer from input data to generate outputs such as predictions or decisions.

Use phase

These outputs influence physical or virtual environments, with varying levels of autonomy and adaptiveness

OECD AI Principles

Self-awareness and theory of mind

As AI systems grow more capable, it’s important to distinguish between functional intelligence and cognitive traits associated with human consciousness. Concepts like self-awareness and theory of mind central to human cognition remain theoretical in AI, yet they frame important discussions about the limits and future potential of intelligent systems.

Self-awareness

AI currently lacks both self awareness and a theory of mind. Self-awareness is the ability to recognize oneself as an individual, and in humans, it typically precedes theory of mind.

Theory of mind

Theory of mind is the capacity to understand that others have thoughts, emotions, and intentions that influence their behavior. If AI were to develop this ability, it could form internal representations of other agents and the world, enabling more sophisticated decision-making and interaction (Hintze, 2016).

Current limitations

However, no AI system is currently recognized by experts as being self-aware. While there have been controversial claims, the idea of a truly self-aware AI—one with consciousness and an understanding of its own existence—remains theoretical (Finkel, 2023).

AI in healthcare

Black Box

In healthcare, AI must be both effective and trustworthy. While the cost and complexity of training AI models remain high, a major concern is the lack of explainability particularly in neural network-based systems. These “black box” models often fail to provide transparent reasoning, complicating risk assessment and regulatory approval, especially in novel or edge-case scenarios.

Embedded AI

Additionally, as AI becomes more embedded in healthcare, it’s essential to balance technological capability with human connection.

High-tech - high-touch care

High-tech, high-touch care reminds us that AI should enhance not replace compassion, communication, and collaboration between healthcare workers and patients.

Conclusion

As AI continues to reshape healthcare, the importance of clear definitions, ethical frameworks, and robust governance cannot be overstated. Ireland’s strategic alignment with EU regulations and the OECD’s evolving principles reflect a shared commitment to human centered, transparent, and responsible AI. For healthcare professionals and policymakers alike, understanding these developments

Evolution of Artificial Intelligence

Artificial Intelligence (AI) has undergone a series of transformative epochs since its inception, each marked by technological breakthroughs and shifts in conceptual understanding. From early symbolic systems to today’s large language models, the evolution of AI reflects both the progress of computing and the growing complexity of human-machine interaction. This overview traces the major milestones in AI development, with a particular focus on its implications for healthcare and the ethical challenges that arise in this sensitive domain.

1950s-1990s: Early foundations

The evolution of Artificial Intelligence (AI) has been shaped by visionary ideas, foundational theories, and technological breakthroughs. From its conceptual roots in the mid-20th century to the emergence of symbolic reasoning and neural networks, each era has contributed uniquely to the development of intelligent systems.

1950 - Turing test

Alan Turing introduces the imitation game, now known as the Turing test, suggesting that a machine could be considered intelligent if it can convincingly mimic human responses in a text-based conversation.

1956 – Dartmouth Conference

At the Dartmouth Conference, the term Artificial Intelligence is formalized, marking the official launch of AI as a field of academic inquiry (McCarthy et al., 1956).

1950s–1970s – Software and hardware

This era focuses on software innovation, with slow hardware progress. Symbolic AI, expert systems, and early machine learning algorithms emerge. Projects like ELIZA (1966) showcase early attempts at human computer interaction.

An algorithm is a series of instructions for performing a calculation or solving a problem, especially with a computer. They form the basis for everything a computer can do and are therefore a fundamental aspect of all AI systems.

1970s – Neural networks conceptualized

Early neural‑network ideas from the 1950s stalled by the 1970s after Minsky and Papert’s Perceptrons (1969) exposed fundamental limitations, leading symbolic, rule‑based AI to dominate that decade.

Neural networks are also known as artificial neural networks; these are machine learning models loosely inspired by the structure of the human brain. They consist of layers of interconnected nodes or “artificial neurons.” Each node processes input data by applying weights and thresholds, passing it forward only if certain conditions are met. Through training, these weights and thresholds are adjusted to produce consistent outputs from similar inputs (Brown, 2021).

1970s-1990s - Technology gap

Neural networks remain underdeveloped and are overshadowed by programmatic systems for nearly two decades, awaiting future advances in computational power and algorithmic design.

1980s-1990s: Rules-based systems

In the late 1980s to 1990s, with more powerful hardware emerging, there was renewed interest in neural networks. However, the majority of deployed AI systems in this time period were still computationally constrained and hence tended to be narrowly focused 'rule-based' or 'expert systems' applications, including in teaching and learning applications such as PLATO (Bitzer et al., 1965) and in healthcare with the emergence of early clinical decision support systems (CDSSs) (Middleton et al., 2016).

There was an advantage to the expert systems approach: the same inputs result in the same outputs, allowing them to be thoroughly tested to avoid unintended behaviors or results. This reproducibility is not guaranteed with neural network-based approaches.

While these early systems focused on reproducibility and logic, today’s healthcare demands more than accuracy it requires empathy, teamwork, and creativity. A shift toward high tech, high touch care requires that AI supports not substitutes the human elements of healing.

1990s-2000s: Machine learning

With the growth in power of computing devices, the ability to ingest and train upon large datasets to establish statistical patterns saw an increase in efficiency and practicality of machine learning research. There were also significant contributions in areas like computer vision and natural language processing (NLP).

Machine learning is the ability of systems to learn from experience and improve performance without being explicitly programmed. When trained with sufficient data, a machine learning algorithm can make predictions or solve problems such as identifying objects in images or mastering strategy games.

Computer vision is a field of AI that enables machines to interpret and understand visual information from the world. It powers technologies like facial recognition, object detection, medical image analysis, and autonomous vehicles.

Natural Language Processing (NLP) is a branch of artificial intelligence focused on enabling computers to understand, interpret, and generate human language. It powers applications like machine translation, sentiment analysis, chatbots, and large language models (LLMs).

2010-2017: Deep learning

Deep learning is a subset of machine learning, which itself is a branch of broader artificial intelligence (AI) techniques. This conceptual hierarchy sets the stage for future breakthroughs in AI capabilities (Pesapane et al., 2018). During this period, deep learning approaches rose to prominence, transforming the AI landscape (Gao, 2020). This revolution was driven by several converging factors.

GPUs

Increased computational power from graphical processing units (GPUs), originally designed for gaming, but highly effective at parallel matrix multiplications a core requirement for neural networks (Merritt, 2023).

Big data

Availability of large, curated datasets (big data), which enabled more effective training of complex models.

Algorithmic advances

Algorithmic advances, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which significantly improved performance in tasks like image recognition, chess, and the game of Go domains traditionally dominated by human intelligence.

2017-Present: Large language models (LLMs)

The transformer architecture was introduced in the paper Attention is all you need by Vaswani et al. (2017), revolutionizing natural language processing (NLP). This model replaced earlier sequential approaches with a mechanism called self-attention, enabling more efficient and scalable training of language models.

2018–2024: Generative transformers

Building on the transformer foundation, models like GPT (Generative Pre-trained Transformer) emerged, leading to rapid advancements in NLP. OpenAI’s ChatGPT series and Meta’s LLaMA models exemplify this progress, setting new benchmarks in language understanding and generation using LLMs (OpenAI, 2024; Meta, 2024).

2024–Present: Inference efficiency

Today, the major computational cost lies in training LLMs and LMMs (large multimodal models), not in running them. The operational phase known as inference computing is relatively lightweight compared to the massive resources required for model training.

Timeline

The evolution of artificial intelligence spans decades of innovation, from early philosophical questions about machine intelligence to transformative breakthroughs in learning algorithms and language models. This timeline highlights key milestones that shaped AI’s development, reflecting shifts in theory, technology, and application across academic, commercial, and societal domains.

1950: Turing test

Alan Turing introduces the imitation game, now known as the Turing test.

1956: Dartmouth Conference

The term Artificial Intelligence is formalized, marking the official launch of AI as a field of academic inquiry.

1980s: Expert systems

In the 1980s, expert systems emerged as the dominant form of AI, using rule-based logic to emulate human decision-making in narrow domains such as medical diagnosis and mineral exploration. These systems demonstrated AI’s potential in real-world applications.

2000s: Machine learning

In the 2000s, machine learning gained prominence as growing datasets and advances in computing enabled algorithms to learn patterns from data rather than rely on hand-coded rules, laying the groundwork for modern AI applications in search, recommendation, and fraud detection.

2000s+: Deep learning

In the 2010s, deep learning revolutionized AI with neural architectures like convolutional neural networks (CNNs) for vision and recurrent neural networks (RNNs) for sequential data, enabling breakthroughs in image classification, speech recognition, and translation

2017+: Transformers and LLMs

From 2017 onward, transformer architectures redefined AI capabilities, enabling large language models (LLMs) like BERT (Devlin, 2019) and GPT to achieve state-of-the-art performance in translation, summarization, and question answering through scalable self-attention mechanisms

Three phases

AI has evolved through three distinct phases, each marked by a shift in how machines process information and solve problems. From rule-based logic to data-driven prediction and now context-aware generation, these phases reflect the growing sophistication of AI systems and their expanding role in human-computer interaction.

Symbolic AI

Dominant from the 1950s to the 1980s, symbolic AI relied on hand-crafted rules and formal logic to represent knowledge and reason about problems. It powered early expert systems and was grounded in human-defined structures.

Machine learning

Emerging in the 1990s and accelerating in the 2000s, machine learning shifted focus to pattern recognition and statistical inference. Algorithms learned from data, enabling applications in search, recommendation, and fraud detection.

Transformers LLMs

Since 2017, transformer-based models have enabled large-scale language understanding and generation. These systems use self-attention to capture context across sequences, powering breakthroughs in translation, summarization, and conversational AI.

Categorizing Artificial Intelligence

AI by functionality

Given the rapid pace of AI development, it’s important to understand the different types of AI and how they function. These categories offer a practical lens for evaluating AI systems in healthcare, particularly in terms of risk, capability, and transparency. The complexity increases when robotics is involved, as these systems must also interpret and interact with the physical world. Understanding AI functionality is not just about technical classification it’s about how these systems interact with people. Reflex agents may monitor vitals, but utility-based agents can support compassionate decision-making. It is imperative that we evaluate these systems through a human lens.

Reactive machines

These are the simplest AI systems, designed to respond to specific inputs without learning from experience. Examples include IBM’s Deep Blue and Google’s AlphaGo, which could evaluate board positions and select optimal moves but lacked the ability to learn from past games (Greenemeier, 2017; Hintze, 2016).

These systems retain short-term data to inform decisions. For instance, self-driving cars monitor nearby vehicles’ speed and direction to predict movement and ensure safety. However, this memory is transient and not stored for long-term learning (Hintze, 2016).

In healthcare, reactive machines are exemplified by systems like real-time vital sign monitors, which trigger alerts based on current readings without storing or learning from past data.

AI agents

AI agents vary in complexity, from simple reflex systems to utility-based agents that optimize outcomes. They include:

Simple reflex agents – respond directly to specific inputs.
Model-based reflex agents – use internal models of the world.
Goal-based agents – act to achieve defined objectives.
Utility-based agents – aim to maximize performance based on a utility function.

In healthcare, such agents could support decision-making by modelling patient data, predicting outcomes, or recommending treatments (Russell & Norvig, 2022).

AI by technology

AI technologies vary in how they operate and what they enable. From rule-based systems to deep learning and foundation models, each approach supports different applications in healthcare. This section outlines the key technologies driving AI today and how they’re applied in clinical decision support, diagnostics, patient communication, and operational efficiency.

Each technology from rule-based systems to LLMs offers opportunities to improve care. However, their success depends on how well they support healthcare workers in delivering respectful, creative, and collaborative care.

Rules-based expert systems

These emulate human decision-making using explicitly coded rules. Many clinical decision support systems (CDSSs) fall into this category. Other examples are diagnostic rule engines and treatment protocol enforcement systems that guide clinicians through predefined pathways (Coiera, 2016).

Process streamlining

AI can improve patient flow by streamlining administrative tasks, reducing the workload on staff and improving efficiency. For example, AI virtual assistants can book appointments, compose letters, and send patient reminders, while NLP can automate dictation and structure clinical notes (Dawoodbhoy et al., 2021).

Statistically trained Machine Learning (ML)

These use statistical models to predict outcomes based on training data. Radiology systems often use this approach because it allows for interpretability and scrutiny of dataset quality and bias. For example, ML models can classify X-ray images or predict lab result anomalies based on historical data (Harrer, 2023).

Deep Learning (DL)

A subset of ML that uses multi-layered neural networks to analyse large datasets. Applications include image and speech recognition, natural language processing (NLP), and computer vision for medical image analysis. For instance, deep learning is used to detect tumours in MRI scans or transcribe physician dictations into structured notes. This approach is often considered a “black box” due to limited interpretability (Stanford Emerging Technology Review, 2025).

Natural Language Processing (NLP)

This enables machines to understand and interpret human language. In healthcare, NLP is used to analyse clinical notes, patient records, and power chatbots for patient interaction (Stanford Emerging Technology Review, 2025). Examples for this might include extracting symptoms from unstructured notes or enabling conversational agents to triage patient queries.

Large Language Models (LLMs)

Introduced in 2017, the transformer architecture revolutionised natural language processing (NLP) by enabling models to process data using attention mechanisms. This architecture underpins today’s large language models (LLMs), such as GPT and LLaMA, which perform the functional heavy lifting in real-world applications. LLMs are large-scale models trained on diverse data, capable of understanding and generating human language across domains.

In healthcare, current research demonstrates applications primarily in medical question answering, patient information generation (including summarization and translation of medical texts), and limited forms of clinical documentation such as informed consent forms and patient education materials (Busch et al., 2025).

AI by intelligence

While AI systems are becoming increasingly capable, they still lack cognitive traits like self-awareness and theory of mind concepts that remain theoretical but help frame the boundaries of current AI technologies.

Artificial Narrow Intelligence (ANI)

ANI, also known as weak AI, is designed to perform specific tasks, such as image recognition or language translation. It doesn't possess general intelligence and can't learn beyond its programmed capabilities (Hintze 2016). ANI is the most common form of AI currently in use across various industries and applications.

In healthcare, ANI powers systems like diagnostic imaging tools, chatbots for patient interaction, and robotic process automation for administrative tasks (Davenport & Kalakota, 2019).

Artificial General Intelligence (AGI)

AGI refers to AI systems capable of performing a wide range of cognitive tasks at a level comparable to humans, including learning and reasoning across diverse domains. Although AGI remains hypothetical, recent work suggests that advanced models may exhibit early signs of general intelligence. Bubeck et al. (2023) report that GPT‑4 demonstrates broad, human‑level performance across tasks in mathematics, coding, vision, medicine, law, and psychology, arguing that it could be viewed as an early though still incomplete form of AGI.

Artificial Super Intelligence (ASI)

ASI would surpass human intelligence, performing tasks better than any human could. This level of AI remains purely theoretical and is the subject of much debate and speculation.

Artificial Intelligence in Healthcare

AI technologies

There are range of AI technologies that can be categorized based on their technical design and capabilities.

Rule-based and workflow automation

Rule-based systems follow predefined logic and structured workflows. They support clinical and administrative tasks, including robotic process automation (RPA), which improves operational efficiency by automating repetitive processes.

Clinical Decision Support Systems (CDSS)

CDSS are knowledge-based systems that use structured data to guide treatment decisions and drug usage. These systems produce deterministic outputs based on expert rules.

Data-driven analytical systems

These systems use ML and deep learning to analyse complex healthcare data. They support diagnostics, documentation, and pattern recognition, and adapt over time to improve performance (Ahsan et al., 2022).

Conversational and assistive systems

Conversational systems, including chatbots and large language models (LLMs), can support patient engagement, triage, and education. Voice assistants and transcription tools can enhance communication by allowing clinicians to focus on patients and significantly reduce documentation burden.

Generative and predictive systems

Data‑driven biomedical modelling systems enable high‑fidelity simulation of biological processes, personalised therapeutic design, and accelerated drug discovery. NVIDIA Clara Biopharma provides AI‑accelerated molecular modelling, virtual screening, and multimodal analysis for precision medicine, while NVIDIA’s BioNeMo foundation models offer generative chemistry, protein modelling, and large‑scale biological simulation to speed therapeutic development (Stern, 2024; NVIDIA, n.d.).

AI applications

These are a number of areas of healthcare practice where the AI technologies are applied.

Imaging diagnostics

AI supports the analysis of radiological images to detect and classify tumours and improve diagnostic accuracy. AI tools can also automate aspects of imaging workflow such as triage and reporting (Najjar, 2023), and machine‑learning models show strong performance in differentiating benign from malignant bone lesions (Ong et al., 2023).

Pathology and cardiology

AI supports tissue analysis in pathology and ECG interpretation in cardiology, enhancing diagnostic precision and enabling early detection of conditions (Nagarajan et al., 2021, Kather 2019).

Personalized medicine

AI-driven treatment plans based on genetic and clinical data improve efficacy and reduce side effects. This approach is particularly impactful in oncology and rare diseases (Raufaste-Cazavieille et al., 2022).

Drug discovery and clinical research augmentation

Generative AI tools simulate and test drug candidates, accelerating development cycles and reducing costs. Systems like AlphaFold have revolutionized protein structure prediction (Manjarrez, 2023).

Clinical documentation and communication

NLP tools summaries patient notes and transcribe interactions, supporting clinician workflows and reducing cognitive load.

Hospital operations and patient flow

Predictive analytics help plan capacity and allocate resources efficiently, improving operational responsiveness.

Natural Language Processing in Healthcare

n healthcare, NLP acts as a bridge between human expression and machine interpretation, enabling AI systems to transcribe clinical conversations, summaries patient histories, and interact with users through chatbots and decision support tools (Park et al., 2024). As the core technology behind large language models (LLMs), NLP enables these systems to engage with clinical language at scale transforming unstructured text into actionable insights and supporting a wide range of clinical and administrative tasks. Effective NLP systems require access to both structured data (e.g., lab results) and unstructured data (e.g., clinical notes). They must be trained on language-rich, diverse, and ethically sourced datasets to ensure safe and equitable outcomes. Explainability and transparency are also critical to ensure trust and accountability in clinical settings (Shool et al., 2025).

Example: An NLP system might extract medication names from free text clinical notes and cross reference them with structured prescription data to flag potential interactions bridging the gap between narrative documentation and structured decision support.

Clinical applications

NLP is increasingly used in direct clinical applications. These include pre-screening and triage via chatbots, surgical planning and innovation, remote rehabilitation, and early disease prediction.

Pre-screening and triage

NLP systems have supported mental health referrals (Gkotsis 2017)

Surgical planning

Assisted in surgical insight generation (Lim et al., 2025)

Remote rehabilitation

Enabled remote monitoring during the COVID-19 pandemic (Carriere et al., 2021)

Early disease prediction

Improved early detection of conditions like sepsis by mining unstructured notes (Goh et al., 2021)

Workflow automation

Recent advances in NLP, particularly with large language models (LLMs) like GPT-4, have enabled automated transcription of clinical conversations and generation of structured notes.

Error type - omission

In a study evaluating GPT-4’s ability to generate SOAP-format notes from audio transcripts, the model produced notes with 23.6 errors per case, with omissions being the most frequent error type (86%) (Giorgi et al., 2023).

Accuracy decline

Accuracy declined with longer or more complex transcripts, and only 52.9% of data elements were consistently reported across replicates.

Despite these limitations, early trials suggest that LLMs can meaningfully reduce documentation time when used in supervised workflows.

Reducing administrative burden

In addition to its clinical applications, NLP plays a key role in reducing the administrative burden that affects both clinicians and healthcare systems.

Clinical documentation

Clinicians face significant documentation demands due to the widespread use of Electronic Health Records (EHRs), often spending more time entering data than engaging with patients. This contributes to burnout and reduced care quality. A longitudinal cohort study found that team-based documentation reduced EHR time by 28.1% and increased visit volume by 10.8% after a learning period (Apathy et al., 2024). NLP offers a solution by automating transcription, summarization, and note generation, thereby reducing the cognitive and time load associated with clinical documentation (Giorgi et al., 2023).

Operational tasks

Operationally, administrative burden includes tasks such as billing and coding, prior authorisation, population health reporting, and clinical trial matching. These processes require extracting structured information from unstructured clinical text, a task well-suited to NLP systems trained on diverse and domain-specific corpora (Shool et al., 2025).

Diagnostic codes

For example, NLP can identify diagnostic codes from discharge summaries or match patients to trial eligibility criteria based on free-text notes.

Eligibility criteria

It can match patients to trial eligibility criteria based on free-text notes.

By addressing both clinical and operational inefficiencies, NLP supports a more scalable, responsive, and data-driven healthcare system.

Reasons for optimism

Van Veen et al. (2024) conducted a blind evaluation comparing adapted LLMs to medical experts in summarizing clinical texts, including radiology reports, progress notes, and patient queries. Physicians rated AI-generated summaries as equivalent (45%) or superior (36%) to expert written summaries.

Equivalent

Physicians rated AI-generated summaries as equivalent (45%) to expert-written summaries.

Superior

Physicians rated AI-generated summaries as superior (36%) to expert-written summaries. These findings suggest growing confidence in LLMs as tools for enhancing clinical analysis and reducing documentation burden. The shift from statistical NLP to deep learning has unlocked new capabilities in understanding context, nuance, and medical terminology offering a promising future for NLP assisted care.

Data, Ethics, Safety and Governance

Data

Effective NLP in healthcare depends on high-quality, diverse data. Poor corpora introduce bias, omissions, and clinical misinterpretation.

Corpus quality concerns

LLMs trained on general or poorly curated corpora may misinterpret clinical nuance or terminology, leading to unreliable outputs (Wang and Zhang, 2024).

Omissions and inconsistencies

GPT-4 showed frequent omissions (86% of errors) and inconsistent reporting across replicates, especially with longer or more complex transcripts (Giorgi et al., 2023).

Dependence on unstructured data

NLP systems require rich, diverse, and ethically sourced datasets. Poor data quality can result in biased outputs, hallucinations, and misrepresentation (Dang and Verma 2025).

Ethics

LLMs raise ethical concerns in healthcare, including hallucinations, accountability, and the tension between fluency and factual accuracy.

Hallucinations and overconfidence

LLMs may generate plausible but incorrect information, which can mislead clinicians or patients if not carefully supervised (Giorgi et al., 2023).

Clinician responsibility and oversight

Automated summaries raise questions about accountability for errors or omissions, especially when used in clinical decision-making (Reason, 2000).

Fluency vs factuality

Hagendorff (2024) highlights the ethical tension between fluency and factuality in LLM outputs, underscoring the need for careful oversight.

Safety

NLP tools in healthcare pose safety risks from misdiagnosis to legal liability requiring robust oversight, validation, and governance frameworks.

Misdiagnosis risk

NLP-generated summaries or decision support tools may omit critical details or introduce errors, affecting clinical outcomes (Williams et al., 2025).

System vulnerabilities

These risks align with Reason's Swiss Cheese Model and Hollnagel's Safety-I/II framework, illustrating how errors can propagate through complex systems (Reason, 2000; Hollnagel, Wears, & Braithwaite, 2015).

Need for supervision

Variability in LLM performance reinforces the importance of human oversight to ensure safe and effective clinical use (Bedi et al., 2025).

Legal and safety considerations

Variability in LLM performance reinforces the importance of human oversight to ensure safe and effective clinical use (Carandang et al., 2025).

Governance

Principles for responsible AI

Governance ensures that NLP systems are ethically sound, legally compliant, and socially trustworthy. The World Health Organization (2021) outlines six principles: transparency, accountability, inclusiveness, equity, responsiveness, and sustainability.

Organisational readiness

The HAIRA model provides a five-level framework for readiness, covering leadership, product evaluation, monitoring, and stakeholder engagement (Hussein et al., 2024).

Accountability frameworks

Legal and safety risks from automated note generation reinforce the need for clear governance structures that define responsibility, oversight, and escalation pathways (Novelli et al., 2025).

Evaluation frameworks and expert skepticism

Frameworks like LLM-EVAL and MEDIC assess LLMs across dimensions such as accuracy, safety, bias, and clinical reasoning (Kanithi et al., 2024). Despite these advances, many clinicians remain cautious, citing the lack of transparency and difficulty verifying outputs as barriers to trust (Choudhury & Chaudhry, 2024).

Artificial Intelligence from Examination to Treatment

AI applications

AI-powered Clinical Decision Support Systems (CDSS) now support clinicians across the care continuum from examination to treatment by integrating data from EHRs, wearables, and genomics to enable real-time, personalised decision-making.

Examination

Karakasis et al. (2025) report on AI approaches in Atrial Fibrilation (AF) detection, risk stratification, and treatment optimisation, including wearable tech and EHR integration. Junaid et al. (2022) highlight AI's role in vital signs monitoring—both supporting early-stage examination and non-invasive testing.

Pre-screening and triage

NLP-powered chatbots are increasingly used to support patient pre-screening and triage. These systems engage patients in structured dialogues, assess symptoms, and recommend appropriate next steps. For example, NHS England has trialled conversational agents to streamline mental health referrals, reducing administrative burden and improving access to care (NHS England, 2025). These tools raise governance concerns, particularly when generative models produce confident but incorrect recommendations (Hagendorff, 2024).

Testing

AI is transforming diagnostic testing by enhancing both image-based analysis and physiological monitoring. In cancer imaging, AI tools are helping clinicians detect malignancies with greater precision, improving early detection and reducing diagnostic delays (Jaber, 2022). Complementing this, Junaid et al. (2022) demonstrated how wearable sensors and ambient monitoring systems enable real-time tracking of vital signs—such as heart rate and respiration—feeding into AI models that support non-invasive testing. Together, these innovations illustrate how AI is expanding the reach and responsiveness of diagnostic testing across clinical settings.

Diagnosis

AI is increasingly demonstrating its value in clinical diagnosis, often matching or surpassing human performance across specialties like radiology, pathology, and cardiology. At Massachusetts General Hospital and MIT, for instance, AI systems detected lung nodules with 94% accuracy—outperforming radiologists in a key diagnostic task (Topol, 2019). Microsoft's MAI-DxO platform has shown similar promise, achieving significant accuracy in complex diagnostic cases while also reducing costs (Kanjee, 2023). These advances reflect a broader trend, as outlined in a critical review of AI in disease diagnostics (Mirbabaie et al., 2021). Commercial tools like Alcidion's Miya platform are already integrating rules-based logic with machine learning to support clinicians in diagnostic decision-making.

In clinical settings, NLP is used to summarise patient histories, extract relevant findings from diagnostic reports, and support decision-making. Van Veen et al. (2024) found that adapted LLMs could match or outperform clinicians in summarising radiology reports and progress notes.

Structured orchestration models such as MAI-DxO go further by mimicking team-based diagnostic reasoning, helping clinicians navigate complex cases with greater confidence. The American Hospital Association has also documented how AI is improving diagnostic workflows and clinical decision-making across health systems (American Hospital Association, 2023).

Clinical summarisation

These systems help clinicians navigate complex cases by distilling large volumes of information into concise, actionable insights—reducing cognitive load and improving efficiency.

Early disease prediction

NLP contributes to early disease prediction by mining unstructured clinical notes for subtle indicators of illness. Goh et al. (2021) developed the SERA algorithm, which achieved high predictive accuracy for sepsis up to 12 hours before onset. When combined with structured data, NLP-enhanced models can improve diagnostic sensitivity and support timely intervention.

Treatment

AI is increasingly being used to personalise treatment by analysing a wide range of patient-specific data, including genetic profiles, lifestyle factors, and clinical history. This is especially impactful in cancer care and chronic disease management, where tailored interventions can significantly improve outcomes. Tools like DeepMind and IBM Watson are at the forefront of this transformation, accelerating drug discovery and enabling precision medicine through advanced data modelling (Bhinder et al., 2021). By identifying which therapies are most likely to be effective for individual patients, AI supports more targeted, efficient, and responsive treatment planning.

Challenges

Despite AI's growing impact across care phases, challenges in data, ethics, and governance continue to shape its safe deployment.

Data

AI systems supporting examination, testing, diagnosis, and treatment rely on high-quality, multimodal data including imaging, sensor outputs, and clinical records. In diagnostic testing, for example, wearable sensors and ambient monitors feed real-time physiological data into AI models (Junaid et al., 2022). Cancer imaging tools require annotated datasets with consistent resolution and metadata to ensure accurate detection (Jaber, 2022). Diagnostic platforms like MAI-DxO integrate structured and unstructured data to support decision-making, but fragmented or biased inputs can compromise performance (Chen et al., 2023).

Ethics

Ethical concerns in these phases include explainability, bias, and consent. AI systems used in diagnosis often generate probabilistic predictions that must be communicated carefully to avoid undue anxiety or false reassurance (Ehsan and Riedl, 2024). Consent for secondary data use is especially relevant in treatment planning, where genomic and lifestyle data may be repurposed for AI model development (Bhinder et al., 2021). Bias in training data can reinforce disparities in diagnostic accuracy and treatment access, particularly for underrepresented populations (Mirbabaie et al., 2021).

Safety

AI systems must be designed with patient safety as a core priority. Risks include inaccurate predictions, alert fatigue, and over-reliance on automation. These can undermine trust and lead to unintended harm if not properly mitigated. For example, AI-generated clinical notes have been shown to include fabricated or misleading information, which can be dangerous in medical contexts (Feske-Kirby, Shojania, & McGaffigan 2024). Additionally that suggest that health systems should implement safety programs that include risk identification, mitigation strategies, and workforce readiness to ensure AI augments rather than replaces human judgment.

Governance

Governance frameworks must ensure safe integration of AI into clinical workflows. Structured orchestration models like MAI-DxO support real-time auditability and clinician feedback loops in diagnostic settings (The AI Track Team, 2025).. Regulatory bodies increasingly require explainable AI for treatment planning and diagnostic support, with models like HAIRA offering maturity assessments for organizational readiness (Hussein et al., 2024). The WHO’s six principles transparency, accountability, inclusiveness, equity, responsiveness, and sustainability remain essential for guiding AI deployment across these phases (World Health Organization, 2021).

Artificial Intelligence and Patient Monitoring

AI applications

AI is advancing continuous patient monitoring, enabling personalised, predictive care through real-time data analysis and adaptive remote systems.

Vital signs

ML and DL systems have been applied to predict patient outcomes, monitor vital signs, and personalise care. Junaid et al. (2023) reviewed AI applications in cardiovascular diagnostics and vital signs monitoring, highlighting models such as classification algorithms, stacked autoencoders, and real-time cloud-based systems for diabetes and outpatient care (Junaid et al., 2020).

Remote monitoring

AI-enabled remote patient monitoring (RPM) systems have emerged as transformative tools. These architectures can detect early deterioration in patient health, personalise monitoring using federated learning, and adapt to individual behaviour patterns through reinforcement learning (Shaik et al., 2023). Generative AI is expected to enhance early warning systems and enable natural language interaction between patients and care teams, improving postoperative care and communication.

NLP also plays a role in remote patient monitoring, particularly in postoperative care and chronic disease management. Integrated with wearable sensors and cloud-based platforms, NLP systems can interpret patient-reported symptoms and support real-time communication between patients and care teams. These tools help personalise recovery plans and detect early signs of deterioration, complementing physiological monitoring systems (Shaik et al., 2023).

Challenges

Despite AI's growing impact, challenges in data, ethics, and governance continue to shape its safe deployment in postoperative care.

Data

AI systems for postoperative care and vital signs tracking depend on continuous, high-quality data streams from wearable sensors, cloud platforms, and clinical records. However, inconsistencies in data formats, limited interoperability, and gaps in annotation can compromise model accuracy and reliability (Junaid et al., 2023). Real-time monitoring platforms must also manage large volumes of physiological data, which can be noisy or incomplete, especially in remote settings (Shaik et al., 2023).

Ethics

Ethical concerns include patient consent, data privacy, and algorithmic bias. Remote monitoring systems often collect sensitive health data continuously, raising questions about informed consent and secondary data use. Bias in training data can lead to unequal performance across patient populations, particularly in predictive models used for postoperative risk assessment (Shaik et al., 2023). Ensuring transparency in how AI systems interpret and act on vital signs is essential to maintain trust and safety.

Safety

AI systems in patient monitoring must be designed with safety as a core priority. Poorly calibrated models can miss early signs of deterioration or trigger false alarms, leading to delayed care or unnecessary interventions. A widely used sepsis detection model, for example, failed to identify over 1,700 cases, resulting in delayed treatment and patient harm (Ratwani et al., 2024). To mitigate such risks, safety programs should include model validation, clinician oversight, and continuous performance monitoring. Trust, usability, and the preservation of the doctor-patient relationship are also essential for safe adoption (Godoy Junior et al., 2024).

Governance

Governance frameworks must evolve to support AI integration into postoperative workflows. This includes lifecycle oversight of AI-enabled RPM systems, from model validation to post-deployment auditing. Regulatory bodies increasingly require explainable AI and robust data governance, especially when systems influence clinical decisions remotely. Standards for federated learning and reinforcement learning in patient monitoring are still emerging and require multidisciplinary collaboration (Shaik et al., 2023).

Artificial Intelligence and Radiology

AI applications

Diagnostics

Hosny et al. (2018) provide a comprehensive overview of the integration of artificial intelligence into radiological practice, highlighting both its current applications and future potential. AI technologies, particularly machine learning and deep learning, are being used to enhance diagnostic accuracy by identifying patterns in medical images that may be difficult for the human eye to detect. These tools also streamline workflows by automating routine tasks, prioritizing urgent cases, and supporting clinical decision-making. This integration not only improves efficiency it also supports more responsive and equitable care. By reducing diagnostic delays and errors, AI can help radiologists deliver timely, compassionate interventions that build trust and improve outcomes.

Radiomics

A key concept emerging in this context is "radiomics," which refers to the extraction of a large number of quantitative features from medical images using data characterisation algorithms. This approach enables more nuanced decision support and has the potential to reveal disease characteristics that may not be visible to the human eye.

Van Timmeren et al. (2020) demonstrated how mathematical quantification and spatial analysis of pixel data can uncover findings imperceptible to human observers, thereby enhancing diagnostic precision. Their study outlines the patient journey from image acquisition and segmentation to the extraction of radiomic features, which are then analysed using statistical modelling and machine learning to support disease classification, patient clustering, and risk stratification.

Radiomics exemplifies how AI can support personalised and predictive healthcare—but it must be deployed with creativity, curiosity, and civility to ensure that these insights are used ethically and inclusively.

Challenges

Data

In radiology, the effectiveness of AI hinges on the precision and consistency of imaging data. Unlike population health, where data fragmentation and outdated records are key concerns, radiology faces unique challenges in image annotation and standardization. High-resolution scans must be paired with accurate clinical metadata Even minor inconsistencies in segmentation or preprocessing can compromise model performance (Royal College of Radiologists, 2025; Royal Australian and New Zealand College of Radiologists, 2019).

Radiomics, in particular, demands rigorous labelling protocols. The extraction of quantitative features from pixel-level data requires not just clean images, but methodologically consistent acquisition and annotation standards. Without global harmonization, radiomic models risk being non-reproducible across institutions undermining both diagnostic reliability and clinical trust (Cobo et al., 2023).

Ethics

While ethical concerns in population health often center on surveillance and consent, radiology introduces distinct dilemmas around interpretability and responsibility. AI models may detect features imperceptible to human observers, raising questions about how clinicians should communicate findings and make decisions based on algorithmic outputs (Hosny et al., 2018).

In radiomics, predictive modelling can influence treatment pathways, yet the underlying logic may be opaque. This challenges traditional notions of informed consent and clinician accountability. Ethical governance must ensure that AI augments—not replaces—human judgment, and that patients understand how machine-derived insights inform their care (Van Timmeren et al., 2020; Morley et. al. 2020).

Safety

AI safety in radiology requires a lifecycle approach that spans pre-deployment validation, production guardrails, and post-deployment monitoring. The RAISE framework outlines how radiology specific AI systems must meet rigorous standards of safety, effectiveness, and fairness before clinical use. Input/output guardrails and continuous performance tracking help identify failures and data drift, while clinician education ensures responsible use of algorithmic findings. Trust in radiological AI is earned through transparent, multi-level quality assurance across regulatory, clinical, and ethical domains (Cardoso et al., 2023).

Governance

Radiology governance must address not only fairness and transparency but also clinical integration and interdisciplinary collaboration. Unlike population-level systems, radiological AI tools operate in high-stakes, individualized contexts. Governance frameworks should support real-time auditability, clinician feedback loops, and continuous validation against evolving clinical standards (Chen et al., 2021), allowing clinicians to focus on the emotional and relational dimensions of care. This requires governance models that go beyond technical compliance to foster trust, inclusion, and professional integrity (WHO, 2021; Hussein et al., 2024).

Artificial Intelligence and Pathology

Tissue preparation and digitization

Before AI can be applied in pathology, a detailed tissue preparation and digitization process must take place.

Biopsy

It begins with the collection of a biopsy from the patient.

Tissue block

This is fixed and embedded in paraffin to create a tissue block.

Sectioning

This block is subsequently sliced into thin sections.

Mounting and staining

The slices are placed on glass slides and treated with special stains to highlight cellular structures.

Scanner

Finally, the stained slides are scanned using a specialized slide scanner

High resolution digital images

This produces high-resolution digital images that can be analyzed by AI systems

Acs et al., 2020

AI applications

Before AI can be applied, a detailed tissue preparation and digitization process must occur.

Biopsy

It begins with the collection of a biopsy from the patient.

Tissue block

This is fixed and embedded in paraffin to create a tissue block.

Sectioning

This block is subsequently sliced into thin sections.

Mounting and staining

The slices are placed on glass slides and treated with special stains to highlight cellular structures.

Scanner

Finally, the stained slides are scanned using a specialized slide scanner

High resolution digital images

This produces high-resolution digital images that can be analyzed by AI systems

Acs et al., 2020

Applying AI

Once pathology slides are digitized, AI techniques particularly convolutional neural networks (CNNs) can be applied to analyze the images. The process begins by dividing the hematoxylin and eosin (HE) stained image into smaller patches.

Convolutional Neural Network (CNN)

These patches are then processed through multiple convolutional layers within the CNN, where each layer applies local transformations to extract relevant features.

Neuron linking

Following this, fully connected layers integrate the information by linking every neuron in one layer to every neuron in the next.

Processing

Each neuron calculates an output by applying a set of weights and a bias to the inputs from the previous layer.

New representation

Through this layered processing and with sufficient training data, CNNs can generate new representations of the image, enabling them to assess routine diagnostic features or identify additional insights into disease

Acs et al., 2020

Foundation models now enable broader generalization across cancer types and reduce the need for task-specific retraining (Verma et al., 2025).

Integrating AI workflows

Successful AI adoption depends on safe and efficient integration into clinical workflows. NHS studies such as APathLAKE Plus have explored AI's impact on diagnostic efficiency and decision-making (Royal College of Pathologists, 2023). Key recommendations include:

Training

Providing task-sensitive training for pathology teams

Integration

Ensuring compatibility with NHS systems (e.g. Sectra, Indica Labs, Philips)

Reflection

Allowing time for reflection on the evolving roles and responsibilities of users.

Workflow constraints persist

Equipment

Reliance on 2D scanning for tasks that require 3D imaging

Labelling

The need for extensive labelled training data.

However, once deployed, AI models offer consistent performance and can significantly reduce diagnostic variability (Drogt et al., 2022).

Challenges

Data

AI systems in pathology depend on high-quality, annotated datasets derived from histological slides. Inconsistent tissue preparation, staining variability, and scanner differences can compromise model performance (Acs et al., 2020). Annotation remains a bottleneck, requiring expert input and specialized platforms such as QuPath and Cytomine (Drogt et al., 2022). The rise of unsupervised foundation models offers promise in reducing reliance on labelled data, but their effectiveness still hinges on robust dataset design and validation (Verma et al., 2025).

Ethics

Ethical concerns in pathology AI include bias in training data, lack of explainability, and the potential erosion of professional autonomy. There are a number of frameworks and guidance that emphasize standards that are critical when AI systems interpret sensitive diagnostic data or assist in clinical decision-making:

UK Conformity Assessed (UKCA) mark (Medicines and Healthcare products Regulatory Agency. (2023)
Conformité Européenne In Vitro Diagnostic Regulation (CE-IVD) (European Parliament and Council of the European Union. (2017)
Medicines and Healthcare products Regulatory Agency (MHRA) on AI as a Medical Device (AIaMD) Medicines and Healthcare products Regulatory Agency. (2024)

In essence they focus on transparency, safety, and human oversight (Royal College of Pathologists, 2023). The WHO further stresses inclusive stakeholder engagement and external validation to ensure equitable deployment (World Health Organization, 2021).

Safety

AI safety in pathology requires rigorous validation, continuous monitoring, and clinician oversight. Errors in tissue classification or biomarker scoring can lead to misdiagnosis or inappropriate treatment. The FDA and European Commission both emphasize lifecycle safety, including pre-market testing, post market surveillance, and adaptive algorithm regulation (FDA, 2023; European Commission, 2024). NHS guidance also highlights the importance of human in the loop systems and external validation to ensure safe deployment in clinical settings (Royal College of Pathologists, 2023).

Governance

Governance models are evolving to support safe AI integration in pathology. The UK's Medicines and Healthcare products Regulatory Agency's (MHRA) s roadmap for software and AI as medical devices includes lifecycle regulation from qualification to post-market surveillance with attention to adaptive algorithms and transparency (Royal College of Pathologists, 2023). Globally, the WHO recommends mandatory post-release auditing and impact assessments for AI tools used in healthcare (World Health Organization, 2021). The HAIRA model introduces a five-level maturity framework tailored to organizational readiness, offering scalable governance pathways for institutions adopting AI in pathology (Hussein et al., 2024).

Artificial Intelligence and Population Health

AI applications

Prediction

AI is transforming healthcare by enabling more accurate and timely predictions. Algorithms can identify individuals at high risk for developing Type 1 diabetes, allowing for early intervention and potentially preventing complications such as diabetic ketoacidosis (Alowais et al., 2023). Deep learning models have shown impressive accuracy in detecting diabetic retinopathy from retinal images, supporting earlier diagnosis and treatment (Alowais et al., 2023).

Predictive analytics also help forecast hospitalisation and readmission risks based on ambulatory care sensitive conditions, giving healthcare providers the ability to plan ahead and manage patient flow more effectively (American Psychological Association, 2025). In critical care, AI is being used to anticipate the onset and progression of sepsis, a complex and life-threatening condition (Alowais et al., 2023). On a broader scale, predictive models can forecast disease outbreaks and personalise treatment plans, helping clinicians tailor care to individual needs and improving overall patient outcomes (American Psychological Association, 2025).

Optimisation

AI doesn't just predict—it also helps optimise healthcare systems and clinical decision-making. By identifying patients at risk for conditions like diabetic ketoacidosis, AI enables timely preventative action that reduces emergency interventions and improves long-term outcomes (Alowais et al., 2023). In managing sepsis, predictive algorithms guide clinicians toward the most effective treatment strategies, improving survival rates and reducing unnecessary interventions (Alowais et al., 2023).

AI also plays a key role in optimising resource allocation for population health planning. By assessing hospitalisation and readmission risks, healthcare systems can distribute staff, beds, and equipment more efficiently, ensuring that resources are available where they're needed most (American Psychological Association, 2025). These data-driven insights support better decision-making, reduce operational costs, and enhance the overall quality of care. Ultimately, AI enables a more proactive, efficient, and patient-centred approach to healthcare delivery (Alowais et al., 2023).

Challenges

AI in population health depends on high-quality data, ethical oversight, and robust governance to ensure safe, equitable, and effective outcomes at scale.

Data

In population health, data is the foundation for AI-driven insights. High-quality data is essential, as poor tagging, fragmentation, or outdated information can lead to misdiagnoses and inequitable outcomes (Fisher & Rosella, 2022; Federal Register, n.d.). These risks are especially critical when AI systems inform public health decisions or allocate resources.

Population health AI relies on a rich ecosystem of direct and indirect data sources—each contributing to a more connected, compassionate, and proactive approach to care. Direct data sources provide real-time, measurable data crucial for accurate health monitoring and intervention. These include wearable devices that track physiological data (Fisher & Rosella, 2022), electronic health records (Ze Luan et al., 2023), medical claims, disease registries and environmental sensors. Indirect sources include social media posts, web searches, news media, flight data, government reports, academic literature, and global health repositories like WHO's Global Health Observatory.

One notable indirect data source is SENTINEL, which uses natural language processing and neural network algorithms to analyse daily tweets for real-time disease prediction (Șerban et al., 2019). This system has successfully predicted disease occurrences and identified potential outbreaks. Other AI models have been used to investigate foodborne illness outbreaks and improve COVID-19 case tracking (Greene et al., 2021). In population health, data is the foundation for AI-driven insights. High-quality data is essential, as poor tagging, fragmentation, or outdated information can lead to misdiagnoses and inequitable outcomes (Fisher & Rosella, 2022). These risks are especially critical when AI systems inform public health decisions or allocate resources. Natural Language Processing (NLP) adds further complexity. Models trained on biased or non-representative data may misinterpret dialects, cultural expressions, and minority health concerns, distorting insights and reducing effectiveness in vulnerable populations. This is particularly concerning in surveillance systems where NLP supports real-time interventions (Au Yeung et al., 2024).

Ethics

Ethical considerations are central to the responsible use of AI in population health. Aggregated data may protect individual identities, but indirect sources such as social media raise concerns around consent, surveillance, and data ownership (Șerban et al., 2019). These issues are amplified at scale, where AI systems influence public health decisions and resource allocation. Bias in training data can reinforce existing health disparities, particularly when models are applied across diverse populations without adequate safeguards (Fisher & Rosella, 2022).

Safety

AI systems in population health must be designed and deployed with patient safety as a core priority. While AI offers powerful tools for clinical decision support, documentation, and patient engagement, it also introduces risks such as inaccurate predictions, alert fatigue, and over reliance on automation. These risks can undermine trust and lead to unintended harm if not properly mitigated.

Safety concerns include misalignment between AI model goals and clinical safety objectives, poor integration into workflows, and biased training data that may exacerbate disparities. For example, AI-generated clinical notes have been shown to include fabricated or misleading information, which can be particularly dangerous in medical contexts (Feske-Kirby et al., 2024). To address these risks, healthcare organizations should implement robust safety programs that include risk identification, mitigation strategies, and continuous monitoring. These programs must also consider workforce readiness and ensure that AI augments rather than replaces human judgment (Kumah, 2025).

Governance

Governance models such as HAIRA (Hussein et al., 2024) and the WHO's six principles (WHO, 2021) are well suited to population health because they support scalable oversight, multidisciplinary collaboration, and equity focused evaluation key for managing diverse data sources and public-facing AI systems. These models also ensure public trust by promoting transparency in decision-making and enabling retrospective review of deployed tools (Hussein et al. 2024) (Chen et al., 2021). Importantly, ethical governance must address health equity, ensuring that AI systems do not reinforce disparities through biased data or opaque algorithms (Fisher & Rosella, 2022). By embedding these principles, governance models help align technological innovation with the broader mission of improving health outcomes for entire populations.

Artificial Intelligence and Precision Medicine

AI Applications

AI has a number of use cases in precision medicine including tailoring treatments to individual patients to accelerating drug discovery, AI technologies are reshaping clinical decision-making, improving outcomes, and supporting earlier interventions through advanced data analysis, predictive modelling, and genomic insights.

Personalised treatment Plans

AI algorithms can rapidly examine patient data, including genetic information, medical histories, and lifestyle factors, and use that information to support the development of personalised treatment plans. This approach aims to tailor treatments to an individual's unique genetic makeup and specific health conditions, with the potential to improve clinical outcomes.

Predictive analytics

AI's predictive analytics capabilities allow for early detection of diseases and prediction of disease progression. By identifying patterns and risk factors in patient data, AI can forecast the likelihood of conditions such as cancer, diabetes, and cardiovascular diseases, enabling earlier and potentially more effective intervention.

Genomic analysis

AI is accelerating genomic analysis by enabling rapid statistical evaluation of complex genetic data. In cancer treatment, for example, AI can help detect mutations that may inform the design of clinical trials and identify patient subgroups likely to benefit from targeted therapies. This can reduce the risk of poor clinical outcomes, particularly in cases where overall response rates are low but certain patients may experience significant benefit (Bhinder et al., 2021). AI algorithms may also assist in predicting drug responses based on genetic or other personalised data. This topic will be explored further in the context of digital twins.

Drug discovery and development

AI is contributing to drug discovery by identifying potential drug candidates and predicting their efficacy and safety. Machine learning algorithms can analyse biological data to discover new drug targets and design compounds with a higher likelihood of success in clinical trials. This may reduce the time and cost associated with bringing new medications to market. A recent example is AlphaFold 3, an enhanced version of the AlphaFold algorithm, which is capable of joint structure prediction for complexes including proteins, nucleic acids, small molecules, ions, and modified residues. It has demonstrated improved accuracy over many previous specialised tools (Abramson et al., 2024).

Challenges

Data

Precision medicine introduces significant data-related challenges. Integrating multi-omic datasets (genomic, proteomic, metabolomic) is complex due to their size, heterogeneity, and inconsistent annotation, which limits reproducibility and clinical applicability (Bhinder et al., 2021). Population bias is another concern: genomic datasets often overrepresent individuals of European ancestry, leading to variant misclassification and unequal access to personalised therapies (Chen et al., 2021). Additionally, fragmented data systems and proprietary formats hinder interoperability and collaborative research across institutions (Lekadir et al., 2025).

Ethics

Ethical challenges in precision medicine are equally pressing. Consent for secondary use of data is often inadequate, especially when datasets are repurposed for research or AI model development. Traditional consent models may not account for future applications, raising concerns about autonomy and transparency (Hussein et al., 2024). Moreover, AI systems in precision medicine frequently generate probabilistic predictions about disease risk, which must be communicated carefully to avoid causing undue anxiety or false reassurance (Cross et al., 2024). Equity in access also poses a significant ethical dilemma, as personalised treatments and AI-driven diagnostics are often expensive and limited to well-resourced settings, potentially widening health disparities (World Health Organization, 2021).

Safety

AI safety in precision medicine requires robust safeguards across the entire lifecycle of model development and deployment. Poorly validated algorithms can misclassify genetic variants or overfit to biased datasets, leading to inappropriate treatment recommendations. A widely used sepsis detection model, for example, failed to identify over 1,700 cases, resulting in delayed care and patient harm (Ratwani et al., 2024). To mitigate such risks, safety programs must include pre-deployment testing, post-market surveillance, and clinician oversight. President Biden's 2023 executive order calls for a national AI safety framework to track clinical errors and disseminate guidelines for safe implementation in healthcare settings (Ratwani et al., 2024).

Governance

Governance frameworks must evolve to address the unique demands of precision medicine. Regulatory bodies increasingly require explainable AI models, particularly when used to stratify patients or guide clinical trial design (Reddy et al., 2020). Cross-border data governance presents additional complexity, as international collaborations must navigate differing legal frameworks such as GDPR and HIPAA (Chen et al., 2021). Finally, accountability remains a critical issue: when AI systems recommend targeted therapies or flag genetic risks, it is often unclear who bears responsibility clinicians, developers, or institutions. Governance models must clarify roles and ensure shared responsibility for AI-driven decisions (Hussein et al., 2024).

Artificial Intelligence and Digital Twins

Use cases

Healthcare

A digital twin is a dynamic digital representation of a physical system that is continuously updated with real-world data throughout the system's life cycle (Madni, Madni & Lucero, 2019). In healthcare, this concept has evolved to include virtual replicas of patients, organs, medical devices, and clinical workflows. These models integrate data from sources such as electronic health records (EHRs), medical imaging, genomics, sensors, and IoT devices to create a comprehensive, real-time simulation (Katsoulakis et al., 2024).

Digital twins are used to:

Identify those at risk

Identify high-risk individuals or groups for early intervention.

Support personalized care

Support personalized treatment planning based on individual characteristics and real-time data.

Prediction

Predict disease risks and treatment outcomes.

Simulate scenarios

Optimizing design

Optimize medical device design before physical prototyping.

These applications reflect a shift toward precision care that is also personal where AI-powered simulations help clinicians understand not just what is happening, but who it is happening to.

Training

Building on their use in clinical modelling, digital twins also serve as powerful tools for simulation-based training. They allow healthcare professionals to practice a wide range of procedures from catheter insertions to ultrasound-guided interventions in a realistic, risk-free environment (Vallée, 2023). This supports skill development and confidence building, particularly for complex or high-stakes procedures.

Emergency training

Digital twins are especially valuable in emergency training. They can simulate critical scenarios such as cardiac arrests or trauma cases, enabling teams to rehearse responses and improve coordination.

Complex scenarios

Additionally, they support the simulation of complex clinical cases, allowing practitioners to refine diagnostic reasoning and explore treatment options using realistic patient data (Weile et al., 2021; Moyaux et al., 2023).

Haptic feedback

Some training platforms now integrate haptic feedback with digital twin simulations, offering tactile realism that enhances procedural learning (Adibi et al., 2024).

AI applications

AI-powered digital twins enhance decision-making by simulating outcomes, detecting anomalies, and optimizing clinical and operational healthcare processes

Decision making

The integration of AI with digital twin technology enhances its ability to process and interpret large volumes of heterogeneous data. AI techniques such as machine learning and deep learning enable digital twins to support more accurate diagnosis, treatment planning, and predictive modelling.

Cancer care

In cancer care, for example, AI-powered digital twins can simulate treatment responses across multiple virtual instances of a patient, helping clinicians identify the most effective intervention. This approach supports more precise and personalized care (Kaul et al., 2023).

Accuracy

AI also plays a role in maintaining the accuracy of digital twins over time by detecting anomalies in incoming data streams and triggering model recalibration when needed (Saratkar et al., 2025).

Operations

Beyond clinical care, digital twins are now applied to pharmaceutical supply chains and risk management. They help predict equipment failures, simulate disruptions, and optimize logistics in real time (Saratkar et al., 2025).

Operational decision-making

They also support operational decision-making, such as predicting equipment failures or optimizing resource allocation. For example, digital twins are increasingly used to simulate hospital infrastructure, including equipment performance, staff workflows, and energy systems. Institutions such as Stanford and UCSF have adopted these models to support predictive maintenance, resource optimization, and construction planning (Vallée, 2023).

These integrations show how AI can be a partner in care, not just a processor of data. When designed ethically, AI-powered twins help clinicians make better decisions not faster alone, but fairer and more informed.

Generative AI

Recent developments have seen the integration of generative AI into digital twin systems, enabling the creation of adaptive, prescriptive models that evolve with patient data and clinical context. These models support real-time decision-making, surgical planning, and drug discovery, moving beyond static simulations (Kamel Boulos & Zhang, 2021; Bordukova et al., 2024).

Simulate outcomes

For instance, generative AI can simulate multiple clinical outcomes for a given patient profile, allowing clinicians to explore a range of interventions before selecting the most effective one. This capability enhances precision medicine and supports proactive care planning (Elhaddad & Hamam, 2024).

Synthetic data

Moreover, generative models are being used to create synthetic patient data that can train digital twins without compromising patient privacy. This approach supports the development of robust, generalisable models while addressing ethical concerns around data sharing (van Velzen et al., 2025).

Challenges

Despite their promise, digital twins in healthcare face several critical challenges that must be addressed to ensure safe, equitable, and effective implementation.

Data

Digital twins rely on continuous, high-quality data streams to maintain accuracy. However, many systems suffer from fragmented data sources, inconsistent formats, and limited interoperability (Vallée, 2023). The reliability of AI-driven simulations is often undermined by opaque algorithmic processes and unverifiable data inputs (Topol, 2019). These issues complicate clinical validation and hinder trust in digital twin outputs.

Ethics

Ethical concerns surrounding digital twins are multifaceted. Key issues include patient autonomy, informed consent, and the risk of algorithmic bias (Morley et al., 2020). The concept of “representation” is central how accurately and fairly a digital twin reflects the individual it simulates (Floridi et al., 2018; Morley et al., 2020). There is also concern that digital twins may exacerbate existing social inequalities if not designed inclusively. Dynamic consent models are being explored to give individuals more control over their digital representations (Kalkman et al., 2022).

Safety

Digital twins introduce new dimensions of clinical risk that traditional safety frameworks may not fully address. Errors in simulation models whether due to flawed data, incorrect assumptions, or software bugs can lead to inappropriate clinical decisions if not properly validated (Topol, 2019). The dynamic and adaptive nature of digital twins also complicates traceability and root cause analysis when adverse events occur. Moreover, over-reliance on digital simulations may lead to automation bias, where clinicians defer to model outputs even when they conflict with clinical judgment (Morley et al., 2020). Ensuring safety requires rigorous validation protocols, continuous monitoring, and clear accountability structures for both developers and clinical users.

Governance

Governance frameworks for digital twins are still emerging. Regulatory lag is a major issue, with current medical device laws often failing to account for the complexity and adaptability of digital twin systems (Morley et al., 2020). There is a pressing need for standardized protocols, ethical oversight, and cross-sector collaboration to guide development and deployment (Vallée, 2023).

Artificial Intelligence and Healthcare Workflows

Hospital wide process model

Ahlin, Almström and Wänström (2022) present a hospital-wide process model that maps the typical flow of patients through departments such as the emergency department (ED), intensive care unit (ICU), preoperative unit, operating room (OR), and post-anaesthesia care unit (PACU). The model identifies common throughput barriers—points where inefficiencies or delays can disrupt care and increase operational costs.

The hospital-wide process model (ED = emergency department, ICU = intensive care unit, Pre-Op = preoperative unit, OR = operating room, PACU = post-anaesthesia care unit).

Given the complexity and cost of hospital operations, process optimization is a key priority. AI offers a range of tools that can assist in improving efficiency and supporting both clinical and administrative decision making.

The table below outlines key steps in administrative and clinical workflows, along with examples of how AI can assist. These applications range from predictive analytics and natural language processing (NLP) to robotic process automation (RPA) and clinical decision support systems (CDSS).

Administrative step	Clinical step	Potential AI assistance
Resource allocation		ML models to predict ahead of time in response to seasonal or emergency events, the resource needs, and optimize the allocation of medical staff and equipment.
Patient registration	Medical history taking	AI-driven chatbots for initial patient information collection, reducing wait times, and administrative burden.
Appointment scheduling		AI scheduling systems to coordinate appointments and ensure continuity of care after discharge.
Patient engagement		AI-based personalised education programs to inform patients about their health conditions and treatment options. Sentiment analysis on patient feedback to improve service quality and patient satisfaction.
Insurance verification (not relevant in some countries)		Automated verification systems to quickly validate insurance coverage and streamline the billing process.
	Initial physical examination and diagnosis, diagnostic testing (e.g., labs, imaging) and treatment planning	CDSS to support diagnosis, AI RPA for speeding up test ordering and result delivery.
Room allocation		AI algorithms to optimise room allocation based on patient needs and hospital capacity.
	Treatment and medication management	AI-assisted treatment and AI for medication optimisation according to guidance clinical and administrative parameters.
Surgical scheduling	Surgical procedures	AI optimised scheduling and AI for surgical training.
	Postoperative care, monitoring and vital sign tracking	AI-driven monitoring systems for real-time vital sign tracking and alerting clinicians to any anomalies.
Record keeping	Clinical documentation	NLP for clinical documentation and record keeping.
Discharge planning	Discharge planning	Predictive analytics to identify patients at risk for readmission and tailor discharge plans accordingly.
	Rehabilitation and Recovery	AI-assisted rehabilitation and recovery tools and systems
Billing and coding, data entry, reporting and compliance		AI-powered tools for accurate medical coding and billing, reducing errors, and speeding up reimbursement. Automated data entry and EHR management to maintain accurate patient records.

Artificial Intelligence and Patient Registration

AI applications

AI is beginning to play a role in improving this experience. While peer-reviewed evidence is still emerging, vendors and early adopters are beginning to report benefits.

Identifying issues

Authenticx (2024) claim that AI can help detect patterns in appointment scheduling such as frequent conflicts or bottlenecks—and suggest targeted improvements to reduce delays and enhance patient satisfaction.

Scheduling

It is claimed AI systems can analyze interactions and provide real-time feedback to staff, helping them implement best practices and optimize access to care according to Authenticx (2024).

Automation

AI tools now analyse patient history and provider availability to streamline scheduling and reduce administrative burden. According to Hyro n.d. this allows for automating prior authorizations and predicting optimal appointment time.

Challenges

While AI offers clear benefits in streamlining registration and scheduling, several technical and administrative challenges persist.

Integration with existing systems

AI tools must work seamlessly with Electronic Health Records (EHRs) and legacy scheduling platforms. Poor integration can lead to fragmented workflows and data silos, undermining efficiency (Bates, 2021).

Data quality standardization

AI systems depend on high-quality, structured data to operate effectively. Poor or inconsistent data can lead to inaccurate predictions and unintended consequences (Obermeyer et al., 2019).

Staff training and change management

Administrative staff need training to understand and trust AI systems. Without proper onboarding, staff may resist adoption or misuse the tools, leading to errors or inefficiencies (Ross et al., 2016).

Configuration complexity

AI tools require careful configuration and human oversight to avoid errors and unintended consequences. Poorly designed systems can misallocate resources or fail to account for important context (Lekadir et al., 2025).

Overreliance on automation

There’s a risk of placing too much trust in AI-generated decisions. Without human oversight, systems may overlook context-specific factors, especially in complex or urgent scheduling scenarios (Goddard et al, 2012).

Real-time adaptability

AI systems require strong infrastructure and oversight to adapt to real-time changes and avoid failures. (Soman et al. , 2025).

Artificial Intelligence and Patient Engagement

Patient portals

A key tool enabling this shift is the patient portal, defined as "A patient portal is a website for your personal health care. This online tool helps you keep track of your health care provider visits, test results, billing, prescriptions, and so on. You can also message your provider questions through the portal" (Mount Sinai Health System, n.d.). Patient portals have evolved from simple information systems into interactive platforms that support communication, scheduling, and care management.

AI applications

The integration of AI into these portals creates opportunities to increase patient engagement and improve outcomes.

Conversational AI and NLP

Platforms like Hyro and Mytonomy use AI to personalize communication, triage queries, and provide 24/7 support. These tools purport:

Reduce missed appointments
Increase monthly active users
Improve medication adherence
Boost patient satisfaction scores

Generative AI

Patel and Lam (2023) argue that generative AI tools like ChatGPT demonstrate impressive capability in rapidly producing high‑quality, standardized discharge summaries, reducing clinicians’ administrative burden and potentially improving patient‑care transitions.

Predictive analytics tools are also helping reduce hospital readmissions by identifying at-risk patients and triggering proactive outreach (Frizzell et al., 2017).

Challenges

Despite these advances, trust remains a concern.

Don't trust AI

A recent study found that 65.8% of patients do not trust their healthcare system to use AI responsibly.

AI might cause harm

While 57.7% worry that AI tools might cause harm.

Artificial Intelligence in Operations and Resource Management

AI applications

Organizational implementation

AI offers opportunities for hospital operations to improve how patients are scheduled, rooms are allocated, and clinical resources are coordinated. Dawoodbhoy et al. (2021) studied NHS acute mental health inpatient units and identified multiple ways AI could improve patient flow. However, they emphasized that successful implementation requires broader organizational changes, including improvements in information governance and culture aligned with ethical and regulatory frameworks.

Valued AI functionalities

Jansson et al. (2022), in a survey of five university hospitals, found that the most valued AI functionalities were care pathway planning, patient profiling, and shared resource coordination. These findings align with wider literature showing that AI-enhanced scheduling systems can identify modifiable risk factors and stratify patients to optimize preventive measures before treatment.

Patient flow

In the UK, hospitals such as Kettering General have trialed AI tools to support bed managers by forecasting demand and suggesting optimal bed assignments. These systems aim to reduce unnecessary patient transfers and improve continuity of care, while maintaining human oversight (Accelerated Capability Environment, 2023).

Scheduling

AI platforms analyze appointment types, staff availability, and patient demand to optimize scheduling in real time. Jannsson et al. (2022) argue that AI systems can reduce patient wait times and improve in staff utilization.

Administrative support

Emerging generative AI tools allow administrators to interact with scheduling systems using natural language. According to Simbo AI (2025) this enables more intuitive queries such as checking room availability or coordinating multidisciplinary teams without relying on rigid, rules-based systems.

Challenges

Despite these promising developments, several challenges remain.

System integration

Many hospitals still operate with fragmented IT systems, making it difficult for AI tools to access and synthesize the data needed for accurate predictions (Soman et al., 2025).

Data quality and governance

AI systems depend on high-quality, standardized data. Inconsistent or incomplete records can undermine performance and erode trust in the system (Accelerated Capability Environment, 2023).

Staff training and adoption

Successful implementation requires cultural change and upskilling of administrative and clinical teams. Without adequate training, staff may resist or underutilize AI tools (Simbo AI, 2025).

Transparency and explainability

AI systems must be transparent in how they generate recommendations. Tools that operate as "black boxes" risk losing the trust of clinicians and administrators (Accelerated Capability Environment, 2023)

Foundations of Healthcare Quality and Trust

Understanding and evaluating healthcare quality

Quality in healthcare refers to the degree to which health services increase the likelihood of desired outcomes and align with current professional knowledge (Lohr, 1990) This definition highlights that quality is multifaceted and extends beyond safety alone.

Six Pillars of Healthcare Quality

Healthcare quality is built on six essential pillars that guide safe, effective, and equitable care. These principles, introduced by the Institute of Medicine (2001), help evaluate performance and improve outcomes.

Safety

Refers to the avoidance of preventable harm to patients. High-quality care ensures that systems and practices are in place to minimise risks and errors.

Effectiveness

Means that care is based on the best available scientific evidence and is provided to those who are likely to benefit, while avoiding unnecessary or ineffective interventions.

Patient-centredness

Emphasises respect for and responsiveness to individual patient preferences, needs, and values. It ensures that these values guide all clinical decisions.

Timeliness

Involves reducing delays in care delivery. High-quality care ensures that patients receive treatment without unnecessary waiting, and that providers can act promptly.

Efficiency

Focuses on the optimal use of resources. This includes avoiding waste in materials, time, energy, and effort, while maintaining or improving care outcomes.

Equity

Ensures that care is delivered fairly and without discrimination. High-quality care does not vary in quality due to personal characteristics such as gender, ethnicity, geographic location, or socioeconomic status.

Measuring healthcare quality

According to Hannawa et al., (2022) each pillar can be evaluated using specific outcome measures.

Safety

Adverse event rates, medication errors, incident reporting, resilience assessments, and continuous quality improvement activities.

Effectiveness

Clinical outcomes, adherence to guidelines, treatment success rates, and medication adherence.

Patient-centredness

Patient experience and communication quality, and patient-reported outcomes.

Timeliness

Wait times, time to treatment, and discharge efficiency.

Efficiency

Resource utilisation, healthcare spending, waste reduction, and staff productivity.

Equity

Access to care, health disparities, cultural fairness, and satisfaction across demographic groups.

These measures help healthcare organizations assess how well they are delivering care and identify areas for improvement.

What is safe healthcare?

Safety in healthcare refers to making evidence-based clinical decisions that maximize health outcomes while minimizing the potential for harm. This includes avoiding both errors of commission (doing something wrong) and omission (failing to do something necessary). This foundational definition was introduced by the Institute of Medicine (2004) and remains widely used in healthcare safety science.

Safety-I and Safety-II

Building on this foundation, Hollnagel, Wears, and Braithwaite (2015) proposed two complementary perspectives on safety.

Safety-I

Safety-I defines safety as the absence of adverse outcomes and focuses on identifying and eliminating causes of failure. It is reactive, aiming to prevent things from going wrong.

Safety-II

In contrast, Safety-II views safety as the ability to ensure that as many things as possible go right. It is proactive, emphasizing system resilience and the capacity to adapt to changing conditions.

Swiss Cheese Model

The Swiss Cheese Model, developed by Reason (2000), illustrates how errors can occur in complex systems and emphasizes the importance of strengthening systems to prevent the alignment of vulnerabilities and reduce harm. Healthcare systems rely on multiple safeguards such as protocols, training, and technology that aim to prevent errors. Each of these layers has vulnerabilities or "holes", which may arise from human error or latent system flaws. Typically, these weaknesses don’t align, allowing the system to intercept potential errors before they cause harm. However, when holes temporarily align across layers, an error can pass through all defenses and reach the patient.

Ethical considerations

Achieving high-quality healthcare requires more than technical excellence it demands ethical integrity. The pillars of quality are underpinned by core ethical principles. For example:

Safety aligns with non-maleficence

Effectiveness with beneficence

Patient centeredness with respect for autonomy

Equity with justice

These ethical foundations are not separate from quality they are its rationale and moral compass. As healthcare systems evolve and integrate technologies like AI, ethical considerations such as transparency, accountability, and fairness become increasingly vital to ensure that innovations support not undermine the delivery of safe, effective, and equitable care (Nelson et al., 2010). These ethical principles must also be applied to the clinical and operational domains where AI is deployed. For example, AI systems used in patient registration, scheduling, or discharge planning must be designed to uphold fairness, transparency, and accountability. Ethical governance ensures that these tools do not inadvertently introduce bias or compromise patient autonomy in routine workflows.

Artificial Intelligence and Healthcare Quality

Safety

Safety refers to the avoidance of preventable harm to patients. High-quality care ensures that systems and practices are in place to minimize risks and errors (Institute of Medicine, 2004). Hollnagel, Wears, and Braithwaite (2015) expanded this concept through two complementary perspectives: Safety-I, which focuses on preventing things from going wrong, and Safety-II, which emphasizes ensuring that as many things as possible go right. Bates, Levine, and Syrowatka (2021) identified eight areas where AI could mitigate risks, including:

Adverse drug events

Diagnostic errors

Health care associated infections

For example, Davoudi et al. (2019) demonstrated how pervasive sensing and AI-driven monitoring in intensive care units can detect early signs of delirium, enabling timely intervention.

Incident reporting

AI also supports incident reporting, root cause analysis (RCA), and continuous quality improvement (CQI), helping healthcare organizations identify patterns and prevent future harm.

Patient-centeredness

Patient centeredness emphasizes respect for and responsiveness to individual patient preferences, needs, and values. It ensures that these values guide all clinical decisions (Institute of Medicine, 2001).

Personalized care

AI can support personalized care planning by tailoring recommendations to individual patient profiles (Matheny et al., 2020).

Chat tools

Tools like Ada Health, an EU-MDR certified symptom assessment app, help patients understand their symptoms and guide them toward appropriate care (Ada Health GmbH, n.d.). In a 2022 trial, patients and clinicians endorsed the tool for facilitating conversations and improving usability (Scheder-Bieschin et al., 2022).

Effectiveness

Effectiveness means that care is based on the best available scientific evidence and is provided to those who are likely to benefit, while avoiding unnecessary or ineffective interventions (Institute of Medicine, 2001).

Reduce medication errors

AI enhances effectiveness through tools like Clinical Decision Support Systems (CDSS), which can reduce medication errors by recognising lookalike or sound-alike drugs and flagging potential interactions or incorrect dosages (Ferrara et al., 2024).

Predicting infections

Machine learning algorithms also show promise in predicting conditions such as sepsis and surgical site infections, though their performance depends heavily on the quality and relevance of training data (Fleuren et al., 2020).

Timeliness

Timeliness involves reducing delays in care delivery. High-quality care ensures that patients receive treatment without unnecessary waiting, and that providers can act promptly (Institute of Medicine, 2001).

Efficiency

AI applications have shown promise in improving the speed of diagnosis, triage, and administrative processes. Park et al. (2020) found that many AI tools focus on improving timeliness and efficiency.

Reduced waiting times

Reddy et al. (2019), described how AI systems linked to hospital servers can reduce waiting times in emergency departments by analysing patient data and posing queries.

Efficiency

Efficiency focuses on the optimal use of resources, including avoiding waste in materials, time, energy, and effort (Institute of Medicine, 2001).

AI can improve efficiency by automating administrative tasks, such as claims processing and coding audits. Davenport and Kalakota (2019) noted that machine learning can match data across databases to identify and correct errors, saving time and money for insurers and providers.

Streamline workflows

Operational efficiency is also enhanced through AI systems that streamline workflows and reduce manual burdens. However, as with timeliness, systemic barriers must be addressed to realise these benefits fully (Panch et al., 2019).

Equity

Equity ensures that care is delivered fairly and without discrimination, regardless of personal characteristics such as gender, ethnicity, geographic location, or socioeconomic status (Institute of Medicine, 2001).

Bias

AI has the potential to both support and undermine equity. Chen et al. (2021) identified multiple points in the machine learning pipeline where bias can be introduced, leading to underperformance on women, ethnic minorities, and publicly insured patients. They also highlighted examples of language models producing discriminatory outputs based on race.

Exacerbate inequity

Koehle et al. (2022) warned that digital health technologies may exacerbate disparities if equity is not considered in design and implementation. Richardson et al, 2022) proposed a framework for digital health equity, emphasising factors such as digital literacy, self-efficacy, and access.

Ethical considerations

The integration of artificial intelligence into healthcare systems introduces not only technical and operational challenges but also profound ethical responsibilities

Future AI development

AI must be developed and deployed in ways that uphold the core ethical principles of healthcare: beneficence, non-maleficence, autonomy, and justice. These principles are essential for ensuring that AI supports not undermines the six pillars of healthcare quality.

Opaque algorithms

For example, opaque algorithms may compromise transparency and patient autonomy, while biased training data can threaten equity and fairness. Ethical frameworks such as the FUTURE-AI guideline provide structured guidance for ensuring that AI systems are fair, explainable, and robust throughout their lifecycle

Ethical Governance of Artificial Intelligence in Healthcare

Transparency and reliability

Reliability challenges

One major ethical challenge is verifying the reliability of AI outputs, especially in novel or "edge case" situations, where the AI encounters scenarios that are rare or not well-represented in its training data (Saoudi et al., 2025).

"Black Box" Problem

Additionally, many neural network models operate as "black boxes," meaning they do not easily allow for backward reasoning, complicating the establishment of risk parameters. This lack of transparency and explainability poses significant risks, including potential patient harm from AI errors and algorithmic bias.

AI "Hallucinations"

Another concern with LLM and LMM approaches is their tendency to be confident while also being incorrect. It is unacceptable for a system to generate false diagnoses based on poor training or, worse, to produce post hoc reasoning that is false or misleading about a diagnosis or treatment. The slang term for this phenomenon is "hallucinating," where the model generates words that sound plausible but are not true.

Performance vs. Explainability

Finally, more transparent models like k-Nearest Neighbours and Decision Trees offer faster response times and greater interpretability. Conversely, opaque models like Convolutional Neural Networks and Support Vector Machines achieve higher accuracy. This complicates considerations around transparency and reliability as there is a trade-off between performance and explainability (Rudin, 2019; Tioa and Guan, 2021).

Accountability and governance

Ethical AI requires both technical mechanisms (to ensure transparency and reliability) and structural mechanisms (to ensure oversight and responsibility). Together, these approaches form the foundation for accountable and trustworthy AI in healthcare.

Technological approaches

Ensuring ethical AI in healthcare begins with designing systems that are transparent, interpretable, and responsive to human oversight. A range of technical methods have emerged to address these needs.

Self-Explaining Neural Networks

Embed interpretability directly into the model architecture. These networks are designed to produce both a prediction and a human-readable explanation, satisfying criteria such as explicitness, faithfulness, and stability (Alvarez-Melis & Jaakkola, 2018).

Model-agnostic frameworks

Like LIME and SHAP offer post hoc interpretability. LIME explains predictions by perturbing input data and observing output changes, while SHAP uses game theory to attribute contributions to each input feature (Lundberg et al., 2024). These tools help clinicians and developers understand how decisions are made, even in complex models.

Attention mechanisms

Particularly in transformer-based models, highlight which parts of the input data are most influential in generating outputs. This can be especially useful in clinical contexts where traceability of reasoning is essential.

Layer-wise Relevance Propagation

Layer-wise Relevance Propagation and neuron visualization techniques further enhance interpretability by mapping decision pathways and visualizing them with heatmaps, making it easier to detect anomalies or biases.

Human-in-the-loop

Finally, Human-in-the-Loop approaches integrate domain experts into the development and validation process. These methods ensure that AI systems are not only technically sound but also aligned with clinical reasoning and ethical standards.

Together, these technological approaches form the foundation for ethical AI design. However, they must be embedded within broader governance structures to ensure accountability, fairness, and sustained oversight. We now turn to governance models that formalize these responsibilities.

Governance models

While technological solutions can improve transparency and interpretability, they are not sufficient on their own. Ethical AI requires robust governance structures that define how decisions are made, who is accountable, and how risks are managed over time. Governance is about the systems and processes that ensure AI is developed and deployed responsibly, with clear oversight and alignment to ethical principles.

GMAIH

One influential framework is the Governance Model for AI in Healthcare (GMAIH) proposed by Reddy et al. (2020). It addresses key dimensions such as bias mitigation, transparency, privacy, and safety. The model emphasizes the need for continuous monitoring, ethical review, and stakeholder engagement throughout the AI lifecycle. It also recommends the formation of multidisciplinary governance committees that include clinicians, managers, patient representatives, technical experts, and ethicists.

Ethics by Design

Complementing this is the Ethics by Design approach, which advocates embedding ethical principles into AI systems from the earliest stages of development.

EU AI Act and GDPR

This proactive stance is reinforced by legal frameworks such as the EU AI Act (European Parliament and Council of the European Union, 2024) and GDPR (European Parliament and Council of the European Union, 2016). which mandate transparency, accountability, and data protection in AI applications.

Stakeholder roles

Governance is not just about structures it’s about the people who enact and sustain them. Effective ethical oversight depends on clearly defined stakeholder roles. The FUTURE-AI framework emphasizes multistakeholder collaboration across the entire AI lifecycle from design and development to deployment and monitoring to ensure systems are trustworthy, ethically sound, and aligned with real world clinical needs (Lekadir et al., 2025).

In addition to technical and ethical alignment, successful AI integration requires attention to staff experience and change management. Healthcare professionals must be trained to understand and trust AI tools, with ongoing support to adapt workflows and maintain clinical oversight. Ethical governance should include provisions for staff education, engagement, and feedback to ensure responsible and confident use of AI systems.

Clinicians

Ensure that AI outputs are clinically relevant, safe, and aligned with patient needs. They validate AI recommendations and identify unintended consequences.

Developers

Responsible for building systems that are fair, interpretable, and secure. Their design choices directly affect how ethical principles are operationalized.

Ethicists

Provide guidance on value alignment, risk assessment, and societal impact. They help navigate trade-offs and ensure ethical concerns are addressed.

Regulators

Enforce compliance with legal standards and ensure AI systems meet safety, privacy, and equity requirements.

Patients

Central to ethical AI. Their informed consent, feedback, and lived experiences shape how AI is designed and used, promoting autonomy and trust.

Legal considerations

Issues of transparency and reliability directly impact medical ethics and jurisprudence. Core ethical principles—autonomy, beneficence, non-maleficence, and justice—provide a framework for responsible AI integration.

EU Artificial Intelligence Act

Legislation such as the EU Artificial Intelligence Act (AI Act) (European Parliament and Council of the European Union, 2024) emphasises the importance of incorporating ethical considerations from the earliest stages of AI development. These frameworks mandate transparency, accountability, and human oversight in high-risk AI applications, including those used in healthcare.

General Data Protection Regulation

The General Data Protection Regulation (GDPR) (European Parliament and Council of the European Union, 2016) provides additional protections for patient data used in AI systems, ensuring privacy and consent are maintained.

Ethics by Design

The concept of Ethics by Design operationalizes these principles by embedding them into the architecture and governance of AI systems (European Commission, 2021; Gross et al., 2025). Together, these legal instruments represent a growing international effort to ensure that AI technologies are not only innovative but also safe, fair, and aligned with public interest especially in sensitive domains like health. Moreover, governance frameworks must be adaptable to global healthcare contexts, including low-resource settings. In these environments, infrastructure limitations and data scarcity pose unique challenges to ethical AI deployment. Ensuring equity means designing AI systems that are inclusive, accessible, and responsive to the needs of diverse populations worldwide.