When does wellness AI become a medical device?

March 9, 2026

Reflections from the MHRA AI Airlock simulation workshop

Group of people in business attire at a conference table, discussing a presentation displayed on a screen.

Last week, representatives from across the healthcare and AI ecosystem gathered for the first of three Phase 2 Simulation Workshops as part of the UK Medicines and Healthcare products Regulatory Agency (MHRA) AI Airlock programme.

The session brought together regulators, developers, clinicians, academics, and public and patient representatives to explore one of the most complex issues emerging in digital health: where the boundary sits between wellness AI and AI that becomes a regulated medical device.

Dr Paul Sacher, Founder of Sacher AI, was invited to participate in the workshop as an expert contributor based on his experience developing and evaluating patient facing conversational AI systems. His work spans both academic research and real world deployment of AI in healthcare settings, with a particular focus on clinical and behavioural safety in human AI interactions.

The AI Airlock is the MHRA’s first regulatory sandbox. It is designed to work alongside developers to better understand emerging AI as a Medical Device (AIaMD) technologies and help address regulatory challenges in a responsible, patient centred way.

The programme is currently in its second phase, working with seven AIaMD developers.

Two companies from the cohort took part in the workshop. TORTUS demonstrated its clinician facing AI system designed to support clinical documentation, while Numan presented its digital health platform delivering remote patient care. These demonstrations helped ground the discussion in real product scenarios and practical regulatory questions.

A central theme of the discussion was intended use. In medical device regulation, intended use determines whether a system qualifies as a medical device, how it is classified, what evidence is required, and how it must be monitored once deployed.

For AI systems, particularly large language model based conversational systems, intended use is rarely static.

Products evolve. Features expand. Systems become integrated into clinical workflows. Outputs begin to influence users, clinicians, or treatment decisions in new ways.

A system that initially sits within the wellness category can gradually cross into medical device territory, sometimes without a single clear moment where that shift occurs.

Another major focus of the workshop was post market surveillance.

Regulators increasingly expect companies to demonstrate how they monitor their AI systems once deployed and how they identify, track, and respond to risks over time. This responsibility extends across the full product lifecycle.

For teams deploying conversational AI at scale, this creates practical challenges.

Risks in large language model systems often emerge gradually across many interactions rather than appearing as a single obvious failure. In many cases, relying solely on manual review of conversations or retrospective audits is unlikely to be sustainable.

Monitoring systems therefore need to detect patterns, trends, and behavioural signals across thousands or millions of interactions.

This challenge is becoming increasingly visible across healthcare AI more broadly.

As AI systems become embedded into care pathways, even subtle product changes can shift the regulatory position of a system. Expanding features, integrating with clinical services, or increasing the influence of system outputs can all move a product closer to the regulatory definition of a medical device.

One of the most valuable aspects of the session was the level of practical nuance in the conversation.

Participants discussed how seemingly small design choices or product updates can affect regulatory classification, and how difficult it can be to detect early signals that a system may be drifting across regulatory boundaries.

The workshop also highlighted the growing diversity of AI technologies entering healthcare, including ambient voice technologies and conversational AI systems.

Overall, the workshop offered a thoughtful and grounded look at the real challenges facing teams building AI systems for healthcare today.

The boundary between wellness products and regulated medical devices is becoming increasingly complex. Understanding where that line sits, and how systems may gradually move across it, will be critical for companies developing AI in health and clinical contexts.

At Sacher AI, we work with organisations developing AI systems that interact directly with patients, clinicians, and health consumers. Our focus is on helping teams design, evaluate, and monitor these systems so that safety, behavioural impact, and regulatory expectations are addressed from the earliest stages of development.

As regulatory expectations evolve, building systems that are not only innovative but also demonstrably safe, transparent, and monitorable will become essential.

If your organisation is developing AI systems that interact directly with people in healthcare settings, you can learn more about Sacher AI’s work supporting safe and responsible AI deployment here

Related blog updates

Diverse digital health team reviewing patient engagement data for a weight management platform

The GLP-1 revolution has outpaced the behavioural science

By paul sacher • March 12, 2026

GLP-1 drugs are transforming obesity treatment, but behavioural science is lagging. Why digital health platforms may hold the key to better long term outcomes.

Diverse digital health team reviewing patient support data and treatment journey analytics in a mode

Why GLP-1 weight loss platforms struggle to scale without behavioural support

By paul sacher • March 10, 2026

What digital health platforms are learning as GLP 1 services scale. Why behaviour change support and AI systems matter for retention, safety and operational pressure.

The missing discipline in AI: why behavioural science must shape responsible AI

March 9, 2026

Artificial intelligence systems are increasingly interacting directly with people. They guide health decisions. They answer personal questions. They offer advice, reassurance, and encouragement. In many cases they are now embedded in products people use repeatedly over months or even years. Yet most conversations about AI safety still focus almost entirely on technical performance. Accuracy, bias, privacy, and security dominate the discussion. These are essential considerations. But they are not the whole story. What is still missing in many AI systems is systematic evaluation of behavioural impact. A global call from behavioural scientists Recently, Dr Paul Sacher, Founder of Sacher AI and Research Director at the Behavioral AI Institute, led an international group of behavioural scientists to address this issue. Their open letter, now published in Wellcome Open Research, argues that artificial intelligence systems inevitably influence how people think, feel, decide, and act. Despite this, behavioural effects are rarely treated as a core requirement in AI development, evaluation, or governance. The paper brings together researchers from institutions including Imperial College London, Harvard University, Duke University, the University of Exeter, and the Alan Turing Institute. It outlines where behavioural risks arise in real world AI systems and why these risks deserve far greater attention. Behavioural risks are not edge cases When organisations think about AI risk, they often focus on incorrect outputs or technical failures. However, behavioural risks can emerge even when systems are technically accurate. AI systems influence behaviour through repeated interaction. They shape decision making, motivation, confidence, and emotional responses over time. For example, conversational systems can unintentionally: · encourage over reliance on automated advice · reinforce existing beliefs through personalisation · influence emotional regulation through tone and framing · alter motivation and goal setting behaviour · reduce appropriate help seeking in certain contexts These effects arise through well documented behavioural mechanisms such as automation bias, trust calibration, anthropomorphism, and reinforcement learning from user feedback. Yet most AI evaluation frameworks still prioritise task success and user satisfaction rather than behavioural outcomes. The growing gap between technical success and real world impact This creates a gap between technical performance and real world impact. A system can perform well on benchmarks and still shape behaviour in ways that undermine user wellbeing, decision quality, or long term outcomes. The risk becomes particularly important in domains where AI systems interact with people repeatedly and at scale. Healthcare, education, financial guidance, and emotional support tools are clear examples. In these environments, small behavioural effects can accumulate over time and remain invisible to standard performance metrics. Behavioural science is often missing from the AI lifecycle Behavioural science offers decades of research into how people think, feel, and act. It provides practical tools for understanding trust, influence, decision making, and motivation. However, behavioural expertise is still rarely embedded systematically across the AI lifecycle. In many projects behavioural scientists are not involved in system design, consulted only during ethics review, or brought in late in development when major design choices are already fixed. This often means behavioural risks are identified too late or not assessed at all. What responsible AI should look like The open letter argues that responsible AI must go beyond technical safeguards. Systems that interact directly with people must demonstrate what the authors describe as psychological competence. In practice, this means responding in ways that are emotionally appropriate, behaviourally responsible, and aligned with the user’s needs and context. Achieving this requires several shifts in how AI systems are designed and evaluated. Behavioural assumptions should be made explicit during design. Behavioural expertise should be embedded early in development. Evaluation should assess behavioural outcomes alongside technical performance. Monitoring should continue after deployment as systems evolve and users adapt. These are not theoretical concerns. They are practical requirements for building AI systems that work safely in real world environments. Why this matters for digital health and AI companies At Sacher AI we see this challenge frequently when working with digital health companies building patient facing AI systems. Teams often focus heavily on model performance, prompt design, or system architecture. These are important elements. But once systems begin interacting with people, behavioural dynamics quickly become central to product safety and effectiveness. Tone influences trust. Feedback influences motivation. Personalisation influences decision making. Without deliberate behavioural evaluation, systems can unintentionally nudge users in directions that product teams never intended. For companies operating in regulated environments such as healthcare, this also has implications for governance, compliance, and long term product risk. Bringing behavioural safety into AI development Addressing behavioural risk does not require reinventing AI governance. It requires integrating behavioural science into existing development processes. In practice this can include structured behavioural evaluation during development, adversarial testing of conversational agents, governance frameworks for human facing AI, and ongoing monitoring of behavioural outcomes once systems are deployed. Many organisations are beginning to recognise this gap and are looking for ways to assess behavioural safety earlier in their development pipeline. A broader shift in how we think about AI safety The central message of the open letter is simple. If AI systems influence human behaviour, behavioural science must be treated as foundational infrastructure for responsible AI. Technical safety alone is not enough. Understanding how people interpret, trust, and respond to AI systems is essential for building technology that works safely and effectively in the real world. Read the open letter The full paper is available in Wellcome Open Research . About Sacher AI Sacher AI works with digital health and AI companies to design, test, and deploy human facing AI systems safely and effectively. Our work combines behavioural science, AI engineering, and real world healthcare experience to help organisations build AI systems that are not only technically strong but also clinically and behaviourally safe. If your organisation is developing AI systems that interact directly with patients or users, we are always happy to start a conversation. More information can be found at https://sacher.ai Sacher PM, Michie S, Hauser OP et al. The missing discipline in AI: a call for behavioural science . Wellcome Open Res 2026, 11:152 (https://doi.org/10.12688/wellcomeopenres.25922.1)