RQS Blog

FDA Expectations for AI/ML Model Training in SaMD (2025 guide)

Navigating AI/ML Training in the Premarket Context: What Device Makers Need to Know

Key Pitfalls & Best Practices for AI/ML-Enabled SaMD / SiMD Under Current FDA Thinking

 

Introduction

Artificial intelligence and machine learning (AI/ML) are increasingly embedded in medical-device software, bringing promising opportunities for better diagnostics, monitoring, personalization and outcomes. However, the same adaptive-data, algorithm-driven approach that makes AI/ML compelling also introduces unique risks and regulatory considerations, especially in the premarket context for SaMD/SiMD. As manufacturers seek to bring AI/ML-enabled devices to market, training the underlying model(s) properly — and documenting that training — has become a critical differentiator.

At Rook Quality Systems (Rook), leveraging our teams’ subject-matter expertise, we have observed several training-phase traps and regulatory “watch-points” that device makers must address to move smoothly from concept to cleared or approved product. In this blog, we highlight the top issues to watch out for when training your model(s) in the premarket phase and align them with the latest FDA guidance and trends.


 

Why Training Matters Now More Than Ever

First, to set the stage: the FDA has published several key documents in the past year that sharpen its expectations for AI/ML-enabled medical devices:

 

Against this backdrop, training becomes more than just a “model gets data, builds weights, we test it” exercise. It becomes a regulated activity, with requirements for documentation, bias analysis, monitoring, version control, change-management planning (such as a Predetermined Change Control Plan, or PCCP), and alignment to declared model performance. If any of those links break, pre-market review may stall, or worse, your cleared product may face post-market issues.

 

Top 6 Training-Phase Watch Points for AI/ML-Enabled SaMD/SiMD

Here are six key areas device makers should focus on during the training phase of an AI/ML device, each tied to both regulatory expectations and practical insights from Rook’s subject matter experts.

Define and document your data lineage, split strategy, and risk context

  • Ensure you can trace every data source used in training (and validation) back to its origin, justification, inclusion/exclusion criteria, and whether it reflects real-world, intended-use populations (including demographic diversity). The FDA’s draft guidance emphasizes transparency around “data lineage/splits” and performance tied to claims.

  • Specify how you split data (training vs validation vs test) and whether any hold-out sets represent future or external populations. Bias and drift risks often originate here.

  • Map your model’s intended use (clinical context, operator, workflow) and risk classification (e.g., IMDRF risk category, FDA class) clearly up front. Without that, the agency may ask for more justification.

Tip from RQS: Ensure you keep a “versioned data catalog” and log which data version was used for which model version, so that later you can correlate performance degradation or drift back to dataset changes.

Tight linkage between model architecture/logic and clinical claim

  • While training, you should document not only “we used a neural network” but which architecture, why that architecture, how pre-processing was done, what hyper-parameters were chosen, and what performance baseline you targeted. The FDA draft guidance expects model overview + intended use + architecture diagram.

  • For SaMD/SiMD, you must demonstrate that the model logic supports your clinical claim (e.g., detection of feature X, risk-stratification of patient Y) and that the training regimen was appropriate to deliver that claim with acceptable error rates, sensitivity/specificity, etc.

Tip from RQS: Provide a “model traceability matrix” linking clinical risk, model inputs, algorithm structure, and output thresholds. This matrix will help reviewers (internal and regulatory) follow the logic end-to-end.

Evaluate and mitigate bias, ensure subgroup-performance equity

  • The FDA emphasizes that for AI/ML devices, particularly those that adapt/learn, manufacturers must assess whether performance is consistent across relevant demographic groups (e.g., age, sex, race/ethnicity) and should describe any mitigations undertaken.

  • In the training phase, you should conduct robustness tests (e.g., under-represented sub-populations, out-of-distribution data, rare cases) and include these results in your submission.

    Tip from RQS: Build into your training regimen challenge sets designed to stress test under-represented strata and document the outcome (even if the results are “we did not have enough samples, so plan for post-market monitoring”). This level of documentation signals proactivity.

Document your release version vs locked vs adaptive model strategy

  • One of the biggest regulatory “watch-points” is whether your model is considered a “locked” algorithm (i.e., outputs unchanged for the same inputs) or an adaptive model that will evolve post-market. The FDA’s site states the traditional paradigm was not designed for adaptive AI/ML.

  • If your model will or might change post-market, you must build (and submit) a Predetermined Change Control Plan (PCCP) that describes the types of changes you anticipate (SaMD Pre-Specifications) and the algorithm change protocol. The FDA recently issued final guidance on PCCPs. 

  • During the training phase, you should establish clearly: what is your “first version” release model, what performance baseline you’re establishing, and what types of model updates you foresee (e.g., adding new classes, improving sensitivity, expanding geographies). Then document how you will validate those changes and when a new submission may be required.

    Tip from RQS: Create a “Model Evolution Roadmap” early, even if updates aren’t happening now. It becomes a companion to the training documentation and helps ensure you’ll be ready when real-world usage triggers update needs.

Build monitoring and feedback loops from the start

  • The FDA’s January 2025 draft guidance emphasizes the importance of managing risk through the total product lifecycle, including post-market monitoring of performance.

  • Even in the pre-market training phase, you should design your dataset and model validation to incorporate real-world usage characteristics. For example: Will you monitor drift, capture misclassifications in the field, update thresholds, manage distribution shifts? Document the plan.

  • Pre-specify the monitoring metrics during model training (e.g., baseline false-positive/false-negative rate, calibration drift, domain shift indicators) to ensure you can compare field performance back to your trained model.

    One common trap: Training with pristine data only, without capturing real-world noise/variation (e.g., different imaging devices, patients with comorbidities, site workflows). That leaves you vulnerable to performance degradation once deployed.


Quality system, change control, and documentation discipline

  • AI/ML-enabled devices still sit inside the broader medical-device regulatory framework: quality systems (QSR), change control, software validation, risk management (ISO 14971), cybersecurity, labeling, and human-factors. The FDA draft guidance explicitly links AI training/documentation to these system-controls.

  • During training you should maintain version control of datasets, code, architectures, evaluation metrics, and model weights. Document all changes, approvals (internal and external if applicable), audit trails, and tie these into your software lifecycle documentation.

  • Also, ensure labeling reflects the version you trained, and if you plan updates via a PCCP, that labeling states that. Does your user know “version 1.0 uses model trained on data set X using architecture Y”? Clarity helps regulatory reviewers and builds trust.

    Tip from RQS: Treat your data, code, and model artifacts as regulated components, just like hardware change control. If you cannot trace what dataset was used for what model version, you risk regulatory questions or worse, post-market findings.

Additional Strategic Considerations for Manufacturers

 

   Engage early and often with the FDA. 

The agency repeatedly emphasizes early engagement for AI/ML-enabled devices. The January 2025 draft guidance “encourages sponsors to engage with the agency early and often.”

   Treat training as part of your clinical claim narrative. 

Model training isn’t just a “data science step” — it is the backbone of your validation, safety, and effectiveness story. Present your training approach in your submission (and internal review) with full transparency.

   Plan for real-world feedback and deployment complexity. 

Even the best training cannot capture all real-world variability (site differences, patient population shift, device drift). By building in monitoring and update plans, you reveal maturity and forethought.

   Don’t underestimate “explainability” and user-workflow integration. 

The FDA’s transparency principles for ML-enabled medical devices emphasize that users (clinicians, patients) should be provided appropriate information, including logic or explainability to the extent practicable.  For training, this means you should think about how the model’s outputs will integrate into the workflow, how you validated that integration, and how you document explanations of key outputs (or limitations).

   Bias and equity are first-order risks. 

If you have a high-performing model in a narrow dataset but haven’t tested for demographic splits, you may get flagged. Better to document known limitations and mitigation strategy upfront than have the agency ask for supplemental data later.

   Prepare change control for the long haul. 

AI/ML models evolve. If you don't plan for how you will update, monitor and control those updates, you risk either lots of regulatory submissions or performance drift. A well-structured PCCP (or roadmap) differentiates you.

   Make training reproducible and auditable. 

Version-control, datasets, seed values, random splits, hyper-parameter logs — these all matter. If you train “offline” without traceability, you’ll back yourself into a corner when someone asks “which model was cleared?” or “what has changed since training?”

   Focus on robustness, not just accuracy. 

Training certain convenient data is easy. Making sure the model will hold up under “messier” real-world conditions is harder, and regulators will expect you to have thought about edge cases, signal noise, device variability, site workflow differences, sample bias, and potential drift.

 

Closing Thoughts

For manufacturers of SaMD/SiMD with AI/ML-enabled functions, the model-training phase is no longer a “backend” afterthought, it is a regulatory, quality-system, clinical-claim, and lifecycle-management cornerstone. The FDA’s evolving guidance reflects this: from a static pre-market model to a total product lifecycle mindset that anticipates updates, monitors real-world performance, requires transparency, and demands documentation and traceability.

By proactively addressing the six watch-points we’ve outlined, data lineage and splits, architecture/logic linkage, bias/subgroup performance, locked vs adaptive model strategy, monitoring/feedback loops, and documentation/change control, you position your device program not just for premarket clearance or approval but for long-term success in the field.

 

 Look to Rook for Model Training Support

At Rook, we believe that the companies that win in this space will treat model training like a first-class regulated activity — structured, documented, auditable, and aligned to lifecycle outcomes. If you’d like to learn more or walk through your training-phase strategy for AI/ML-enabled devices, we’re here to help.

 


 

Content