Virtual phase III trials are conducted before the actual phase III trial, in order to evaluate the data from the phase I/II trial for safety and efficacy. Validating such virtual trials requires novel methods, and here we are presenting three such methods.
Matching patients using 4D generative AI
In order for single-arm trials to be complemented with a control arm, it has to be proven that the complement match the original arm. When using external control arms, this is done by for example using the propensity matching score on inclusion criteria, clinical data and biomarkers. A reflection paper for evidence-generation with external controls is being European Medicine Agency is under review. For virtual control arms, the challenge is more complex: the synthetic patients must not only be statistically similar to the real cohort, they must also behave like them across time. 4D generative models enable this by learning disease trajectories. The predictions from a 4D generative AI can be classified into observable and non-observable treatment scenarios. These two scenarios are also called factual and counterfactual treatment scenarios. The ground truth is only available for the observable treatment scenario. Changing any parameter, such as confounding variables, treatment dose, therapy type etc, changes the disease trajectory and it becomes of the the type non-observable and not verifiable with a ground truth. For 4D generative AI, we are here listing what kind of validation methods that could be applicable (more details found here):
- Patient-level consistency checks
We propose a patient-level verification can be done with a novel method we call blind treatment assignment test. This method is generating both observable and non-observable treatment scenarios, and the closest prediction to the ground truth is classified as the predicted treatment assignment. We call the method blind, since the 4D generative AI model does not know in advance with therapy that has been assigned. By doing so, it is assured that the 4D generative AI model is having a performance at least within the risk-benefit difference window between the two therapies. - Population-level consistency checks
We also propose a population-level verification method we call non-observable randomized clinical trial benchmarking. This method is replicating historical clinical trials by letting the 4D generative AI model make only non-observable predictions for a new patient population. By doing so, it is assured that the 4D generative AI model is taking into account the confounding variables for assuring reliable performance. - Subgroup-level consistency checks
We also propose a patient stratification for evaluating which subgroups that are predicted to benefit the most of a new therapy. By doing so, we are not only considering population-level benefits but also how the predictions are performing on subgroups, which enables checking for bias in the training data for the AI model.
Conclusions
4D generative AI provides a powerful framework for creating virtual control arms and simulating treatment outcomes in clinical trials. By leveraging patient-level, population-level, and subgroup-level consistency checks, it is possible to validate predictions in both observable (factual) and non-observable (counterfactual) scenarios. These validation methods help ensure that the model accurately captures disease trajectories over time, accounts for confounding variables and heterogeneity in patient populations, and minimizes bias across subgroups while reflecting realistic treatment effects.