Skip to content

Cheatsheet: Multimodal world models for science

Why biology is harder data than the internet

Section titled “Why biology is harder data than the internet”
PropertyInternet text/imageBiology
Modalities per datasetone or twomany (molecules, microscopy, transcriptomics, proteomics, phenotype, clinical text)
Scale per modalitybillions to trillionsthousands to millions
Cost per examplenear-zeroexpensive (wet-lab time)
Noiselowhigh; replication adds cost
Ground truth for the targetabundantscarce (clinical outcomes are precisely what you want to predict)

The multimodal world model framing for drug discovery

Section titled “The multimodal world model framing for drug discovery”
StepAction
1encode each biological modality into a shared embedding
2train multimodal transformer on co-occurring data (molecule + cell + outcome)
3predict perturbation effects for new molecules or new cell systems
Connection to L7predict semantic biological state, not raw outputs (capacity-on-semantic-structure argument applies even more)
Public exampleNoetik.ai’s OCTO and Perturb-map
Claim typeExampleInstruments to settle
ML benchmark”91% AUC on held-out cell-line response”training/validation/test, AUC/F1/correlation
Clinical”30% improved patient survival vs standard-of-care”randomized trials, clinical endpoints, regulatory review

These are not the same claim and not on the same epistemic ladder.

“Model passes ML benchmark → therefore the drug it identified will work in patients.”

The single most important pitfall in medical AI. Benchmark performance does not establish clinical utility; routing the benchmark through this conflation is the standard medical-AI overreach.

Operational scope test (medical-AI specialization)

Section titled “Operational scope test (medical-AI specialization)”
If the question is settled by…It is…
ML benchmark / training loss / representation quality / generalization testsIN SCOPE (technique)
Clinical trial / regulatory review / standard-of-care framework / patient consent process / clinical-practice judgmentOUT OF SCOPE (different conversation)

Six categories OUT of scope (with instruments)

Section titled “Six categories OUT of scope (with instruments)”
CategoryInstruments
Diagnostic claims / clinical validityclinical trials, gold-standard comparisons
Regulatory frameworkFDA, EMA, sectoral medical regulators
Medical malpractice / standard-of-carelegal precedent, professional medical societies
Patient consent for AI involvementbioethics, patient-advocacy frameworks
Clinical-trial methodology vs ML-evaluation methodologytranslational science
Therapeutic claims (what to prescribe)clinical-practice judgment, evidence-based medicine
TopicWhy
Model architecture (multimodal transformer fusing modalities)technique
Training methodology (multimodal pretraining, contrastive losses, world-model objectives)technique
Benchmark performance (per-task biological metrics)evaluation
Representation quality (transfer to downstream tasks; latent organization)evaluation
Generalization (cross-cell-line, cross-perturbation)evaluation
Compute and data requirementsengineering
PitfallReality
”Model passes ML benchmark → clinically useful”the gap is huge; clinical trials required, not better benchmarks
”Bigger model fixes drug discovery”biology is data-limited, not capacity-limited; bigger overfits more
”Multimodal models handle all biological data uniformly”heterogeneity is real; specialized representations sometimes needed
”World models replace experimentation”they guide experiments, not replace them