Protein Engineering

Predicting thermostability before you run a single DSF experiment

November 28, 2024

Abstract heat visualization over protein structure representing thermostability

Differential scanning fluorimetry (DSF) is a workhorse for measuring enzyme melting temperature — cheap per sample, high throughput, and informative enough for early-stage thermostability screening. Most labs run DSF after expression and purification, which means by the time you see a Tm for a given variant, you've already paid for synthesis, transformation, expression culture, and at least partial purification. For a 96-well DSF plate, that cost is usually $1,200–$3,500 in direct costs depending on your expression system and protein handling time.

If your primary goal is selecting high-Tm variants from a large generated library, running DSF on everything that expresses is the wrong sequence of operations. You need to filter computationally before you synthesize. This post explains how the Fermvyne thermostability prediction head works, what accuracy you should actually expect from it, and when it's reliable enough to change your synthesis decision.

The structure-sequence-stability triangle

Thermostability has a complicated relationship with sequence. Two enzymes with 40% sequence identity can have Tm values 25°C apart. Two variants of the same enzyme differing by a single surface residue can have Tm values within 0.5°C of each other. The primary sequence doesn't encode stability directly — it encodes it through the folded structure, which determines the number and geometry of stabilizing interactions: hydrogen bonds, hydrophobic core packing, disulfide bridges, salt bridges, and loop entropy penalties.

This is why structure-naive sequence-based stability predictors (including some popular ones) have notoriously poor calibration outside their training distribution. They learn correlations between amino acid composition and stability from training sets that over-represent well-studied mesophilic enzymes, and those correlations don't transfer to novel scaffolds or thermophile-like sequences. We've seen labs use sequence-composition predictors to pre-filter libraries, discard sequences that looked like good thermostable candidates because their Ile/Val/Leu content patterns didn't match the mesophilic training set, and then rediscover those sequences empirically six months later when the campaign hit a wall.

How AlphaFold2 structure prediction changes the prediction pipeline

The availability of high-confidence AlphaFold2 structure predictions for essentially any protein sequence changes what's computationally feasible. Instead of predicting stability from sequence composition, we can predict a structure first, then predict stability from structural features that correlate mechanistically with unfolding thermodynamics.

The Fermvyne thermostability head is a supervised regressor trained on ~115,000 experimentally-measured Tm values paired with AlphaFold2-predicted structures. We compute ~180 structural features per variant: residue-level packing density (using Voronoi tessellation of the folded structure), secondary structure fraction and helix capping geometry, buried hydrophobic surface area, polar/charged residue burial ratios, number of salt bridges and disulfide bonds per 100 residues, and loop length distribution. These features are fed into a gradient-boosted tree ensemble, which outputs a point estimate and a 90% confidence interval for Tm.

The confidence interval isn't a fixed-width band — it reflects genuine uncertainty about the structure prediction quality (AlphaFold2 pLDDT scores factor in as a quality weight), about how well the training data covers the EC class in question, and about structural features that the model hasn't seen in high density. A variant with narrow confidence interval ± 4°C is a different signal from one with ± 14°C, and we display both.

What the numbers actually look like

On a held-out validation set of 8,400 enzyme variants not present in training (stratified by EC class and homology cluster to avoid data leakage), the thermostability head achieves:

Median absolute error: 5.2°C across all EC classes
Spearman rank correlation within a campaign (same enzyme family): 0.74–0.82
Accuracy of selecting true top-quartile variants (Tm ≥ 75th percentile of campaign): 61–68% precision at top-20 recall

The rank correlation within a campaign is the most practically relevant number. When you're generating 500 variants of an oxidoreductase and want to order the top 24, you don't need absolute Tm accuracy — you need to rank correctly within that family. A 0.78 within-family rank correlation means your top-24 selection captures roughly 2.2× more high-Tm variants than random selection from the same pool. That translates directly to fewer DSF-confirmation rounds before you have a hit with your target Tm specification.

Where the prediction fails: the honest account

There are enzyme classes and structural contexts where our thermostability prediction is not reliable enough to use as a primary filter. We flag these explicitly in the platform rather than silently returning confident-looking numbers.

Intrinsically disordered regions (IDRs) within otherwise globular enzymes cause systematic errors because AlphaFold2 pLDDT scores for IDRs are low and our structural features are undefined or noisy for those segments. If your target enzyme has a disordered N-terminal extension or flexible loop that's functionally significant, the Tm prediction reflects the structured core and will overestimate stability of variants where that disordered region aggregates during unfolding.

Metal-dependent thermostability is another weak point. Enzymes that rely on structural Zn²⁺ or Ca²⁺ for fold stability have Tm values that depend on metal occupancy in the DSF assay, which varies with assay conditions. Our structural features assume nominal metal coordination from PDB statistics; if your expression or assay conditions lead to partial metal loading, the predicted Tm won't match measured Tm regardless of prediction quality.

Finally, homo-oligomeric enzymes where the thermostability is dominated by interface contacts rather than monomer stability can have poor predictions because our structural feature extraction operates per-chain. We model interface contacts as a secondary feature, but it's a weaker signal than core packing geometry for monomers.

We're not saying DSF is replaceable — it isn't, not yet. What we're saying is that running a computational Tm pre-filter before synthesis reduces the number of synthesis orders that produce variants failing your Tm floor, and the 5.2°C MAE with 0.78 within-family rank correlation is good enough to change synthesis economics meaningfully for most oxidoreductase and hydrolase campaigns.

Practical workflow integration

The way this plugs into a real campaign looks like this: you submit a generation request for EC 1.1.1.X with your substrate SMILES, we return 500 candidate sequences ranked by predicted activity. Each candidate has a thermostability point estimate and confidence interval. If your production process runs at 55°C, you apply a Tm floor filter of 62°C (allowing headroom for prediction error) and reduce 500 candidates to the subset above that floor, then re-rank by activity. You synthesize the top 24 of that filtered subset.

Without the Tm filter, roughly 35–45% of a generated pool will have predicted Tm below 55°C for thermophile-class targets. With the filter applied, you concentrate synthesis budget on the subset most likely to survive your process temperature, without running any wet-lab thermostability experiments until after expression confirms you have active material. DSF then serves as a confirmation screen on a smaller set of validated-active variants rather than as a bulk screen on everything that expresses.

The synthesis cost reduction from this filtering step is often the highest single-step ROI in a typical design-build-test cycle — because you're compressing the number of synthesis-expression-purification-DSF cycles, not just the number of sequences you order. Calendar time savings compound from that. And for labs working under synthesis-slot scarcity at their gene synthesis vendor, the ability to order 24 well-filtered variants instead of 96 loosely-filtered ones is often the constraint that unlocks faster iteration.