Model-phased maps, bias and likelihood
Model bias: importance of phase
Parseval's theorem:
The Central Limit Theorem
When several sources of error are combined, the combined error tends to have a Gaussian probability distribution, regardless of the distributions of the individual sources of error
requires a sufficient number of independent sources of error
none of the single sources of error can dominate
The centroid (mean) of the combined probability distribution is the sum of the centroids of the individual distributions
The variance of the combined probability distribution is the sum of the variances of the individual distributions
where the xj are independent random variables with centroids and variances
where
Structure factor probabilities
2-dimensional random walk: apply central limit theorem
2-dimensional random walk starting at FP
Effect of variable coordinate errors
Centroid for individual atomic contribution is non-zero
|
|
equivalent to smearing the atom over all of its possible positions
Individual variances depend on size of errors
Structure factor distribution: general treatment
Central limit theorem valid under many conditions
F has Gaussian distribution centered on DFC
D includes effects of:
difference in position or scattering factor
missing atoms
difference in overall scale or B-factor
variance given by
where is expected intensity factor
With E values, plays the role of D, and the variance is
eliminate overall scale and B-factor effects
Estimating
Deduce from the values of |FO| and |FC|
maximize the likelihood function
p(|FO|;|FC|) is derived from p(FO;FC) by integrating over all possible phase differences
use log likelihood for convenience
For cross-validation data, add smoothing terms to log likelihood function
smoothing parameter s is a target standard deviation;
0.05-0.10 is a suitable range for s
penalty for a value deviating from the value interpolated from its neighbours is given by:
Figure of merit weighting for model phases
Blow and Crick (1959) and Sim (1959) showed that the map with the least rms error is calculated from centroid structure factors.
Model bias in fom-weighted maps
Generalize derivation by Main (1979)
Start from cosine law
Take expected values
Solve for fom-weighted map coefficient
Solve for approximation to true FO
SIGMAA
Calculate phase probabilities from |FO|, |FC|, C
Combine MIR phase information using Hendrickson-Lattman coefficients
p() exp[A cos() + B sin() + C cos(2) + D sin(2)]
Produce 4 types of map coefficients:
1)
2)
3) combined phase map coefficients
4)
Refinement targets and difference maps
In refinement, adjust model to optimize refinement target
Atoms are moved to shift FC in a direction that improves target
done by taking derivative of target wrt atomic parameters
Derivative of target wrt FC shows what direction to change the calculated structure factor to improve agreement
least-squares target should decrease, so move in opposite direction to derivative
likelihood target should increase, so move in direction of derivative
Fourier transform of (+/-) derivative shows the corresponding derivative of the electron density
such a map shows where the target would prefer more or less electron density
e.g. least-squares:
e.g. log likelihood target (ignoring measurement error)
consider effect of adding this to model:
Refinement bias
Statistical treatment depends on independent errors in all the atoms
Proteins have too many parameters/observation
diffraction data can be "over fit"
compensating errors give misleading agreement
Deal with refinement bias by:
using cross-validated
reducing weight on amplitude agreement
exploiting NCS, if present
"omit refinement"
examining combined phase and MIR maps