Modelphased maps, bias and likelihood
Model bias: importance of phase
Parseval's theorem:
The Central Limit Theorem
When several sources of error are combined, the combined error tends to have a Gaussian probability distribution, regardless of the distributions of the individual sources of error
requires a sufficient number of independent sources of error
none of the single sources of error can dominate
The centroid (mean) of the combined probability distribution is the sum of the centroids of the individual distributions
The variance of the combined probability distribution is the sum of the variances of the individual distributions
where the x_{j} are independent random variables with centroids and variances
where
Structure factor probabilities
2dimensional random walk: apply central limit theorem
2dimensional random walk starting at F_{P}
Effect of variable coordinate errors
Centroid for individual atomic contribution is nonzero


equivalent to smearing the atom over all of its possible positions
Individual variances depend on size of errors
Structure factor distribution: general treatment
Central limit theorem valid under many conditions
F has Gaussian distribution centered on DF_{C}
D includes effects of:
difference in position or scattering factor
missing atoms
difference in overall scale or Bfactor
variance given by
where is expected intensity factor
With E values, plays the role of D, and the variance is
eliminate overall scale and Bfactor effects
Estimating
Deduce from the values of F_{O} and F_{C}
maximize the likelihood function
p(F_{O};F_{C}) is derived from p(F_{O};F_{C}) by integrating over all possible phase differences
use log likelihood for convenience
For crossvalidation data, add smoothing terms to log likelihood function
smoothing parameter s is a target standard deviation;
0.050.10 is a suitable range for s
penalty for a value deviating from the value interpolated from its neighbours is given by:
Figure of merit weighting for model phases
Blow and Crick (1959) and Sim (1959) showed that the map with the least rms error is calculated from centroid structure factors.
Model bias in fomweighted maps
Generalize derivation by Main (1979)
Start from cosine law
Take expected values
Solve for fomweighted map coefficient
Solve for approximation to true FO
SIGMAA
Calculate phase probabilities from F_{O}, F_{C}, _{C}
Combine MIR phase information using HendricksonLattman coefficients
p() exp[A cos() + B sin() + C cos(2) + D sin(2)]
Produce 4 types of map coefficients:
1)
2)
3) combined phase map coefficients
4)
Refinement targets and difference maps
In refinement, adjust model to optimize refinement target
Atoms are moved to shift F_{C} in a direction that improves target
done by taking derivative of target wrt atomic parameters
Derivative of target wrt F_{C} shows what direction to change the calculated structure factor to improve agreement
leastsquares target should decrease, so move in opposite direction to derivative
likelihood target should increase, so move in direction of derivative
Fourier transform of (+/) derivative shows the corresponding derivative of the electron density
such a map shows where the target would prefer more or less electron density
e.g. leastsquares:
e.g. log likelihood target (ignoring measurement error)
consider effect of adding this to model:
Refinement bias
Statistical treatment depends on independent errors in all the atoms
Proteins have too many parameters/observation
diffraction data can be "over fit"
compensating errors give misleading agreement
Deal with refinement bias by:
using crossvalidated
reducing weight on amplitude agreement
exploiting NCS, if present
"omit refinement"
examining combined phase and MIR maps