Model-phased maps, bias and likelihood

Outline

Importance of the phase

Structure factor probability relationships

Bias component of model-phased map coefficients

SIGMAA map coefficients to reduce model bias

Relationship between targets and maps

Refinement bias

Model bias: importance of phase

Parseval's theorem:

phase change has more effect than amplitude change

The Central Limit Theorem

When several sources of error are combined, the combined error tends to have a Gaussian probability distribution, regardless of the distributions of the individual sources of error

requires a sufficient number of independent sources of error

none of the single sources of error can dominate

The centroid (mean) of the combined probability distribution is the sum of the centroids of the individual distributions

The variance of the combined probability distribution is the sum of the variances of the individual distributions

where the x_j are independent random variables with centroids and variances

where

Structure factor probabilities

Wilson distribution in P1

2-dimensional random walk: apply central limit theorem

Sim distribution in P1

2-dimensional random walk starting at F_P

Effect of variable coordinate errors

Centroid for individual atomic contribution is non-zero

d_j = Fourier transform of p(

x_j)

equivalent to smearing the atom over all of its possible positions

Individual variances depend on size of errors

Structure factor distribution: general treatment

Central limit theorem valid under many conditions

F has Gaussian distribution centered on DF_C

D includes effects of:

difference in position or scattering factor

missing atoms

difference in overall scale or B-factor

variance given by

where is expected intensity factor

Acentric:

Centric:

With E values, plays the role of D, and the variance is

eliminate overall scale and B-factor effects

Estimating

Deduce from the values of |F_O| and |F_C|

maximize the likelihood function

p(|F_O|;|F_C|) is derived from p(F_O;F_C) by integrating over all possible phase differences

use log likelihood for convenience

For cross-validation data, add smoothing terms to log likelihood function

smoothing parameter s is a target standard deviation;
0.05-0.10 is a suitable range for s

penalty for a value deviating from the value interpolated from its neighbours is given by:

Figure of merit weighting for model phases

Blow and Crick (1959) and Sim (1959) showed that the map with the least rms error is calculated from centroid structure factors.

Model bias in fom-weighted maps

Generalize derivation by Main (1979)

Start from cosine law

Take expected values

Solve for fom-weighted map coefficient

Solve for approximation to true FO

SIGMAA

Calculate phase probabilities from |F_O|, |F_C|, _C

Combine MIR phase information using Hendrickson-Lattman coefficients

p() exp[A cos() + B sin() + C cos(2) + D sin(2)]

Produce 4 types of map coefficients:

3) combined phase map coefficients

Refinement targets and difference maps

In refinement, adjust model to optimize refinement target

Atoms are moved to shift F_C in a direction that improves target

done by taking derivative of target wrt atomic parameters

Derivative of target wrt F_C shows what direction to change the calculated structure factor to improve agreement

least-squares target should decrease, so move in opposite direction to derivative

likelihood target should increase, so move in direction of derivative

Fourier transform of (+/-) derivative shows the corresponding derivative of the electron density

such a map shows where the target would prefer more or less electron density

e.g. least-squares:

e.g. log likelihood target (ignoring measurement error)

consider effect of adding this to model:

Refinement bias

Statistical treatment depends on independent errors in all the atoms

Proteins have too many parameters/observation

diffraction data can be "over fit"

compensating errors give misleading agreement

Deal with refinement bias by:

using cross-validated

reducing weight on amplitude agreement

exploiting NCS, if present

"omit refinement"

examining combined phase and MIR maps