Model-phased maps, bias and likelihood

Outline

  • Importance of the phase
  • Structure factor probability relationships
  • Bias component of model-phased map coefficients
  • SIGMAA map coefficients to reduce model bias
  • Relationship between targets and maps
  • Refinement bias
  • Model bias: importance of phase

    Parseval's theorem:

    phase change has more effect than amplitude change

    The Central Limit Theorem

    When several sources of error are combined, the combined error tends to have a Gaussian probability distribution, regardless of the distributions of the individual sources of error

    requires a sufficient number of independent sources of error

    none of the single sources of error can dominate

    The centroid (mean) of the combined probability distribution is the sum of the centroids of the individual distributions

    The variance of the combined probability distribution is the sum of the variances of the individual distributions

    where the xj are independent random variables with centroids and variances

    where

    Structure factor probabilities

    Wilson distribution in P1

    2-dimensional random walk: apply central limit theorem

    Sim distribution in P1

    2-dimensional random walk starting at FP

     

    Effect of variable coordinate errors

    Centroid for individual atomic contribution is non-zero

     

     

    dj = Fourier transform of p(xj)

    equivalent to smearing the atom over all of its possible positions

    Individual variances depend on size of errors

    Structure factor distribution: general treatment

    Central limit theorem valid under many conditions

    F has Gaussian distribution centered on DFC

    D includes effects of:

    difference in position or scattering factor

    missing atoms

    difference in overall scale or B-factor

    variance given by

    where is expected intensity factor

    Acentric:

    Centric:

    With E values, plays the role of D, and the variance is

    eliminate overall scale and B-factor effects

    Estimating

    Deduce from the values of |FO| and |FC|

    maximize the likelihood function

    p(|FO|;|FC|) is derived from p(FO;FC) by integrating over all possible phase differences

    use log likelihood for convenience

    For cross-validation data, add smoothing terms to log likelihood function

    smoothing parameter s is a target standard deviation;
    0.05-0.10 is a suitable range for s

    penalty for a value deviating from the value interpolated from its neighbours is given by:

    Figure of merit weighting for model phases

    Blow and Crick (1959) and Sim (1959) showed that the map with the least rms error is calculated from centroid structure factors.

    Model bias in fom-weighted maps

    Generalize derivation by Main (1979)

    Start from cosine law

    Take expected values

    Solve for fom-weighted map coefficient

    Solve for approximation to true FO

     

    SIGMAA

    Calculate phase probabilities from |FO|, |FC|, C

    Combine MIR phase information using Hendrickson-Lattman coefficients

    p() exp[A cos() + B sin() + C cos(2) + D sin(2)]

    Produce 4 types of map coefficients:

    1)

    2)

    3) combined phase map coefficients

    4)

    Refinement targets and difference maps

    In refinement, adjust model to optimize refinement target

    Atoms are moved to shift FC in a direction that improves target

    done by taking derivative of target wrt atomic parameters

    Derivative of target wrt FC shows what direction to change the calculated structure factor to improve agreement

    least-squares target should decrease, so move in opposite direction to derivative

    likelihood target should increase, so move in direction of derivative

    Fourier transform of (+/-) derivative shows the corresponding derivative of the electron density

    such a map shows where the target would prefer more or less electron density

    e.g. least-squares:

    e.g. log likelihood target (ignoring measurement error)

    consider effect of adding this to model:

    Refinement bias

    Statistical treatment depends on independent errors in all the atoms

    Proteins have too many parameters/observation

    diffraction data can be "over fit"

    compensating errors give misleading agreement

    Deal with refinement bias by:

    using cross-validated

    reducing weight on amplitude agreement

    exploiting NCS, if present

    "omit refinement"

    examining combined phase and MIR maps