## Model-phased maps, bias and likelihood

#### Outline

• Importance of the phase
• Structure factor probability relationships
• Bias component of model-phased map coefficients
• SIGMAA map coefficients to reduce model bias
• Relationship between targets and maps
• Refinement bias
• ## Model bias: importance of phase

Parseval's theorem:

phase change has more effect than amplitude change

## The Central Limit Theorem

When several sources of error are combined, the combined error tends to have a Gaussian probability distribution, regardless of the distributions of the individual sources of error

requires a sufficient number of independent sources of error

none of the single sources of error can dominate

The centroid (mean) of the combined probability distribution is the sum of the centroids of the individual distributions

The variance of the combined probability distribution is the sum of the variances of the individual distributions

where the xj are independent random variables with centroids and variances

where

## Structure factor probabilities

#### Wilson distribution in P1

2-dimensional random walk: apply central limit theorem

#### Sim distribution in P1

2-dimensional random walk starting at FP

## Effect of variable coordinate errors

Centroid for individual atomic contribution is non-zero

dj = Fourier transform of p(xj)

equivalent to smearing the atom over all of its possible positions

Individual variances depend on size of errors

## Structure factor distribution: general treatment

Central limit theorem valid under many conditions

F has Gaussian distribution centered on DFC

D includes effects of:

difference in position or scattering factor

missing atoms

difference in overall scale or B-factor

variance given by

where is expected intensity factor

#### Centric:

With E values, plays the role of D, and the variance is

eliminate overall scale and B-factor effects

## Estimating

Deduce from the values of |FO| and |FC|

maximize the likelihood function

p(|FO|;|FC|) is derived from p(FO;FC) by integrating over all possible phase differences

use log likelihood for convenience

For cross-validation data, add smoothing terms to log likelihood function

smoothing parameter s is a target standard deviation;
0.05-0.10 is a suitable range for s

penalty for a value deviating from the value interpolated from its neighbours is given by:

## Figure of merit weighting for model phases

Blow and Crick (1959) and Sim (1959) showed that the map with the least rms error is calculated from centroid structure factors.

## Model bias in fom-weighted maps

Generalize derivation by Main (1979)

Start from cosine law

Take expected values

Solve for fom-weighted map coefficient

Solve for approximation to true FO

## SIGMAA

Calculate phase probabilities from |FO|, |FC|, C

Combine MIR phase information using Hendrickson-Lattman coefficients

p() exp[A cos() + B sin() + C cos(2) + D sin(2)]

Produce 4 types of map coefficients:

1)

2)

3) combined phase map coefficients

4)

## Refinement targets and difference maps

In refinement, adjust model to optimize refinement target

Atoms are moved to shift FC in a direction that improves target

done by taking derivative of target wrt atomic parameters

Derivative of target wrt FC shows what direction to change the calculated structure factor to improve agreement

least-squares target should decrease, so move in opposite direction to derivative

likelihood target should increase, so move in direction of derivative

Fourier transform of (+/-) derivative shows the corresponding derivative of the electron density

such a map shows where the target would prefer more or less electron density

e.g. least-squares:

e.g. log likelihood target (ignoring measurement error)

consider effect of adding this to model:

## Refinement bias

Statistical treatment depends on independent errors in all the atoms

Proteins have too many parameters/observation

diffraction data can be "over fit"