Phaser should be able to solve most structures with the Automated
Molecular Replacement mode, and this is the first mode that you should try.
Give Phaser your data (How to Define Data) and your
models (How to Define Models), tell Phaser what to
search for (use SEARch
keyword), and a list of possible spacegroups (in the same pointgroup - use the
SGALternative
keyword). The flow diagram for the automated molecular replacement mode is shown
below. If this doesn't work (see "How to know whether
Phaser has solved it"), you can try selecting peaks of lower significance
in the rotation function in case the real orientation was not within the selection
criteria. By default peaks above 75% of the top peak are selected (see "How
to Select Peaks"). See "What to do in difficult
cases" for more hints and tips. If the automated molecular replacement
mode doesn't work even with non-default input you need to run the modes of Phaser
separately. The possibilities are endless - you can even try exhaustive searches
(translations of all orientations) if you want - but experience has shown that
most structures that can be solved by Phaser can be solved by relatively simple
strategies.
Flow Diagram for Automated Molecular Replacement in Phaser
2.1 How to Define Data
You need to tell Phaser the name of the mtz file containing your data and
the columns in the mtz file to be used using the HKLIn
and LABIn
keywords. Additional keywords (BINS
CELL OUTLier RESOlution
SPACegroup) define how the data are used.
2.2 How To Define Models
Molecular replacement models are defined with the ENSEmble
keyword and the COMPosition
keyword. To compute a Sigma(A) curve representing the accuracy of model
structure factors as a function of resolution, Phaser needs to know the
RMS coordinate error expected for the model (determined directly from RMS
or indirectly from IDENtity
in the ENSEmble
keyword) and the fraction of the scattering power in the asymmetric unit
that this model contributes (deduced from the COMPosition
keywords). If fp is the fraction scattering and RMS is the rms coordinate
error, then
Sigma(A) = SQRT{fp*[1-fsol*exp(-Bsol*(sin(theta)/lambda)^2)]}
* exp{-(8 Pi^2/3)*RMS^2*(sin(theta)/lambda)^2}
where fsol(=0.95) and Bsol(=300Å^2) account for the effects
of disordered solvent on the completeness of the model at low resolution.
Phaser must be given the models that it will use for molecular replacement.
A molecular replacement model is constructed in one of two ways - either
by making an ensemble from a set of aligned homologous structures, entered
as pdb files, or by entering a model from a map, entered as structure factors
in an mtz file. Each ensemble is treated as a separate type of rigid body
to be placed in the molecular replacement solution. An ensemble should only
be defined once, even if there are several copies of the molecule in the
asymmetric unit.
If you construct a model by homology modelling, remember that the RMS error
you expect is essentially the error you expect from the template structure.
So specify the sequence identity of the template, not of the homology model!
Examples of building an Ensemble from Coordinates
- You have one structure as a model with 44% sequence identity to the
protein in the crystal.
- ENSEmble
mol1 PDB homology1.pdb IDENtity
.44
- You have three structures as models with 44%, 39% and 35% identity to
the protein in the crystal.
- ENSEmble
mol2 PDB
homology1.pdb IDENtity .44 PDB
homology2.pdb IDENtity .39 PDB
homology3.pdb IDENtity .35
- You have an NMR Ensemble as a model. There is no need to split the coordinates
in the pdb file provided that the models are separated by MODEL and ENDMDL
cards. In this case the homology is not a good indication of the similarity
of the structural coordinates to the target structure. You should use
the RMS option; several test cases have succeeded with an RMS value of
about 1.5Å.
- ENSEmble
mol3 PDB nmr.pdb RMS
1.5
Examples of a Map as an "Ensemble"
- You have low resolution electron density of your model. This density
has been cut out and converted to structure factors in a large cell.
- ENSEmble
mol1 HKLIn mol1.mtz F
= Fmol1 P = Pmol1 EXTEnt
23 25 29 RMS 2.0 CENTre
4 3 30 PROTein MW 10241 NUCLeic
MW 0
When using density as a model, it is necessary to specify both the extent
(x,y,z limits) of the cut-out region of density, and the centre of this
region. With coordinates, Phaser can work this out by itself. This information
is needed, for instance, to decide how large rotational steps can be in
the rotation search and to carry out the molecular transform interpolation
correctly. In the case of electron density, the RMS value does not have
the same physical meaning that it has when the model is specified by atomic
coordinates, but it is used to judge how the accuracy of the calculated
structure factors drops off with resolution. A suitable value for RMS can
be obtained, in the case of density from an experimentally-phased map, by
choosing a value that makes the SigmaA curve fall
off with resolution similar to the mean figures-of-merit. In the case of
density from an EM image reconstruction, the RMS value should make the SigmaA
curve fall off similar to a Fourier correlation curve used to judge the
resolution of the EM image.
Phaser must know what percentage of the scattering is given by each Ensemble.
It can not work this out without knowing the content of the asymmetric unit.
The composition of the asymmetric unit is defined either by entering the
molecular weights or sequences of the components in the asymmetric unit,
and giving the number of copies of each. Expert users can also enter the
fraction of the scattering of each component directly, although the composition
must still be entered for the absolute scale calculation.
Examples of Composition by Molecular Weight
- You have one protein (with MW 21022) in the asymmetric unit
- COMPosition
PROTein MW 21022
- You have three copies of a protein (with MW 21022) in the asymmetric
unit
- COMPosition
PROTein MW 21022
- COMPosition
PROTein MW 21022
- COMPosition
PROTein MW 21022
- Another way of entering the same thing is
- COMPosition
PROTein MW 21022 NUMber
3
- Yet another way of entering the same thing is
- COMPosition
PROTein MW 63066
- You have two copies of a protein (with MW 21022), two copies of a protein
(with MW 9843) and RNA with (MW 32004) in the asymmetric unit
- COMPosition
PROTein MW 21022 NUMber
2
- COMPosition
PROTein MW 9843 NUMber
2
- COMPosition
NUCLeic MW 32004
Examples of Composition by Sequence
- You have one protein (with sequence in fasta format in the file prot1.seq)
in the asymmetric unit
- COMPosition
PROTein SEQuence prot1.seq
- You have three copies of a protein (with sequence in fasta format in
the file prot1.seq) in the asymmetric unit
- COMPosition
PROTein SEQuence
prot1.seq
- COMPosition
PROTein SEQuence
prot1.seq
- COMPosition
PROTein SEQuence
prot1.seq
- Another way of entering the same thing is
- COMPosition
PROTein SEQuence
prot1.seq NUMber
3
- Yet another way of entering the same thing is to make a sequence file
with all the amino acids concatenated together (prot1.seq3)
- COMPosition
PROTein SEQuence prot1.seq3
- You have two copies of a protein (with sequence in fasta format in
the file prot1.seq), two copies of a protein (with sequence in fasta format
in the file prot2.seq) and RNA with (with sequence in fasta format in
the file nucl1.seq) in the asymmetric unit
- COMPosition
PROTein SEQuence
prot1.seq
NUMber
2
- COMPosition
PROTein SEQuence prot2.seq
NUMber
2
- COMPosition
NUCLeic SEQuence nucl1.seq
Examples of Composition by Percentage Scattering
- Each copy of Ensemble mol1 gives 22% of the scattering
- COMPosition
ENSEmble mol1 FRACtional
0.22
- Each copy of Ensemble mol2 gives 78% of the scattering
- COMPosition
ENSEmble mol2 FRACtional
0.78
2.3 How To Define Solutions
You don't really need to know how to define molecular replacement solutions
as Phaser writes out files ending in ".sol"
and ".rlist" that
contain the solution information from the job. The root of the files is given
by the ROOT
keyword. By default, the root filename is PHASER. These files can be read
back into subsequent runs of Phaser to build up solutions containing more
than one molecule in the asymmetric unit.
"PHASER.sol" files
are generated by all modes, and contain the current idea of potential molecular
replacement solutions.
"PHASER.rlist"
files are generated by the rotation function modes, and are for performing
translation functions. (They are also produced by degenerate (2D) translation
functions, for performing a translation function to find the third dimension)
To include the files you should use the preprocessor
command @
@
filename.sol
@
filename.rlist
However, if you want to understand "PHASER.sol"
and "PHASER.rlist"
files, read on
PHASER.sol
At different stages of molecular replacement, an Ensemble will be oriented
but not positioned (after the rotation search), or oriented and positioned
(after the translation search), or, rarely, oriented and the position in 2
of 3 dimensions known. These three states correspond to solutions defined
by the keywords SOLUtion
3DIM, SOLUtion
6DIM, and SOLUtion
5DIM. Each Ensemble in the asymmetric unit has its own SOLUtion
keyword. Examples of the different types of molecular replacement solutions
are:
- One copy of mol1 with known orientation and position (fractional coordinates)
- SOLUtion 6DIM
ENSEmble mol1 EULEr
17 20 32 FRACtional 0.12 0.05 0.74
- One copy of mol1 with known orientation only
- SOLUtion 3DIM
ENSEmble mol1 EULEr
17 20 32
- One copy of mol1 with known orientation and position (fractional coordinates)
and one copy of mol2 with known orientation only
- SOLUtion 6DIM
ENSEmble mol1 EULEr
17 20 32 FRACtional 0.12 0.05 0.74
- SOLUtion 3DIM
ENSEmble mol2 EULEr
5 183 230
- Two copies of mol1 with known orientation and position (fractional coordinates),
one copy of mol2 with known orientation and position (fractional coordinates)
and one copy of mol2 with known orientation only
- SOLUtion 6DIM
ENSEmble mol1 EULEr
17 20 32 FRACtional 0.12 0.05 0.74
- SOLUtion 6DIM
ENSEmble mol1 EULEr
24 23 24 FRACtional 0.58 0.73 0.93
- SOLUtion 3DIM
ENSEmble mol2 EULEr
68 7 85 FRACtional 0.04 0.19 0.25
- SOLUtion 3DIM
ENSEmble mol2 EULEr
5 183 230
When more than one molecular replacement solution is present, the solutions
are separated with the SOLUTION
SET keywords.
At any given stage in the structure solution all the solutions will have the
same number of ensembles oriented, and/or oriented and positioned, and the
solutions will look very similar.. For example, if the rotation function and
translation function for mol1 were very clear, then there will only be one
type of 6DIM solution for mol1.
If the rotation and translation functions for mol2 were then not clear, there
will be a series of possible 6DIM
solutions for mol2.
- SOLUtion SET
- SOLUtion 6DIM
ENSEmble mol1 EULEr
17 20 32 FRACtional 0.12 0.05 0.74
- SOLUtion 6DIM
ENSEmble mol2 EULEr
5 183 230 FRACtional 0.71 0.54 0.81
- SOLUtion SET
- SOLUtion 6DIM
ENSEmble mol1 EULEr
17 20 32 FRACtional 0.12 0.05 0.74
- SOLUtion 6DIM
ENSEmble mol2 EULEr
51 93 75 FRACtional 0.08 0.57 0.25
- SOLUtion SET
- SOLUtion 6DIM
ENSEmble mol1 EULEr
17 20 32 FRACtional 0.12 0.05 0.74
- SOLUtion 3DIM
ENSEmble mol2 EULEr
5 33 21 FRACtional 0.32 0.05 0.44
Where only the coordinates in 2 dimensions (a plane through the origin) of
an oriented Ensemble are determined, a solution of type 5DIM
is produced. The degenerate direction is defined as the direction perpendicular
to the plane in which the position is given. These solutions can be treated
in exactly the same way as the 3DIM
and 6DIM solutions
SOLUtion 5DIM ENSEmble
mol1 EULEr 17 20 32 DEGEnerate
X FRACtional 0.05 0.74
PHASER.rlist
These files define a rotation function list. The peak list is given with a
series of SOLUtion
TRIAl keywords.
SOLUtion TRIAl ENSEmble
mol1 EULEr 17 20 32
SOLUtion TRIAl ENSEmble
mol1 EULEr 67 65 51
SOLUtion TRIAl ENSEmble
mol1 EULEr 67 112 81
If a partial solution is already known, then the information for the currently
"known" parts of the asymmetric unit is given in the form used for
the PHASER.sol file, followed
by the list of trial orientations for which a translation function is to be
performed.
SOLUtion SET
SOLUtion 6DIM ENSEmble
mol1 EULEr 17 20 32 FRACtional
0.12 0.05 0.74
SOLUtion TRIAl ENSEmble
mol1 EULEr 44 20 32
SOLUtion TRIAl ENSEmble
mol1 EULEr 67 65 51
SOLUtion SET
SOLUtion 6DIM ENSEmble
mol1 EULEr 17 20 32 FRACtional
0.13 0.55 0.76
SOLUtion TRIAl ENSEmble
mol1 EULEr 83 9 180
SOLUtion TRIAl ENSEmble
mol1 EULEr 8 36 92
SOLUtion TRIAl ENSEmble
mol1 EULEr 48 87 10
When the rlist file is generated by Phaser, an additional keyword SCORE
appears on the end of the SOLUtion
TRIAl lines. This is the z-score from the rotation function. It is
not used by Phaser, but allows the user to keep track of the results.
SOLUtion SET
SOLUtion 6DIM ENSEmble
mol1 EULEr 17 20 32 FRACtional
0.12 0.05 0.74
SOLUtion TRIAl ENSEmble
mol1 EULEr 44 20 32 SCORe
3.4
SOLUtion TRIAl ENSEmble
mol1 EULEr 67 65 51 SCORe
3.0
If a degenerate translation function is performed, then a SOLUtion
TRIAl line is produced with the degenerate translation information
present, ready for performing the translation function on the third dimension.
SOLUtion TRIAl ENSEmble
mol1 EULEr 17 20 32 DEGEnerate
X FRACtional 0.05 0.74
2.4 How to Control Output
The output of Phaser can be controlled with the following optional keywords.
The ROOT keyword
is not compulsory (the default root filename is "PHASER"),
but should always be given, so that your jobs have separate and meaningful
output filenames.
Optional Keywords
Where HKLOut ON
is given as an optional keyword, Phaser produces an mtz file with "SigmaA"
type weighted Fourier map coefficients for producing electron density maps
for rebuilding.
MTZ Column Labels |
Description |
FWT |
PHWT |
Amplitude and phase for 2m|Fobs|-D|Fcalc| exp(i alpha-calc)
map |
DELFWT |
PHDELWT |
Amplitude and phase for m|Fobs|-D|Fcalc| exp(i alpha-calc)
map |
FOM |
|
m, analogous to the "Sim" weight, to
estimate the reliability of alpha-calc |
2.5 How to Select Peaks
The selection of peaks saved for output in the rotation and translation functions
can be done in four different ways. Peaks can either be selected by "PERCent",
"SIGma", "NUMber"
or "ALL", illustrated
below. "PERCent"
means that the cutoff value is the percentage of the top peak, where the value
of the top peak is defined as 100% and the value of the mean is defined as
0%. "SIGma"
means that the cutoff value is the number of standard deviations (sigmas)
over the mean (otherwise known as the Z-score). "NUMber"
means that the cutoff value is the number of top peaks to select. "ALL"
mean that all peaks are selected.
The default is selection by "PERCent"
with the cutoff value set at 75%. This has the advantage that there are always
peaks output. If the solution is clear, and is a long way above the mean,
then only the clear solution(s) will be output, but if the distribution of
peaks is rather flat, then many peaks will be output for testing in the next
part of the molecular replacement procedure (e.g. many peaks selected from
the rotation function for testing with a translation function). If an absolute
significance test is required, then selection by "SIGma"
is more appropriate, although not all searches will produce output if the
cutoff value is too high (e.g. 5 sigma). If the distribution is very flat
then it might be better to select by "NUMber",
for example select the top 1000 rotation peaks for testing in the translation
function. "ALL"
is for full 6 dimensional searches, where all the solutions from the rotation
function are output for testing in the translation function (although this
should never be necessary; it would be much faster and probably just as likely
to work if the top 1000 peaks were used in this way).
Peaks can also be clustered or not clustered prior to selection. If clustering
is off, then all high peaks on the search grid are selected. If clustering
is on, then points on the search grid with higher neighboring points are removed
from the selection.
The selection of peaks is done in three stages for the fast rotation and fast
translation searches. The first stage is the selection of peaks from the fast
search that will be rescored with the full likelihood target. Rescoring with
the full likelihood target may change the order of the peaks and their significance.
The second stage is the selection of peaks from the rescoring to be saved
and combined with other searches performed in the same phaser job. The third
stage is the final selection of peaks from the merged list for output from
the phaser job. The selection of peaks to go into rescoring is controlled
with the RESCORE
keyword, the selection of peaks saved from each separate search is controlled
with the SAVE
keyword, and the final selection is controlled with the FINAL
keyword.
If RESCORE OFF is
requested (no rescoring of the fast search peaks is performed), or if the
brute rotation or translation searches are carried out, then the SAVE
keyword refers to the selection of peaks from the fast search (or brute search)
for merging in the final stage (the RESCORE
keyword is not used for selection in this case).
2.6 How to Run Phaser
Phaser runs in different modes, which perform Phaser's different functionalities,
such as rotation functions and translation functions. Some of the modes combine
the functionality of other modes to allow automatic structure solution (e.g.
Automated Molecular Replacement), while others are
basic modes (e.g. Molecular Replacement Anisotropy Correction).
The example scripts all refer to the tutorial test case, the crystal structure
of a hetero-dimer of beta-lactamase (BETA) and beta-lactamase inhibitor protein
(BLIP), both with molecular replacement models from crystal structures of
the individual BETA and BLIP components. The pdb and mtz files required for
running this test case are distributed with Phaser.
2.6.1 Automated Molecular Replacement
This mode (MODE
MR_AUTO) combines the anisotropy correction, likelihood enhanced
fast rotation function, likelihood enhanced fast translation function, packing
and refinement modes for multiple search models and a set of possible spacegroups
to automatically solve a structure by molecular replacement. Top solutions
are output to the files FILEROOT.sol,
FILEROOT.#.mtz and FILEROOT.#.pdb
(where "#" refers to the solution number). Many structures
can be solved by running an automated molecular replacement search with
defaults, giving the ensembles that you expect to be easiest to find first.
Example command script for finding BETA and BLIP. This is the minimum input,
using all defaults (except the ROOT filename).
beta_blip_auto.com
phaser
eof
TITLe beta blip automatic
MODE MR_AUTO
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
SEARch ENSEmble beta NUM 1
SEARch ENSEmble blip NUM 1
ROOT beta_blip_auto # not the default
eof
Example command script for finding BETA and BLIP. The spacegroup recorded
on the mtz file is P3221 but the other hand is also a possibility.
Both search orders (BETA first, BLIP second and BLIP first, BETA second)
are tried, using the PERMutations ON keyword. We would not normally recommend
using the PERMutations ON keyword for this case, as it is obvious that the
larger molecule should be easier to find first.
beta_blip_auto_sg.com
phaser
eof
TITLe beta blip automatic
MODE MR_AUTO
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
SEARch ENSEmble beta NUM 1
SEARch ENSEmble blip NUM 1
PERMutations ON # not the default
SGALternative HAND # not the default
ROOT beta_blip_auto_sg # not the default
eof
Compulsory Keywords
Optional Keywords
2.6.2 Fast Rotation Function
This mode (MODE
MR_FRF) combines the anisotropy correction and likelihood-enhanced
fast rotation function (2), optionally rescored
with the full rotation likelihood function (1),
to find the orientation of a model in molecular replacement. Top rotation
solutions are output to the file FILEROOT.rlist
for input to a translation function. Top rotation solutions are also output
to the file FILEROOT.sol.
Example command script for fast rotation function to find the orientation
of BETA.
beta_frf.com
phaser
eof
TITLe beta FRF
MODE MR_FRF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
SEARCH ENSEmble beta
ROOT beta_frf
eof
Example command script for fast rotation function to find the orientation
of BLIP knowing the position and orientation of BETA, with the position
and orientation of BETA input from the command line.
blip_frf_with_beta.com
phaser
eof
TITLe blip FRF with beta rotation and translation
MODE MR_FRF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 #beta
COMPosition PROTein MW 17522 #blip
SEARch ENSEmble blip
SOLUtion 6DIM ENSEmble beta EULEr 201 41 184 FRACtional -0.49408 -0.15571
-0.28148
ROOT blip_frf_with_beta
eof
Example command script for fast rotation function to find the orientation
of BLIP knowing only the orientation of BETA, with the orientation of BETA
input using the output solution file from the beta_frf.com
job above.
blip_frf_with_beta_rot.com
phaser
eof
TITLe blip FRF with beta R
MODE MR_FRF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
SEARch ENSEmble blip
@beta_frf.sol # solution file output by phaser
ROOT blip_frf_with_beta_rot
eof
Compulsory Keywords
Optional Keywords
2.6.3 Brute Rotation Function
This mode (MODE
MR_BRF) combines the anisotropy correction and brute force likelihood
rotation function (1) to find the orientation
of a model in molecular replacement. Top rotation solutions are output to
the file FILEROOT.rlist for
input to a translation function. Top rotation solutions are also output
to the file FILEROOT.sol.
Example command script for brute rotation function to find the orientation
of BETA
beta_brf.com
phaser
eof
TITLe beta BRF
MODE MR_BRF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
SEARch ENSEmble beta
ROOT beta_brf
eof
Example command script for brute rotation function to find the optimal orientation
of BETA in a restricted search range and on a fine grid around the position
from the fast rotation search.
beta_brf_around.com
phaser
eof
TITLe beta BRF fine sampling
MODE MR_BRF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
SEARch ENSEmble beta
ROTAte AROUnd EULEr 201 41 184 RANGE 10
SAMPling ROTation 0.5
XYZOut ON # not the default
TOPFiles 1 # not the default
ROOT beta_brf_around
eof
Compulsory Keywords
Optional Keywords
2.6.4 Fast Translation Function
This mode (MODE
MR_FTF) combines the anisotropy correction and likelihood-enhanced
fast translation function (3), optionally rescored
by the full likelihood translation function (1),
to find the position of a previously oriented model in molecular replacement.
Top translation solutions are output to the file FILEROOT.sol.
Example command script for finding the position of BETA after the rotation
function has been run and the results output to the file beta_frf.rlist
beta_ftf.com
phaser
eof
TITLe beta FTF
MODE MR_FTF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
@beta_frf.rlist
ROOT beta_ftf
eof
Example command script for finding the position of BLIP after the rotation
function has been run and the results output to the file blip_frf_with_beta.rlist,
which has the SOLUtion 6DIM keyword
input for BETA and the SOLUtion
TRIAL keyword input for the orientations to try for BLIP with the
translation function.
blip_ftf_with_beta.com
phaser
eof
TITLe beta FTF
MODE MR_FTF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
@blip_frf_with_beta.rlist
ROOT blip_ftf_with_beta
eof
Compulsory Keywords
Optional Keywords
2.6.5 Brute Translation Function
This mode (MODE
MR_BTF) combines the anisotropy correction and brute force likelihood
translation function (1) to find the position
of a previously oriented model in molecular replacement. Top translation
solutions are output to the file FILEROOT.sol.
Example command script for brute Translation function to find the position
of BETA after the rotation function has been run
beta_btf.com
phaser
eof
TITLe beta BTF
MODE MR_BTF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
@beta_frf.rlist
TRANslate AROUnd FRACtional POINt -0.49408 -0.15571 -0.28148 RANGe 5
ROOT beta_btf
eof
Example command script for brute Translation function to find the position
of BETA degenerate in X after the rotation function has been run
beta_btf_degen_x.com
phaser
eof
TITLe beta degenerate X
MODE MR_BTF
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
@beta_frf.rlist
TRANslate DEGEnerate X
ROOT beta_btf_degen_x
eof
Compulsory Keywords
Optional Keywords
2.6.6 Refinement and Phasing
This mode (MODE
MR_RNP) combines the anisotropy correction and refinement against
the likelihood function (1) to optimize full or
partial molecular replacement solutions and phase the data. At the end of
refinement, the list of solutions is checked for duplicates, which are pruned.
Refined solutions are output to the file FILEROOT.sol.
Example command script to refine a set of solutions
beta_blip_rnp.com
phaser
eof
TITLe beta blip rigid body refinement
MODE MR_RNP
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
ROOT beta_blip_rnp # not the default
HKLOut OFF # not the default
XYZOut OFF # not the default
@beta_blip.sol
eof
Compulsory Keywords
Optional Keywords
2.6.7 Log-Likelihood Gain
This mode (MODE
MR_LLG) combines the anisotropy correction and the likelihood function
(1) to calculate the log-likelihood gain for full
or partial molecular replacement solutions. Solutions are output to the
file FILEROOT.sol.
Example command script to rescore the solutions using a different resolution
range of data and a different spacegroup
beta_blip_llg.com
phaser
eof
TITLe beta blip solution 6A P3121
MODE MR_LLG
HKLIn beta_blip.mtz
LABIn F=F SIGF = SIGF
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
ROOT beta_blip_llg # not the default
RESOlution 6.0
SPACegroup P 31 2 1
@beta_blip.sol
eof
Compulsory Keywords
Optional Keywords
2.6.8 Packing
This mode (MODE
MR_PAK) determines whether molecular replacement solutions pack in
the unit cell. Solutions that pack are output to the file FILEROOT.sol.
Example command script for determining whether a set of molecular replacement
solutions pack in the unit cell
beta_blip_pak.com
phaser
eof
TITLe beta blip packing check
MODE MR_PAK
HKLIn beta_blip.mtz
LABIn F=F SIGF=SIGF
ENSEmble beta PDB beta.pdb IDENtity 100
ENSEmble blip PDB blip.pdb IDENtity 100
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
ROOT beta_blip_pak # not the default
PACK 1 # not the default
@beta_blip.sol
eof
Compulsory Keywords
Optional Keywords
2.6.9 Anisotropy Correction
This mode (MODE
MR_ANO) corrects the experimental data for anisotropy. Data (amplitude
and associated sigma) are corrected for anisotropy and output to FILEROOT.mtz
with column label set to the input column label with the addition of _ISO.
Example command script to phase a molecular replacement solution only
beta_blip_ano.com
phaser
eof
TITLe beta blip data correction
MODE MR_ANO
HKLIn beta_blip.mtz
LABIn F=Fobs SIGF=Sigma
ROOT beta_blip_ano # not the default
eof
Compulsory Keywords
Optional Keywords
2.6.10 Normal Mode Analysis
This mode (MODE
MR_NMA) writes out pdb files that have been perturbed along normal
modes, in a procedure similar to that described by Suhre & Sanejouand
(Acta Cryst. D60, 796-799, 2004). Each run of the program
writes out a matrix FILEROOT.mat
that contains the eigenvectors and eigenvalues of the atomic Hessian, and
can be read into subsequent runs of the same job, to speed up the analysis.
Do normal mode analysis only, write out eigenfile but not coordinates
beta_nma.com
phaser
eof
TITLe beta normal mode analysis
MODE MR_NMA
ENSEmble beta PDB beta.pdb IDENtity 100
XYZOut OFF
ROOT beta_nma # not the default
eof
Write out pdb files perturbed in 0.5 angstrom rms intervals along modes
7 and 8 (and combinations of 7 and 8)
beta_nma_pdb.com
phaser
eof
TITLe beta normal mode analysis pdb file generation
MODE MR_NMA
ENSEmble beta PDB beta.pdb IDENtity 100
ROOT beta_nma_pdb # not the default
EIGEn beta_nma.mat
NMAPdb MODE 7 MODE 8 RMS 0.5
eof
Compulsory Keywords
Optional Keywords
2.6.11 Cell Content Analysis
This mode (MODE
MR_CCA) determines the composition of the crystals using the "new"
Matthews coefficients of Kantardjieff & Rupp (2003) "Matthews coefficient
probabilities: Improved estimates for unit cell contents of proteins, DNA
and protein-nucleic acid complex crystals". Protein Science 12:1865-1871.
The molecular weight of ONE complex or assembly
to be packed into the asymmetric unit is given with the COMPosition
keyword, and the possible Z values (number of copies of the complex or assembly)
that will fit in the asymmetric unit and the relative frequency of their
corresponding VM values is reported. RESOlution
should be set to the maximum resolution that has been observed for the crystal.
Do cell content analysis
beta_cca.com
phaser
eof
TITLe beta-blip cell content analysis
MODE MR_CCA
COMPosition PROTein MW 28853 NUM 1 #beta
COMPosition PROTein MW 17522 NUM 1 #blip
RESO 3.0
ROOT beta_blip_cca # not the default
eof
Compulsory Keywords
Optional Keywords
2.7. How to know whether Phaser has solved it
By default, Phaser selects solutions over 75% of the the difference between
the top solution and the mean. Ideally, only the number of solutions you
are expecting should be selected by this criterion, but if the signal-to-noise
of your search is low, there will be noise peaks in this selection also.
For a translation function the correct solution will generally have a Z-score
(number of standard deviations above the mean value) over 5 and be well
separated from the rest of the solutions. For a rotation function, the correct
solution may be in the list with a Z-score under 4, and will not be found
until a translation function is performed and picks out the correct solution.
Of course, there will always be exceptions! Note, in particular, that in
the presence of translational NCS, pairs of similarly-oriented molecules
separated by the correct translation vector will give large Z-scores, even
if they are incorrect, because they explain the systematic variation in
intensities caused by the translational NCS.
Z-score
|
Have I solved it?
|
less than 5
|
no
|
5 - 6
|
unlikely
|
6 - 7
|
possibly
|
7 - 8
|
probably
|
more than 8
|
definitely
|
You should always at least glance through the summary log file. One thing
to look for, in particular, is whether any translation solutions with a
clear signal-to-noise have been rejected by the packing step, especially
with a small number of clashes. Such a solution may be correct, and the
clashes may arise only because of differences in small surface loops. If
this happens, repeat the run allowing a suitable number of clashes with
the PACK keyword.
2.8. What to do in difficult cases
Not every structure can be solved by molecular replacement, but the right
strategy can push the limits. What to do when the default jobs fail depends
on why your structure is difficult.
Flexible structure
The relative orientations of the domains may be different in your crystal
than in the model. If that may be the case, break the model into separate
PDB files containing rigid-body units, enter these as separate ensembles,
and search for them separately. If you find a convincing solution for one
domain, but fail to find a solution for the next domain, you can take advantage
of the knowledge that its orientation is likely to be similar to that of
the first domain. The ROTAte AROUnd
option of the brute rotation search can be used to restrict the search to
orientations within, say, 30 degrees of that of the known domain. Allow
for close approach of the domains by increasing the allowed clashes with
the PACK
keyword by, say, 1 for each domain break that you introduce.
Alternatively, you could try generating a series of models perturbed by
normal modes, with the NMAPdb keyword. One of these
may duplicate the hinge motion and provide a good single model.
Poor or incomplete model
Signal-to-noise is reduced by coordinate errors or incompleteness of the
model. Since the rotation search has lower signal to begin with than the
translation search, it is usually more severely affected. For this reason,
it can be very useful to use the subsequent translation search as a way
to choose among many (say 1000) orientations. Try increasing the number
of clustered orientations in an AUTO job using the keyword FINAl,
e.g. FINAl ROT SELEct PERCent 65.
If that fails, try turning off the clustering feature in the save step (SAVE ROT CLUSter OFF),
because the correct orientation may sit on the shoulder of a peak in the
rotation function.
As shown convincingly by Schwarzenbacher et al. (Schwarzenbacher,
Godzik, Grzechnik & Jaroszewski, Acta Cryst. D60, 1229-1236,
2004), judicious editing can make a significant difference in the quality
of a distant model. In a number of tests with their data on models below
30% sequence identity, we have found that Phaser works best with a "mixed
model" (non-identical sidechains longer than Ser replaced by Ser).
In agreement with their results, the best models are generally derived using
more sophisticated alignment protocols, such as their FFAS protocol.
High degree of non-crystallographic symmetry
If there are clear peaks in the self-rotation function, you can expect orientations
to be related by this known NCS. Methods to automatically use such information
will be implemented in a future version of Phaser. In the meantime, you can
work out for yourself the orientations that would be consistent with NCS and
use the ROTAte AROUnd
option to sample similar orientations. Alternatively, you may have an oligomeric
model and expect similar NCS in the crystal. First search with the oligomeric
model; if this fails, search with a monomer. If that succeeds, you can again
use the ROTAte AROUnd
option to force a subsequent monomer to adopt an orientation similar to the
one you expect.
Pseudo-translational non-crystallographic symmetry
It is frequently the case that crystallographic and non-crystallographic rotational
symmetry axes are parallel. The combination generates translational NCS, in
which more than one unique copy of the molecule is found in the same orientation
in the crystal. This can be recognized by the presence of large non-origin
peaks in the native Patterson map. If one copy of the search model can be
found, then the translational NCS tells you where to place another copy. Unfortunately,
the presence of translational NCS can make it difficult to solve a structure
using Phaser, because the current likelihood targets do not account for the
statistical effects of NCS. If there is a small difference in the orientation
of the two molecules (which will show up as a reduction in the height of the
non-origin Patterson peak as the resolution is increased), it may help to
use data to higher resolution than the default, because the translational
NCS is partially broken.