#### Previous topic

Perturbation: Thermodynamic Perturbation Calculations

#### Next topic

Generation of Hydrogen Bonds

# Details about TSM Free Energy Calculations¶

## Introduction¶

For a good overview of free energy simulation methods, the follow- ing references are suggested: M. Mezei and D. L. Beveridge, in Annals of the New York Academy of Sciences, chapter titled “Free Energy Simulations”, 482 (1986) 1; T. P. Straatsma, PhD dissertation, “Free Energy Evaluation by Molecular Dynamics Simulations”, University of Groningen, Netherlands (1987) and S. H. Fleischman and C. L. Brooks III, “Thermodynamics of Aqueous Solvation: Solution Properties of Alchohols and Alkanes”, J. Chem. Phys., 87, (1987) p. 3029, D. J. Tobias and C. L. Brooks III, J. Chem. Phys., 89, (1988) 5115-5127, and D.J. Tobias, “The Formation and Stability of Protein Folding Initiation Structures”, Ph.D. dissertation Carnegie Mellon University (1991).

In the previous nodes we have generally referred to this area of molecular simulation as a “perturbation” theory. Actually, none of the techniques used are actually perturbation methods. The relationships used for computing the relative free energy differences are all exact in the statistical mechanical sense. The use of the term perturbation in this context arises from the fact that in the pre-number crunching supercomputer days, various series expansions were derived from these equations and were in fact perturbation theories. The name thermodynamic integration might be used, however common practice has been to apply it to only one particular formulation (and furthermore not put that under the rubric of thermodynamic perturbation). Finally, the use of the name “free energy simulations” is another misonomer for two reasons: 1) we can calculate the temperature derivative thermodynamic properties as well (Delta E and Delta S) and the one thing we can’t get is absolute free energies (as van Gunsterin has pointed out , Mother Nature doesn’t integrate all over phase space either). In fact, we generally are limited to calculating relative changes in free energies, i.e. Delta Delta A’s.

In thermodynamic perturbation theory, a system with the potential energy function U0 is perturbed to one with the potential function U1, and the resulting free energy difference is calculated as

A1 - A0 = -kT ln < exp[ -(1/kT)*(U1 - U0)] >

where k is Boltzmann’s constant, T is temperature (degrees K), and A0 and A1 are the excess Helmholtz free energies of systems 0 and 1, respectively. Two methods of thermodynamic perturbation are implemented in CHARMM:

1. Chemical perturbation, where the perturbation being considered is a change in the system’s potential function parameters and topology, e.g., CH3OH is “mutated” to CH3CH3, and
2. Internal coordinate perturbation, where the perturbation represents a variation in configuration, and the potential function remains the same for the perturbed and unperturbed systems.

Each of these is discussed separately below.

## THEORY AND METHODOLOGY¶

### Chemical Perturbation¶

If you have read either (Fleischman and Brooks, 1987) or (Straatsma, 1987) or any of the McCammon or Kollman perturbation (oops! that word again) papers, then you have seen the standard schpiel on why getting Delta A’s (or Delta G’s) of solvation or drug/enzyme binding, among other processes is so difficult and that if one is satisfied with relative changes in free energies it is computationally more tractable to “trans-mutate” various parts of a system in a way that is usually physically unreasonable but computationally feasible and thermodynamically equivalent to that obtained from the physical process. Read some of the aforementioned references if this doesn’t ring a bell.

So that’s what we are doing - calculating relative changes in free energies (Delta Delta A) for solvation and small molecule/enzyme binding, among other things. In the rest of this node, we will discuss a little bit of the theory (you’re better off reading the papers) and lot about the actual how-to-do-it in our implementation. Subsequent nodes discuss the actual implementation and some issues to consider when attempting this type of calculation.

### The Hamiltonian¶

There are three basic techniques for calculating relative changes in free energy and their temperature derivative properties: 1) the so-called “perturbation” approach 2) “Thermodynamic Integration” (TI) 3) and the somewhat dubious “slow-growth” technique (which is actually a step-child of the TI method).

In all of the methods we use a hybrid hamiltonian,

                             N           N
H(lambda) = H  + (1 - lambda) H  + lambda H  .
o                 R           P

where: H  = "Environment" part of the Hamiltonian
o
H  = "Reactant" part of the Hamiltonian
R
H  = "Product" part of the Hamiltonian
P
lambda = coupling parameter (extent of
transformation)
N = integer exponent

The various terms will be explained shortly. First, a bit about our Weltanshauung viz. free energy simulations. The system is divided into four sets of atoms: 1) The reactant atoms 2) the product atoms 3) the colo atoms and 4) the environment atoms. The reactant and product atoms are those that are actually being changed. The colo atoms (short for co-located charge) are those in which only the charge changes in going from reactant to product. The environment atoms are the rest of the system (e.g. solvent; parts of a molecule common to both reactant and product). The reactant and product designations are arbitrary and are used as a convention to denote the direction in which we are mutating (i.e. start with reactant end up with product).

A simple example, taken from (Fleischman and Brooks, 1987) is the calculation of the relative change in the solvation free energies of methanol and ethane. This one has been done by virtually everybody that has written a free energy simulation code. The system is represented by the water molecules (we used a box of 125 in our study) and the hybrid methanol/ethane system, with aliphatic methyl groups represented as extended atoms.

          O1--H1
/          H
/          /
H      C1--C2     O
\                 \
O                 H
/
H

Using the depiction above, for the transformation of methanol -> ethane the reactant atoms are the hybrid’s O1 and H1; the product atom is the hybrid’s C2 methyl group. The hybrid molecule’s C1 methyl group changes charge as one goes from reactant to product. This is a colo atom, in going from methanol -> ethane it starts with the methanol methyl group charge and ends up (at lambda = 1.) with the ethyl methyl group charge. Otherwise, C1 is considered an environment atom. The atoms of the water molecules constitute the actual environment atoms in this system. If the hybrid molecule was larger it too could contain environment atoms.

All potential energy terms involving the reactant atoms, as well as the electrostatic interactions involving colo atoms with their reactant charges, go into . The kinetic energies of the reactant atoms also are included in this term. Similarly, the potential energy terms involving product atoms and the colo product charge electrostatic interactions along with the kinetic energies of the product atoms go into . The rest of the energy terms are incorporated into . Note that for a potential energy term to be included in, say, only one atom in the given interaction has to be a reactant atom (or in the case of a electrostatic interaction a colo atom). Similarly for product terms.

For electrostatic terms involving colo atoms effectively what is done is that electrostatic terms containing colo atoms are calculated twice, once with the reactant charges and then again with the product charges. Terms between colo reactant charges and reactant atoms are avoided and similarly for product atoms. Actually, the programming details are a bit more complicated than that and if interested see implementation. The outcome is the same as just described.

It is assumed that when the hybrid molecule is constructed in the residue topology file, there are no internal coordinate energy terms involving reactant and product atoms. As yet no checking is done in the program. Similarly, it is assumed that non-bonded exclusions have been specified between reactant and product atoms.

In our implementation, the Hamiltonian is constructed exactly as specified in the equation above. In many papers, that particular form of the Hamiltonian is given in the theoretical section (or more likely, the form with N=1, i.e. linear) and in the actual implementation the lambda dependence of the Hamiltonian is quite a bit more complex. This is done in those implementations where the force constants and other parameters in the energy terms are factored by lambda rather than calculating various energy terms and factoring them.

In a statistical mechanical sense there is no particular reason that forces one to factor the Hamiltonian consistently like we do. The thermodynamics holds regardless of path and the equipartion theorem for obtaining the kinetic energy works just as well (though in other implementations it appears that the factoring of the kinetic energy is ignored anyway). However, we feel that there are certain advantages to doing it this way. First, there is a certain conceptual simplicity in factoring the Hamiltonian consistently for reactant and product terms enmass. Second, it makes obtaining the derivatives of the Hamiltonian with respect to lambda, d E(lambda)/dlambda programmatically simple. These derivatives are needed in the TI and, the related, slow growth methods. Actually, the current algorithm for the slow growth method in our implementation uses finite differences for the derivative as do Kollman and van Gunsterin. This could easily be changed. Factoring the energy terms rather than functional parameters permits a more modular design and makes incorporating changes by others to energy functions terms easier.

### The Free Energy Equations¶

As we said there are two (maybe three, depending how you count it) different ways that we obtain the free energy changes. For the thermodynamic integration method (TI) the following expression is used:

          _ 1
/
|
/\ A =   |  < d H(lambda)/d lambda >       d lambda
--       |                          lambda
_/ 0

Expressions for energy and entropy changes can be derived for this equation (Mezei and Beveridge, 1986) and have been incorporated into our program. They suffer from very high uncertainties due to presence of ensemble averages over the total energy which are then multiplied by ensemble averages over d H/d lambda. One is apparently better off getting average energies at the endpoints and subtracting.

The method we have used the most is the thermodynamic perturbation technique. For this, the free energy change is given as follows:

/\ A (lambda ->  lambda') = - kT ln < exp - (V - V ) >
--                                            R   P   lambda

Notice that all of the averages are at lambda. To get the total delta A

          ___
\
/\ A  =   /       /\ A (lambda -> lambda')
--        ---     --          i         i
i

the pieces are added up. The user must insure that the whole lambda range is covered. For example, in the methanol -> ethane calculation, we ran dynamics at 3 points: lambda = .125, .5 and .875. To cover the range we calculated delta A’s as follows:

lambda’ lambda lambda’
0.000 0.125 0.250
0.250 0.500 0.750
0.750 0.875 1.000

I.e., for each lambda in which dynamics were run two delta A’s were calculated, one lower and one higher than the corresponding lambda. This has been termed “double-wide” sampling. Note that the pieces all join up.

In our implementation we have the capability of calculating the temperature derivative related thermodynamic properties, delta E and delta S. This is effected by the use of equations derived by Brooks.

See Fleischman and Brooks, 1987 for the corrected set of equations. They use a finite difference approximation to the derivatives that avoids then necessity of taking the differences of large averages that would result from using the explicit temperature derivatives.

With both of the aforementioned methods the technique for accomplishing the simulations is called the “window” procedure. In these methods simulations are run at a discrete number of lambda points (we generally use 3 - 6 and long trajectories; other workers use up to 100 lambda points and very short trajectories). In the case of thermodynamic perturbation the total free energy change is pieced together from perturbations done with each “window”. In the case of thermodynamic integration the integration is done by a quadrature method. In our implementation, we fit the ensemble average as a function of lambda to a cubic spline polynomial and then integrate the polynomial analytically. No extrapolation to endpoints is done. So if you start at lambda = .125 and end at lambda = .875 (like we do) you can use thermodynamic perturbation to get the end points (.125 -> 0 and .875 -> 1.) and TI for the middle.

An alternative sampling method is termed “slow growth”. It is more or less an approximation to the thermodynamic integration method. In this case instead of lambda being a constant for a given trajectory (as in the window method), instead the parameter varies monotonically with each time step.

         n steps
----
\
/\ A =   /    H(lambda + delta lambda) - H(lambda )
--       ----         i                          i
i

and

lambda  = lambda    + delta lambda
i         i-1

Where H is the Hamiltonian and delta lambda = 1/nstep.

### Internal Coordinate Perturbation¶

According to the thermodynamic perturbation (TP) theory, the Helmholtz free energy difference, A1 - A0, between system 0, in which the conformational coordinate of interest (e.g. an internal coordinate) is equal to x, and another system, in which the coordinate has been “perturbed” by the amount dx, is given by the equation

A1 - A0 = A(x + dx) - A(x)

= kT ln < exp [-kT (U(x + dx) - U(x)) ] >           (1)
x

where U is the potential energy k is Boltzmann’s constant and T is temperature (degrees K). The <...>x notation denotes a canonical ensemble average over the “reference” ensemble in which the coordinate is equal to x. Although the potential energy may depend on many degrees of freedom, for the sake of simplicity we have only explicitly indicated its dependence on x. If we assume that the ergodic hypothesis holds, we can equate the ensemble average appearing in equation (1) to the time average computed from an MD simulation, e.g.

<exp [ -(1/kT) (U(x + dx)) ]> =

(1/N) * Sum   { exp [ -(1/kT)*(Ui(x+dx) - Ui(x)) ] }   (2)
1->N

where Ui is the value of the potential energy at the ith timestep and N is the number of timesteps in the simulation. Since the average is over the reference ensemble, we must constrain the system so the value of the coordinate of interest is x at each step of the simulation. In other words, we must impose the holonomic constraint

sigma = x(t) - x(0) = 0                       (3)


during the integration of the equations of motion. The conformational coordinate may correspond to a set of internal coordinates. In that case, equation (3) implies a set of holonomic constraints. In addition to enforcing the conformational constraint, we need to carry out the perturbation (x -> x + dx), calculate the potential energy difference, U(x + dx) - U(x), and restore the constraint at each step of the simulation.

With the above considerations, the following pseudo-computer code illustrates schematically the implementation of the TP method into an MD simulation:

set up dynamics; specify constraint and perturbation
do i = 1,N
compute potential energy and forces
take unconstrained dynamics step
satisfy constraints
perform perturbation
compute potential energy
restore constraints
end do
compute averages and thermodynamics

A detailed description of an algorithm for satisfying internal coordinate constraints is given in (Tobias, 1991). We concentrate here on the tasks of specifying and performing the perturbation, and computing the difference in the potential energy of the perturbed and reference systems.

The specification of the perturbation consists of identifying the degree(s) of freedom to be perturbed and the atoms whose positions change as a result of the perturbation. Our implementation allows for perturbations of distances, angles, and torsions between groups of atoms. For example, we may use a distance perturbation to study the breakup of a salt-bridge (ion pair) formed by the sidechains of lysine and glutamic acid, where x might be the distance between the N atom in the lysine sidechain and the carboxyl C atom in the glutamic acid sidechain, and the perturbation would consist of moving the entire glutamic acid residue. Alternatively, we could use an angle perturbation to study the angular dependence of the strength of a hydrogen bond between two amides, where x is the O...H-N angle, and the perturbation moves the entire hydrogen bond donor molecule. Or, we could use a torsional perturbation to study the trans-gauche isomerization in butane, where x is the dihedral angle for methyl group rotation about the central C-C bond, and the perturbation moves a terminal methyl group and the hydrogen atoms on the adjacent methylene group. In addition to simple perturbations of a single internal coordinate, we can define more complicated perturbations involving more than one internal coordinate in order to study correlated conformational transitions. For example, we could combine perturbations of the phi and psi dihedral angles to study backbone conformational equilibria in peptides (see Implementation).

The procedure for carrying out internal coordinate perturbations during molecular dynamics simulations may be summarized as follows: after choosing an internal coordinate to perturb, and deciding which atoms will be moved by the perturbation, we compute a Cartesian displacement vector which changes the internal coordinate by a specified amount, and add the displacement vector to the positions of the atoms to be moved. Thus, in our implementation, the perturbation can be described as a rigid body movement of the perturbed atoms relative to the unperturbed atoms.

Once we have moved all of the atoms involved in the perturbation, we need to compute the potential energy difference, delta U = U1 - U0 = U(x + dx) - U(x). To do this, we could compute U(x), carry out the perturbation and compute U(x + dx), and simply take the difference. However, this direct route is computationally inefficient, because interaction energies between atoms which are not moved by the perturbation are unnecessarily recomputed. To minimize the computational effort required to compute delta U, we only consider the interactions which change as a result of the perturbation. For this purpose, we partition the system into two parts: the atoms which are moved by the perturbation (denoted by “s” for “solute”), and those which are not (denoted by “b” for “bath”). With this partitioning, we can write the potential energy as a sum of three contributions:

U(x) = Uss(x) + Usb(x) + Ubb(x),                                (4)


where Uss, Usb, and Ubb are the solute-solute, solute-bath, and bath-bath interaction energies, respectively. Clearly, Ubb(x + dx) - Ubb(x) = 0 since the positions of the bath atoms are not changed by the perturbation. Thus,

U1 - U0 = Uss(x + dx) - Uss(x) + Usb(x+dx) - Usb(x).             (5)

Since, in typical applications, the number of solute atoms is much smaller than the number of bath atoms, equation (5) represents a large reduction in computational effort over the direct route. Before we proceed, we point out that when we need U(x + dx) in addition to delta U (e.g. for computation of conformational entropies using finite difference temperature derivatives of the TP free energy (see Description), we can use the following expression:

U(x + dx) = Uss(x + dx) + Usb(x + dx) + Ubb(x)

= Uss(x + dx) + Usb(x + dx) + U(x) - Uss(x) - Usb(x),   (6)

since Ubb(x + dx) = Ubb(x). We assume that U(x) is computed when the forces required for the propagation of the dynamics are computed. Thus, we still only need to compute the changes in the solute-solute and solute-bath interaction energies which result from the perturbation.

In general, when we have a choice, we partition the system so that the solute consists of the smallest possible number of atoms. There are two good reasons for this. First, smaller solute partitions require less effort to compute the interaction energies. Second, with smaller solute partitions, there is less of a chance that the solute atoms will “run into” bath atoms as a result of the perturbation. When solute and bath atoms run into one another, there is a large, positive van der Waals contribution to deltaU. This is undesirable because large delta U values lead to poorer convergence of the average in equation (2). The partitioning of the system is especially important when the perturbations are carried out in “crowded” environments, such as in solution or in the interior of a protein.

In some cases it is useful to divide the solute partition into two sections, and accomplish the desired perturbation by moving each section by half the perturbation. For example, to perturb the dihedral angle in butane by dx, we could include both methyl carbons and all of the hydrogens in the solute partition, with the C1 methyl group and C2 methylene hydrogens in one section, and the C4 methyl group and C3 hydrogens in the other section, and move each section by dx/2. This “double move” strategy is useful when the perturbation is carried out on a small molecule in a crowded environment where the movement of n atoms by dx/2 is more favorable than the movement of approximately n/2 atoms by dx. The option to perform perturbations in this fashion is available in our implementation.

In principle, we could get the free energy difference between any two conformations, x0 and x1, in a single simulation using the TP theory expression:

delta A = A1 - A0 = A(x1) - A(x0)

= - kT ln < exp [ -(1/kT)*(U(x1) - U(x0)) ] >   .     (7)
x0

However, in practice, for typical simulation lengths, the average in equation (7) exhibits acceptable convergence only when deltaA <= 2kT (Beveridge & DiCapua, 1989). Thus, if the free energy difference between the conformations x0 and x1 or the free energy barrier separating them, is more than about 2kT, then a single simulation is not sufficient to determine accurately the free energy difference. This problem is circumvented by breaking up the range of the coordinate, x1 - x0 into n segments or “windows”, y(i),

dy = (x1 - x0)/(n + 1);  y(i) = x0 + (i - 1) dy; i = 1,...,n,          (8)

and running a series of n simulations where the free energy differences,

delta A(i) = A(y(i+1)) - A(y(i)

= -kT ln < exp [ -(1/kT)*(U(y(i+1)) - U(y(i))) ] >       (9)
y(i)

are computed. Then the free energy difference between x1 and x0 is obtained by summing the results from the n windows, e.g.

                  x1
delta A = Integral   (p(deltaA(y))/py) dy
x0

= Sum    delta A(i),                                     (10)
1->n

where p(z) denotes the partial derivative of z. Aside from yielding more accurate free energy differences, the window method is attractive because it allows us to map out the free energy surface as a function of the conformational coordinate.

By far the most time consuming task in a molecular dynamics simulation is the evaluation of the forces necessary to propagate the equations of motion. The additional work required for computing the interaction energies needed for the TP free energy differences is relatively small. Thus, it is advantageous to get more than one free energy difference from a single simulation. This is the motivation for using the so-called “double-wide” sampling method (Beveridge & DiCapua, 1989), where the free energy differences A(y + dy) - A(y) and A(y - dy) - A(y) are obtained in one simulation. Furthermore, we can divide dx into m subintervals, dy(m) = dy/m, and compute 2m free energy differences,

           +/-
delta A(i,k)  =  A(y(i) +/- dy(m)) - A(y(i))

= -kT ln < exp[ -(1/kT)*(U(y(i) +/- kdy(m)) - U(y(i))) ] >
y(i)

k = 1,...,m;        (11)

over the range y(i) - dy <= y <= y(i) + dy from a single simulation with x = y(i). Then we sum the free energy differences from the various subintervals (in analogy with equation (10)) to get a free energy surface for each window. This “double-wide, multiple-point” window method allows a higher resolution mapping of the free energy surface with little additional computational effort.

Let us now comment on how dy is chosen. As we have already said, dy should be chosen so that the free energy change in a given window is not more than a couple of kT. In addition, the shape of the free energy surface in a given window can be used to determine a good choice for dy. A reasonable choice for a given system can be made by considering results from short simulations with a modest dy and several subintervals at a couple of values of y in the range of interest. In general, for perturbations in crowded environments (e.g. in solution or the interior of a protein), excessively large values of dy always result in positive free energy differences. This is because the perturbation results in repulsive van der Waals interactions of the atoms in the solute partition with those in the bath partition. The value of dy where the free energy difference begins to sharply increase can then be regarded as the upper bound on acceptable dy values. Of course, it is possible that the underlying free energy surface really does rise sharply beyond the second subinterval in both directions. That is why we suggest running another test at a different value of y. In addition to running short test calculations, it is also useful to consult previous work to get a preliminary estimate for an acceptable size of a perturbation in a similar system (for several examples, see (Tobias, 1991)).

In our implementation, the information needed to calculate conformational thermodynamics (free energies, internal energies, entropies, average interaction energies), and their associated statistical uncertainties, is written to a datafile during a simulation. The data file is subsequently “post-processed” to yield the quantities of interest. The alternative approach is to calculate the average properties of interest as the simulation progresses, and simply write out the final results at the end of the simulation. The latter approach has the advantage that large, cumbersome data files do not need to be saved on a mass-storage device (e.g. disk or tape). However, we prefer the post-processing approach because of the flexibility it gives us in the analysis of the data. For example, we can: examine the time evolution, and hence the “convergence”, of the average properties; carry out the averaging on an arbitrary amount of the data; compare various protocols for computing the statistical uncertainties or finite-difference temperature derivatives, etc.

We use the method of block averages (a.k.a. batch averages) to compute the average properties and their uncertainties (Wood, 1968). In this method, the total number of samples, N, is divided into m “batches” of n samples (mn = N), and the average of the property of interest, <O>i, is computed for each batch i:

<O>i = (1/n) Sum      O(k,i),                                  (12)
k=1->n

where O(k,i) is the kth observation of O in the ith batch. The average of the N samples, <O>, is simply the average of the batch averages:

<O> = (1/m) Sum      <O> ;                                     (13)
i=1->m    i

and the “uncertainty”, std<O>, is estimated from the standard deviation in the batch averages:

std<O> = ( Sum    [ (<O> - <O>)**2 / m(m - 1)] )**1/2           (14)
i=1->n       i

We use equation (14) to compute the uncertainty in the average of the exponential in equation (1). Then we obtain the uncertainty in the free energy (and other thermodynamic functions) by error propagation (Young, 1962), e.g.

std(delta A)**2 = (p(delta A)/p(z))**2 (std(z))**2

= (kT*std(z)/z)**2                               (15)

where z is the average of the exponential in equation (1).

In order for the uncertainty given by equation (14) to be a good estimate of the “true” uncertainty (e.g. in a large number of random samples), the block size must be chosen so that the block averages are uncorrelated (randomly distributed), and the number of blocks is not too small for the evaluation of a meaningful standard deviation. The block size n is typically chosen arbitrarily and possible correlations in the data are ignored. More refined uncertainties can be obtained by considering the actual correlation of the data determined explicitly from the autocorrelation function (Straatsma, et al., 1986). However, we presently have no facility for carrying out the correlation function analysis.