Optimizing a protein
In this tutorial we will explain how to perform a structure optimization using the autograd function of pytorch in combination with MadraX.
Let’s start again with all the imports. This time we are also gonna import the method to run protein optimization out of the box:
[1]:
from madrax.ForceField import ForceField # the main MadraX module
from madrax import utils,dataStructures # the MadraX utility module
import time,os,urllib
from madrax.mutate.StructureOptimizer import optimize # this is where the magic of this tutorial come from
and then let’s download an example structure to optimize. Since optimization is computationally intensive we are gonna use a single structure, but everything works with as many proteins you want
[2]:
if not os.path.exists("exampleStructures"):
os.mkdir('exampleStructures')
urllib.request.urlretrieve('http://files.rcsb.org/download/5BMZ.pdb', 'exampleStructures/5bmz.pdb')
[2]:
('exampleStructures/5bmz.pdb', <http.client.HTTPMessage at 0x7f4504157700>)
as in the quickstart tutorial, we need to parse and precalculate the information tensors and generate the MadraX main object (see quickstart for a step to step guide)
[3]:
device = "cpu"
coords, atnames, pdbNames = utils.parsePDB("exampleStructures/") # get coordinates and atom names
info_tensors = dataStructures.create_info_tensors(atnames,device=device)
forceField_Obj = ForceField(device=device)
Let’s check the initial energy of the protein conformation just running MadraX without optimization
[4]:
initial_energy = forceField_Obj(coords.to(device), info_tensors)
print(initial_energy.sum())
tensor(-198.3297, grad_fn=<SumBackward0>)
And now the funny part. We use the optimize function to run a deltaG minimization.
You just need to set the learning rate of the optimization, that defines how much you move your atom at every step and the number of iteractions (epochs).
CAREFUL: if you use a learning rate too low you will have basically no movement, if it is too high you will have no convergence
let’s also put verbose to True because we love printed text
[5]:
energy, optimizedCoord = optimize(forceField_Obj, coords.to(device), info_tensors, epochs=5, verbose=True, learning_rate=0.001, backbone_rotation=False)
optimizing epoch: 0 loss: -198.3297 time 6.8825
optimizing epoch: 1 loss: -211.2425 time 6.8174
optimizing epoch: 2 loss: -224.4531 time 6.8256
optimizing epoch: 3 loss: -235.3043 time 6.8958
optimizing epoch: 4 loss: -243.8654 time 6.3523
And it is done!
You can also change the device to “cuda” and check the difference in speed. I get a 3.5x speed boost on GPU (~2s per epoch).
We have the energy of the protein and the coordinates of the relaxted structure. Let’s see what we got:
[6]:
print(energy.shape,energy.sum())
torch.Size([2, 4, 151, 1, 11]) tensor(-243.8654, grad_fn=<SumBackward0>)
You can see that the optimizer decreases the energy of the complex as expected:
[7]:
print("initial energy:",float(initial_energy.sum().cpu().data), "Optimized Energy", float(energy.sum().cpu().data) )
initial energy: -198.32965087890625 Optimized Energy -243.8654022216797
Now you can use the writePDB function to write down your new structure and analyze it with a structure viewer
[8]:
utils.writepdb(optimizedCoord.cpu().data, atnames,output_folder="example_pdb_output")