Implementing mutations in a protein
In this tutorial we will explain how to perform a structure optimization using the autograd function of pytorch in combination with MadraX.
Let’s start again with all the imports. This time we are also gonna import the method to run protein optimization out of the box:
[1]:
import warnings
warnings.filterwarnings('ignore')
from madrax.ForceField import ForceField # the main MadraX module
from madrax import utils,dataStructures # the MadraX utility module
import time,os,urllib
from madrax.mutate.StructureOptimizer import optimize
from madrax.mutate.mutatingEngine import mutate
and then let’s download an example structure to optimize. Since optimization is computationally intensive we are gonna use a single structure, but everything works with as many proteins you want
[2]:
if not os.path.exists("exampleStructures"):
os.mkdir('exampleStructures')
urllib.request.urlretrieve('http://files.rcsb.org/download/5BMZ.pdb', 'exampleStructures/5bmz.pdb')
urllib.request.urlretrieve('http://files.rcsb.org/download/1bni.pdb', 'exampleStructures/1bni.pdb')
[2]:
('exampleStructures/1bni.pdb', <http.client.HTTPMessage at 0x7f2c50c28c40>)
as in the quickstart tutorial, we need to parse and precalculate the information tensors and generate the MadraX main object (see quickstart for a step to step guide)
[3]:
device = "cpu"
coords, atnames, pdbNames = utils.parsePDB("exampleStructures/") # get coordinates and atom names
info_tensors = dataStructures.create_info_tensors(atnames,device=device)
forceField_Obj = ForceField(device=device)
First of all lets take a look at the proteins we have. t
[4]:
print(pdbNames)
['1bni', '5bmz']
Let’s now define a couple of mutations we want to implement in the structures we downloaded
[5]:
mutationList = [
[["10_A_GLY","10_B_GLY"],["54_A_ASN"]],
[["12_A_ALA","77_B_ARG"]]
]
this means we will generate 3 mutants: the first protein with a GLY in position 10 of chain A and a GLY in position 10 of chain B the first protein with a ASN in position 54 of chain A the second protein with a ALA in position 8 of chain A and a ARG in position 77 of chain B
Residue number and chain is the same of the one of the PDB.
Now we can implement the mutations and get new coordinates and atom names
[6]:
newCoords,newatnames = mutate(coords,atnames,mutationList)
lets recalculate the info_tensors with the mutant atom names
[7]:
newInfo_tensors = dataStructures.create_info_tensors(newatnames,device=device)
and now we can calculate the energy with MadraX as usual
[8]:
energy = forceField_Obj(newCoords.to(device), newInfo_tensors)
print(energy.shape)
torch.Size([2, 4, 151, 3, 11])
you can see now that the dimension 3 of the output of madrax now is 3. Position 0 is always gonna be the wild type, while the others are gonna be the mutants.
energy[0:,:,1] represents the energy of the first mutant of the first protein (“10_A_GLY”,”10_B_GLY”) energy[0:,:,2] represents the energy of the second mutant of the first protein (“54_A_ASN”) energy[1:,:,1] represents the energy of the first mutant of the second protein (“12_A_ALA”,”77_B_ARG”)
energy[0:,:,0] represents the energy of first protein the wlid type energy[1:,:,0] represents the energy of second protein the wlid type
and we can use relax the structure as described in the “optimize a protein” tutorial
[9]:
energy, optimizedCoord = optimize(forceField_Obj, newCoords.to(device), newInfo_tensors, epochs=5, verbose=True, learning_rate=0.001, backbone_rotation=False)
optimizing epoch: 0 loss: -1514.8442 time 4.2348
optimizing epoch: 1 loss: -1591.6768 time 4.463
optimizing epoch: 2 loss: -1662.3555 time 4.4108
optimizing epoch: 3 loss: -1678.0632 time 4.5837
optimizing epoch: 4 loss: -1726.4849 time 4.6056
We can also convert the mutant’s coordinates to PDBs using the utils.writePDB function
[10]:
utils.writepdb(optimizedCoord.cpu().data, newatnames,output_folder="example_pdb_output/", pdb_names=pdbNames)
and all mutants will be written into pdb files