utils module

utils.atomName2Seq(atName)

Function that returns the sequence of proteins given the atom names

Parameters:

atName (list) – shape: list of lists A list of the atom identifier. It encodes atom type, residue type, residue position and chain as an example GLN_39_N_B_0_0 refers to an atom N in a Glutamine, in position 39 of chain B. the last two zeros are used for the mutation engine and should be ignored This list can be generated using the madrax.utils.parsePDB function

Returns:

sequence of the PDBs

Return type:

sequences

utils.parsePDB(PDBFile, keep_only_chains=None, bb_only=False)

function to parse pdb files. It can be used to parse a single file or all the pdb files in a folder. In case a folder is given, the coordinates are gonna be padded

Parameters:
  • PDBFile (str) – path of the PDB file or of the folder containing multiple PDB files

  • bb_only (bool) – if True ignores all the atoms but backbone N, C and CA

  • keep_only_chains (str or None) – ignores all the chain but the one given. If None it keeps all chains

  • keep_hetatm (bool) – if False it ignores heteroatoms

Returns:

  • coords (torch.Tensor) – coordinates of the atoms in the pdb file(s). Shape ( batch, numberOfAtoms, 3)

  • atomNames (list) – A list of the atom identifier. It encodes atom type, residue type, residue position and chain as an example GLN_39_N_B_0_0 refers to an atom N in a Glutamine, in position 39 of chain B. the last two zeros are used for the mutation engine and should be ignored

  • pdbNames (list) – an ordered list of the structure names

utils.writepdb(coords, atnames, pdb_names=None, output_folder='outpdb/')

function to write a pdb file from the coordinates and atom names

Parameters:
  • coords (torch.Tensor) – shape: (Batch, nAtoms, 3) coordinates of the proteins. It can be generated using the madrax.utils.parsePDB function

  • atName (list) – shape: list of lists A list of the atom identifier. It encodes atom type, residue type, residue position and chain as an example GLN_39_N_B_0_0 refers to an atom N in a Glutamine, in position 39 of chain B. the last two zeros are used for the mutation engine and should be ignored This list can be generated using the madrax.utils.parsePDB function

  • pdb_names (list) – names of the PDBs. you can get them from the output of utils.parsePDB. If None is given, the proteins are named with an integer that represent their position in the batch

  • output_folder (str) – output folder in which PDBs are written