utils module
- utils.atomName2Seq(atName)
Function that returns the sequence of proteins given the atom names
- Parameters:
atName (list) – shape: list of lists A list of the atom identifier. It encodes atom type, residue type, residue position and chain as an example GLN_39_N_B_0_0 refers to an atom N in a Glutamine, in position 39 of chain B. the last two zeros are used for the mutation engine and should be ignored This list can be generated using the madrax.utils.parsePDB function
- Returns:
sequence of the PDBs
- Return type:
sequences
- utils.parsePDB(PDBFile, keep_only_chains=None, bb_only=False)
function to parse pdb files. It can be used to parse a single file or all the pdb files in a folder. In case a folder is given, the coordinates are gonna be padded
- Parameters:
PDBFile (str) – path of the PDB file or of the folder containing multiple PDB files
bb_only (bool) – if True ignores all the atoms but backbone N, C and CA
keep_only_chains (str or None) – ignores all the chain but the one given. If None it keeps all chains
keep_hetatm (bool) – if False it ignores heteroatoms
- Returns:
coords (torch.Tensor) – coordinates of the atoms in the pdb file(s). Shape ( batch, numberOfAtoms, 3)
atomNames (list) – A list of the atom identifier. It encodes atom type, residue type, residue position and chain as an example GLN_39_N_B_0_0 refers to an atom N in a Glutamine, in position 39 of chain B. the last two zeros are used for the mutation engine and should be ignored
pdbNames (list) – an ordered list of the structure names
- utils.writepdb(coords, atnames, pdb_names=None, output_folder='outpdb/')
function to write a pdb file from the coordinates and atom names
- Parameters:
coords (torch.Tensor) – shape: (Batch, nAtoms, 3) coordinates of the proteins. It can be generated using the madrax.utils.parsePDB function
atName (list) – shape: list of lists A list of the atom identifier. It encodes atom type, residue type, residue position and chain as an example GLN_39_N_B_0_0 refers to an atom N in a Glutamine, in position 39 of chain B. the last two zeros are used for the mutation engine and should be ignored This list can be generated using the madrax.utils.parsePDB function
pdb_names (list) – names of the PDBs. you can get them from the output of utils.parsePDB. If None is given, the proteins are named with an integer that represent their position in the batch
output_folder (str) – output folder in which PDBs are written