swamp.utils.swamplibrary module

class SwampLibrary(workdir, logger=None)[source]

Bases: object

Class that implements certain methods to create a SWAMP fragment library and data structures to manage the data of interest in the library.

Parameters:
  • workdir (str) – the working directory for this instance. Only used if a library will be created.
  • logger (SwampLogger) – logging instance
Variables:
  • rmsd_matrix (pandas.DataFrame) – square dataframe with the rmsd distance across framgents in the library
  • qscore_matrix (pandas.DataFrame) – square dataframe with the similarity across framgents in the library
  • nalign_matrix (pandas.DataFrame) – square dataframe with the no. of aligned residues between framgents in the library
  • pdb_library (str) – location of the directory with the pdb files contained in the SWAMP library
  • pdbtm_svn (str) – location of the pdbtm svn repository
  • outdir (str) – an output directory for any operation of the SwampLibrary instance
Example:
>>> from swamp.utils.swamplibrary import SwampLibrary
>>> my_library = SwampLibrary('<workdir>')
>>> pdb_code_list = my_library.parse_nr_listfile("/path/to/nr_list")
>>> my_library.pdbtm_svn = "/path/to/pdbtm_svn"
>>> my_library.pdb_library = "/path.to/pdb_library"
>>> my_library.make_library(outdir="/path/to/outdir", pdb_codes=pdb_code_list)
>>> my_library.all_vs_all_gesamt(outdir="/path/to/outdir", inputdir="/path/to/library", nthreads=1)
>>> my_library.create_distance_mtx(gesamt_dir="/path/to/gesamt_dir")
all_vs_all_gesamt(inputdir, outdir, nthreads=1)[source]

For each the members of the library, obtain the distance with all the others. This step is required to obtain the distance matrices: qscore_matrix, rmsd_matrix and nalign_matrix

Parameters:
  • inputdir (str) – the input directory with the pdb files created by make_library()
  • outdir (str) – the output directory where the .hit files will be created
  • nthreads (int) – number of threads to be used in the gesamt archive scan (default 1)
create_distance_mtx(gesamt_dir)[source]

Create the square distance matrices for the library: qscore_matrix, rmsd_matrix and nalign_matrix

Requires the all_vs_all_gesamt() results. The distance matrices contain the optimal structural alignment between every set of fragments present in the library.

Parameters:gesamt_dir (str) – directory containing the .hit files resulting from the all vs all gesamt search results
make_library(pdb_codes)[source]

Create the pdb files for each contacting TM helical pair in detected with the information at pdb_library and pdbtm_svn. Files will be created at workdir

Parameters:pdb_codes (list) – a list with the pdb codes that will be included to the library
static parse_nr_listfile(fname)[source]

Method to parse a file with pdb structures listed as PDB:CHAIN into a nested list

Parameters:fname (str) – file name with the list to be parsed
Returns:a nested list where each element contains the non-redundant pdb code and chain name (list)
pdbfiles_list

A list of file names in pdb_library

remove_homologs(pdb_ids_to_remove)[source]

Remove fragments originating from a set of pdb structures out of qscore_matrix, rmsd_matrix, nalign_matrix

Parameters:pdb_ids_to_remove (tuple) – tuple with the pdb codes of the structures to be removed
static rename_axis(df)[source]

Rename the axis of a pandas.DataFrame so that the row names correspond with the column names

Parameters:df (pandas.DataFrame) – the dataframe to be renamed
Returns:renamed dataframe (pandas.DataFrame)