Options

Usage:

usage: PopPUNK [-h]
            (--easy-run | --create-db | --fit-model | --refine-model | --assign-query | --use-model | --generate-viz)
            [--ref-db REF_DB] [--r-files R_FILES] [--q-files Q_FILES]
            [--distances DISTANCES]
            [--external-clustering EXTERNAL_CLUSTERING] --output OUTPUT
            [--plot-fit PLOT_FIT] [--full-db] [--update-db] [--overwrite]
            [--min-k MIN_K] [--max-k MAX_K] [--k-step K_STEP]
            [--sketch-size SKETCH_SIZE] [--max-a-dist MAX_A_DIST]
            [--ignore-length] [--K K] [--dbscan] [--D D]
            [--min-cluster-prop MIN_CLUSTER_PROP] [--pos-shift POS_SHIFT]
            [--neg-shift NEG_SHIFT] [--manual-start MANUAL_START]
            [--indiv-refine] [--no-local] [--model-dir MODEL_DIR]
            [--previous-clustering PREVIOUS_CLUSTERING] [--core-only]
            [--accessory-only] [--subset SUBSET] [--microreact]
            [--cytoscape] [--phandango] [--grapetree] [--rapidnj RAPIDNJ]
            [--perplexity PERPLEXITY] [--info-csv INFO_CSV] [--mash MASH]
            [--threads THREADS] [--no-stream] [--version]

Command line options

optional arguments:
-h, --help show this help message and exit
Mode of operation:
--easy-run Create clusters from assemblies with default settings
--create-db Create pairwise distances database between reference sequences
--fit-model Fit a mixture model to a reference database
--refine-model Refine the accuracy of a fitted model
--assign-query Assign the cluster of query sequences without re- running the whole mixture model
--generate-viz Generate files for a visualisation from an existing database
--use-model Apply a fitted model to a reference database to restore database files
Input files:
--ref-db REF_DB
 Location of built reference database
--r-files R_FILES
 File listing reference input assemblies
--q-files Q_FILES
 File listing query input assemblies
--distances DISTANCES
 Prefix of input pickle of pre-calculated distances
--external-clustering EXTERNAL_CLUSTERING
 File with cluster definitions or other labels generated with any other method.
Output options:
--output OUTPUT
 Prefix for output files (required)
--plot-fit PLOT_FIT
 Create this many plots of some fits relating k-mer to core/accessory distances [default = 0]
--full-db Keep full reference database, not just representatives
--update-db Update reference database with query sequences
--overwrite Overwrite any existing database files
Kmer comparison options:
--min-k MIN_K Minimum kmer length [default = 9]
--max-k MAX_K Maximum kmer length [default = 29]
--k-step K_STEP
 K-mer step size [default = 4]
--sketch-size SKETCH_SIZE
 Kmer sketch size [default = 10000]
Quality control options:
--max-a-dist MAX_A_DIST
 Maximum accessory distance to permit [default = 0.5]
--ignore-length
 Ignore outliers in terms of assembly length [default = False]
Model fit options:
--K K Maximum number of mixture components [default = 2]
--dbscan Use DBSCAN rather than mixture model
--D D Maximum number of clusters in DBSCAN fitting [default = 100]
--min-cluster-prop MIN_CLUSTER_PROP
 Minimum proportion of points in a cluster in DBSCAN fitting [default = 0.0001]
Refine model options:
--pos-shift POS_SHIFT
 Maximum amount to move the boundary away from origin [default = 0.2]
--neg-shift NEG_SHIFT
 Maximum amount to move the boundary towards the origin [default = 0.4]
--manual-start MANUAL_START
 A file containing information for a start point. See documentation for help.
--indiv-refine Also run refinement for core and accessory individually
--no-local Do not perform the local optimization step (speed up on very large datasets)
Database querying options:
--model-dir MODEL_DIR
 Directory containing model to use for assigning queries to clusters [default = reference database directory]
--previous-clustering PREVIOUS_CLUSTERING
 Directory containing previous cluster definitions and network [default = use that in the directory containing the model]
--core-only Use a core-distance only model for assigning queries [default = False]
--accessory-only
 Use an accessory-distance only model for assigning queries [default = False]
Further analysis options:
--subset SUBSET
 File with list of sequences to include in visualisation (with –generate-viz only)
--microreact Generate output files for microreact visualisation
--cytoscape Generate network output files for Cytoscape
--phandango Generate phylogeny and TSV for Phandango visualisation
--grapetree Generate phylogeny and CSV for grapetree visualisation
--rapidnj RAPIDNJ
 Path to rapidNJ binary to build NJ tree for Microreact
--perplexity PERPLEXITY
 Perplexity used to calculate t-SNE projection (with –microreact) [default=20.0]
--info-csv INFO_CSV
 Epidemiological information CSV formatted for microreact (can be used with other outputs)
Other options:
--mash MASH Location of mash executable
--threads THREADS
 Number of threads to use [default = 1]
--no-stream Use temporary files for mash dist interfacing. Reduce memory use/increase disk use for large datasets
--version show program’s version number and exit