Options¶
Usage:
usage: PopPUNK [-h]
(--easy-run | --create-db | --fit-model | --refine-model | --assign-query | --use-model | --generate-viz)
[--ref-db REF_DB] [--r-files R_FILES] [--q-files Q_FILES]
[--distances DISTANCES]
[--external-clustering EXTERNAL_CLUSTERING] --output OUTPUT
[--plot-fit PLOT_FIT] [--full-db] [--update-db] [--overwrite]
[--min-k MIN_K] [--max-k MAX_K] [--k-step K_STEP]
[--sketch-size SKETCH_SIZE] [--max-a-dist MAX_A_DIST]
[--ignore-length] [--K K] [--dbscan] [--D D]
[--min-cluster-prop MIN_CLUSTER_PROP] [--pos-shift POS_SHIFT]
[--neg-shift NEG_SHIFT] [--manual-start MANUAL_START]
[--indiv-refine] [--no-local] [--model-dir MODEL_DIR]
[--previous-clustering PREVIOUS_CLUSTERING] [--core-only]
[--accessory-only] [--subset SUBSET] [--microreact]
[--cytoscape] [--phandango] [--grapetree] [--rapidnj RAPIDNJ]
[--perplexity PERPLEXITY] [--info-csv INFO_CSV] [--mash MASH]
[--threads THREADS] [--no-stream] [--version]
Command line options
- optional arguments:
-h, --help show this help message and exit - Mode of operation:
--easy-run Create clusters from assemblies with default settings --create-db Create pairwise distances database between reference sequences --fit-model Fit a mixture model to a reference database --refine-model Refine the accuracy of a fitted model --assign-query Assign the cluster of query sequences without re- running the whole mixture model --generate-viz Generate files for a visualisation from an existing database --use-model Apply a fitted model to a reference database to restore database files - Input files:
--ref-db REF_DB Location of built reference database --r-files R_FILES File listing reference input assemblies --q-files Q_FILES File listing query input assemblies --distances DISTANCES Prefix of input pickle of pre-calculated distances --external-clustering EXTERNAL_CLUSTERING File with cluster definitions or other labels generated with any other method. - Output options:
--output OUTPUT Prefix for output files (required) --plot-fit PLOT_FIT Create this many plots of some fits relating k-mer to core/accessory distances [default = 0] --full-db Keep full reference database, not just representatives --update-db Update reference database with query sequences --overwrite Overwrite any existing database files - Kmer comparison options:
--min-k MIN_K Minimum kmer length [default = 9] --max-k MAX_K Maximum kmer length [default = 29] --k-step K_STEP K-mer step size [default = 4] --sketch-size SKETCH_SIZE Kmer sketch size [default = 10000] - Quality control options:
--max-a-dist MAX_A_DIST Maximum accessory distance to permit [default = 0.5] --ignore-length Ignore outliers in terms of assembly length [default = False] - Model fit options:
--K K Maximum number of mixture components [default = 2] --dbscan Use DBSCAN rather than mixture model --D D Maximum number of clusters in DBSCAN fitting [default = 100] --min-cluster-prop MIN_CLUSTER_PROP Minimum proportion of points in a cluster in DBSCAN fitting [default = 0.0001] - Refine model options:
--pos-shift POS_SHIFT Maximum amount to move the boundary away from origin [default = 0.2] --neg-shift NEG_SHIFT Maximum amount to move the boundary towards the origin [default = 0.4] --manual-start MANUAL_START A file containing information for a start point. See documentation for help. --indiv-refine Also run refinement for core and accessory individually --no-local Do not perform the local optimization step (speed up on very large datasets) - Database querying options:
--model-dir MODEL_DIR Directory containing model to use for assigning queries to clusters [default = reference database directory] --previous-clustering PREVIOUS_CLUSTERING Directory containing previous cluster definitions and network [default = use that in the directory containing the model] --core-only Use a core-distance only model for assigning queries [default = False] --accessory-only Use an accessory-distance only model for assigning queries [default = False] - Further analysis options:
--subset SUBSET File with list of sequences to include in visualisation (with –generate-viz only) --microreact Generate output files for microreact visualisation --cytoscape Generate network output files for Cytoscape --phandango Generate phylogeny and TSV for Phandango visualisation --grapetree Generate phylogeny and CSV for grapetree visualisation --rapidnj RAPIDNJ Path to rapidNJ binary to build NJ tree for Microreact --perplexity PERPLEXITY Perplexity used to calculate t-SNE projection (with –microreact) [default=20.0] --info-csv INFO_CSV Epidemiological information CSV formatted for microreact (can be used with other outputs) - Other options:
--mash MASH Location of mash executable --threads THREADS Number of threads to use [default = 1] --no-stream Use temporary files for mash dist interfacing. Reduce memory use/increase disk use for large datasets --version show program’s version number and exit