Getting Started
This guide walks you through the complete setup — from installation to running mlst with your freshly updated database.
Step 1: Install mlstdb
Verify the installation:
Step 2: Register with the databases
Before downloading any schemes, you need to register your OAuth credentials with PubMLST and/or Pasteur. This is a one-time setup — your credentials are saved locally and reused for future updates.
Connect to PubMLST
Connect to Pasteur
Each connect command will:
- Ask for your Client ID (24 characters) and Client Secret (42 characters)
- Open an authorisation URL — visit it in your browser
- Ask you to paste the verification code from the website
- Save all tokens securely to
~/.config/mlstdb/
Where do I get my Client ID and Client Secret?
See the Connect guide for step-by-step instructions on registering with PubMLST and Pasteur.
Info
If you've already connected before, mlstdb connect will test your existing credentials and skip re-registration if they're still valid.
Step 3: Download MLST schemes
This will:
- Read the built-in curated list of ~300 MLST schemes from both PubMLST and Pasteur
- Download allele sequences (
.tfafiles) and ST profiles (.txtfiles) for each scheme - Save everything to a
pubmlst/directory - Build a BLAST database in
blast/
First run may take a while
Downloading 300+ schemes involves many API calls. You can speed things up with --threads 4 or download specific schemes by providing a custom input file. See the Update guide for details.
If the download is interrupted, use --resume to pick up where you left off:
Step 4: Verify with mlst
Once the update is complete, test your new database:
Replace your_assembly.fasta with the path to any bacterial genome assembly.
What was created?
After a successful update, your directory should look like this:
pubmlst/
├── klebsiella/
│ ├── klebsiella.txt # ST profiles
│ ├── klebsiella_info.json # Scheme metadata
│ ├── database_version.txt # Database version number
│ ├── gapA.tfa # Allele sequences
│ ├── infB.tfa
│ └── ...
├── listeria/
│ └── ...
└── ...
blast/
├── mlst.fa # Combined allele sequences
├── mlst.fa.ndb # BLAST index files
├── mlst.fa.nhr
└── ...
Each scheme gets its own subdirectory containing:
- Profile file (
<scheme>.txt) — maps ST numbers to allele combinations - Allele files (
<locus>.tfa) — FASTA sequences for each locus - Metadata (
<scheme>_info.json) — source database, download date, locus count
Keeping your database up to date
Schemes change over time as new STs and alleles are added. To update your database, simply run:
This will re-download all schemes and rebuild the BLAST database. Use --resume if you want to skip schemes that have already been downloaded.
Next Steps
- Connect — Registration details — How to obtain OAuth credentials
- Update — All options — Custom inputs, parallel downloads, resume
- Fetch — Advanced — Explore all available schemes with custom filters
- Disclaimer — Important safety notes