Python script to update BLAST databases

| Vimalkumar Velayudhan

Updating BLAST databases from NCBI can be done using the update_blastdb command included in the BLAST+ package. For example, the following command will download and/or update the swissprot protein database present in the current directory:

update_blastdb --decompress --passive swissprot

which should print:

Connected to NCBI
Downloading swissprot.tar.gz... [OK]

listing of downloaded files:

swissprot.tar.gz  swissprot.tar.gz.md5

One issue with this approach is that any long running BLAST jobs currently accessing the database will be aborted. To overcome this problem, I wrote a wrapper around the update_blastdb command. It uses a symbolic link to the latest version of the database and only updates the link if the database is not being used. If the database is being used, the script adds a message in the log after the database download is complete. The link can then be updated manually later.


This script will only work on Linux/Unix-like systems due to the dependence on the lsof command to check if a directory is being accessed.


Link: (Repository: vimalkvn/sysadminbio on Github).

Save this script as under /home/user/programs/ (only used for the purpose of the examples below). Script can be saved anywhere else.


Assuming the script is saved as /home/user/programs/ and you would like to download the swissprot database to /home/user/blast, the command would be:

python /home/user/programs/ \
-d swissprot -p /home/user/blast

A log file will be available under /home/user/blast/log/blastdb_updater.log.

To use the database in your BLAST search, you can use:

blastp -db /home/user/blast/swissprot/swissprot \
-query sample.fasta

Other databases (supported by update_blastdb) can be downloaded in the same manner.

Automated update

An automated update can be setup using cron:

0 0 1 * * /home/user/programs/ \
-d swissprot -p /home/user/blast

The above cron job will update the database on the 1 st of every month.