COVID19
Download most recent data from NCBI into the file data/sequences.fa
Create a directory called gene-fastas. Run
python3 split-by-reference.py data/sequences.fato create a fasta for each reference gene identified in sequences.fa. Reference genes are found by the YP_ prefix to the accession.
The linux binary clustalo is in the repo, but it can be downloaded fresh using
wget http://www.clustal.org/omega/clustalo-1.2.4-Ubuntu-x86_64 -O clustaloCreate a directory called aligns, and create alignments with ./make-aligns.sh
Create a directory called idmats, and create identity matrices with ./make-idmats.sh