Skam - Skolem Assisted Makefiles

Skam is a makefile replacement. The main difference between skam and the make the utility is in how they handle pattern matching. Normal makefiles utilise text pattern matching, skam uses the even more powerful concept of function matching. These function constructs are known as skolem functions.

Unlike most makefile-type systems, skam was not developed with compiling and code management in mind. The primary reason was to support the kinds of automated computational analysis pipelines and data transformation pipelines typically required in many bioinformatics centres. However, like makefiles, skam is completely generic and should be adaptable to any task achievable with makefiles.

Skam is also built with asynchronous job submission in mind (for example, tasks involving multiple executions that must be run on a compute farm). Unlike typical makefiles, skam targets can be rows in a database as well as files on a filesystem.

Skam is written in SWI-Prolog. The full power of the prolog language is available in a skamfile specification. Prolog is particularly suited to specifying rules in a make-type system, because it is a rule-based system and has facilities for querying over facts built in. On the other hand, no knowledge of prolog is required to write skamfiles.

Example Pipeline

Bio Compute Pipeline

An example simple biological analysis compute pipeline. Input sequences are scanned for genes with genscan and checked for similarity to sequences from other organisms using blast. First the input sequence is masked for repetitive low-complexity sequence using RepeatMasker. Blast requires that the MultiFasta database of sequences to be compared against is first indexed using formatdb. A different blast program is run depending on whether or not the MultiFasta is of proteins or nucleic acids. Finally, the raw results are parsed and combined using the BOP parser, which exports GAME-XML.

Example Skam Makefile

  flat: S.masked
  run: RepeatMasker -lib $(RMLIB) S

  flat: S-results/S.D.blast.raw
  req:  mask(S) blastindex(D)
  srun: blastall -p blastx -i mask(S) -d D -filter 'SEG+XNU'     {is_pepdb(D)}
  srun: blastall -p blastn -i mask(S) -d D -filter 'DUST'        {not(is_pepdb(D))}
  comment: after indexing the db and repeatmasking the seq, will run\
           approproate blast program

  flat: S-results/S.genscan.raw
  req:  mask(S)
  srun: genscan $(GENSCANMODEL) mask(S)
  comment: run genscan on masked sequence

  run:  formatdb -p 'T' D                                        {is_pepdb(D)}
  run:  formatdb -p 'F' D                                        {not(is_pepdb(D))}
  comment: blast requires an index is made before seq searching

  req:  X
  run:  bop X -o target
  comment: apollo bop - parse file formats to game xml

  req: bop(mask_then_genscan(S))
  comment: build mask_then_genscan on a seq S and process results

  req: setof(bop(mask_then_blast(S,D)),fastadb(D))
  comment: build mask_then_blast on a seq S for all fastadbs D, then\
           run bop to process the results to game xml

  req: all_prediction(S) all_similarity(S)
  comment: both predications and sim search on input sequence

  req: setof(all(S),input_seq(S))
  comment: build all(S) for all input chromosomes

<data relation="fastadb">
fly_nr_aa    aa
worm_nr_aa   aa
est          na

<data relation="input_seq">

is_pepdb(D):- fastadb(D,aa).
</prolog> Logo
Chris Mungall
Last modified: Thu Dec 15 13:07:46 PST 2005