| Version 7 (modified by , 15 years ago) (diff) | 
|---|
Use Cases
This is a short note on use cases we want to support in the MOLGENIS processing extension
- Share my pipeline
 - Add new module
 - List my data items
 - Incorporate Galaxy or GenePattern? modules in my pipeline
 - How did I produce this result file?
 - Autogenerate a R-suave (?) document that is executable documentation?
 - Export R data annotation packages?
 
PBS best practices
Overview:
- We use Freemarker to define templates of jobs
 - We generate for each job one <job>.sh
 - We generate one submit.sh for the whole workflow
 - The whole workflow behaves like 'make': it can recover from failure where it left of
 - The workflow shares one working directory with conventions to ease inter-step variable passing
 
Main ingredients:
- The workflow works on a data blackboard
- The whole workflow uses the same working directory (= blackboard architecture pattern)
 - We use standard file names to reduce inter-step parameter passing (= convention over configuration)
 - Naming convention: <unit of analysis>_<name of step>.<ext>
 - For example in NGS lane (unit) alignment (step): 
<flowcell_lane>_<pairedalign>.bam 
 - Make style submit.sh
- Each line puts one command in the qsub queue
 - We solve dependency ordering using 
-W depend=afterok:job1:job2option - Use of proper return values will ensure dependent jobs are canceled on fail
 
 - Recoverable steps job<step>.sh
- We generate a .sh file for each job including standard logging
 - Each script checks if the output is already there (otherwise it can be skipped)
 - Each script checks if it has produced its output (otherwise return error)
 - N.B. check file existence using 
if ! test -h FILE return -1