wiki:ComputeRoadmap

Version 8 (modified by george, 12 years ago) (diff)

--

Compute Roadmap

This pages describes plans to make Compute even better.

NB: the features below have still to be put on a release schedule!

Finish creation of tests for all important configurations

  • local, pbs, grid
  • impute, align

Make it easier to insert additional step/remove a step

  • requires seperation between 'per protocol input/output parameters' and 'workflow links'
  • should not be extra work; could do automatic mapping?
  • should be solved in the workflow.csv (instead of control flow, do data flow, i.e. create list of output-input edges)

Auto generate the list of parameters that you need

  • could automatigically be filled from templates or, if we have it, the data flow?

Better error reporting on input

  • doesn't list which variable is missing
  • templates are missing
  • syntax checking of CSV files

Monitoring of progress

  • how far is the analysis
  • succesfully or wrongly, during running (could be done with #end macro)
  • add a job at the end that creates report of results (for commandline)
    • Also do this for database version to have this info incase of database problems.

Monitoring of success and resource usage

  • have harmonized method to report 'success' or 'error', incl message + runtime
  • include also stuff like max, min etc.

Make transparant/unimportant for the user which backend is actually used

  • system decides where it (can) run: cluster, pbs, grid, local
  • needs flexible file manager that can 'stage' data for any backend
  • like to restart on other backend

Transparent server to stage data

  • to easily move pipeline to other storage.

Restart from specific step

  • Remove files for the step from which to restart as well as for all steps that depend on that one.
  • Reduce problem by using *tmp file and and only 'mv' if step is successful (as automatically determined by compute)
  • However, this doesn't work if we want to restart from a step that was successful as automatically determined by compute, but has failed as manually determined by a user.
  • Could use folders per step, so you could more easily delete all output from a specific step onwards
  • Can be solved by good practice -> Compute should assist users as much as possible to standardize good practice and make it as easy as possible.

Store pilot job id in the task in the database

  • I need to know which pilot jobs have died, and which tasks were associated with it
  • Then I can re-release tasks to they can be done by another pilot
  • Alternative: jobs are sending heart-bits to the server every 5-10 minutes and the actual analysis is running in the background

Like to have a 'heartbeat' for jobs

  • so I can be sure a (pilot) job is still alive
  • could us a 'background' process that pings back to database
  • could also be used for pbs jobs

Add putFile -force to manual

Enable easy merging of workflows, merging of parameters

  • Easily combine protocols from multiple workflows
  • wants less parameter files
  • meanwhile allow multiple worksheets

Get rid of parameters.csv and instead create worksheet

  • so parameter names on first row
  • hasOne using naming scheme A_B, means B has one A
  • conclusion: use multiple headers.
  • allow -parameters and -tparameter

Cleanup backend specific protocols

  • e.g. 'touch' commands

Visualization framework for analyses runs

Rearchitecture the components

  • one interface, multiple implementations
  • unit tests
  • Can we do something with unit tests per protocol?

Make submit.sh uses 'lock' file so the jobs can only end when all is submitted

  • Problem is that first jobs fails quickly, many dependent jobs are not yet submitted, and get orphaned
  • so dependent jobs can be submitted and never have issue (alex feature)
  • at this in the #end macro so jobs can never finish until complete workflow is submitted
  • After all jobs are submitted lock file will be removed

Seperate protocols from code

  • yes, seperate github repo
  • should enable to combine multiple protocol folders, multiple parameter filse
  • should indicate at which compute version it works

Users for runs, priorities?

  • needed if we want 'priority queue' for pilot jobs

Publish!

Approach

  • Clean start = yes
  • Separate development of v5 from v4
  • Bug fixes for v4 = yes
  • Database, commandline or both = both
  • Release schedule -> roadmap
  • Backwards compatibility = no