wiki:ComputeRoadmap

Version 4 (modified by Patrick, 11 years ago) (diff)

--

Compute Roadmap

This pages describes plans to make Compute even better.

NB: the features below have still to be put on a release schedule!

Finish creation of tests for all important configurations

  • local, pbs, grid
  • impute, align

Make it easier to insert additional step/remove a step

  • requires seperation between 'per protocol input/output parameters' and 'workflow links'
  • should not be extra work; could do automatic mapping?
  • should be solved in the workflow.csv (instead of control flow, do data flow, i.e. create list of output-input edges)

Auto generate the list of parameters that you need

  • could automatigically be filled from templates or, if we have it, the data flow?

Better error reporting on input

  • doesn't list which variable is missing
  • templates are missing
  • syntax checking of CSV files

Monitoring of progress

  • how far is the analysis
  • succesfully or wrongly, during running (could be done with #end macro)
  • add a job at the end that creates report of results (for commandline)
    • Also do this for database version to have this info incase of database problems.

Monitoring of success and resource usage

  • have harmonized method to report 'success' or 'error', incl message + runtime
  • include also stuff like max, min etc.

Make transparant/unimportant for the user which backend is actually used

  • system decides where it (can) run: cluster, pbs, grid, local
  • needs flexible file manager that can 'stage' data for any backend
  • like to restart on other backend

Transparent server to stage data

  • to easily move pipeline to other storage.

Restart from specific step

  • remove files so far
  • reduce problem by using *tmp file and and only 'mv' if step succesful
  • however, this doesn't work if we want to restart
  • could use folders per step, so you could delete the folders from step onwards
  • can be solved by good practice

Store pilot job id in the task in the database

  • I need to know which pilot jobs have died, and which tasks were associated with it
  • Then I can re-release tasks to they can be done by another pilot

Like to have a 'heartbeat' for jobs

  • so I can be sure a (pilot) job is still alive
  • could us a 'background' process that pings back to database
  • could also be used for pbs jobs

Add putFile -force to manual

Enable easy merging of workflows, merging of parameters

  • Easily combine protocols from multiple workflows
  • wants less parameter files
  • meanwhile allow multiple worksheets

Get rid of parameters.csv and instead create worksheet

  • so parameter names on first row
  • hasOne using naming scheme A_B, means B has one A
  • conclusion: use multiple headers.
  • allow -parameters and -tparameter

Cleanup backend specific protocols

  • e.g. 'touch' commands

Visualization framework for analyses runs

Rearchitecture the components

  • one interface, multiple implementations
  • unit tests
  • Can we do something with unit tests per protocol?

Make submit.sh uses 'lock' file so the first job only ends when all is submitted

  • Problem is that first jobs fails quickly, many dependent jobs are not yet submitted, and get orphaned
  • so dependent jobs can be submitted and never have issue (alex feature)
  • at this in the #end macro so jobs can never finish until complete workflow is submitted

Seperate protocols from code

  • yes, seperate github repo
  • should enable to combine multiple protocol folders, multiple parameter filse
  • should indicate at which compute version it works

Users for runs, priorities?

  • needed if we want 'priority queue' for pilot jobs

Publish!

Approach

Clean start = yes seperate development 5 from using 4 (bug fixes) = yes database, commandline or both = both release schedule -> roadmap backwards compatibility = no