| 6 | |
| 7 | === Where to put reference data === |
| 8 | |
| 9 | Reference data sets available to all (Hence not group specific data) can be deployed ''as-is'' in: |
| 10 | {{{ |
| 11 | /apps/data/${provider}/${data_set}/$version/ |
| 12 | }}} |
| 13 | When reference data must be modified for example because it must be indexed / reformatted for use with specific version of software, you must put the derived version in a sub dir to indicate it is not the original. When it was modified for a specific version of an app you could for example create additional sub dirs like this: |
| 14 | {{{ |
| 15 | /apps/data/${provider}/${data_set}/$version/${app}/${version}/ |
| 16 | }}} |
| 17 | Always add a {{{/apps/data/${provider}/${data_set}/$version/README}}} with at least details on: |
| 18 | * What the source location of the data was. |
| 19 | * When it was download. |
| 20 | * If a derived flavor was created: how the data was modified (link to eLabjournal and/or code in our GitHub repos) and for what purpose. |
| 21 | |
| 22 | {{{#!comment |
| 23 | TODO: add example for GRCh38 |
| 24 | |
| 25 | /apps/data/GRC/GRCh/38/ * data gedownload "as is"; hooguit uitgepakt |
| 26 | /apps/data/GRC/GRCh/38/BWA/0.7.12-goolf-1.7.20/ * een setje relatieve symlinks naar de referentie sequenties: ../../uitgepakte referentie fasta seqs |
| 27 | * bwa indices voor deze referentie |
| 28 | }}} |
| 29 | |
| 30 | === Syncing deployed reference data to nodes === #SyncRefData |
| 31 | |
| 32 | Before you can use reference data on cluster nodes it needs to be synced to various places. |
| 33 | * Switch to the ''envsync'' user: |
| 34 | {{{ |
| 35 | $> sudo -u umcg-envsync bash |
| 36 | }}} |
| 37 | * Now sync the reference data by specifying the path to the data set relative to /apps/data/ (or specify the complete absolute path if you like to type). The sync will work recursively. |
| 38 | {{{ |
| 39 | $> hpc-environment-sync.bash -r ReferenceData/ |
| 40 | }}} |
| 41 | or |
| 42 | {{{ |
| 43 | $> hpc-environment-sync.bash -r /apps/data/ReferenceData/ |
| 44 | }}} |
| 45 | * For a full list of options use the commandline help: |
| 46 | {{{ |
| 47 | hpc-environment-sync.bash -h |
| 48 | }}} |