wiki:StoryConvertPhenoData

Version 6 (modified by Erik Roos, 13 years ago) (diff)

--

As a LL data manager, I want to load data from publish layer into EAV (pheno model)

TracNav(MolgenisAppStories)?

Scrum: ticket:1067

How to demo:

Acceptance criteria:

Instructions:

  • Install Oracle SQL Developer using jorislops@…'s account, or make your own
  • Adjust date settings (tools -> preferences -> database -> NLS) to YYYY-MM-DD and time settings to YYYY-MM-DD HH.MI.SSXFF AM TZR
  • Make an SSH tunnel using the command: ssh -L 2000:orawise01.target.rug.nl:15300 username@…
  • Go to Add Connections, hit the + sign, Hostname = localhost, Port = 2000, Service Name = llpacc / llptest / llp, Username = molgenis, Password = <ask Joris, Robert or Erik>
  • Connect to one of the databases (prod, acc, test), go to Other Users, go to LLPOPER User, go to Views, select them all except VW_BEP_OMSCHR and VW_DICT_HULP, and export them to CSV (separate files)
  • Now you have the CSV files that form the input of the convertor
  • VW_DICT_DATA.csv contains all the protocols and measurements
  • Convertor code is in molgenis_apps/Apps/lifelinespheno, build script is build_lifelines.xml (in project root)
  • Uncomment ImportMapper.java and all four Java files in 'listeners' package.
  • Change the path to the CSV files (line 46 of ImportMapper.java in package lifelinespheno.org.molgenis.lifelines)
  • Run the convertor as Java application

To-do's / issues:

  • Dates
  • BEZOEK_ID (PK?) empty in LL_VW_BEZOEK
  • The data in the Test and Acceptation environments are not the same
  • Add logging (so we can see what going on when it crashes in production environment, if it ever occurs)
    • Add Thread Monitor
  • How to handle/load/implement descriptive tables like LAB_BEPALING, this table is actually big list of Measurements with a lot of extra fields.
    • options:
      • Create a new type that extends Measurement and hold the additional fields
      • Merge data into the label of Category
  • How to handle/load/implement the table that describes which foreign keys are used between the tables.
    • The matrix viewer should know this info as well to build correct queries
  • Re-factor lifelines packages (it a little bit messy), remove old not used code anymore and place in descriptive packages
  • Remove JPA dependencies
    • Many-to-many in JPA are not working properly with Labels, for example ov.setTarget_Name("x"). In JDBCMapper this is solved, but know not where and how we could best do this for JPA. This set by label should also be put into generated test
    • Remove change org.molgenis.JpaDatabase interface to prevent this
    • trick to prevent compilation problem in Hudson, should be changed!
    • this.em = ((JpaDatabase)db).getEntityManager().getEntityManagerFactory().createEntityManager();
    • Jpa reverse relation cause hudson to give compile error. Should be added to molgenis for non-jpa entities. And implement as @deprecated of throw unsupportedOperationException.
  • Update CSV readers to be multi threaded?
  • In (production) environment it's not a bad idea to put the java executable in the Oracle VM that part of the database.
  • Last but not least, Test if data is loaded correctly (Test from Anco).
  • We should make sure that the data is always loaded into the right format (this means that it always end up the right way in database).