wiki:StoryConvertPhenoData

Version 7 (modified by Erik Roos, 13 years ago) (diff)

--

As a LL data manager, I want to load data from publish layer into EAV (pheno model)

TracNav(MolgenisAppStories)?

Scrum: ticket:1067

How to demo:

Acceptance criteria:

Instructions:

  • Install Oracle SQL Developer using jorislops@…'s account, or make your own
  • Adjust date settings (tools -> preferences -> database -> NLS) to YYYY-MM-DD and time settings to YYYY-MM-DD HH.MI.SSXFF AM TZR
  • Make an SSH tunnel using the command: ssh -L 2000:orawise01.target.rug.nl:15300 username@…
  • Go to Add Connections, hit the + sign, Hostname = localhost, Port = 2000, Service Name = llpacc / llptest / llp, Username = molgenis, Password = <ask Joris, Robert or Erik>
  • Connect to one of the databases (prod, acc, test), go to Other Users, go to LLPOPER User, go to Views, select them all except VW_BEP_OMSCHR and VW_DICT_HULP, and export them to CSV (separate files)
  • Now you have the CSV files that form the input of the convertor
  • VW_DICT_DATA.csv contains all the protocols and measurements
  • Convertor code is in molgenis_apps/Apps/lifelinespheno, build script is build_lifelines.xml (in project root)
  • Uncomment ImportMapper.java and all four Java files in 'listeners' package.
  • Change the path to the CSV files (line 46 of ImportMapper.java in package lifelinespheno.org.molgenis.lifelines)
  • Run the convertor as Java application
  • If you encounter errors, find out what the last part of "our" code was where it went wrong. Probably it's an issue with an entity that's not saved before being flushed. Go to the offending line and find out which entity is the problem. Scroll to the right in the Console to find out which Many-to-One relation of the Entity is the cause. Go to the generated Java class of the entity and uncomment the "cascade" part in the "@ManyToOne?" line just above the line where the relation is defined. Re-run.

To-do's / issues:

  • Dates
  • BEZOEK_ID (PK?) empty in LL_VW_BEZOEK
  • The data in the Test and Acceptation environments are not the same
  • Add logging (so we can see what going on when it crashes in production environment, if it ever occurs)
    • Add Thread Monitor
  • How to handle/load/implement descriptive tables like LAB_BEPALING, this table is actually big list of Measurements with a lot of extra fields.
    • options:
      • Create a new type that extends Measurement and hold the additional fields
      • Merge data into the label of Category
  • How to handle/load/implement the table that describes which foreign keys are used between the tables.
    • The matrix viewer should know this info as well to build correct queries
  • Re-factor lifelines packages (it a little bit messy), remove old not used code anymore and place in descriptive packages
  • Remove JPA dependencies
    • Many-to-many in JPA are not working properly with Labels, for example ov.setTarget_Name("x"). In JDBCMapper this is solved, but know not where and how we could best do this for JPA. This set by label should also be put into generated test
    • Remove change org.molgenis.JpaDatabase interface to prevent this
    • trick to prevent compilation problem in Hudson, should be changed!
    • this.em = ((JpaDatabase)db).getEntityManager().getEntityManagerFactory().createEntityManager();
    • Jpa reverse relation cause hudson to give compile error. Should be added to molgenis for non-jpa entities. And implement as @deprecated of throw unsupportedOperationException.
  • Update CSV readers to be multi threaded?
  • In (production) environment it's not a bad idea to put the java executable in the Oracle VM that part of the database.
  • Last but not least, Test if data is loaded correctly (Test from Anco).
  • We should make sure that the data is always loaded into the right format (this means that it always end up the right way in database).