= MolgenisFile = Managing files becomes increasingly more important in some of our projects that deal with large, preformatted datasets. Also, many results are files of non-relational nature such as images or documents. I would like to present some thoughts and principles here on how to better deal with such files in a database (Molgenis) context. I do not pretend to have the best solutions, nor do I think it is clever to introduce features already present in Molgenis. Instead I would like to open a constructive discussion on how to use this work and/or its principes to improve Molgenis and the way we design software dealing with the challenges it addresses. I hope you find it at least informative or inspirational :) - Joeri == Overview == Here we explain the differences between two ways to treat files, and how they can be harmonized. === Field is file === Molgenis has a field type 'file' which allows you to store and retrieve files. This is a solid mechanism that works just fine for most applications. See the molgenis [http://www.molgenis.org/wiki/FieldElement guide]. For advanced users however, there are some limitations to this. For instance: The storage directory is hardcoded in properties file. Not ideal, because: * Often cannot redeploy elsewhere without editing this file (ie. application not portable) * No way to check whether the path is correct, nor if tomcat/java has rights to use it for read/write actions > MS: not a valid argument. You can solve this by making the 'properties' live inside the database. > Action: We therefore could (and should) make a 'MolgenisProperties' entity to store these system settings. The file is a field and not an entity of its own. This means: * There is no straightforward way to attach a plugin for eg. viewing the file * Adding decorators, extensions, etc on is not possible in a suitable way > MS: this is a very valid argument. However, this should be solved as part of the data model, not as core method. > Action plan: we keep and improve field type="file". Only then we add inside the GCC model a way to use File as entity. type="file_entity" (see note B ) === Entity is file === To make the mechanism more open and flexible, the MolgenisFile entity was added to the [http://www.molgenis.org/svn/gcc/trunk/handwritten/datamodel/shared/core.xml core] datamodel. The model without descriptions: {{{ }}} This basic entity represents a file. It has two attributes: file name and extension. The extension is important, because it is used to map the MIME type at runtime. For example, 'png' will be served out as 'image/png'. More about the attributes and subclassing later on. == Merging == If the entity way of handling files is a good idea, it would be very feasable to combine the two and use the best of both worlds. The model and classes that deal with filehandling could be put into the Molgenis source so they are always available and centrally updated. When using file as a field, it would secretly simply be an XREF to the MolgenisFile table, so the user does not notice a difference at all. However, it would allow freedom for developers because the files can also be treated as entities. Developers can extend upon the MolgenisFile definition and handlers to tailor projects their specific needs, while keeping the field + XREF construction for the end users. There does not have to be a conflict with the current implementation :) > MS: I want to have a more backward compatible and flexible method. Proposal: make the 'complex' file build on top of the 'field' file: {{{ }}} == Technical == Here we delve into the cool stuff on how to exploit new possibilities. > MS: A completely different and much more direct approach to influence this would be: {{{ }}} And then a java interface, for example: {{{ public interface FileInputDecorator { //decorator that influences how this thing is rendered public String render(); //validate the file in how many ways you want public MolgenisMessage validate(); //have a decorator to do things before insert, update, delete public void preUpdate(Entity e, Operation operation); //have a decorator to do things after insert, update, delete public void postUpdate(Entity e, Operation e) } }}} === Decorating === When file is an entity, we can use a decorator to influence its behaviour. The decorator is automatically applied to all the subclasses of the entity as well. Basically, the decorator takes care of the mapping of the entity (any MolgenisFile) to the file on the filesystem. It does things like: * Names are 'escaped' to filesafe versions (eg. strange characters removed) * Names must be unique when escaped (handy for finding/downloading) * Files need to be renamed when the name is changed * Files need to be deleted properly when the record (entity) is removed * Extensions must be correct * And so on. Informative errors are thrown when something isn't going right. The code can be found [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/decorators/MolgenisFileDecorator.java here] === Setting storage location === > MS: This is very useful but I want this to be generalized to a 'MolgenisProperties' screen that would validate the molgenis settings. It should also include database name, password, etc. Also I would like to have a standard 'installation' script that checks if the database is consistent with the code and optionally to automatically load system data (such as system settings). Like an install wizard that asks users for some parameters on first start. Before you can start storing files, you need a validated storage location. There is a plugin that helps you do this [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/plugins/system/settings/ here]. The idea is as follows: * In a running application (deployed anywhere) you browse to the plugin. Preferably an administrator - we should hide this plugin from others. * You type in the preferred storage path, and click 'Set path' to save it. * Now, you must run two tests which both need to succeed before this path is marked as validated. * When the tests are successful, your path is marked VALIDATED and you can store MolgenisFiles. [[Image(storagedirplugin.png)]] If the tests fail, refer to the error message and fix what is wrong. Maybe the database is not accessable, tomcat/java lacks rights on this directory, directory is not a valid path, etc. Some information about the location is also displayed: Does it already exists? Are there files in it? The path is stored in a special table which is located inside your selected database, but outside the range of tables accessable by your application. For testing purposes, the path can be set and receive validated status manually. (see note A ) === Java API === > MS: all what you describe here already holds for field type="file". So this is in my book duplicated work. Only thing we can differ on in opinion is where files should go. In MOLGENIS that is /path/entity/entity_label.ext. In older versions of MOLGENIS that could be customized. The API has two layers: BasicFileHandler and MolgenisFileHandler, which extends BasicFileHandler. [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/BasicFileHandler.java BasicFileHandler] tells you the most basic information. For example, give me the common file storage directory for my application as a Java 'File' object. For example: {{{ BasicFileHandler bfh = new BasicFileHandler(db); File fileStorage = bfh.getFileStorage() }}} [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/MolgenisFileHandler.java MolgenisFileHandler] is a direct extension of BasicFileHandler and is constructed in the same way. Mostly focused on 'MolgenisFile' objects, you can get information or manipulate files using functions such as getFile(), deleteFile(), findFile(), getStorageDirFor(). {{{ MolgenisFileHandler mfh = new MolgenisFileHandler(db); File myRealFile = mfh.getFile(myMolgenisFile); File storageForFileType = mfh.getStorageDirFor(myMolgenisFile); }}} Note that each 'type' of MolgenisFile has its own subdirectory, and your application name is used as part of the storage location. For example: You have set your path to "/data/xgap" and deploy the application as "ngspipeline". You created an entity 'Video extends MolgenisFile'. A video file "result.mpg" would be saved as "/data/xgap/ngspipeline/video/result.mpg". This makes manual tasks such as browsing or backing up files on your filesystem easier. === Services: uploading and downloading === > MS: I feel that this is already covered by the html extraFields) }}} Which requires some low-level specifications instead of a 'MolgenisFile' object. Example usage: {{{ //upload as a MolgenisFile, type 'BinaryDataMatrix' HashMap extraFields = new HashMap(); extraFields.put("data_name", data.getName()); PerformUpload.doUpload(db, true, data.getName()+".bin", "BinaryDataMatrix", binFile, extraFields); }}} Cool thing nr.2: The upload services will ask for the additional fields of a subclass if you forget them! For example, if you have a 'Image extends MolgenisFile', and add a field to this subclass: {{{ }}} Then the upload API will want you to provide in an 'investigation_name', or report back an error if you don't. (e.g. "Missing needed field 'investigation_name' for MolgenisFile type 'Image'") '''Downloading''' is as simple as can be. All you need to do is provide the name of the MolgenisFile to the download [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/servlet/Download.java service], and it will return a download (outputstream) with the file content. The cool thing here is that MIME types are automatically mapped to the file extension, so your browser will know what to do with this type of file. {{{ response.setContentType(sc.getMimeType(mf.getExtension())); }}} Use the service by calling: {{{ http://255.255.255.255:8080/xgap_1_4_distro/download.do?name=SomeFile }}} Just like the Upload servlet, it wraps a Java API (MolgenisFileHandler) that you can use elsewhere. (see sourcecode) === Practical example === >MS: this doesn't allow you to easily reuse viewers in more complicated entities and it greatly polutes the model. I have now to subclass MolgenisFile for all types instead of just saying {{{ }}} Let's walk through a practical example on how to use all this stuff, step-by-step. Say we want to store images in a Molgenis database. These images are coupled to an 'Investigation'. Start by adding the entity in the datamodel, extending MolgenisFile: {{{ }}} Now add a GUI component. We nest a small [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/plugins/molgenisfile/ plugin] to the form that will allow us to upload and view the images that belong to the records. {{{
}}} After generating, browse to the image section of the GUI. Create a new record as normal. [[Image(newimagerecord.png)]] The plugin appears, telling you there is no source file. Select a picture and press upload. [[Image(imguploaded.png)]] Done! If you take at a look at your filesystem, you'll find it back at your storage path + app name + MolgenisFile type, meaning: [[Image(imglocation.png)]] The plugin that we use here is very simple, and only wraps the upload and download services. Here's the upload code: {{{ File content = request.getFile("upload"); PerformUpload.doUpload((JDBCDatabase) db, this.model.getMolgenisFile(), content); this.setMessages(new ScreenMessage("File uploaded", true)); }}} And the viewer simply puts an IFRAME around a download: {{{