= MolgenisFile = Managing files becomes increasingly more important in some of our projects that deal with large, preformatted datasets. Also, many results are files of non-relational nature such as images or documents. I would like to present some thoughts and principles here on how to better deal with such files in a database (Molgenis) context. I do not pretend to have the best solutions, nor do I think it is clever to introduce features already present in Molgenis. Instead I would like to open a constructive discussion on how to use this work and/or its principes to improve Molgenis and the way we design software dealing with the challenges it addresses. I hope you find it at least informative or inspirational :) - Joeri == Overview == Here we explain the differences between two ways to treat files, and how they can be harmonized. === Field is file === Molgenis has a field type 'file' which allows you to store and retrieve files. This is a solid mechanism that works just fine for most applications. See the molgenis [http://www.molgenis.org/wiki/FieldElement guide]. For advanced users however, there are some limitations to this. For instance: The storage directory is hardcoded in properties file. Not ideal, because: * Often cannot redeploy elsewhere without editing this file (ie. application not portable) * No way to check whether the path is correct, nor if tomcat/java has rights to use it for read/write actions > MS: not a valid argument. You can solve this by making the 'properties' live inside the database. > Action: We therefore could (and should) make a 'MolgenisProperties' entity to store these system settings. The file is a field and not an entity of its own. This means: * There is no straightforward way to attach a plugin for eg. viewing the file * Adding decorators, extensions, etc on is not possible in a suitable way > MS: this is a very valid argument. However, this should be solved as part of the data model, not as core method. > Action plan: we keep and improve field type="file". Only then we add inside the GCC model a way to use File as entity. type="file_entity" (see note B ) === Entity is file === To make the mechanism more open and flexible, the MolgenisFile entity was added to the [http://www.molgenis.org/svn/gcc/trunk/handwritten/datamodel/shared/core.xml core] datamodel. The model without descriptions: {{{ }}} This basic entity represents a file. It has two attributes: file name and extension. The extension is important, because it is used to map the MIME type at runtime. For example, 'png' will be served out as 'image/png'. More about the attributes and subclassing later on. == Merging == If the entity way of handling files is a good idea, it would be very feasable to combine the two and use the best of both worlds. The model and classes that deal with filehandling could be put into the Molgenis source so they are always available and centrally updated. When using file as a field, it would secretly simply be an XREF to the MolgenisFile table, so the user does not notice a difference at all. However, it would allow freedom for developers because the files can also be treated as entities. Developers can extend upon the MolgenisFile definition and handlers to tailor projects their specific needs, while keeping the field + XREF construction for the end users. There does not have to be a conflict with the current implementation :) > MS: I want to have a more backward compatible and flexible method. Proposal: make the 'complex' file build on top of the 'field' file: {{{ }}} == Technical == Here we delve into the cool stuff on how to exploit new possibilities. > MS: A completely different and much more direct approach to influence this would be: {{{ }}} === Decorating === When file is an entity, we can use a decorator to influence its behaviour. The decorator is automatically applied to all the subclasses of the entity as well. Basically, the decorator takes care of the mapping of the entity (any MolgenisFile) to the file on the filesystem. It does things like: * Names are 'escaped' to filesafe versions (eg. strange characters removed) * Names must be unique when escaped (handy for finding/downloading) * Files need to be renamed when the name is changed * Files need to be deleted properly when the record (entity) is removed * Extensions must be correct * And so on. Informative errors are thrown when something isn't going right. The code can be found [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/decorators/MolgenisFileDecorator.java here] === Setting storage location === Before you can start storing files, you need a validated storage location. There is a plugin that helps you do this [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/plugins/system/settings/ here]. The idea is as follows: * In a running application (deployed anywhere) you browse to the plugin. Preferably an administrator - we should hide this plugin from others. * You type in the preferred storage path, and click 'Set path' to save it. * Now, you must run two tests which both need to succeed before this path is marked as validated. * When the tests are successful, your path is marked VALIDATED and you can store MolgenisFiles. [[Image(storagedirplugin.png)]] If the tests fail, refer to the error message and fix what is wrong. Maybe the database is not accessable, tomcat/java lacks rights on this directory, directory is not a valid path, etc. Some information about the location is also displayed: Does it already exists? Are there files in it? The path is stored in a special table which is located inside your selected database, but outside the range of tables accessable by your application. For testing purposes, the path can be set and receive validated status manually. (see note A ) === Java API === The API has two layers: BasicFileHandler and MolgenisFileHandler, which extends BasicFileHandler. [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/BasicFileHandler.java BasicFileHandler] tells you the most basic information. For example, give me the common file storage directory for my application as a Java 'File' object. For example: {{{ BasicFileHandler bfh = new BasicFileHandler(db); File fileStorage = bfh.getFileStorage() }}} [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/MolgenisFileHandler.java MolgenisFileHandler] is a direct extension of BasicFileHandler and is constructed in the same way. Mostly focused on 'MolgenisFile' objects, you can get information or manipulate files using functions such as getFile(), deleteFile(), findFile(), getStorageDirFor(). {{{ MolgenisFileHandler mfh = new MolgenisFileHandler(db); File myRealFile = mfh.getFile(myMolgenisFile); File storageForFileType = mfh.getStorageDirFor(myMolgenisFile); }}} Note that each 'type' of MolgenisFile has its own subdirectory, and your application name is used as part of the storage location. For example: You have set your path to "/data/xgap" and deploy the application as "ngspipeline". You created an entity 'Video extends MolgenisFile'. A video file "result.mpg" would be saved as "/data/xgap/ngspipeline/video/result.mpg". This makes manual tasks such as browsing or backing up files on your filesystem easier. === Services: uploading and downloading === '''Uploading''' means creating a new MolgenisFile record, plus put the file in the correct place. There is a simple upload [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/servlet/Upload.java servlet] to do this. For a basic MolgenisFile, the servlet expects to receive: * name = The name under which the file should be stored * type = The type (subclass) of MolgenisFile, in this case: 'MolgenisFile' * file = A filestream with the content of your file you wish to store The servlet can be called in many ways, for example with [http://gbic.target.rug.nl/forum/showthread.php?tid=107 RCurl] or regular commandline [http://gbic.target.rug.nl/forum/showthread.php?tid=86 cURL]. Cool thing nr.1: The servlet is detached from the actual procedure that handles creating the database records and storing the file. This is another Java API. See code [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/generic/PerformUpload.java here]. This means you can store files from anywhere in Java sourcode by using the static doUpload() function. There are two flavours: {{{ doUpload(Database db, MolgenisFile mf, File content) }}} Which needs a database object, a MolgenisFile definition, and a File pointer to the content. Example usage: {{{ File content = request.getFile("upload"); PerformUpload.doUpload((JDBCDatabase) db, this.model.getMolgenisFile(), content); }}} And the second: {{{ doUpload(Database db, boolean useTx, String name, String type, File content, HashMap extraFields) }}} Which requires some low-level specifications instead of a 'MolgenisFile' object. Example usage: {{{ //upload as a MolgenisFile, type 'BinaryDataMatrix' HashMap extraFields = new HashMap(); extraFields.put("data_name", data.getName()); PerformUpload.doUpload(db, true, data.getName()+".bin", "BinaryDataMatrix", binFile, extraFields); }}} Cool thing nr.2: The upload services will ask for the additional fields of a subclass if you forget them! For example, if you have a 'Image extends MolgenisFile', and add a field to this subclass: {{{ }}} Then the upload API will want you to provide in an 'investigation_name', or report back an error if you don't. (e.g. "Missing needed field 'investigation_name' for MolgenisFile type 'Image'") '''Downloading''' is as simple as can be. All you need to do is provide the name of the MolgenisFile to the download [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/filehandling/servlet/Download.java service], and it will return a download (outputstream) with the file content. The cool thing here is that MIME types are automatically mapped to the file extension, so your browser will know what to do with this type of file. {{{ response.setContentType(sc.getMimeType(mf.getExtension())); }}} Use the service by calling: {{{ http://255.255.255.255:8080/xgap_1_4_distro/download.do?name=SomeFile }}} Just like the Upload servlet, it wraps a Java API (MolgenisFileHandler) that you can use elsewhere. (see sourcecode) === Practical example === Let's walk through a practical example on how to use all this stuff, step-by-step. Say we want to store images in a Molgenis database. These images are coupled to an 'Investigation'. Start by adding the entity in the datamodel, extending MolgenisFile: {{{ }}} Now add a GUI component. We nest a small [http://www.molgenis.org/svn/gcc/trunk/handwritten/java/plugins/molgenisfile/ plugin] to the form that will allow us to upload and view the images that belong to the records. {{{
}}} After generating, browse to the image section of the GUI. Create a new record as normal. [[Image(newimagerecord.png)]] The plugin appears, telling you there is no source file. Select a picture and press upload. [[Image(imguploaded.png)]] Done! If you take at a look at your filesystem, you'll find it back at your storage path + app name + MolgenisFile type, meaning: [[Image(imglocation.png)]] The plugin that we use here is very simple, and only wraps the upload and download services. Here's the upload code: {{{ File content = request.getFile("upload"); PerformUpload.doUpload((JDBCDatabase) db, this.model.getMolgenisFile(), content); this.setMessages(new ScreenMessage("File uploaded", true)); }}} And the viewer simply puts an IFRAME around a download: {{{