Monday, August 22, 2011

Somo tips to make XML and HTML+JS files easier to read and mantain

Here comes a couple of tips/conventions to easify xml edition. Anyone who has used Ant overengineered parametrization config files and then changed to the beautiful Maven default-by-convention config files will be aware by now how useful conventions are. For the rest of you, believe me, conventions are the best friend of anyone writting XML files.

Convention 1: Use upper case for ids global/shared constants. Those constants that are normally defined first in file and used then "everywhere". That's and standard convention in most programming languages.

Convention 2: (This is the important one). XML tags further referenced by other XML tags are usually identified by an id="arbitrary_name" attribute. This arbitrary name is later used to refer to the tag element. This second convention tell us to use the next rule for the arbitrary name:
Prefix it with "type_" where "type_" ussually equals to the tag name that "owns" the id attribute or a real type if the tag name is not descriptive enough / too generic.

Behind is shown two (Ant) XML files before and after applying conventions:
BEFORE APPLYING CONVENTIONS        ||   AFTER APPLYING CONVENTIONS                                
<project name=....>                  || <project name=....>
...                                  || ...
<property name=""        || <property name="DIR_JAVA_SRC"
   value="src" />                    ||    value="src" />
<property name="lib.dir"             || <property name="DIR_LIB"
   value="../../../lib" />           ||    value="../../../lib" />
<property name="build.dir"           || <property name="DIR_BUILD"
   value="bin" />                    ||    value="bin" />
<path id="project.classpath">        || <path id="PATH_PROJECT">
  <fileset dir="${lib.dir}">         ||   <fileset dir="${DIR_LIB}">
    <include name="**/*.jar" />      ||       <include name="**/*.jar" />
  </fileset>                         ||   </fileset>
</path>                              || </path>
<patternset id="conf">               || <patternset id="PATTERNSET_CONF">
  <include name="**/*.xml" />        ||   <include name="**/*.xml" />
  <include name="**/*.properties" /> ||   <include name="**/*.properties" />
  <include name="**/*.conf" />       || </patternset>
</patternset>                        ||
                                     || <patternset id="PATTERNSET_IMAGES">
<patternset id="images">             ||   <include name="**/*.png" />
  <include name="**/*.png" />        ||   <include name="**/*.jpg" />
  <include name="**/*.jpg" />        ||   <include name="**/*.gif" />
  <include name="**/*.gif" />        ||   <include name="**/*.gif" />
</patternset>                        || </patternset>
...                                  || 
<target name="copyconf">             || <target name="copyconf">
  <mkdir dir="${build.dir}" />       ||   <mkdir dir="${DIR_BUILD}" />
  <copy todir="${build.dir}">        ||   <copy todir="${DIR_BUILD}">
    <fileset dir="${}">  ||     <fileset dir="${DIR_JAVA_SRC}">
      <patternset refid="conf" />    ||       <patternset refid="PATTERNSET_CONF" />
    </fileset>                       ||     </fileset>
  </copy>                            ||   </copy>
</target>                            || </target>
</project>                           || </project>

The first great advantage of using the second convention is that now we can use word/code completion in our favourite text editor.

Notice the next points in the previous example:

- Once conventions are applied, if we want to edit a dir value we just write:
<tagName dir="{$DIR_
and using word_completion in our editor (for example Ctrl+X n in vim) will show a list with the avaible dirs constants defined.
Notice that <property name="arbitrary_name" ...> is a little bit assimetric (Property is actually used as a meta-tag to define constants string values for later replacement in our XML text file). It doesn't follow the convention:
<tagName id="arbitrary_name" ...>
using "name", not "id" for the tag attribute. Also we can't use the tag name, "property" as the prefix since it itself doesn't provide the type of the object (it's always an string type value). We will use something like "DIR_" as in the previous example to indicate that the property value is a directory. Other useful and descriptive prefixes could be "PATH_", "CLASSPATH_", "URL_", ... where the "PREFIX_" is a descriptive string describing the actual type of its value.

- For the "patternset", that follows the standard '<tagName id="arbitry_name" ' we will write:
<patternset idref="PATTERNSET_ (Ctrl+X n for autocompletion)
And word/code completion will offer now a list of available candidates:

- We could now continue to use code completion with any defined PATH_... element or any other type.

- The convetion used is also exceptionally useful when mixing HTML (sort of pseudo-XML) and Javascript. For example instead of identifying a table like:
<table id="results" .... >
<table id="buttons" .... >
we can use:
<table id="table_results" .... />
<table id="table_buttons" .... />
again code completion will be at our disposal when handling the html element (table, div, form,...) through Javascript. Now the editor can help us writting our "risky and error prone" javascript code as the next screenshot probes:
Basically the "TYPE_" prefix convention is adding manual type safety to our non-type-safe XML/HTML+JS files.

"Magically" now we are half-way between the non-type-safe languages and the compile-check type-safe ones. We don't yet have a compile-check to advice us of code mistakes, but at least the editor helps us now with code completion (that actually is certainly an indirect check safety measure).

No comments: