Leveraging Clojure

Project Enhancements
This article is part of series where we build and enhance the "mobile site generation" project. Each installment in this articles series looks at different computer language. A quick recap of our project. The "mobile site generation" project, is a command line utility. The command line utility reads an RSS XML document and then generates a custom website. The custom website is viewable on mobile devices. You can read more details in previous articles. In this installment, we:

  • Add new navigation links.
    • The logo header now contains a link to our table of contents (index.html) page.
  • Add a meta-tag the html source code. The meta-tag is a list of keywords. The keywords are used by search engines like Bing and Google to categorize the website. We dynamically build keyword based on the RSS XML "category" tags.

In this article we implement and enhance our project application with Clojure.

Clojure is a dialect of Lisp . Like Scala, Clojure's default implementation runs on the Java Virtual Machine (JVM). Like Scala, there is also an implementation that runs on Microsoft's .NET CLR.

Concise Coding
Clojure fits the dictionary description of the word concise. Clojure expressions are very brief, yet very comprehensive.

To compensate for Clojure's brevity, I added a fair amount of comments to my source code. Writing Clojure code reminded me of writing "C" expressions. In "C" you can write a function that returns a pointer to a function which returns an array of pointers to structures. When you are writing such expressions, the ideas are fresh in your mind. But, if you haven't coded in "C" for a period of time, then you need an explicit explanation of those same, concise "C" expressions. Thus, I did the same for my Clojure code. I added explicit explanations to the Clojure source code. Along the same lines, I formatted the Clojure code in an outline mode. Probably not the typical format for a Clojure project. However, for folks following this series, I felt it would be easier to compare corresponding functionality (expressed in Java, Scala and Ruby).

Functional Programming
Both Scala and Clojure are referred to as "Functional" computer languages. "Functional" as opposed to "Object Oriented" (E.G. Java, Ruby), or "Procedural" (E.G. "C", Basic). In an Objected Oriented language like Java, you can compose an object tree, parent and child objects (E.G. inner classes). In a "Functional" language you express function trees (E.G. Higher level functions that contain either named or anonymous child-like functions).

For example. the following code listing includes 2 anonymous functions. Those functions are defined inside the definition of a larger function (not shown here, see function "main-process" ). The whole expression iterates first through a list of articles (as represented by "nodes"). Each article contains one to many categories. The expression then iteraties through the list of categories. The inner expression returns true if any of the categories match the criteria, "is equal to the word Polyglot". If the inner expression true, the article is appendend to the collection represented by the variable "poly-articles".

To summarize, the expression compiles a list of articles that have been a categorized as "Polyglot".

(let [ poly-articles
            ( fn[n]
              (some #(= "Polyglot" %) (:categories n ) )

Data Symbols
Note! In most computer languages,a symbol representing data is referred to as a "variable" (E.G. Java Integer myNum = 1;). I'll use the term "variable", loosely. I'll use "variable", just to make the code explanation a little more familar.

However, there is an important distinction. In Clojure, and in "Functional Programming" the data sent in to function (I.E. paramater) is not mutable. The data parameter does not change. The data parameter does not "vary". Thus the term "variable" doesn't quite fit.

Again, I'll use the term "variable" here, only, because most programmers understand "variable" means "data symbol".

Examining the Clojure Expression
The following details describe the Clojure code listing above.

  • "filter" is a core Clojure function. "filter" iterates through our list of documents. "nodes" is a variable which represents our document list.
  • We delimit the start of a function with an opening parenthesis "(". We delimit the end of a function with a closing parenthesis ")".
  • The first anonymous function is locathed here (filter ( fn[n].
    • "fn[n]" is the anonymouse functoin declaration.
    • "[n]" is that parameter list (a list one paremeter in this case).
  • "some" is a Clojure function. "some" returns true if any of the cases it is inspecting, evaluates to true.
  • The second anonymouse function is expressed in shorthand "#". It means the same thing as fn[parm]. The parameter is (="Polyglot" %)
  • The expression (="Polyglot" %) is an equality comparison. "%" is a placeholder which is replaced with a category value during each iteration of the "filter" function. Thus, (="Polyglot" %) means, "is the current category value equal to the word Polyglot". So "some" (above) returns true if any category in the list is "Polyglot".
    • = is the equality operartor. We are comparing 2 values. We want to see if the 2 values are equal.
    • "Polyglot" is the fist value being compared.
    • "%" is a placeholder for the second value being comparied. The placehoder is replaced with value retrieved from the next expression (:categories n).
  • The expression (:categories n), retrieves the categories hash set from the article.
    • The collection of categories is stored in a Clojure hash set. A hash set store key/value pairs. Thus, an article might be categorized as both "Polyglot" and "Ruby". The hash set looks like {:categories "Polyglot" :categories "Ruby"}.
  • "nodes" is variable defined in the parent function. "nodes" is complex structure which represents a list of website articles. Each article, contains "title", "body", "url/link" and a collection of categories.
  • "let [ poly-articles" is the start of a larger expression. "let" is a Clojure keyword. "let" is a code block that allows the programmer to declare a local scoped variable "poly-articles". I then assign the variable "poly-articles", the result of remaining expression. In other words, I assign "poly-articles" a collection of articles which contain the category "Polyglot". The following details the "let" block. Clojure:
    • Delimits the start of of let block with an open parenthisis followed by the word "let", "(let".
    • Delimites the end of the let block with a closing parenthisis ")".
    • Delimits the start of "let" variable declarations with an open square bracket "[".
    • Delimits the end of "let" scoped variables with the closing square bracket "]".

You can peruse the full code listing on my Github respository here. The source listing is "core.clj", parent function "main-process".

So, to summarize, the whole point of the above detailed, bullet pointed explanation is to demonstrate how much you can do with Clojure, with only, a tiny segment of code.

Don't be intimidated by stack traces
A short note about stack traces. Personally, I've gotten use to editors that provide syntax checking. For my introductions to Clojure, I did not have that luxury. Thus, as you are developing, I was faced with plenty of stack traces. I might have an extra closing parenthisis. Or I might have closing parenthisis in the wrong place. When that happens and you try to run or test your code, the results is a stack trace. After a while, it became easier and easier to eyeball the stack traces and identify the syntax issue. I found if top of the stack trace was complaining about type casting, I probably placed a closing parenthisis in the wrong place. If the top of the stack trace complained about EOF (end of file), then I probably was missing a closing parenthisis. Thus, although the stack traces are intimidating at first, you get used to them.

One of the techniques I used to avoid the stack traces all together was my next topic, unit testing.

Test Driven Development with Cloure
Test Driven development is a popular workflow. Clojure comes bundled with test facilty. Combined with the Leiningen tool (detailed below), test driven development with Clojure is a snap. To demonstrate Clojure tests, let's look at some code who's expression is very familar.

For this project, I need to convert each article's "published date" , from the long format stored in the RSS XML document, in to a short YYYY-MM-DD format. The Clojure code to perform the date conversion looks pretty similar to the Java, Scala and Ruby versions.

(defn convert-month
  "Converts a three letter month representation to a
   two digit month representation."
       (= "Jan" month) "01"
       (= "Feb" month) "02"
       (= "Mar" month) "03"
       (= "Apr" month) "04"
       (= "May" month) "05"
       (= "Jun" month) "06"
       (= "Jul" month) "07"
       (= "Aug" month) "08"
       (= "Sep" month) "09"
       (= "Oct" month) "10"
       (= "Nov" month) "11"
       (= "Dec" month) "12"
       :default "ooops"

  (defn format-date 
    "Converts a long date format to a short yyyy-mm-dd."
    (def elems (clojure.string/split date #"\s"))
      (format "%s-%s-%s" (elems 3) (convert-month(elems 2)) (elems 1) )

Now here is the Clojure test:

(deftest date-format
  (testing "format-date"
    (is (=(mob-site.sitemap/format-date "Tue, 04 Jun 2013 22:13:24 +0000") "2013-06-04" ))

The test passes is the result of the function "format-date" is "2014-06-04". The "mob-site" prefix is used to identify the listing (file) which contains the definiton of the "format-date" function. As mentioned above "=(" is an equality comparison operator. "(is" is a Clorjure test assertion.

Clojure Tooling - Leiningen
I highly recommend "Leiningen". Leiningen will generate full project boilerplate. Leiningen does many things:

  • Compiles.
  • Runs you program.
  • Performs package mangement via Apace Maven
  • Supports plugins. I used the Codox pluging to generate the API documentation.
  • Launches the Clojure REPL. An interactive shell where you can submit Clojure expressions and view their results.
  • Runs Clojure tests. I have more details below.
  • And there is mode...

To run my Clojure test suite, I simply open a terminal (E.G. xtrerm). Change in to the root directory (parent) of my Clojure project. Run the following command liine:

$>lein test

Here is an example of a sucessful test suite run:

lein test mob-site.core-test

Ran 6 tests containing 6 assertions.
0 failures, 0 errors.

Clojure and Mutable Data
One of Clojure's goals is to make concurrent programming easy. In order to implement concurrency and prevent thread contention, Clojure's data structure's are "immutable". Immutable, meaning you declare and populate a data property once. You can not change the value of a immutable data property. Scala provides 2 sets of data collections, mutable and immutable. Clojure takes a different route. Clojure provides a rich set of immutable data structures and collections. To support mutable data, your functions must provide a synchronized block along with some form of "references". "References" are structures which point to data values. So a Clojure "reference" is immutable, but the data it points to, can change. Note! There are other mutable expressions in Clojure (E.G. Agents and Atoms). For purposes of this article, I've focused on Clojure "references". You can explicitly express the start of a synchronized block with:


So for example, if you need a mutable vector, you express it as such:

(def sb (ref[]))

"def" means we are defining a variable called "sb". "sb" is an empty vector ([]) of references.

Here is utility function to append a collection of references (E.G. vector):

(defn add-element
  "Utility which appends a mutable collection.
   Part of my small set of mutable functions."
  [collection elem]
    (alter collection conj elem)

Note the "dosync" block. "alter" means I am changing the collection's value". "conj" means I am appending the value storeed in "elem".

Implementing the SAX event handler in Clojure
In this project, the overall "mobile site generation" process, depends on data parsed from an RSS XML document. Once we have the parsed source data, then we can perform all subsequent steps.

When I started my Clojure implementation, I had to make a "project" decision. I had not mastered Clojure yet, the project itself, is, in part, a learning device. Thus I had to choose. Do I "mock up" the data structure which holds the result of "parse RSS XML document". Or, do I jump in to the fire, and start coding the most challenging portion of the whole project. I decided the later, "jump in to the fire and start coding". After all, I've already implemented the SAX parse routine in three other languages (Java, Scala and Ruby). How hard could it be?

I started to research my options. I found the Clojure source code for the core's SAX event handler. I noticed the Clojure source used one of the special "mutable" forms ("binding") and the mutable Java StringBuilder.

The premise of the SAX parser is, you read in small chunks of the XML document and then fire events when document fragments are recognized. You typically use a SAX parse to optimize performance, specifically use the least amout of memory. Thus, I just couldn't picture how I could retain the SAX parser routine, limit memory consumption, and, avoid a data structure that is appended (mutable).

For example, the SAX "characters" event is signaled when their is a chunk of character data ready to be read. In flat XML node organization, like RSS, you typically append a buffer each time the character event is signaled. You either capture or dispense (empy the buffer) depending on whether a new XML is recognized. Right there, you have 2 mutuable requirements. A string buffer which is filled and emptied. Some type of collection (E.G. an array, a vector) which must be appended, with the data you want to capture (E.G. like an article's title).

Hence, I created a small set of functions that support a mutable hash-set, sequence and vector. To summarize, the thinking was, retain the SAX parsing routine to make the Clojure implementation as close the other language implementations as possible (comparable to the Java, Scala or Ruby implementations).

So my Clojure implementation of the XML parse process may not be the best, but it works. And in real life, to move the project forward, that is a reasonable compromise. Again, I noted the source listing is a candidate for refactoring, an then moved on.

Implementing html-text templating with Clojure. Simple and Easy.
I don't want to leave the reader with the impression that every task was a challenge to implement in Clojure. Please refer to the source listing on my GitHub repository called "docFormatter.clj". You can compare the Clojure listing with the corresponding Java, Ruby or Scala listings. All I am doing is creating an html document interspersing text, placeholders (E.G. %s) and Clojure expressions.

Rather than display the Clojure listing. I will summarize. Implementing the html-text templating routine was simple and quick. I literally was able to copy the same routine from one of the other language implementations. I made a few modiications, and voila it worked.

Generating Clojure API documentation.
As mentioned above, I installed the Leiningen pluging Codox to generate documenetation. Clojure allows you to add meta-data to you code listings. For example let's look at the format-date function again.

(defn format-date
    "Converts a long date format to a short yyyy-mm-dd."
    (def elems (clojure.string/split date #"\s"))
      (format "%s-%s-%s" (elems 3) (convert-month(elems 2)) (elems 1) )

The text block "Converts a long date format to a short yyyy-mm-dd." is the meta-data which Codox picks out to describe the function.

In order to includes links to the source code, all I had to do was add a statement to the project definition file "project.clj" (resides in the root directory of the project). Here is the project defintion function:

(defproject mob-site "Rel-1.0"
  :description "Clojure implementation of our mobile generation project"
  :url "http://public-actiion.org"
  :license {:name "GNU General Public License, version 2"
            :url "http://www.gnu.org/licenses/gpl-2.0.html"}
  :dependencies [[org.clojure/clojure "1.5.1"]
    [commons-io/commons-io "2.4"]
  :plugins [[codox "0.6.4"]]
  :codox {:src-dir-uri "http://github.com/lorinpa/mob-site-clojure/blob/dev-1.0"
     :src-linenum-anchor-prefix "L"
  :main mob-site.core

":plugins [[codox "0.6.4"]]" instructs Leiningin to install the "Codox" plugin.

:codox {:src-dir-uri "http://github.com/lorinpa/mob-site-clojure/blob/dev-1.0"
:src-linenum-anchor-prefix "L"
Configures Codox to reference the source code published in my Github repositoy.

To generate the API document set. Again, in a terminal (E.G. xterm), change in to the roolt directory of your project, issue the following command:

lein doc

The default location for the generated API docs is "doc" ( a subdirectory of the root of your project).

I recommend Clojure
Clojure code is the most consise of all the languages examined so far. Let me qualify that statement. In all of the compute languages, the programmer can construct concise high level expressions. However, with Clojure the brevity is delivered "out of the box".

Personally, I've found Clojure to be very interesting (intellectually).

About the Author:
Lorin M Klugman - I'm an experienced developer. My main interest is in new technology. Please use our contact box here if you are interested in hiring me. Please no recruiters :)