Login

Testing

Automated Testing

Data Mining with Clojure and Datomic

In this article we take a look at a few items:

  • Applying a "Functional Style" to our code. Making better use of Clojure's programming features.
  • Introduction to Datomic, a database written in Clojure.
  • Use of Clojure Test fixtures.

For our project, we are going to analyze job skills. We use Stack Overflow Careers 2.0 as our data source. For example, I query Stack Overflow Careers 2.0 for jobs within a 10 mile radius of my zip code. I search for job postings that contain the keyword "java" or "clojure". I capture the search results as an RSS XML document. I save the XML document to disk.

We use Datomic to persist our data. We also use Datomic to query our data. We then create reports from the queried data.

Datomic is not a relational database. However, it's pretty simple to define a Datomic schema which includes logical relationships.

For our project we first establish a "snapshot". A "snapshot" simply simply tells us when we collected our data. A "snapshot" includes a short description. Here is the "snapshot" schema definition supplied to Datomic:

{:db/id #db/id[:db.part/db]
  :db/ident :snapshot/time
  :db/valueType :db.type/instant
  :db/cardinality :db.cardinality/one
  :db/doc "time data was extracted. milliseconds"
  :db.install/_attribute :db.part/db}

  {:db/id #db/id[:db.part/db]
  :db/ident :snapshot/description
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/unique :db.unique/identity
  :db/doc "free form description of snapshot"
  :db.install/_attribute :db.part/db}

  {:db/id #db/id[:db.part/db]
  :db/ident :snapshot/job-set
  :db/valueType :db.type/ref
  :db/cardinality :db.cardinality/many
  :db/doc "List of Jobs obtained during snapshot"
  :db.install/_attribute :db.part/db}
  

Note the last attribute ":snapshot/job-set". That states a "snapshot" contains zero to many "jobs". The cardinality is "many", the datatype is "ref". We are storing references to "jobs". This definition is similar to a relational database "foreign key".

We use the same concept to relate a "job" to "skills". Each "job" has an attribute which stores zero to many "skill" references. A skill being something like "java", "sql", "python", etc.

The complete schema is located on github in schema.dtm.

The process for updating the database is simple. I first process the "skills". I query Datomic to determine whether a skill already exists in our database. If the skill does not exist in our database, we want Datomic to generate a unique identifier, AKA entity id. If the skill already exists, we use the existing entity id.

Here is the relevant code.

;; conn parameter is the database connection
;; queries the database for a particular skill (E.G. "programming")
;; if found, returns the database entity id
(defn get-skill-entity-id [conn skill]
 (let [results (q '[:find ?c :in $ ?t :where [?c :skill-set/skill ?t]]
              (db conn) skill) ]
         (first results)))

;; returns true if the skill can not be found
;; in the database (E.G. do we have "programming" stored in the database)
(defn skill-not-exists? [conn skill]
   (nil? (get-skill-entity-id conn skill)))

(defn process-skill [conn skill]
  (when (skill-not-exists? conn skill)
    (add-skill conn skill)))

Now, let's add the skill.

(defn add-skill [conn skill]
   (let [ temp_id (d/tempid :db.part/user) ]
      @(d/transact conn
      [[:db/add temp_id :skill-set/skill skill]])))

Now that the skills are processed, we can now process jobs. Remember, when we process a job we need to associate zero to many skills. So, we now have entity ids for all the skills. Thus, we can use the skill's entity ids as references for our jobs. Here is a portion of our code which stores a job.

(defmulti add-job (fn [one two three four five] (class five)))

;; job has skill list
(defmethod add-job clojure.lang.PersistentVector
  [conn entity-id title job-key skill-list]
  @(d/transact conn [{:db/id entity-id
                      :jobs/title title,
                      :jobs/job-key job-key,
                      :jobs/skill-set skill-list}]))

;; job does not have any skills 
(defmethod add-job nil
  [conn entity-id title job-key skill-list]
  @(d/transact conn [{:db/id entity-id
                      :jobs/title title,
                      :jobs/job-key job-key}]))

We define 2 method signatures, 1) a job with skills, 2) a job without any skills.

We use the same logic that we used for skills to establish an entity id. We look up the job, if the job is already present in the database we use the existing entity id. If the job does not exist, then let Datomic create a new entity id.

Finally we can add our snapshot to the database. The same logic applies. The snapshot contains zero to many job references. Now that we have an entity id for each job. We use the job entity ids are references in our snapshot.

Testing the Java REST-MVC Server Tier

Article Summary
This article is part of a series. We use the "book club" project to explore various programming languages and frameworks. Details of the book club's business and data requirements are detailed in a prior article, "Leveraging Ruby on Rails and ClojureScript.".

We've created a new version of the book club's server tier using a stack of Java components. See the article REST-MVC using Java for details of implementation. This article details automated testing for the Java/REST-MVC server tier.

Automated Unit Tests
In this article we will not enforce a strict differentiation between "Unit" and "Integration" test.

This article refers to "automated testing". That means, the programmer writes a separate program(s) to "test" functionality. We use the Junit testing framework along with some plugin/extensions (Mockito, JsonPath and Hamcrest).

Organizing Tests Into A Plan
In this section of the article we discuss "what" we test.

We need to consider the business and functional requirements of our application and then plan our test suite. So, let's recap our application and organize our test plan. We have entities (and a relationship). Entities are Authors, Books, Categories and Reviews. Book-Categories are relationship between a book and a category (E.G. Tom Sawyer is Fiction).

We also have application services.

  • A repository is responsible for reading and writing data, to and from the database (and the application data objects).
  • A controller accepts requests, dispatches the request to a handler, then routes the response. Our server has 2 types of controllers. We have controllers which route the response to server generated view (E.G. jsp/html/css). We also have controller which generate a JSON response (the client renders the view).

Finally, our application has some specific implementation/optimization requirements. We require the database (as opposed to the application code) enforce unique Authors, Books, Categories and Book-Categories. We require the database to delete related records. For example, books require an author. If we delete an author, the database should automatically delete all of the books written by that author.

So here are the "operation/method" tests:

  • Select records
  • Insert a record
  • Modify a record
  • Delete a record
  • Attempt to insert a duplicate record.
  • Attempt to modify an existing into a duplicate.
  • Delete an Author and verify all related books, reviews and book-categories are also deleted.
  • Delete a Book and verify all related reviews and book-categories are also deleted.

For each of the above we perform for the following services:

  • Repository
  • Controller
  • Rest Controller

For each service, we test the following entities :

  • Author
  • Book
  • Category
  • Review

For each service, we test the following relationship :

  • Book-Category

Testing Clojurescript - A simple approach

Article Summary
This article is part of a series. We use the "book club" project to explore various programming languages and frameworks. Details of the book club's business and data requirements are detailed in a prior article, "Leveraging Ruby on Rails and ClojureScript.".

This article details authomated testing og our Clojurescript client. Please see the corresponding article ClojureScript - Single Page Application - A simple approach for details on business requirements and implementation details.

We employ a stack of Java libraries and tools. Source code and build files are located on GitHub, at the book-site-clojurescript repository.

In this article we'll detail automated testing of our Clojurescript client. I'm using a port of clojure.test. cemerick/clojurescript.test. I am using phantomjs as a test runner (container).

Managing the Project with Leiningen
We are using the Leiningen to manage the Clojurescript project. Settings for Leiningen are placed in project.clj (located at the root directory of your project).

We need to add a contributed library to support automated tests for Clojurescript. We are using cemerick /clojurescript.test, a port of clojure.test to ClojureScript.

We make the following changes to the project.clj file.

We add the test library to the list dependencies and plugins:

:dependencies [[org.clojure/clojure "1.5.1"] [org.clojure/clojurescript "0.0-1859"] 
                      [com.cemerick/clojurescript.test "0.2.1"]]
:plugins [[lein-cljsbuild "0.3.3"] [com.cemerick/clojurescript.test "0.2.1"]
                      [marginalia "0.7.1"] [lein-marginalia "0.7.1"]]

Adding a new test target for the compiler.
Next we instruct the Clojurescript compiler to build 2 seperate targets. The first target includes our unit tests. The second target is what we'll use when deploying on the server (and ultimately delivered to the web browser client).

:cljsbuild
  { :builds [ { :source-paths ["src/cljs" "test" ]
                :compiler {:output-to "target/cljs/wstestable.js"
                         :optimizations :whitespace :pretty-print true}}
              {:id "prod" :source-paths ["src/cljs"]
                :compiler {:output-to "target/cljs/books_cljs.js"
                         :optimizations :whitespace :pretty-print true}}]
     :test-commands { "phantom-ws" [ "target/cljs/wstestable.js"]}
  }

Compiling Multiple Targets
With the above configuration we can compile just "prod" (production) target with the following command line:

$dev> lein cljsbuild once prod

Or, if we want our code compiled each time we save changes to disk. we substitute "once" with "auto":

$dev> lein cljsbuild auto prod

If we want to build both targets, just remove "prod" from the command line. For example to compile both targets just once:

$dev> lein cljsbuild once

Writing the tests.
In general these tests are much closer to the definition of "unit tests". The functions we are testing a very small. For example, a test of addAuthor[]. Let's look at the addAuthor source code:

(defn addAuthor
  ([id first_name last_name ]  
    (swap! AuthorList conj (Author. id first_name last_name )))
  ([jsonObj] (swap! AuthorList conj (jsonToAuthor jsonObj)) )
)

The above function can be called 2 different ways:

  1. With 3 parameters, an id, author's first name. author's last name
  2. WIth 1 parameter, a JSON object
Syndicate content