Test-Driven Data Warehouse Development

This comment in our bug tracking system reminded me to write a new post on the testing process that I recently applied at NewFrontiers. In the previous post I talked about the book “Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing” by Ken Collier. In this post I will show what I’ve done to make our development process more agile.

JIRA comment

 

 

 

According to Ken the goal in Agile Analytics development is the frequent release of production-quality, working software for user feedback and acceptance. Therefore;

…testing must be integrated into the development process. Each development iteration must include plans for testing and quality activities. One of the great things about this is that bugs don’t get a chance to accumulate over the project lifecycle. Quality feedback is immediate and frequent, and bugs are handled as they arise.

and;

Essential to integrated testing is test automation. Manual testing is just not practical in a highly iterative and adaptive development environment.

For our NemisFs product I am responsible for the front-end / reporting and reconciliation of the data in the Data Warehouse. Data is stored in three seperate databases; Staging (STG), Operational Data Store (ODS) and the Data Warehouse (DWH). The STG is for temporarily staging the data, the ODS is the intermediate 3NF model for flexibility, and the DWH consists of several dimensional star schemas for reporting.  Reporting is thus primarily done on the DWH, so my task is to make sure that the data in the DWH reconciles with SAP ECC.

I used to do this manually, by querying the DWH and compare the results with SAP, a very labor intensive and error prone task. Ken mentions FitNesse, developed by Robert C. Martin, a framework that uses a browser-based wiki for specifying the test cases, describing the expected results, and executing the tests. Together with the extension DbFit, developed by Gojka Adzic, they form a miraculous duo!

I will not go into details about how to setup FitNesse and DbFit because there is a lot of information on both websites. Instead I will just show how I use FitNesse and DbFit in our project.

A test in FitNesse looks like this; a screenshot of an SAP transaction, the test case with the query on the DWH and the expected results.

Screenshot of a FitNesse Test case

Clicking on the Test button will start the test, green means OK!

Screenshot of an executed test with all green -> Yeah!

Tests can belong to Suites which allow you to test a bunch of tests at once.

Screenshot of an executed Suite

Red means work to do!

Screenshot of a FitNesse test with errors -> Booh!

Editing a test is relatively easy. I love the “Spreadsheet to FitNesse” button which converts tab delimited data in a FitNesse table format.

Screenshot of the edit screen of a FitNesse test

Oracle, MS SQL and MySql are supported by DbFit, in the examples above I used an Oracle Database. I would love to hear from someone how to set it up with other databases like IBM PureData, TeraData, SAP HANA.

To increase our agility I’m also going to make tests for the ODS because that will help us find bugs during development.

We’re running FitNesse on a Ubuntu server. I want to thank my colleague Frank Dost for helping me out on the Linux command line! We setup samba for the image directory, CRON for automatic backups, and a start-stop script for the FitNesse service.

Finally I would like to thank Ken Collier for writing the Agile Analytics book, a great source for inspiration!

Linkedin Twitter Facebook Stumbleupon Tumblr Email

Leave a Reply

Your email address will not be published.