Unit testing in Business Intelligence
In my previous post I wrote about the importance of unit testing in business intelligence projects, specifically the need for it when you are following an agile approach. We also looked at the basic layers that you have in the typical Microsoft BI project and how a seemingly easy change has and effect at almost all of the different layers in the solution.
In this post I want to dive into the mechanics of unit testing in BI and look at some high level examples of unit tests.
Basic structure of a unit test
The basic structure of a unit test is very similar across many unit testing frameworks and usually consists of a Setup, Test and Tear down phase
Lets look at each one of these phases and their function.
The setup phase is used to perform any actions needed to create the environment required to perform the actual test. This could be anything, from creating instances of specific objects to adding records to database tables or setting up files that the test will need. The setup phase is there to ensure that the test has all the resources it requires to run.
The test phase normally involves executing or running the particular piece of code or application that you want to test. You will normally provide the objects that you created during the setup phase as inputs to the code or application that you are testing and then compare the result or outputs of it against an expectation. If the output matches your expectation the test will be successful.
This pattern of providing inputs, executing your code and then comparing it against an expected result is the pattern followed for the majority of tests. There could be cases where you test that under certain circumstances errors are raised in which case your expected result will not be a result set but the actual error that is raised.
Tear down is the phase in which we clean up any objects that might have been left behind after our test has executed. We can do this to ensure that the next unit test has a clean slate to start from or to perform certain functions at the end of each unit test.
Unit tests in a BI project
In a BI project we need to focus on the different applications that we use and devise tests that satisfy the Setup, Test and Tear Down requirements needed to test each of them. Lets look at some practical examples of how you would go about testing two of the main layers in the Microsoft BI stack, namely the ETL and OLAP layers.
In order to test SSIS packages you will need to create test data for each test that you would like to perform on a packages. As an example lets look at a simple package that builds a Product dimension.
The input of the package would be a set of database tables in the source system that contains the product and product category information. Below are some samples of the possible data in our source tables.
Our SSIS package takes this data and builds our Product dimension which is a Slowly Changing Dimension Type 2. The attribute that will cause the type 2 change is the Category Code, thus if our Nike X Trainer suddenly moves to a category other than Men's Shoes we will perform a type 2 change on the dimension. All other attribute change will result in a type 1 update to the table.
Two possible tests that we might want to do on this package could be to test that new products are inserted correctly and SCD Type 2 changes work correctly. Lets look at how a potential unit test for this could look like.
In order to do our test we need to ensure we have data available for our SSIS package to test. In our setup phase we will insert the required rows into the relevant database tables that our SSIS package will point to. (Note that a copy of the source system would be created in a test environment in order to perform the tests.)
The first table that we will populate will be the Product dimension table in the data warehouse. This is done to ensure that we have a dimension table against which we can test our new record and SCD type 2 change logic. We can insert the following data into it.
The second table that we will populate will be the Category table in the source system in our test environment. We can insert the following data into it.
The last table that we will insert data into for the purpose of this test will be the Product table in our source system in the test environment. The data could look as follows.
Once we have completed the setup phase and all our test data is inserted into the relevant tables we will execute the SSIS package and wait for it to complete. We will then compare the data in the Product dimension table against the expected data and if the results match we know that our SSIS packages is functioning correctly based on the input data we provided to it.
Our expected data would could look like this.
If the data in the Product dimension does not match our expectation then our test has failed and we need to fix the SSIS package until it passes our test.
After we have completed the test phase we would typically delete all the records from the tables we used in order to ensure that the next test has a clean database to start with.
For SSAS we will follow the exact same pattern as we did for the SSIS, the main difference will be that instead of inserting records into the tables of our source system we will insert records into our data warehouse.
The steps we would take to create a test for a particular measure in our cube would look something like this.
Insert records into the data warehouse for each dimension and fact table required to process the cube. (Remember you do not need to insert hundreds of records, only enough records to allow you to run the test that you have defined.)
Process the cube, ensuring that if the cube does not process successfully it will automatically result in failing the test.
Once the cube has been processed run a prepared MDX query against the cube and compare the result of it against the expected result of your test. If the two results match then our measure has passed the test. If there is a mismatch then the measure has failed the test.
After the test phase is complete we will clean out the data warehouse tables that we populated at the start of the test to ensure any subsequent tests have a clean database to start with.
More extensive tests and available tools
From the above I am sure you can see how,in addition to the unit tests that we would normally create for each individual package, we could create tests that test our ETL end to end. We could extend this further to be an end to end test from our source system all the way to our cube.
So far we have looked at the structure of a unit test and some examples of how tests against SSIS and SSAS. We have not yet investigated how this can be automated so that we could integrate these tests into our nightly build, or as developers do actual test driven development on our BI projects.
In my previous post I have mentioned some of the tools that we could use such as SSISUnit and QueryUnit. Other options are to develop your own unit testing tool. Unfortunately we do not currently have a dedicated testing tool for BI projects in the Microsoft BI tool set. By that I mean a tool that allows for the easy, user friendly development and running of unit tests. (Just to be clear, writing loads of C# to create a unit test is not something I would classify as user friendly since it brings an added level of complexity to the process).
In the next post we will explore some of the solutions and steps that we can take to get closer to the ideal of a testing tool specifically aimed at BI projects.