SSIS frameworks and unnecessary complexity
The notification email arrives in my inbox carrying the unwelcome message that the SSIS job failed. I instantly get a sinking feeling in my stomach, this is going to be an unpleasant morning. I open Management Studio and get to work looking at the SSIS Catalog execution reports to see if I can track down a meaningful error message. What should be a straight forward process of debugging the ETL turns instead into a SSIS debugging high wire act of querying obscure configuration tables, interrogating custom logging tables and setting up and executing parts of SSIS packages in Visual Studio.
SSIS frameworks, whilst very useful when implemented correctly, can be one of the greatest sources of unnecessary complexity when not designed with flexibility and modularity in mind. It seems that there is no limit to the appetite for complexity by developers and IT professionals. If you are considering creating a framework for your company, or are maintaining an existing one, then it will be well worth asking yourself the following questions on your road to framework nirvana.
Are you hindering efficient debugging in Visual Studio?
If the answer to this question is anything other than a confident “No" then you are on the road to frustration, agitation and lots of wasted time. We have come a long way since the dark days of DTS packages and the even darker days before that of BCP and TSQL scripts.
If you are unable to open your ETL solution in Visual Studio, set a few project and package parameters, run your solution and see the error displayed right there in your IDE then you have some soul searching to do. To better illustrate what I mean let's look at an example.
The master conductor
Since SQL Server 2012 we have had the ability to start SSIS packages via TSQL (without resorting to the dubious and horrible xp_cmdshell). Having this built in support for running SSIS packages is extremely useful but can lead to disastrous consequences when taken too far.
One example of this is when a framework uses a “master conductor” packages to start all the SSIS packages for a particular job via TSQL. Sometimes this package dynamically retrieves the list of packages to execute from a database table held in a “control” or “settings” database. Typically, these “control” or “settings” databases are a hangover from the SQL 2005 and 2008 era where configuration files were a pain and using such a database made life much easier. Today however this is not required anymore as the SSIS Catalog negates their use and makes life much simpler and explicit.
Now don’t confuse this “master conductor” package with the correct kind of “master” package in which you explicitly specify when and which packages to execute by using Execute Package tasks. I am specifically talking about a package that starts other SSIS packages via TSQL and which thus stops you from being able to debug your ETL in Visual Studio in a straight forward manner.
What ends up happening is a disconnect between your debug environment and the packages being executed. Your “master orchestrator” package will be running in Visual Studio but it will be starting SSIS packages that are deployed to the SSIS catalog and thus those packages won’t be opened in Visual Studio when they are executed.
Now I hear you saying “But you could still debug those individual SSIS packages on their own in Visual Studio!”. Yes, you are right, but if that master package runs complex logic to retrieve and set values in a “control / settings” database which in turn is passed to the packages it executes then you will spend a great deal of time setting up your environment to represent the state in which the error occurred. Simply put this is time wasted and needless frustration introduced not only into your development process but also your support process.
Are you duplicating functionality that already exist?
Think about this one for a moment, do you have any piece of functionality in your framework that currently exists out of the box in SSIS? If the answer is “Yes” then it is time to get to work to remove these components and use the out of the box functionality instead. Custom logging? Custom ways of configuring your SSIS packages from configuration databases? Custom error handling or restart ability functionality? Do any of these ring a bel?
A lot of this is again a hangover from the SQL 2005 / 2008 days but it is time to let go of it. We have SQL 2016 today and that is basically 3 versions since the SSIS catalog has been introduced. If you are still doing custom logging via some form of script tasks or custom components, then your efforts might be best spent developing a set of reports directly on the SSIS Catalog or alternatively extracting the data from the SSIS catalog and building your own little data mart from it.
While custom logging and custom configuration does not necessarily bring extra debugging frustration it is technical debt. For example, if you are using custom tasks to perform logging then you would need to update these tasks and recompile them before you could use them with a new version of SQL Server. Now this might not sound like that much of a problem but what happens if you use these components over a large set of projects and the time comes to upgrade to a new version of SQL Server? There will be no automatic SSIS package upgrade functionality for your custom components and it will add an extra set of headaches.
In the past I have been at a client where they had 50 instances of SQL Server running in the organisation and other another client where they had over 500 SSIS packages in their ETL solutions in the organisation. While these might sound like extreme examples large organisations do have a proverbial sprawl of ETL solutions and SQL Server instances. A small increase in technical debt per package would be amplified tremendously in such environments.
Is your framework modular?
Can you pick and choose the components from your framework that make sense for the ETL challenge you are trying to solve? Not all ETL challenges are the same and it is important that your framework does not force you to solve all challenges in exactly the same way.
Don’t get me wrong, I am not advocating a Wild West approach to ETL, not at all. What I am advocating is a modular framework that allows you to choose the different components of your framework that have been designed to solve your challenge in the simplest way. This means that your framework might have a specific approach to loading Flat Files and another approach to loading records from a source database and yet another for dealing with Excel files. Each of the individual SSIS packages should still conform to the standards set by your framework and template packages should be available to use as the starting point for development.
Monolithic approaches to implementing your ETL should be avoided at all cost. If the answer to the question “Why is it done this way?” is simply “Because that is the way the framework works.” Then alarm bells should start to ring.
Is it easy to get up to speed with?
If you have a new SSIS developer join your team can you get him up to speed with the framework in a day? If someone with good working knowledge of SSIS cannot be up and running with your framework in a day, then you have an over engineered framework. If it takes days or even weeks for someone to get comfortable with your framework, then your framework is simply too complicated.
Imagine how much more difficult it would be for someone junior to join your team and become productive. It can already be challenging for a junior developer new to SSIS to learn the ins and outs of SSIS itself. If you now increase the complexity with an overbearing framework you will not only take longer to get the developer productive, you might actually cause them to pick up bad habits in SSIS and thus stunt their growth.
Summary
The questions above are by no means exhaustive but they should provide guidance and will serve as an early warning system when you undertake the task of designing or maintaining your SSIS framework. They are not deep technical questions but instead serve to highlight some of the reasons why people build frameworks to begin with namely:
To make debugging easier
To make support easier
To reduce duplicated effort
To apply a standard across all solutions
To increate modularity
To increase development speed
To decrease the barrier to entry
If your framework is not helping you on your way to achieving these goals, then it is time to make the changes necessary to fix this.