That Tricky Thing Called Automation

Automation of processes is something every serious organization should look into. The other option is to waste your precious human resources on tasks that are repetitive and hardly challenging. That is a depreciation in both man-hours and the employee morale. No one wants to keep on doing the same thing over and over again, especially if the tasks do not require the proper use of the human intellect.

Automation enables the machine driven initiation and management of these tasks and sometimes can even help coordinate the use of manual intervention. Therefore, any organization that doesn’t want to get stuck in the everyday details, automation of processes is a must.

Where to start?

However, automation is not an easy task itself. A manual process is usually rigid and in-automatable right out of that state. This is because the lazy human nature of not fixing stuff unless they are broken. Manual processes produce manual protocols and tasks that will not easily translate to automation directly. They will at the same time produce perfect rationales, when challenged, as to why the manual process should be the way forward instead of any “clever” automation.

This is where process re-engineering comes in. Existing manual processes will have to go through a re-thinking process, analyzing the business rationale behind each step and protocol. It’s these business decisions that should be automated, not the processes themselves.

When analyzing the existing process, care should be taken not to be driven too deep into the thinking process. Deep analysis would result in the person getting accustomed to the way things are being done at the moment and will not be able to think out of the box to re-engineer the process effectively. Going too deep into the rabbit hole results in tunnel vision that hides potential improvements that can be done. Re-engineering should be done with an outsider perspective.

The other side of this coin is Change Management. If the automated process is drastically different from what was there before, what takes a hit in the compromise is the adoption rate. People would rather do the mundane manual task again and again if the automated process is too complex to adopt or follow through. If the automation is similar in nature to what was there before, at least in terms of user interaction, that will help a great deal in helping the users to ease into the process. Change should be gradual and small at each step.

New tools will come into place, replacing or complementing the existing ones. Automatability should be one of their innate functionalities. For an example (a trivial one), if a particular tool doesn’t accept options as command line arguments, that probably is an indication that shows it may not be the best automatable tool. Simple features in user experience will help in great length when trying to use these tools in automated pipelines. The essence of implementing automation will ultimately be piecing together input and output of different tools together.

Should I?

There is an interesting contradiction to be seen in any automation effort. The ultimate aim of automation is to reduce complexity. However the more stories each automated process tries to tackle on, the more complex the automation gets, sometimes in an unbelievably disproportional manner.

The reason for this almost always is the overly ambitious nature of automation efforts that try to assume human intellect when it should have really opted for manual intervention.

For an example, let’s assume a story where a set of Virtual Machine images have to be built and a deployment has to be done using those images. The image baking story can be easily automated with existing tools. The infrastructure creation part of the story is also easily automated. However it’s really tempting to go and try to piece these two parts together into a single pipeline. For the simpler part of the deployment complexity spectrum, this may be suitable, however things quickly spiral out of control as the deployment complexity increases.

It soon becomes evident that these two parts are different stories in their own right. It will be best to avoid trying to piece them together to an automated pipeline. The decisions involved in this “gluing together” part of the story are not easily automatable. This doesn’t mean automation is impossible. It just means the effort it demands and the complexity it introduces to the process is not worth the convenience it provides.

It will be far easier (and simpler) to produce the results from the first story in a readable and parsable manner so that an engineer can read them clearly and take the decisions themselves. The second process can be made to take the results of the decision making as input arguments. Of course this would add a manual step that we would rather have automated.

However these steps are what I like to call “user space” decisions as opposed to “machine space” ones. An example of a machine space decision is to trigger an alert based on soft and hard limits. The user space decision is determining the numbers for these limits after analyzing deployment behavior over time. The argument here is not that user space decisions cannot be automated, rather they are better alone left to manual processes and protocols for the sake of simplicity, engineering efficiency, and user experience.

How do you identify these user space decisions that shouldn’t be automated? Process analysis that is done before the automation effort should provide enough information to determine the high level user space decisions that act as anchor points dividing the process into different automated pipelines. If those are not yet obvious, may be enough analysis of the process has not been done yet.

Another indicator of a user space decision is the ability of that decision point to explode the story into multiple possible scenarios. For an example, in the first example, the decision point where the infrastructure gets created can potentially result in a huge list of deployment patterns.

Trying to cover all of these potential stories with automation will be a nightmare, and the resulting set of automation artifacts will be nearly impossible to maintain and debug.

Making use of modularity

When discussing automation, we can’t ignore the coding aspect of it. There are various tools in this space that allows codification of the processes, deployments, and the configuration, using tool specific DSLs. These DSLs try to be user friendly as possible while being forced to follow constraints natural to their domains. Constructs like variables, method like groupings, and ambiguous terms like class would provide a feeling of familiarity with programming languages.

However caution should be practiced when using these codification tools so that the code is not over-engineered. Refactoring any kind of code for re-usability and coding efficiency is almost the first instinct of any engineer. Some follow an ethos of Clean Codethat others could almost view as being fanatic. This is certainly helpful when programming with real programming languages.

That would not be the case with IaC (Infrastructure as Code) tools. They are not programming languages, but are merely declarative statements that describe the desired state. Trying to enforce programming language practices such as feverish re-usability could make these IaC artifacts infinitely complex and virtually unreadable.

For an example,re-usability can be achieved for CloudFormation scripts using AWS::CloudFormation::Stack resource and an additional layer of scripting on top of the YAML/JSON templates. However it would extremely difficult to debug (especially if you are not the engineer who wrote the templates and scripts) and would be too rigid to update or change.

When considering the compromise between coding efficiency and readability in IaC, it’s always a good bet to chose readability with some level of duplicated code. Unlike compiled/interpreted programming languages, these DSLs only result in a set of API calls to one or more service providers (ex: AWS). Re-usability can not help in optimizing performance aspects, not to mention the fact that performance is not a major concern in automated processes (unless of course your automation eats up memory). On the other hand, readability will greatly help in debugging, maintaining, and updating these artifacts. (As a complementary spike in readability, be generous with commenting and spaces. You may want to describe why each code block is written that way.)

DSLs themselves will offer some level of modular language constructs. These can be used to achieve some level of re-usability. For an example, Terraform provides a level of modularity that enables the reuse of some level. It’s also tempting to add an additional layer of modularity with the help of a set of (ex:) Bash scripts. However, that would mean the IaC code (Infrastructure-as-Code code. I know.) will lose the ability to be run without depending on those additional scripts. Furthermore, someone who has to maintain these scripts will have to learn what the Bash scripts do in addition to learning the tool’s DSL. This will be a huge deterrent for anyone looking to adapt the automation. Despite my “clever” Bash scripting skills, no one wants to waste time understanding the thousand line long script that tries to solve world hunger.

Configuration

If file based configuration is required for the automation, it’s always better to have a single (may be large) configuration file than several smaller ones. This makes it easy to plug the automation into higher level pipelines (manipulating a single file is always easier) and will be version control friendly (all changes are done in a single file which is a huge relief for reviewers).

When possible, configuration options should be accepted as arguments (ex: greedy usage of getopts in Bash). Prompted input should be avoided or kept to a minimum as automated pipelines will not have a tty attached to it. Any 3rd party tool used in the automation process should also follow this principle.

Documentation

Any automation will involve a number of business decisions that translate into the above mentioned set of codified scripts. Unlike the manual processes that were in place earlier, these decisions are not directly visible to new users. Why should the deployments be done on a single region first? Why should these files be copied there? Why should this file be omitted from version control? Why do update calls have to be done through the UI?

In a manual process, questions like the above can be discussed between engineers and understood (with a considerable time spent in the learning curve). Automation removes this interaction. Most design decisions will be taken by the engineer who does the automation and the rationale behind them will soon be forgotten because there is no explanation done later.

This is why documentation on the automated process should be unquestionably extensive.

The tricky part about digital communication is that the reader never reads the material in the same state of mind, knowledge, or vision (or voice) it was intended to be read with. This truth, combined with disorganized documentation will result in a set of really confused users who would not understand what to do when.

Another issue in documentation is the tunnel vision. An engineer working on an automation effort will develop facts that are “obvious” to them over the duration of the development. These can be file locations, encryption algorithms, prerequisite tools, or even something simple as the Bash version. During documentation, these “obvious” facts get omitted because “duh, you should’ve known that”.

The best way to avoid these pitfalls is to get the documentation into dogfooding right after the first iteration. Users should be allowed to solely depend on the documentation when implementing/working with the automation and they should be closely monitored to understand the steps they struggle the most with. Those struggling steps indicate where the documentation is lagging and could also indicate the mindset of the end users.

Outro

One of the best principles to follow in Software Engineering is Pareto’s principle (better known as the 20–80 percent rule). 80% of the outcome will be addressed by 20% of the effort.

The same is true in automation. Almost always, 80% of the user stories will be covered by a simple 20% of automation. On the flip-side, the rest of the 20% will demand four times the effort, which will simply turn out to be not worth the effort.

Developing automation should be done with coding principles that are different, sometimes drastically, from standard software development ones. The priority should be for readability that directly translates into adoption rate and maintainability. The more complex the automated process is, the more resistant it will be against change, and the more likely the automation will get replaced with a manual process.

The ultimate goal of automation is to make the engineer’s happy. It’s always a good option to keep an open channel from the possible end users to understand if the automation process would be suitable to their needs.

There will be resistance against change, however, those should be managed by the complexity of the change. People prefer smaller iterative change, rather than an apocalyptic level of process replacement, even if the change means easier work.

Written on July 20, 2018 by chamila de alwis.

Originally published on Medium