I work in a small engineering team in an office all by myself. The rest of my team are in another city. While we have video conferences, email, chat, phone calls to keep connected, generally we work quite independently of each other.
As an introvert, it’s fantastic!
Day to day I can get a lot done without any distractions. However, it can be limiting doing everything by yourself.
The situation
A few years ago during the pandemic I had a lot of time at home which I spent doing some study. Terraform was (still is) a hot commodity, so I wanted to learn a new skill. I started spinning up resources, tearing them down, building pipelines, breaking and fixing things. Fast forward a few months and a project at work came up that needed resources on demand (or so I thought).
I went to work building a solution, I did all the usual things; engage with stakeholders, gather and record requirements, put together a design, build a proof of concept, an initial build, iterate through testing and making improvements until it was finally built and in put into production.
The problem
Who’s going to support this?
At first the system did what it was supposed to and everyone was happy. I got praised for trying something new, my manager was showing it off to everyone, I even got a promotion.
However I missed one vital step, peer review. I didn’t consider how others in my team would have approached the design. How would they have built it? What technology would they would be comfortable working with?
When you get so caught up in the technical details, you tend to loose sight of of the big picture. In my case, I tend to over complicate things while making too many assumptions. I was so excited to use the new shinny thing I had been learning that I didn’t really consider if it was the right fit.
- Is Terraform a good infrastructure as code (IaC) tool? Sure.
- Is IaC a good solution for something that’s only about 5 or so VMs? Maybe.
- Is IaC a good fit in a small team with varied backgrounds? Maybe not.
I didn’t really consider the resource cost of the thing I was building.
- How much time does it take to fix up your code to accommodate multiple regions?
- What happens when we want to update the base image?
- What happens if I get hit by a bus and someone else needs to change something?
Fast forward a couple years and the system hadn’t scaled beyond the initial build, it had been running mostly without any maintenance, and due to the design choices, no one wanted to touch it to make any changes because it would take them a day or two to get their bearings.
Remediation
Just because you can automate something, doesn’t mean you should.
I had to dial it back a bit to get it into a state that would be easy for anyone in the team to pick up and work on it. I started by asking what do we actually care about?
- Do we need to scale things up/down, in/out? No not really.
- Do we need to build repeatable infrastructure? Nope.
- Do we need to track changes to our infrastructure? Sort of, but not really.
For such a small solution, Terraform was overkill. It had been built in such a way that removed any flexibility from it. If it takes me 10 minutes to point and click my way through setting up VPC peering versus a day of writing and testing code, there’s not really any point in automating it. I rebuilt the infrastructure manually, creating some launch templates and simple scripts for the routine tasks that we were doing with Terraform.
After re-writing the configuration management component to be in line with our standard Ansible roles we use, and then updating the documentation, the project was now in a state that was a lot easier to manage for the rest of the team.
Insight gained
- You and your team are a stakeholder, what do you want out of a project?
- Collaborate - one person didn’t get us to the moon. Working in collaboration with others will help remove biases that might hinder the success of a project and open new trains of thought and creativity.
- Keep it simple - an elegant solution is rarely the one that’s the most complicated.