DevOps is undoubtedly a hot topic for many professionals in the tech industry. Due to the progression of agile software development, a high number of companies have leaned towards its integration into their work process. Transitioning to DevOps requires a change in culture and mindset while the path to its successful implementation can be paved with many twists and turns. That’s why we at YLD have decided to uncover some of the challenges as well as the opportunities of integrating DevOps into the business. If you are thus trying to get a better handle on the DevOps culture this article will shed some light on the topic.
What is DevOps and what do we use it for?
To start with, DevOps is a conjunction of two terms: Software Development (Dev) and IT Operations (Ops). It is common for the tech industry to regard DevOps as a role and it’s not entirely correct as DevOps is a culture around developers.
Some ten or even fifteen years ago, large companies used to have departments where SysAdmins (System Administrators) were responsible for all the infrastructure and for the deployment to production. Some referred to a SysAdmin as a star role in a small to medium enterprise, being a crucial element of the internal IT landscape; and yet there were some problems related to it. First of all, there has been a long lasting disconnect between the owner of the application and the developer. The second was that if you build something, you don’t want someone else to run it — you want to run it yourself because if something goes wrong you know how to fix it. As digitalisation came into view and we moved from a data centre to a cloud model, SysAdmins started managing infrastructure in the cloud. It is in this setting that the problem of people calling ‘DevOps’ as a role, and not as a culture around deploying codes to production, had started.
DevOps is a developer that has the power to understand and to create operations on infrastructure in order to deploy their codes into production. With time, some companies have started to refer to DevOps as a conjunction of two teams — development and operations — that work together to optimise both the productivity of developers and the reliability of operations. They strive to increase efficiencies, improve the quality of services they provide to customers and ensure regular team communication.
DevOps brought flexibility to the development process and it allowed companies to create platforms where developers would be able to deploy their codes without anyone’s help. For large enterprises it also meant that a single deployment could take ten minutes and not the whole month as it may have before. This also empowered developers to be more focused on building as well as delivering processes, and not only on the building one. Thus, DevOps has brought a cultural shift that bridged the gap between development and operation teams, traditionally regarded as siloed forces.
The challenges of implementing DevOps
For many developers the shift from the SysAdmin mentality to a DevOps culture wasn’t an easy one to get used to. A great number of developers have questioned whether they can adapt their technical skills and knowledge to a new workflow that has been abruptly changed. We believe that the answer to a slow and steady adoption of DevOps might lie in the way you approach this new culture. Therefore, one of the most important steps is to regard DevOps as a set of practices and not as a single job position. This will help to reduce the amount of possible mugs and errors, strengthen cohesive teamwork and ensure constant testing and retesting to secure the whole continuous integration and delivery (CI/CD) process.
In the early days DevOps was also quite difficult to implement as there were a lot of new players on the market. At the same time as this culture of operations came into play the physical data centre was disappearing with a cloud one emerging, followed swiftly by the arrival of big names such as Azure, AWS and Google. These cloud services started to provide infrastructure in a very simple way that previously was not happening at all, meaning that the tooling had started to change. As the majority of those shifting into senior management positions had been developers before, they were able to bring their expertise to create new tools that interact with those code providers. This meant that there was a greater demand for a software tool.
What has been happening in the last few years is that if a software tool was an open source model you would usually create a service based on it. Let’s look at the example of Kubernetes, an open source container orchestration platform that provides a very simple approach without any need to build an infrastructure. The service was so successful that big companies such as Azure, AWS and Google agreed on running it. The challenge that followed was to understand what tools one should choose, and how they could be useful without becoming something that, whilst cool, would ultimately give more work.
There are, and always have been, a lot of trends in the tech industry. It is important to know that if you choose the wrong tool, simply because it’s in line with a trend, you can get yourself into an unfavorable position; the need to use migration software is for instance one of the possible outcomes (and migration in infrastructure is usually pretty expensive). That’s why it is crucial to choose the tool that suits your business and not simply because everyone else is running it. Not every small problem requires big investment, so bear in mind the old saying: “If it ain’t broke don’t fix it.”
Monoliths vs. microservices
Earlier on, a monolith (also known as a monolithic application) built as a single and indivisible unit was the “in thing”. Then the move was made towards microservices, or microservice architecture (as it’s also known) — an approach that arranges an application as a collection of services. In software engineering microservices are often defined by the famous “two pizza rule”, introduced by Amazon’s CEO Jeff Bezo, which states that any internal team should be small enough that it can be fed by two pizzas. As light-hearted as this approach might seem it highlights the importance of multi-functionality and communication within small groups as well as the need to focus on two main goals: efficiency and scalability.
If you are building a service it is imperative that you are in constant communication with your team throughout the whole process; otherwise you will have to create a release and synchronise contracts between services. A team that is more mature is definitely preferable to find the most effective way to deal with this situation. As a result some tools appeared on the market as, for instance, service mesh — Linkerd and Istio. A service mesh — also known as a dedicated infrastructure layer built right into an app that documents the interaction between its different parts — is actually a very ‘in’ topic right now. This feature allows the optimisation of communication and helps avoid downtime as the application grows.
With time it became evident that microservices required more effort than monoliths. For instance, microservices are time consuming as you need to deploy each one of its elements independently, as well as to look after orchestration tools and manage to unify the format of your CI/Cd pipelines if you want to reduce time. Some companies like to have high numbers of microservices and it can create a very stressful environment and generate unnecessary database, cache and other related errors. Therefore, we believe that over the next two or three years some of the companies will go back to monoliths for two main reasons. Firstly monoliths are easy to manage as they use a single code base for its services and functionalities. Secondly they create less structure when you have product releases due to the fact that you don’t have to synchronise and test everything.
Security requirements
We live in a world where users give huge importance to the information that companies store and all the security measures related to it. If companies are not managing data properly there will be parts of the system that will have data that should not be there. The same thing applies for security. The fact is, it is very tricky to have a distributed system with lots of microservices; the move from a monolithic architecture to a distributed system takes place not only in the application space but also at the data store and hence managing your data becomes one of the hardest challenges. To progress forward you need to have a proper guideline from your company. If the received guideline doesn’t help, you will need to implement processes to make sure that no data error can go into production. With this in mind we would recommend having something like a production checklist to try to prevent that from happening at all.
During the last fifteen years, big data companies wanted to store everything. As a result, there are a lot of systems that have data they shouldn’t have. If you are a single developer or a small start up having data can be easy to manage. However, if you have hundreds of microservices you will need to have some sort of optimisation to validate if they are doing anything untoward with the data or security. There are however technologies that can help with that.
DevOps culture integrated with businesses at a very fast pace and many companies didn’t manage to get up to speed and create the appropriate security tools needed at the rate needed. With this in mind, DevSecOps was created to integrate security into the DevOps process. It’s a known fact that development and security teams might not get on well, as speed can overbalance quality, and hence security measures can sometimes be left behind on the DevOps process. That’s why to ensure a successful DevOps process as well as its smooth deployment without errors it is vital to integrate safety practices into every step of the workflow. Frequent control can reduce or even nullify errors in deployment. From inception to maintenance, all team members must strive for a collaborative work atmosphere and share responsibility within the process.
The ‘Shiny Objects’ syndrome is quite common in the tech industry — you look at the technology, build and put it into production and only afterwards do you discover you have problems. These problems are usually related to one of two things: firstly technical debt and secondly, as with any new system, the fact that you will have people working who are not yet savvy with running the system. Having to upskill your team while simultaneously fixing the technical debt and maintaining those systems will become a harsh reality. Big enterprise companies, a bank or similar, have time and resources to think and plan beforehand. If, however, you look at the tech industry, there are many budding startups and companies that implement various systems and only afterwards realise they have a problem. Rushing into deployment might be very risky for your team so as the old saying goes: “Measure twice, cut once”.
However, more importantly is for your team to understand how the platform reacts to all kinds of inputs, whether it can be installed and run in its intended environments as well as if it can achieve the desired results. Your team will deliver the best result with the information provided to it, measuring and evaluating in order to react, iterate and improve to respond to users in an increasingly efficient manner.
Teamwork and the importance of setting a common goal
Sometimes we can easily get carried away by our personal tasks and goals; hence, those developing platforms or codes, whilst managing their infrastructure or services, should not disregard the importance of the final result. Setting a common goal is one of the most important tools to guarantee progression and successful results. One important step is to have a member of the operations team working in development and vice versa. This interchange might give a better perspective on work of both teams and can help to avoid future problems and misunderstandings. In line with this, executive support plays an important role to ensure proper guidance, especially in the early stages. Throughout the years we realised that there is no single “right” way to run DevOps. Failures are part of the progression and your response to them is often more important than the fact that they occurred in the first place. On that ground, instilling a culture of constant improvement is a highly valuable practice among leading companies.
The importance of procedures
We all know a classic developer and designer conundrum — different backgrounds, mindsets and approaches towards the work might stand in the way of delivering successful projects. For instance, a designer might want to add interactive elements to a website for a better user interaction while a developer might find it unnecessary and heavy to load. The same applies to a cloud infrastructure engineer and developer relationship, especially those who work at the companies that facilitate more legacy systems. Although this tendency is unlikely to be changed anytime soon, it is important to keep looking for ways to prevent this from happening. If you look at the world right now you could have, say, one team in Beijing and another team in New York. As we are very much aware, this physical separation as well as language and culture barrier can make it more difficult to have people develop meaningful work relationships. One way to progress through this is to have solid procedures in place — if something goes wrong you just need to follow the on-call procedure, for instance, to have the issue dealt with quickly and efficiently.
With this in mind, your team really needs to understand, and respect the rules and procedures that need following. When you have a member of your team who is on call it is vital that they know these ways of working inside out and have a complete comprehension of their role and responsibilities. Always encourage them to ask if they are unsure of something — this is important to their own learning, and the development of the team as a whole. Unfortunately the majority of companies’ on-call procedures lack a vast number of things and with this there is massive scope for things to go wrong. Imagine that you have these procedures, but one of your team doesn’t have a call manager and a P1 incident (priority one) happens. During this call you have ten people trying to deal with this, because you have ten different systems, and if everyone starts to talk at the same time the call will not progress due to the noisy and chaotic environment. In order to manage this you will need to have someone to take control of the call.
A vital take away? Procedures, procedures, procedures. When this is firmly in place it allows egos and personal views reserved for pre-building the system. It can be something as simple as a ‘decision document’ that is used to present the pros and cons for the options related to technology, architecture or any other industry. It’s really important to have discussions and debates that spark from the development process. Working remotely actually helps because people need to continuously keep record and document everything they do to keep the rest of their team in the loop. In summary, our advice would be to really educate your team on procedures, and why it is so vital to leave the egos at the development level and not include them at the production stage.
Tips and tricks on DevOps
Once you have decided you want to adopt the DevOps culture the first piece of advice we would give is to read Google’s SRE Book and find the information that would be useful for you and your team. One major point the book presses on is the complacency of people. People deploy to production, check that it’s running and then go to get some coffee, not thinking about it any more. This is exactly the opposite of what you should be doing. It’s when you deploy to production that it becomes critical for you to work a lot harder and stay present. As you deploy the service to production you need to make sure that certain processes or certain things are guaranteed to be in place. This is where your production checklist really becomes vital in order to be mindful of key parts that could potentially break in production which you therefore need to validate before deployment.
There are processes, the SRE Book explains, that can help you follow important steps to guarantee the success of your product. One thing it talks of is categorising the severity of failures. If something in the middle of an approach goes wrong, say with the internal system, you don’t need to wake someone up to fix this — you can simply wait until the morning. If, however, it’s something from the main company website then that will need to be addressed immediately — you will need to have on-call procedures as well as metrics to monitor this. If such matterics as SLAs and SLO are below a threshold you should have someone take a look at them right away. Live by the famous Amazon mantra: ‘You build it, you run it’. The work itself starts from the moment your application hits production and from that point on it is your responsibility to make sure it works 100%.
…
Speed, reliability, strengthened security and rapid delivery are some of the most important benefits of implementing DevOps culture as opposed to running a traditional software development and infrastructure management processes. It has been an important tool for many companies to increase the quality of their service as well as compete more effectively in the tech market. There are many challenges that DevOps adoption can bring to your business and that’s why it is important to focus on three main things: people, technology and process. If one will ensure the best practices for these three components, DevOps will definitely bring many opportunities to take your business to the next level. Among other enterprises, our team of experienced developers at YLD have helped a global mass publishing house and Kingfisher to grow a collaborative DevOps culture establishing, for instance, a more efficient pipeline and smooth migration process for their businesses. Feel free to contact us if this set of practices is something your company is looking for or wants to discuss.
Last but not least, we at YLD would like to thank our software engineer Renato Castro for his insights on this topic which presented to be a very valuable contribution to create this article.