This article is the first in a series of articles intended to cover the ‘why’, ‘what’ and ‘how’ of Platform Engineering.

Platform engineering, like all other major efforts to improve outcomes in the field of software engineering, eventually it all comes down to sustainably creating quality software at speed and scale.

In the spirit of first principles, let’s briefly examine why we care about these ‘whys’.

Teams and organisations want to be able to move fast, at least faster than their competitors. The benefit of speed to an organisation can be fairly intuitive from a business lens. “First mover advantage”, “time to market” and other common business parlance encapsulate the value of moving at speed. In the specific context of software engineering, arguably the greatest benefit of speed is when it is paired with agile software delivery. Being ‘agile’ implies starting with an assumption, acknowledging that what we think is probably wrong, making small bets, validating with real world feedback, and iterating towards desired state. These cycles of hypothesis, experimentation and adjustment lead to solutions that work. To arrive at a winning product, an engineering organisation needs to out-experiment their competitors. Teams and organisations that create faster, measure faster, iterate faster are more likely to create the winning product. The same can be applied to arriving at what architecture, design and tools work best. Speed of experimentation determines an engineering organisations success.

A lack of quality implies higher costs. For example, failures in customer facing environments are usually associated with a monetary amount — an e-commerce web platform that is unavailable for 10 minutes may lose thousands if not millions in revenue. However even more importantly, quality enables speed! It’s almost intuitive to think that to achieve quality we must take a slow, careful and deliberate approach or that we can sacrifice quality to go fast, release more features. In software engineering, this is a false choice. There is no speed without quality, and organisations with high quality always work at high speed. We know from data driven surveys done by the State of Dev Ops Report and their findings that elite organisations combine both high ‘deployment frequency’ and low ‘lead time’ with low ‘change failure rate’ and low ‘mean time to recovery’.

For now at least, writing software is a creative process still within the domain of humans. And while that remains the case to be able to produce quality software at speed and scale, we need to think about the people and teams creating that software. Factors like autonomy, motivation and cognitive load need to be taken into account along with competencies when creating an engineering team, even more importantly when structuring and engineering organisations. A high performing engineering organisation could be brought to a grinding halt from their own growth and success. As the complexities and scale of their organisations grow, the ability of their teams to function autonomously and at sustainable cognitive load determines if the organisation can keep producing quality software at speed.

Scale, unlike the others factors here, sets the context in which we want to achieve the above outcomes as opposed to a desired outcome itself. The methodologies that determine success at a scale of individuals doing side projects may be very different from practices that enable large engineering organisations to succeed. Organisational constraints and context matter, and perhaps that’s why the only answer consultants can give clients that always applies is “It depends”.

Platform products, or what we better know as PaaS (Platform as a Service) like heroku and can be wonderful enablers at a certain scale, but large enough organisations need to vertically integrate their resources so that they can own cost, security, compliance and other governance concerns. Large enough organisations cannot afford to be help captive by product limitations and constraints while their competitors innovate.

Platform engineering practices often require initial investment, and the return on this investment often scales with the engineering organisation up to a point where these practices become a must-have to produce quality software at speed and scale.

Hence large organisations need to adopt practices, patterns and processes that allow multiple teams, across multiple complex subsystems, across multiple business domains, to operate with manageable cognitive load and autonomy to produce high quality software at speed.

Read More