K8S Notes

Bookmark this to keep an eye on my K8s notes updates!

Project maintained by kevinsulatra Hosted on GitHub Pages — Theme by mattgraham

Cloud Native Roles & Site Reliability Engineering

Cloud Architect:
- Responsible for adoption of cloud technologies, designing application landscape and infrastructure, with a focus on security, scalability, and deployment mechanisms.
DevOps Engineer:
- Often described as a simple combination of developer and administrator, but that doesn’t do the role justice. DevOps engineers use tools and processes that balance out software development and operations. Starting with approaches to writing, building, and testing software throughout the deployment lifecycle.
DevSecOps Engineer:
- In an effort to make security an integral part of modern IT environments, the DevSecOps Engineer combines the roles of the previous two. This role is often used to build bridges between more traditional development and security teams.
Data Engineer:
- Data engineers face the challenge of collecting, storing, and analyzing the vast amounts of data that are being or can be collected in large systems. This can include provisioning and managing specialized infrastructure, as well as working with that data.
Full-Stack Developer:
- An all-rounder who is at home in frontend and backend development, as well as infrastructure essentials.
Site Reliability Engineer (SRE):
- A role with a stronger definition is the Site Reliability Engineer (SRE). SRE was founded around 2003 at Google and became an important job for many organizations. The overarching goal of SRE is to create and maintain software that is reliable and scalable. To achieve this, software engineering approaches are used to solve operational problems and automate operation tasks.
SRE Metrics:
- Service Level Objectives (SLO): “Specify a target level for the reliability of your service.” - A goal that is set, for example reaching a service latency of less than 100ms.
- Service Level Indicators (SLI): “A carefully defined quantitative measure of some aspect of the level of service that is provided” - For example, how long a request actually needs to be answered.
- Service Level Agreements (SLA): “An explicit or implicit contract with your users that includes consequences of meeting (or missing) the SLOs they contain. The consequences are most easily recognized when they are financial – a rebate or a penalty – but they can take other forms.” - Answers the question what happens if SLOs are not met.
Error Budget:
- Around these metrics, SREs might define an error budget. An error budget defines the amount (or time) of errors your application can have before actions are taken, like stopping deployments to production.

K8S Notes

Cloud Native Roles & Site Reliability Engineering

Cloud Native Architecture