Smart Scaling: Four Crucial Points to Remember When Using Cloud-Native Technology to Expand Your Infrastructure

Using cloud-native technologies to grow your infrastructure successfully: 4 essential tips

0 49

To assist engineering managers and backend teams effectively increase their operations while avoiding some of the most typical errors and rash choices, I’d want to offer some high-level lessons in this post.

The main backend engineer at Streamroot, Jordan Pittier, previously wrote a piece that was followed by a presentation on our experience at HighLoad Moscow in November. These discussed our journey from a VM-based to a container-based design, as well as the difficulties we encountered transferring our infrastructure to Kubernetes running on Google Cloud.

Description and Background

Let me begin by briefly introducing Streamroot and the reasons we take the effort to adjust our kubernetes Engine architecture in order to make it more robust as well as scalable.

Media organizations, television networks, and video platforms are among the primary content owners that Streamroot, a technology supplier, supports. Our peer-to-peer video transmission service integrates with broadcasters’ current CDN infrastructure and delivers better quality and reduced prices.

Scaling to the record-breaking FIFA World Cup viewership last year was one of our (and our clients’) major difficulties. With 22 Tbps on peak and more than twice the previous traffic record established by the Super Bowl, the 2018 World Cup proved to be the biggest live event of all time (1).

For TF1, the biggest private broadcaster in France, as well as national television networks in South America, Streamroot provided World Cup coverage. We wanted to build our own Kubernetes engines and be able to expand more quickly in order to be able to service our clients at this size. We required:

handle a lot of traffic, processing a million requests per minute for our backend
Scale to enormous surges in only a few minutes at the start of each World Cup game.
Make sure your backend is completely resilient, 100% fail-proof, and strong enough to endure any failure.
You can view Jordan Pittier and Nikolay Rodionov’s lecture at the HighLoad++ conference in Moscow as well as our presentation slides here if you’re interested in learning more about our scaling experience over the past few months.

1. Not everything that is new is good: Be cautious while using cloud-native technologies.

Since joining the Cloud Native Computing Foundation (CNCF), Kubernetes has experienced exponential growth, and interest in this complicated, open-source cloud-native solution is rising. More than 8,000 guests from all around the world met in Seattle for The CNCF’s KubeCon+CloudNativeCon in December.

One of the Cloud Native technology subsystems is Kubernetes. There are several other components, some of which are a part of the CNCF foundation (https://landscape.cncf.io/), and some of which are not, such Istio.

Since cloud native technology is still in its infancy, several new components—including those for storage, security, service discovery, package management, and other areas—are emerging on a monthly basis.

Our recommendation is to use these new components carefully and to keep things basic (i.e. dumb). These technologies are cutting-edge but still in their infancy and are developing quite quickly. Unless they are driven by a genuine need, there is little use in trying to adopt all the newest, flashiest technology, especially in manufacturing. Even if you have a big team of skilled engineers, you still need to consider the expense (in terms of resources and time) of operating, maintaining, and debugging these novel technologies, which occasionally exhibit instability.

2: Manage your spending

Your main concern is probably not money but having a successful migration when beginning a large project like transferring your service from a VM-based to a container-based architecture supported by Kubernetes. The cost of your backend should be considered right away even if it might not be an immediate or medium-term worry. I strongly advise tracking your Kubernetes Engine scaling expenses as soon as you can for the following reasons:

1. To have an accurate understanding of your software’s effectiveness and resource utilization. Delivery is a backend team’s top priority, and it may be challenging for managers to communicate the value of effective software and resource management.

2. To identify areas where your architecture needs to be improved. Our ability to find improvements in our design was aided by triangulating the data from monitoring and cost progression. By just better adjusting our instances to our utilization and comprehending how the resources are used and consumed, we were able to cut our expenditures by 22%.

3. My third reason is illustrated by the fact that GCP offers sustained use discounts, which are substantial reductions for long-term committed instances. For instance, you receive a flat 30% reduction if you commit a resource for a complete year (for once, it’s really good to see the bill at the end of the month!). These reductions for a three-year commitment might reach up to 57% (!). Of course, I advise delaying any decisions for at least six months so that you can determine your typical CPU and RAM usage.

Be at ease! To efficiently track your expenditures, you do not need to be a specialist in invoicing or corporate finance. If you want to track your monthly consumption, for instance, you may set cost alerting per project by default and then use the CSV export to input the data into your preferred spreadsheet program. As an alternative, you may enable the Bigquery Billing Export option on GCP to export all the specifics of your resource usage each day. Then, spend a few minutes creating a straightforward dashboard using Excel or a SQL export (don’t forget to ask your engineers to properly configure the resource names so that you can distinguish between the various lines).

3: Keep your manufacturing secure by isolating it.
Numerous blogs and articles advise using only one K8s cluster while using various namespaces for your various environments (such as development, staging, and production). Namespaces are a highly potent tool that may be used to organize your Kubernetes resources and boost team productivity. Simply put, everyone makes dumb errors, and the more idiotic the mistake, the greater the likelihood that it will occur. As a result, if you choose the namespaces option, you MUST invest a few weeks in creating a top-notch CI/CD workflow unless you want to spend your most stressful days putting out fires in production because you pushed a staging build there.

On our end, we made the decision to construct entirely independent clusters for our staging and production environments as a different way to keep our environments distinct. Due to the total isolation between the two clusters, any potential for human mistake and the spread of security flaws is completely eliminated. The drawback of this strategy is that it raises your fixed expenses because more machines are required to maintain both clusters. But for us, the security and peace of mind it provides are well worth it.

Additionally, adopting GCP’s ephemeral instances, which are 80% less expensive than standard instances, can help you reduce the cost overhead. Of course, there is a catch: if Google Cloud requires those instances for another client, they may be terminated at any time. However, since we just utilize them for our staging environment, losing one computer doesn’t really affect us. In fact, we’ve learnt how to make the most of it. A type of completely unpredictable red team aiming to bring down the system, provided to you free of charge by Google Cloud, is the ideal test for us to examine how our architecture responds to a random failure of one of our components.

4: Start by streamlining and automating your workflow.
The final thing you want to consider before beginning a new project is how you will share your code with other developers or how you will push your builds between production and staging in the event that you need to do an urgent rollback. There’s no need to over-optimize before you have created something that you can display to the public, which is both natural and extremely prudent. On the other hand, it’s a typical error to ignore these issues indefinitely since you have to deploy the next addition that would enable your product to finally overcome the gap and miraculously attract millions of consumers. To build a straightforward and effective procedure as soon as feasible is what I would advise.

Leave A Reply

Your email address will not be published.