Tech News
7 min read

Everything you would like to know about Kubernetes

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of multiple discussions. This is why we’ve decided to sum up facts and thoughts on it and present a comprehensive overview of this platform. This post is dedicated for a non-technical audience that is interested in this tech.

Kubernetes — basic information

The platform’s name etymology comes from Greek and it means helmsman or pilot. The name is also associated, rooted with governor and cybernetic. The platform’s abbreviation is K8s. The 8 replaces the 8 letters from the full name: ubernete.

Source: GetInData, Google. Source: GetInData, Google.

What exactly Kubernetes is? At first, It’s worth to take a look at Kubernetes history. Originally, the platform was developed and designed (around mid 2000s) by engineers at Google, under name Borg, on the top of container technology, containerization. The technology, invented by Linux, is similar to traditional container idea known from shipping business and assumes packaging an application with its critical dependencies, isolated from other, affiliated processes. It is worth to mention that Google was one of the early contributors to containerization and became popular when the Docker containerization project was launched in 2013. Borg predated Kubernetes and the lessons learned from developing Borg, as well as Google’s +10 years of experience with scaling and containerization, ‘paid off’ in the new platform that was introduced to public and open-sourced in 2014.

After this a bit lengthy (but needed) intro, let’s cut to the point and explain what Kubernetes is. This is an open-source source platform for container orchestration, in other words, it helps to run applications packed in containers. Though the process of running apps on a few containers is not a complicated task but if you start scaling, Kubernetes support is in need. By making containerized applications dramatically easier to manage at scale, Kubernetes has become a key part of the container revolution. Now, you can bundle together hosts running Linux containers, and the platform will support you in the process of smooth and efficient cluster management, also in the cloud environment. Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling, like real-time data streaming through Apache Kafka.

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes — specs & features

Let’s move on to Kubernetes specs. The platform has a number of features. Kubernetes provides a container-based management environment. It arranges computing, networking, and storage infrastructure on behalf of user workloads. This sums up to a mix of PaaS (Platform as a Service) simplicity and IaaS (Infrastructure as a Service) flexibility, however it is not a traditional, all-inclusive PaaS system. The platform operates at the container level rather than at the hardware level and delivers generally applicable features known from PaaS menu: scaling, logging, deployment to name a few. Kubernetes is not monolithic and default solutions are non-existent, they’re optional and ready for customization. The platform leaves the door wide open to build developer platforms, but preserves user choice and flexibility. Labels (a tool to add metadata to Kubernetes objects) empower users to organize their resources however they please. Annotations (a similar feature to label, but allows to add non-identifying metadata) enable to decorate resources with custom information to facilitate workflows and provide an easy way for management tools to checkpoint state. What’s more, the platform offers the control plane built on the basis of the same APIs available for both developers and users. Thanks to that, the latter group is equipped with the resources to write their own controllers on their own APIs, that can be targeted by a general-purpose command-line tool.

Although Kubernetes provides its users a lot of freedom for running operations (i.e. it does not limit the types of applications supported) it has some limitations arising from the platform’s idea: does not deliver traditional infrastructure services like deploying code, does not dictate logging, alerting nor monitoring solutions or PaaS offerings like application-level services such as middleware, data-processing frameworks (i.e. Spark) or databases (i.e. mySQL). Kubernetes does not support advanced machine configurations, maintenance and management solutions.

Kubernetes vs. IT challenges

Cloud vs on premise — this dilemma is known for any fast-developing IT company. The migration process is complicated as a future cloud company needs to fulfill a lot of requirements: infrastructure accommodation, security and risk management or data privacy to name a few. Kubernetes gives its users a hand in the migration process as it defines the standard API. What’s more, the same tools (kubectl, helm) can manage a distribution infrastructure both on premise (Openshift) and cloud (GKE). We can also start up our own cluster on a PC (via minikube or minishift) to get some hands-on experience with the platform. But one need to remember that since K8s is expandable, some distributions solve problems in their own manner (i.e. K8s Ingress vs OpenShift Route).

How about storage? There are a few bottlenecks here. The K8s pods are ephemeral and are not a good fit for storing stateful applications (quick reminder: stateful apps are the ones that track the previously stored information which is used for current and future transactions). This all is resolved by K8s ability to connect volumes to pods in order to save the app state, but only a few storage types are supported, mainly only as exclusive write. This makes the transition process challenging, because storage is not yet easy to scale.

From a Big Data perspective, one of the most K8s amusing features is isolation. The namespace concept, based on the CICD idea (Continuous Integration and Deployment), offers a separated environment inside a cluster with access policies defined on the namespace level. This gives a freedom to create different environments (testing, production, development) and use the same scripts to run queries on them. The process of allotting the environments is easy and their full independence is ensured. From a business standpoint this solution is advantageous, the costs are under control as the whole environment is maintained on one cluster. What’s also important, the fact of using the same scripts ensures far more smooth and accurate testing processes. No doubt, isolation is a great feature for a data scientist to run an independent project with a huge computing need.

What else? We also find it helpful that Spark is already available on the platform — it eliminates the need for YARN (app to run Spark), however Kubernetes does not yet deliver all the features available on YARN such as dynamic allocation.

All in all, Kubernetes serves as a big box with lots of tools delivering nice, fancy, and customized solutions, that are not yet refined to fully handle some major, critical purposes like data storage or data transition. The system provides a set of composable control processes that are continuously developed by a huge K8s community in order to suit users desired state. These all gives an already powerful system backed by big corporations, with a great deal of potential in the future. As of now, the platform is not perfect, it has a lot to improve in data storage and transition fields, but we believe it’s only temporary as the K8s project is open-sourced and the community works on its new functionalities and features in order to deliver a more stable and powerful system.

kubernetes
google
cloud computing
big data
spark
31 May 2019

Want more? Check our articles

juni usecase
Use-cases/Project

Retrieving information from SQL databases with the help of LLMs

LLM-enhanced information retrieval Over the last few months, Large Language Models have gained a lot of traction. Companies and developers were trying…

Read more
0 pjPVaAnArwat2ZH8
Big Data Event

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Read more
heroes 3 how to build real company culture

Heroes 3: Office Wars - How to build real company culture

Communication. Respect. Integrity. Innovation. Strive for greatness. Passion. People. Stirring words, right? Let me share a tale with you about how we…

Read more
gidlogopngobszar roboczy 1 4
Tutorial

Cloud data warehouses: Snowflake vs BigQuery. What are the differences between the pricing models?

Companies planning to process data in the cloud face the difficulty of choosing the right data warehouse. Choosing the right solution is one of the…

Read more
getindata blog big data knowledge sharing it jobs

How do we apply knowledge sharing in our teams? GetInData’s internal initiatives

Knowledge sharing is one of our main missions. We regularly speak at international conferences, we contribute to open-source technologies, organize…

Read more
paweł lesszczyński 2obszar roboczy 1 4x 100
Tutorial

Alert backoff with Flink CEP

Flink complex event processing (CEP).... ....provides an amazing API for matching patterns within streams. It was introduced in 2016 with an…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy