Tutorial

7 min read

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

Please dive in the second part of a blog series based on a project delivered for one of our clients. If you miss the first part, please check it here.

PART I

Problem description
General description of the solution
Problem 1: Limited job output size in GitLab
Problem 2: Limited duration of jobs running on shared runners

PART II

Problem 3: Building a container image in the job
Problem 4: The GitLab Registry token expires too quickly
Problem 5: In the paid GitLab.com plan we have a limit on the shared runners used time
Problem 6: User's names with national characters in GitLab

PART III

Problem 7: Passing on artifacts between CI/CD jobs
Problem 8: Starting docker build manually
Problem 9: We cannot rely on the error code returned by Puppet
Summary

Problem 3: Building a container image in the job

In several of the CI/CD jobs in our project, we create container images and upload them to GitLab's Registry. Building container images in GitLab's CI/CD requires infrastructure preparation, or we have to use a tool other than Docker. If we want to build images using Docker, then we must give the Docker client access to the Docker daemon socket, which is not recommended for security reasons. If we want to use shared GitLab.com runners, access to the daemon socket is not possible.

We can solve the problem in several ways:

Using an executor in a runner that does not work in a container and which has access to a Docker socket (e.g. shell executor). In this case, we can't use GKE or shared GitLab.com runners. We need to install, configure and maintain such a runner ourselves.
Using Kubernetes or Docker executor, but with Docker socket mounted inside containers. The solution requires an independent runner, prevents the use of shared runners and is bad from a security perspective.
Using the buildah tool to build the container image inside the container and the skopeo program to upload the image to the GitLab Registry. Both programs are part of the https://github.com/containers project and do not require a privileged daemon norrootaccess (unlike Docker).
Use kaniko. Kaniko is a tool dedicated to building container images in Kubernetes and in containers. Like buildah, it doesn't require special permissions. An example of its use is described in the official GitLab documentation.

Our team has already had experience with Kaniko in other projects, so we made use of it in this project.

The definition of a job using kaniko in the .gitlab-ci.yml looks like this:

build-base-image:
  stage: prepare
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo 
"{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile 
    $CI_PROJECT_DIR/Dockerfile --destination 
    $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG

As you can guess, after studying the above code fragment, Kaniko builds a container image and immediately uploads it to the indicated image registry.

Problem 4: The GitLab Registry token expires too quickly

GitLab has an integrated container image registry (Registry). A one-time password is created for each job, which allows you to use Registry without having to manually create a dedicated account. GitLab passes the login and temporary password to the Runner, which then sets them in the environment variables of the job process.

This is the end result of a successful job output that creates a container image using Kaniko and uploads it to the Registry built into GitLab:

In our project, some jobs take a very long time, longer than the lifetime of a one-time Registry password. Limiting the duration of such a one-time password is necessary from the security point of view.

Uploading the image to the Registry fails and ends with the following error:

We solved this problem in a traditional way:

We created a dedicated technical account in GitLab.
We gave the new account project privileges.
We logged in to GitLab on this account and created the personal access token. This token doesn't have an expiration date.
In the project, in the CI/CD settings, we created two new variables containing the login and password (token) of the technical account respectively
We modified the job so that Kaniko used a dedicated technical account to perform operations in the Registry.

After these modifications, we no longer had problems uploading container images to the Registry, even if the CI/CD job took 3 hours.

Problem 5: In the paid GitLab.com plan we have a limit on the shared runners used time

If you use paid GitLab-as-a-service plans, you can use a certain number of minutes for the CI pipeline, i.e. shared Runners. Each of the paid plans has a different limit. Using shared Runners is very convenient because we don't have to worry about maintaining our own Runners and thus save time and money. The limit may be sufficient in some projects, but in our project we quickly reached the monthly limit.

You will see this type of message when you use the available time of shared Runners in a given month:

We solved this problem by setting up a dedicated Kubernetes cluster for Runners. We allow GitLab to decide which Runner to use (shared or our own). Thanks to this, the load is distributed to both types of Runners, we reduce expenses and shorten the time of pipeline execution.

Instructions for using Kubernetes cluster as the platform for Runners are described in the Problem 2 section.

Problem 6: User's names with national characters in GitLab

If the runner or container we use in our CI/CD has locales that do not support UTF-8, and the user making commits to the repository that has characters that are not ASCII in the name or surname, then the CI/CD job may end with the following error:

FAILURE: Build failed with an exception.

* What went wrong:
Could not set the value of environment variable 'GITLAB_USER_NAME': 
could not convert string to current locale

There is even an issue with this in this GitLab project.

A workaround is also provided there. You can change the value of the GITLAB_USER_NAME variable so that it doesn't contain non-ASCII characters. It can be assigned, for example, to the value of the variable containing the user's login (assuming that the login consists only of ASCII characters).

To the before_script section in .gitlab-ci.yml, add:

  # Workaround for "Could not set the value of environment variable 
'GITLAB_USER_NAME': could not convert string to current locale" problem.
  # https://gitlab.com/gitlab-org/gitlab-foss/issues/38698
  - export GITLAB_USER_NAME=$(echo $GITLAB_USER_LOGIN)

Next part of the post

Follow our profile on Linkedin and stay up to date for the next part!

big data

kubernetes

google cloud platform

cloud

CI/CD

Last updated: 11 August 2020

Written by

Maciej Korzeń

DevOps

Like this post?
Spread the word

Want more? Check our articles

Use-cases/Project

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…

Big Data Event

Overview of InfoShare 2024 - Part 1: Demystifying AI Buzzwords, Gamified Security Training

The 2024 edition of InfoShare was a landmark two-day conference for IT professionals, attracting data and platform engineers, software developers…

Tutorial

Data Quality in Streaming: A Deep Dive into Apache Flink

The adage "Data is king" holds in data engineering more than ever. Data engineers are tasked with building robust systems that process vast amounts of…

deep learning azure kedroobszar roboczy 1 4

Tutorial

Deep Learning with Azure: PyTorch distributed training done right in Kedro

At GetInData we use the Kedro framework as the core building block of our MLOps solutions as it structures ML projects well, providing great…

Tech News

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Nowadays, we can see that AI/ML is visible everywhere, including advertising, healthcare, education, finance, automotive, public transport…

paweł lesszczyński 2obszar roboczy 1 4x 100

Tutorial

Alert backoff with Flink CEP

Flink complex event processing (CEP).... ....provides an amazing API for matching patterns within streams. It was introduced in 2016 with an…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

Table of Contents

PART I

PART II

PART III

Problem 3: Building a container image in the job

Problem 4: The GitLab Registry token expires too quickly

Problem 5: In the paid GitLab.com plan we have a limit on the shared runners used time

Problem 6: User's names with national characters in GitLab

Next part of the post

Like this post?Spread the word

Want more? Check our articles

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

Overview of InfoShare 2024 - Part 1: Demystifying AI Buzzwords, Gamified Security Training

Data Quality in Streaming: A Deep Dive into Apache Flink

Deep Learning with Azure: PyTorch distributed training done right in Kedro

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Alert backoff with Flink CEP

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!