Tutorial
6 min read

Data Design Pattern: Medallion Architecture - is it really a new way of doing things?

In this blog post, you will learn what medallion architecture is, the characteristics of each layer of this pattern and how it differs from the classic data warehouse layers. 

What is medallion architecture? 

Medallion Architecture (also known as multi-hop) is a data design pattern that gained popularity over the last couple of years. Its main focus is to logically organize data in a lakehouse with gradual improvements in quality by progressing between the layers, taking into account the benefits that lakehouse gives.

table-deltalake-getindata

Raw layer/Landing zone (optional)

This layer is optional as it’s not officially a part of medallion architecture, but indeed necessary and often overlooked when sketching the architecture. It’s simply a storage account/bucket/container or a folder that temporarily stores raw files extracted from the source systems in different formats. 

Its optionality comes from the possibility of pushing the data directly to bronze from different data extraction tools (ex. streaming data).

getindata-landing

Bronze layer

The bronze layer, also known as the raw layer, is a place where we store the data in it's original state (both batch and streaming), meaning that the data should be immutable (however, if the data is fully reloaded every day, data deletion might occur).

Its main purpose is to have a place to store the whole history of data at one time, providing the ability to reprocess if needed without rereading the data from the source system.

No data transformation is allowed here, only adding metadata columns (such as input file name, ingestion date etc.) is permitted.

This layer serves mostly as a technical layer. However I've encountered situations where business users might benefit from the history of the data - especially when they can compare the state of the source tables at one time.

bronze-getindata

Silver layer

Before ingesting the data into the silver layer, the data should be deduplicated, fixed and in the correct format. It’s also a place where data quality rules can be applied.

In simple words, the silver layer contains cleansed and conformed data that is ready to be consumed by operational, analytical and machine learning workloads.

A few points to keep in mind when thinking about the silver layer:

  • Its purpose might be different, depending on the needs. For some workloads it might serve as a current snapshot of a source system, but for others it might be a historical snapshot of the source system with slowly changing dimensions implemented,
  • Enriching with reference/master data is permitted, however it adds a little bit of complexity due to dependency on another source system,
  • Unioning can happen within the tables from the same system, but from different instances (e.g. multiple ERP instances that are within the same organization but serving different organization units - as long as they have the same schema). However, you need to keep in mind that this adds another layer of complexity by reducing isolation of systems and adding another dependency.

silver-getindata

Gold layer

In this layer, the data should be consumption-ready for specific business and analytics use cases. 

It’s prepared for reporting and uses denormalized and read-optimized models, having the highest quality, usability, governance and documentation.

gold-getindata

Summary

Now let’s take a step back and look at how the classic data warehouse layers looked like. We also had three (sometimes more) layers of data like Staging (STG), Operational Data Store (ODS) and Data Warehouse (DWH). Each of them had their own purpose, very similar to Medallion Architecture. Does that ring a bell? To me, the Medallion Architecture is simply another iteration of a concept we've already seen in the data world, but adapted to modern warehousing solutions.

While reading all of those do’s and don'ts you might think “this does not fit into my scenario, I have to do it differently, with different naming, with more layers”. But can you actually do it? Definitely. 

All of the data design patterns were invented to propose some kind of standard of doing things (without this there would be chaos), but it’s impossible to invent something that would be perfect. It’s mostly up to you to decide, which way you’re gonna go with your data warehouse design pattern, having in mind the benefits and risks, while considering Medallion Architecture as a starting point in a modern data warehouse. However, the most important thing is to treat the desired pattern as an organization-wide set of standards and good practices.

getindata-bronze-silver-gold-table

Get Notified about More Tutorials

Subscribe our Newsletter

The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy
streaming
Data
Data Lakehouse
30 October 2024

Want more? Check our articles

running apache spark on aws
Use-cases/Project

Running Spark on Amazon Web Services (AWS)

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the…

Read more
getindata success story izettle stream processing
Success Stories

Success Story: Fintech data platform gets a boost from stream processing

A partnership between iZettle and GetInData originated in the form of a two-day workshop focused on analyzing iZettle’s needs and exploring multiple…

Read more
1716380755877
Big Data Event

Overview of InfoShare 2024 - Part 1: Demystifying AI Buzzwords, Gamified Security Training

The 2024 edition of InfoShare was a landmark two-day conference for IT professionals, attracting data and platform engineers, software developers…

Read more
datamass getindata adoption genai
Big Data Event

A Review of the Presentations at the DataMass Gdańsk Summit 2023

The Data Mass Gdańsk Summit is behind us. So, the time has come to review and summarize the 2023 edition. In this blog post, we will give you a review…

Read more
getindator create a cover graphic for a blog post about optimiz 05dfdc1c 8a91 4d99 9b19 137eabe195b0
Tutorial

Optimizing Flink SQL: Joins, State Management and Efficient Checkpointing

In the fast-paced world of data processing, efficiency and reliability are paramount. Apache Flink SQL offers powerful tools for handling batch and…

Read more
how we work with customer scrum framework dema project
Use-cases/Project

How do we work with customers? Scrum Framework in Dema project

Main Goals GetInData has successfully introduced the Scrum framework in cooperation with Dema. Thanks to the use of Scrum, the results of the…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy