8 min read

Data Journey with Michał Wróbel (RenoFi) - Doing more with less with a Modern Data Platform and ML at home

In this episode of the RadioData Podcast, Adam Kawa talks with Michał Wróbel about business use cases at RenoFi (a U.S.-based FinTech), the Modern Data Platform on top of the Google Cloud Platform and advanced ML/AI models. We will also get an insight into what the specifics of startup projects are.

Host: Adam Kawa, GetInData | Part of Xebia CEO

Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and builds custom Big Data solutions. Adam is also the creator of many community initiatives like the RadioData podcast, Big Data meetups and the DATA Pill newsletter.

Guest: Michał Wróbel, Lead Software Engineer

Michał has worked as a data engineer since 2015. Michał has a lot of expertise in AWS, analytics engineering, AI/ML and end-to-end projects. At RenoFi, Michał was responsible for delivering several data products and for the management of RenoFi’s data platform. Now Michał works at Embedded Insurance as Lead Software Engineer.

RenoFi Use Case

RenoFi is a platform that helps people who are trying to renovate their properties to borrow more money with the lowest possible monthly repayment. What makes RenoFi different? Usually, when institutions like banks grant loans, they use the present value of the property, without taking into account the value after renovation. RenoFi tries to calculate the post-renovation value of the property, which significantly affects the terms of the loan.

Key quotes

dbt

“I think dbt triggered the biggest change in my data career, because it simplified so many things for so many people. It was revolutionary. I remember the times when we had custom SQL scripts and when we ran them on schedule on Airflow, you had to execute them in order. You either had one big file, thousands of lines - just sequential SQL. Now, with dbt you have all the dependencies, all the docs just there in one place.”

“Back in the day, you had to write your own custom scripts, keep the state. Hightouch does this for you, you just connect, say what the data source is, what your destination is and hightouch will handle what has changed on the source and how it changed on the destination. If it didn't change on destination then I wouldn't even hit and update it. So, it has nice notifications, is really easy to integrate with dbt and any data plot.”

Data

"There is this common problem with training prediction skew, where you might train your model on the data that you have available in the warehouse.

But when you try to predict, the client of this model might send one feature that is in a different shape / format than model expects.

This could be an upper case, lower case, not normalised, etc. It could be also different because in the DBT you have transformed it from the raw shape and data scientist wasn’t aware of..

So this is one thing that you need to remember. A common solution for this is a feature store, which adds a massive complexity to the system, because you need to have someone to develop the solution and maintain it and you need to change your experience."

A Feature Store could be a solution for several machine learning problems. In this ebook you will find out what these problems are and how a well-designed feature store can solve them. Along with a step-by-step tutorial:

DOWNLOAD FREE EBOOK

Startups & business

"In startups, you don't usually start with data teams. You need to have a product and then develop it, try to improve it, find a market and then when a startup is successful, maybe set up a data team. At RenoFi this was a little bit different because our CTO has had a data background from day one and made really good decisions. At RenoFi, the data team is pretty small, but we started quite early from the beginning."

"In startups, things change quicker than in other companies, so change is the main thing in startups."

"Startups can do more with less, and by less, I mean fewer people. And yeah, sometimes you don't need to go for a brand new shiny solution like ML, you can just use heuristics which should be good enough and provide adequate business results to a company, because the cost of a proper ML solution is really high."

"There's a plethora of cloud services and external cloud services that we also use. So almost every startup, I would say, has internal databases and external cloud services, from the business perspective and from the CEO perspective, the management team - would like to have all that data in one place. So obviously you have different names these days, whether it's a warehouse, whether it's a data lake or data lakehouse. Whatever, it's a big database which contains all of your data."

"So having all the solutions in small teams means incurring huge costs. And don't even get us started on maintenance costs! So you can set it up and it's all fine. It's easier. But then you need to update it. You need to make sure it's worth it. You need to set up monitoring. I wouldn't say this is feasible for a small team, furthermore if you want to have it running on a good quality level, then you would have to employ someone that doesn't have a life outside of work."

"Thankfully, building real-time solutions or powerful ML / AI models has become simpler and cheaper, thanks to new technologies and new tools. So it's likely that in a few years it will be no problem to use them by default, because the additional costs and efforts to build them will be relatively small."

"A good recommendation is that you don't build from scratch if you don't have the expertise in-house, but hire someone at least for a few months, to set it up to give you the best practices that are currently on the market. Maybe you would still have someone available from here, and then when required in-house, so what you should focus on are the analytics engineers. And by analytics engineers, I mean people that work with dbt and know enough in order to be able to code, who can work effectively within common line standard programming practices and tests. Therefore, RenoFi is just great."

"So it's about having a very pragmatic approach and focusing first on the most critical functionalities, because they usually bring the most value. However, there are companies that must develop real-time online machine learning solutions from day one in order to just exist, because their core business model requires them to do so. So one example is Free Now, a multi-mobility company, or Uber or Bolt or a similar app. They need to calculate the price of a ride dynamically in real-time, based on actual supply and demand, and this changes all the time. They also need to predict the estimated time of driver arrival. The same goes for the estimated duration of your ride, so that you know when you will reach your destination and so on. If the apps do this well, they will obtain drivers, customers and will earn money. But if they do this badly, they will simply lose money. So in their case, real-time and machine learning is a must-have solution which needs to be invested in and improved on constantly, especially at scale."

"Usually, if you have vendors such as dbt Labs or the Google Cloud Platform, then if they are successful, they have a very big leverage for their solutions because they can have like hundreds or even thousands of user companies. So it's cost efficient and makes sense to them to invest in their solutions, thanks to this economy of scale. So they can keep improving them by adding new functionalities, especially the ones that you won't be able to implement by yourself, as sometimes it would be simply too expensive to develop some kind of custom-made or big feature only for yourself, because you won't have this economy of scale and the same leverage as they have."

References:

dbt Coalesce 2022 playlist

These are just snippets from the entire conversation which you can listen to here.

Subscribe to the Radio Data podcast to stay up-to-date with the latest technology trends and to discover the most interesting data use cases!

machine learning

dbt

Modern Data Platform

startup

Last updated: 28 February 2023

Written by

Sylwia Kołpuć

Senior Marketing Specialist

Like this post?
Spread the word

Want more? Check our articles

Big Data Event

A Review of the Big Data Technology Warsaw Summit 2022! Part 2. Top 3 best-rated presentations

The 8th edition of the Big Data Tech Summit left us wondering about the trends and changes in Big Data, which clearly resonated in many presentations…

highly available airflow cluster aws notext

Tutorial

Highly available Airflow cluster in Amazon AWS

These days, companies getting into Big Data are granted to compose their set of technologies from a huge variety of available solutions. Even though…

Radio DaTa Podcast

Data Journey with Yetunde Dada & Ivan Danov (QuantumBlack) – Kedro (an open-source MLOps framework) – introduction, benefits, use-cases, data & insights used for its development

In this episode of the RadioData Podcast, Adam Kawa talks with Yetunde Dada & Ivan Danov about QuantumBlack, Kedro, trends in the MLOps landscape e.g…

Use-cases/Project

5 questions you need to answer before starting a big data project

For project managers, development teams and whole organizations, making the first step into the Big Data world might be a big challenge. In most cases…

getindator data metrics shown on modern visualization being che 643c6b8e 8140 4873 b9b9 3188291a0ef9

Whitepaper

Data Quality Rules: enforcing reliability of datasets. Data Quality Assurance using AWS Glue DataBrew

In today's data-driven world, maintaining the quality and integrity of your data is paramount. Ensuring that organizations' datasets are accurate…

dynamodb aws jedraszewski getindata big data blog

Tutorial

Amazon DynamoDB - single table design

DynamoDB is a fully-managed NoSQL key-value database which delivers single-digit performance at any scale. However, to achieve this kind of…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com