Radio DaTa Podcast

8 min read

Data Journey with Arunabh Singh (Willa) – Building robust ML & Analytics capability very early with FinTech, skills & competencies for data scientists with ML/AI predictions for the next decades.

In this episode of the RadioData Podcast, Adama Kawa talks with Arunabh Singh about Willa use cases ( FinTech): the most important ML models implemented at Willa, the ML(Ops) stack and more about Data and ML/AI at Willa. We will also focus on the trends and predictions for ML/AI for the next decades.

We encourage you to listen to the whole podcast or, if you prefer reading, skip to the key takeaways listed below.

___________

Host: Adam Kawa, GetInData | Part of Xebia CEO

Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and build custom Big Data solutions. Adam is also the creator of many community initiatives like the RadioData podcast, Big Data meetups and the DATA Pill newsletter.

Guest: Arunabh Singh, Head of Data

Arunabh Singh is the Head of Data at Eigensonne, and previously was the director of the Data Science team at Willa. His main fields of education are economics, political science and computer science. He has been working for enterprises of different scales and nature, mainly focused on data science and information technology for the last 10 years. He has been working at Willa for almost 3 years, right from the beginning of the company's journey.

________________

Willa and a Willa Use Case

Willa is a mature FinTech startup company based in Sweden, focused on delivering its services in the US. Its main field of interest is freelancers and, in particular, the influencer market. The main service that Willa is currently actively developing is responsible for creating an intermediary payment service between Willa’s customers and their customers' clients.

Willa’s customers can register on Willa’s app at https://www.willa.com/. Then they can present their invoices to Willa. After accepting their invoice, the Willa app provides them with immediate access to their requested funds and takes the risk and responsibility of retrieving the money from their clients.

_________________

Key takeaways:

1. What are the risks that Willa has to manage and how does it handle them?

Willa takes two types of risks when it’s accepting its customers invoices:

Freelancer side risk
Credit side risk

The freelancer side risk (or fraudulent risk) type answers the following kinds of questions, such as:

Is this freelancer legitimate?
Is this invoice legitimate?

The credit side risk (or clients risk) type answers questions such as:

What is the financial situation of the customer's client?
Does the client intend to pay Willa?
What is the economical environment of the client? Can a recession influence its potential to pay?

Willa has developed various AI/ML models and algorithms to assess the risk involved on the fraudulent and credit risk side. Based on the data that Willa processes, the algorithms decide whether to be more conservative or more liberal in accepting the invoices of its customers. If the risk rates are too high, the model calibrates to be more conservative.

2. ML: Are all the cases handled by algorithms? What is asymmetric risk?

There are some cases in Machine Learning models which are not handled well. In the case of Willa, they are called asymmetric risks. To understand what an asymmetric risk is, it’s good practise to look at an example:

Let’s say there is a Willa customer which presents an invoice for 10 billion dollars for the company Apple. On paper, everything might seem fine - the customer seems to be legitimate and the client of the customer is also a very solid company. But there is a 0.0001% probability that something might go wrong. Even though the ML model would recommend accepting the invoice, Willa should not, because potential failure could result in the financial ruin of Willa. Low probability, high impact events can be catastrophic. Cases such as asymmetric risk are handled independently with some custom common sense gates in the algorithms.

3. What types of data is processed at Willa and what are the data analytics and data science operations?

In Willa there are few types of data collected such as: business reporting, operational metrics, user activity, tracking activity over time, lifetime value calculation, app interactions in the frontend, payment requests and money withdrawal etc.

The main analytics and data science operations are focused on predicting the default rates and fraud rates on each particular invoice of each particular customer. Additionally, they involve more heuristic analytics like calculating limits on particular customers based on their default rates.

4. Technology stack at Willa

Willa has been fully hosted on GCP since the beginning. It uses dbt and Airflow for upstream plumbing and orchestration, BigQuery for data warehousing and DataStudio for reporting. Most of the models are built using Python libraries like Vertex AI and Kedro.

5. How long does it take to create and deploy a new machine learning model in production?

Normally, it takes a few weeks to put an ML model into production, mainly because the product and the field Willa is dealing with is quite new and dynamic. There are also new features being constantly added to the app, which create the ever growing layer of integration that must be achieved. We want to be sure that our models are robust and sound, rather than iterate very quickly. Willa focuses more on data plumbing and data engineering and has a slower approach to data modeling.

6. What are the free of charge technologies that the Willa team uses on a daily basis?

In essence, the Big Query Console and UI together with Google Sheets is used. To create a new field in an actual model or a new variable, dbt is used. For coding of the actual production-ready models, it’s mainly Python, Kedro and Google Vertex AI which are utilized.

7. What are the most sought after skills and competences at Willa?

The three groups of skills that are most appreciated and valued at Willa are:

A general quantitative aptitude - you have to be comfortable with numbers and have the ability to break down the problem into quantitative problems at best, and at least into analytical problems.
An ability to think counter-intuitively, curiosity to dig deeper and not just be satisfied with the first result.
An ability to structurize the unstructured means of a decision and enhance it with data analysis and data science, by speaking well, writing well, communicating well and presenting well.

8. What are the most important trends and predictions regarding Data Science, AI and in the industry overall in the upcoming decade?

The most important trends or predictions regarding Data Science that Arunabh mentioned are:

The idea that there will be mass unemployment caused by machines taking over human jobs seems unlikely, partly because we already have experience in working alongside automatization and machines and already have experience in using machines (even very automatized) to our advantage, and also because not every aspect of human activity can be automated simply.
Self serving analytics and AI/BI are not adopted as easily as was previously predicted. People can create good solutions regarding AI, but they don’t fully rely on them and seek out human confirmation.
There will be more companies in slightly less technologically developed countries that will start to adopt and use AI and ML models.
Adding and focusing on „Green Tech” is going to be the next big industry movement of the next 25 years.

We can already see examples of this, for instance Poland has tripled cloud adoption over the last 8 years and is catching up with other technologically advanced countries like Sweden and Switzerland etc.

Furthermore, in many companies there are multiple examples of where even though AI and automation is used, human confirmation and domain knowledge can be invaluable in solving a complicated problem.

9. What is going to happen at Willa in the near future?

Willa is going to focus mainly on doing the same thing, but better overall. The key fields of improvement for the near future are going to be:

Fine tuning the data science model:
- better predictions of credit and invoice risks,
- including more user data in predictions,
- enhancing the features of the Willa product.
Doing longer-term predictive product analytics on the user side for the following questions, for example:
- What kind of users are likely to stay with Willa after 2 years?
- Who joins, who stays, who reactivates?
Revamping the data warehouse so that it can scale better for a larger number of users and a larger amount of data.

___________________

These are just snippets from the entire conversation which you can listen to here:

Subscribe to the Radio Data podcast to stay up-to-date with the latest technology trends and discover the most interesting data use cases!

analytics

Last updated: 27 July 2023

Written by

Piotr Tutak

Senior Software Engineer

Like this post?
Spread the word

Want more? Check our articles

Tutorial

Data Mesh as a proper way to organise data world

Data Mesh as an answer In more complex Data Lakes, I usually meet the following problems in organizations that make data usage very inefficient: Teams…

getindata xebia joining forces globa partner

Joining forces with Xebia: The story by GetInData’s founders about their aspirations, dilemmas and key reasons for joining the global partner

Starting a company from scratch as first-time founders can be very challenging, but being active community members can make all the difference…

Tutorial

EU Artificial Intelligence Act - where are we now

It's coming up to a year since the European Commission published its proposal for the Artificial Intelligence Act (the AI Act/AI Regulation). The…

Tutorial

dbt run real-time analytics on Apache Flink. Announcing the dbt-flink-adapter!

We would like to announce the dbt-flink-adapter, that allows running pipelines defined in SQL in a dbt project on Apache Flink. Find out what the…

Big Data Event

A Review of the Presentations at the DataMass Gdańsk Summit 2022

The 4th edition of DataMass, and the first one we have had the pleasure of co-organizing, is behind us. We would like to thank all the speakers for…

Use-cases/Project

Anomaly detection implemented in podcasting company

Being a Data Engineer is not only about moving the data but also about extracting value from it. Read an article on how we implemented anomalies…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

Data Journey with Arunabh Singh (Willa) – Building robust ML & Analytics capability very early with FinTech, skills & competencies for data scientists with ML/AI predictions for the next decades.

Willa and a Willa Use Case

Key takeaways:

1. What are the risks that Willa has to manage and how does it handle them?

2. ML: Are all the cases handled by algorithms? What is asymmetric risk?

3. What types of data is processed at Willa and what are the data analytics and data science operations?

4. Technology stack at Willa

5. How long does it take to create and deploy a new machine learning model in production?

6. What are the free of charge technologies that the Willa team uses on a daily basis?

7. What are the most sought after skills and competences at Willa?

8. What are the most important trends and predictions regarding Data Science, AI and in the industry overall in the upcoming decade?

9. What is going to happen at Willa in the near future?

Like this post?Spread the word

Want more? Check our articles

Data Mesh as a proper way to organise data world

Joining forces with Xebia: The story by GetInData’s founders about their aspirations, dilemmas and key reasons for joining the global partner

EU Artificial Intelligence Act - where are we now

dbt run real-time analytics on Apache Flink. Announcing the dbt-flink-adapter!

A Review of the Presentations at the DataMass Gdańsk Summit 2022

Anomaly detection implemented in podcasting company

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!