Radio DaTa Podcast
8 min read

Data Journey with Arunabh Singh (Willa) – Building robust ML & Analytics capability very early with FinTech, skills & competencies for data scientists with ML/AI predictions for the next decades.

In this episode of the RadioData Podcast, Adama Kawa talks with Arunabh Singh about Willa use cases (​ FinTech): the most important ML models implemented at Willa, the ML(Ops) stack and more about Data and ML/AI at Willa. We will also focus on the trends and predictions for ML/AI for the next decades.

We encourage you to listen to the whole podcast or, if you prefer reading, skip to the key takeaways listed below.

___________

Host: Adam Kawa, GetInData | Part of Xebia CEO

Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and build custom Big Data solutions. Adam is also the creator of many community initiatives like the RadioData podcast, Big Data meetups and the DATA Pill newsletter.

Guest: Arunabh Singh, Head of Data

Arunabh Singh is the Head of Data at Eigensonne, and previously was the director of the Data Science team at Willa. His main fields of education are economics, political science and computer science. He has been working for enterprises of different scales and nature, mainly focused on data science and information technology for the last 10 years. He has been working at Willa for almost 3 years, right from the beginning of the company's journey.

________________

Willa and a Willa Use Case

Willa is a mature FinTech startup company based in Sweden, focused on delivering its services in the US. Its main field of interest is freelancers and, in particular, the influencer market. The main service that Willa is currently actively developing is responsible for creating an intermediary payment service between Willa’s customers and their customers' clients.

Willa’s customers can register on Willa’s app at https://www.willa.com/. Then they can present their invoices to Willa. After accepting their invoice, the Willa app provides them with immediate access to their requested funds and takes the risk and responsibility of retrieving the money from their clients.

_________________

Key takeaways:

1. What are the risks that Willa has to manage and how does it handle them? 

Willa takes two types of risks when it’s accepting its customers invoices:

  1. Freelancer side risk
  2. Credit side risk

The freelancer side risk (or fraudulent risk) type answers the following kinds of questions, such as:

  • Is this freelancer legitimate?
  • Is this invoice legitimate?

The credit side risk (or clients risk) type answers questions such as:

  • What is the financial situation of the customer's client?
  • Does the client intend to pay Willa?
  • What is the economical environment of the client? Can a recession influence its potential to pay?

Willa has developed various AI/ML models and algorithms to assess the risk involved on the fraudulent and credit risk side. Based on the data that Willa processes, the algorithms decide whether to be more conservative or more liberal in accepting the invoices of its customers. If the risk rates are too high, the model calibrates to be more conservative.

2. ML: Are all the cases handled by algorithms? What is asymmetric risk?

There are some cases in Machine Learning models which are not handled well. In the case of Willa, they are called asymmetric risks. To understand what an asymmetric risk is, it’s good practise to look at an example:

Let’s say there is a Willa customer which presents an invoice for 10 billion dollars for the company Apple. On paper, everything might seem fine - the customer seems to be  legitimate and the client of the customer is also a very solid company. But there is a 0.0001% probability that something might go wrong. Even though the ML model would recommend accepting the invoice, Willa should not, because potential failure  could result in the financial ruin of Willa. Low probability, high impact events can be catastrophic. Cases such as asymmetric risk are handled independently with some custom common sense gates in the algorithms.

3. What types of data is processed at Willa and what are the data analytics and data science operations?

In Willa there are few types of data collected such as: business reporting, operational metrics, user activity, tracking activity over time, lifetime value calculation, app interactions in the frontend, payment requests and money withdrawal etc.

The main analytics and data science operations are focused on predicting the default rates and fraud rates on each particular invoice of each particular customer. Additionally,  they involve more heuristic analytics like calculating limits on particular customers based on their default rates.

4. Technology stack at Willa

Willa has been fully hosted on GCP since the beginning. It uses dbt and Airflow for upstream plumbing and orchestration, BigQuery for data warehousing and DataStudio for reporting. Most of the models are built using Python libraries like Vertex AI and Kedro.

5. How long does it take to create and deploy a new machine learning model in production?

Normally, it takes a few weeks to put an ML model into production, mainly because the product and the field Willa is dealing with is quite new and dynamic. There are also new features  being constantly added to the app, which create the ever growing layer of integration that must be achieved. We want to be sure that our models are robust and sound, rather than iterate very quickly. Willa focuses more on data plumbing and data engineering and has a slower approach to data modeling.

6. What are the free of charge technologies that the Willa team uses on a daily basis?

In essence, the Big Query Console and UI together with Google Sheets is used. To create a new field in an actual model or a new variable, dbt is used. For coding of the actual production-ready models, it’s mainly Python, Kedro and Google Vertex AI which are utilized.

7. What are the most sought after skills and competences at Willa?

The three groups of skills that are most appreciated and valued at Willa are:

  • A general quantitative aptitude -  you have to be comfortable with numbers and have the ability to break down the problem into quantitative problems at best, and at least into analytical problems.
  • An ability to think counter-intuitively, curiosity to dig deeper and not just be satisfied with the first result.
  • An ability to structurize the unstructured means of a decision and enhance it with data analysis and data science, by speaking well, writing well, communicating well and presenting well.

8. What are the most important trends and predictions regarding Data Science, AI and in the industry overall in the upcoming decade?

The most important trends or predictions regarding Data Science that Arunabh mentioned are:

  • The idea that there will be mass unemployment caused by machines taking over human jobs seems unlikely, partly because we already have experience in working alongside automatization and machines and already have experience in using machines (even very automatized) to our advantage, and also because not every aspect of human activity can be automated simply. 
  • Self serving analytics and AI/BI are not adopted as easily as was previously predicted. People can create good solutions regarding AI, but they don’t fully rely on them and seek out human confirmation.
  • There will be more companies in slightly less technologically developed countries that will start to adopt and use AI and ML models.
  • Adding and focusing on „Green Tech” is going to be the next big industry movement of the next 25 years.

We can already see examples of this, for instance Poland has tripled cloud adoption over the last 8 years and is catching up with other technologically advanced countries like Sweden and Switzerland etc.

Furthermore, in many companies there are multiple examples of where even though AI and automation is used, human confirmation and domain knowledge can be invaluable in solving a complicated problem.

9. What is going to happen at Willa in the near future?

Willa is going to focus mainly on doing the same thing, but better overall. The key fields of improvement for the near future are going to be:

  • Fine tuning the data science model:
    • better predictions of credit and invoice risks,
    • including more user data in predictions,
    • enhancing the features of the Willa product.
  • Doing longer-term predictive product analytics on the user side for the following questions, for example:
    • What kind of users are likely to stay with Willa after 2 years? 
    • Who joins, who stays, who reactivates? 
  • Revamping the data warehouse so that it can scale better for a larger number of users and a larger amount of data.

___________________

These are just snippets from the entire conversation which you can listen to here: 

Subscribe to the Radio Data podcast to stay up-to-date with the latest technology trends and discover the most interesting data use cases! 

SUBSCRIBE

analytics
ML
AI
27 July 2023

Want more? Check our articles

hfobszar roboczy 1 4
Tutorial

Can AI automatically fix and optimize IT systems like Flink or Spark?

Will AI replace us tomorrow? In recent years, there have been many predictions about what areas of our lives will be automated and which professions…

Read more
complex event processing apache flink
Tutorial

My experience with Apache Flink for Complex Event Processing

My goal is to create a comprehensive review of available options when dealing with Complex Event Processing using Apache Flink. We will be building a…

Read more
1 gh9BkF JQSj9vlgSi0I48A
Tech News

Everything you would like to know about Kubernetes

Source: GetInData, Google. Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of…

Read more
flink
Tutorial

ETL 2.0 Why you should switch into stream processing

If you are looking at Nifi to help you in your data ingestions pipeline, there might be an interesting alternative. Let’s assume we want to simply…

Read more
0 pjPVaAnArwat2ZH8
Big Data Event

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Read more
podcast swedbank mlops cloud getindata
Radio DaTa Podcast

MLOps in the Cloud at Swedbank - Enterprise Analytics Platform

In this episode of the RadioData Podcast, Adama Kawa talks with Varun Bhatnagar from Swedbank. Mentioned topics include: Enterprise Analytics Platform…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy