Success Stories
5 min read

Truecaller - armed with data analytics to control incoming calls

Building a modern analytics environment is a strategic, long-term, iterative process of continuous improvement rather than a one-off project.

The challenge

Truecaller created a mobile app that helps identify who is calling even if you don’t have the number stored as a contact. It blocks unwanted calls & SMS, enables instant mobile payments and VoIP calls. These features are in high demand, particularly in emerging markets, as proved by 500M app installs.

Data has always been central to Truecaller’s business. The app’s spam identification feature relies on the reports from users on numbers they consider spam.Internal sources feed caller identification service.Users see ads tailored to their characteristics. App analytics help identify opportunities to provide genuine and meaningful value to its users.

trucaller-getindata-data-volumes-big-data

The solution

GetInData has assisted Truecaller in its data analytics evolution ever since implementing the first big data platform to respond to exploding data volumes in 2014. At that time Kafka and dumps from relational databases fed on-premise Cloudera Data platform, with Airflow responsible for orchestration and scheduling as well as Spark, Presto and Hive responsible for data processing.

App usage expanded further and Truecaller faced constantly increasing storage needs. They bought more hardware to get more disks even though they didn’t require more computing power. Maintaining its own data center was also challenging and the company experienced occasional downtimes.

In 2018 Truecaller decided it’s once again time to rethink their approach. After carefully considering all the available options, they decided to go for Google Cloud Platform offering. The company wanted to benefit from Cloud Storage and use DataProc for YARN compute clusters thus leveraging bare metal instances, saving costs and enabling autoscaling. Cloud Storage reduced the need for capacity planning, diminished maintenance burden, made storage access faster and turned out cheaper in comparison to on-prem HDFS.

The migration to GCP came at the cost of adjusting certain jobs to make them run in DataProc.

trucaller-getindata-cloud-journey

The next step in the cloud journey was to examine other cloud-native technologies. BigQuery turned out faster and cheaper than Hive on DataProc and offered so much better user experience thatpeople dealing with data didn’t want to work with Hive anymore. BigQuery quickly became the preferred analytics tool and Truecaller is even planning to use it for ETL processing. More complicated workload and machine learning will be run as Spark on Kubernetes.

Another advantage of GCP was the availability of cloud-native tools like Deployment Manager for infrastructure automation. It helps to deliver cloud resources faster and improves its management. Keeping resource definitions in templates as Python or Jinja code makes it suitable for CI/CD pipelines resulting in process traceability, faster delivery with infrastructure integration tests included.

Another angle to this story is the data presentation layer. Management and product owners used Tableau dashboards with analytics on users and their ways of approaching app features. With the cloud-native strategy, Data Studio became a natural choice for this purpose. It got integrated with BigQuery seamlessly, was much easier to use, serverless, and available free of charge.

The results

The cloud journey of Truecaller, supported by GetInData, required an iterative reassessment of the approach taking cloud-native and open-source technologies into account. It was full of dilemmas but eventually led to the closure of the on-premise data center and full migration from Tableau to Data Studio.

Throughout these years Truecaller managed to achieve:

6$ per 10k users of monthly cost of the data platform

developers cost constituting 30% of infrastructure cost

● managing current pipelines with only one data engineer per 42M users monthly.

To see the video presentation on Truecaller cloud journey from Big Data Technology Warsaw Summit 2020, please go here.

How-make-Data-Scientists-like-you-and-save-few-bucks-while-migrating
F.Alsadi, J.Araujo, T.Żukowski 'How to make your Data Scientists like you and save a few bucks while migrating'

big data
analytics
google cloud platform
cloud
24 June 2020

Want more? Check our articles

getindata apache nifi recommendation notext
Tutorial

NiFi Ingestion Blog Series. Part VI - I only have one rule and that is … - recommendations for using Apache NiFi

Apache NiFi, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Read more
bqmlobszar roboczy 1 4
Tutorial

A Step-by-Step Guide to Training a Machine Learning Model using BigQuery ML (BQML)

What is BigQuery ML? BQML empowers data analysts to create and execute ML models through existing SQL tools & skills. Thanks to that, data analysts…

Read more
deployingsecuremlfowonawsobszar roboczy 1 4
Tutorial

Deploying secure MLflow on AWS

One of the core features of an MLOps platform is the capability of tracking and recording experiments, which can then be shared and compared. It also…

Read more
noweobszar roboczy 1 3

GetInData in 2022 - achievements and challenges in Big Data world

Time flies extremely fast and we are ready to summarize our achievements in 2022. Last year we continued our previous knowledge-sharing actions and…

Read more
data driven fast track 3 steps make you data driven company
Tech News

Data-driven fast-track: 3 steps to make your company more data-driven

Hardly anyone needs convincing that the more a data-driven company you are, the better. We all have examples of great tech companies in mind. The…

Read more
data modelling looker pdt vs dbt getindata 2
Tutorial

Data Modelling in Looker: PDT vs DBT

A data-driven approach helps companies to make decisions based on facts rather than perceptions. One of the main elements that  supports this approach…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy