Fighting COVID-19 with Google Cloud - quarantine tracking system
Coronavirus is spreading through the world. At the moment of writing this post (on the 26th of March 2020) over 475k people have been infected and over 21k people died (Source: https://coronavirus.jhu.edu/map.html). As East-Asian examples have shown, an effective fight with the outbreak means tracking down individual cases and thoroughly separating infected people from healthy ones. The mechanics behind it is well described in this popular article: Coronavirus - act today or people will die:
“Containment is making sure all the cases are identified, controlled, and isolated. It’s what Singapore, Hong Kong, Japan or Taiwan are doing so well: They very quickly limit people coming in, identify the sick, immediately isolate them, use heavy protective gear to protect their health workers, track all their contacts, quarantine them… This works extremely well when you’re prepared and you do it early on, and don’t need to grind your economy to a halt to make it happen.”
People around the world are taking initiatives to fight with COVID-19. One of those are Hack The Crisis hackathons. My friends and I have decided to take part in the Polish edition. Our goal was to build a system for quarantine tracking.
The South Korean system for quarantined people works very effectively. Once a user starts quarantine they are instructed to stay at home location for at least 2 weeks, in isolation from others and report health conditions. By the time the system was introduced, there were 30k people nationwide self-isolating and the limit of human resources to track each quarantined user was a serious issue. The solution was a mobile application that tracks device location. Once the user leaves „home area” an alert is raised to the case officer. In the South Korean version, there is no additional verification (check). The Ministry is aware of flaws in GPS tracking, but the exact algorithm of checking whether a user is outside of the „home area” is not disclosed.
Coronavirus is spreading fast. When things go wrong we will count active cases in millions. To operate smoothly geofencing apps that monitor people under quarantine need to include different AI-based features to decide whether a user remains in place and to prevent fraudulent activities. Whole system should work in real-time to deliver notifications and reports immediately. Finally, security, reliability, and scalability aspects cannot be ignored as the solution may need to comprise a large chunk of the population.
Notice: The system described in this article is an attempt to build a prototype and not a production-grade system.
Briefly, let’s enumerate the most important use-cases that our system should address:
The System should detect whether a user is within or outside the appointed quarantine area. Each violation should be double-checked to mitigate the risk of false alarms;
Government services should have an overview on the current situation through reports and alerts;
User’s interaction with the system should be seamless and effortless.
Architecture overview
We are going to build a quarantine tracking system. The system should be scalable to support the growing number of users. It should perform fast and deliver up-to-date data and insights. Equally important is time-to-market - the sooner the solution can be introduced the more effective quarantine will be.
Considering functional and non-functional requirements usage of public cloud with emphasis on serverless offering seems to be a good start. Serverless favors writing loosely coupled microservices.
From a high-level perspective we can distinguish three main areas of the system:
Location data collection and evaluation;
Verification and alerting;
Data analysis and reporting.
Overview of the system from the point of view of services. Arrows indicate dataflows. Services are grouped by use-case.
Location data collection and evaluation
Users who start their quarantine agree to share their current location every 15 minutes. This event is published in the background and no more action from the user is needed. Our system can provide a public endpoint that will handle incoming location events. Once received, the function will publish a message to Cloud Pub/Sub topic.
On the other side, we can make a subscription to this topic in Cloud Dataflow. Cloud Dataflow is a managed batch and streaming data processing service, which uses pipelines written in Apache Beam framework. It reads events, joins them with the definition of the user's quarantine area and decides if the user is inside or outside. When outside, it publishes event. Finally, we can stream all events into BigQuery for further data analytics.
Possible improvements here are:
sending location events more frequently and hence compute sliding windows over multiple events. Multiple events outside the quarantine area give a stronger motivation to generate alert;
using a wifi output or a bluetooth scan in addition to GPS coordinates;
creating a time-partitioned table in BigQuery with clustering on auser idcolumn to reduce costs of queries.
Verification and alerting
Remember the event published when a user is outside the quarantine area? Now, we can use this event to trigger a push notification to users and case officers! The number of such events will be significantly lower than one of incoming location events.
Cloud Function with Cloud Pub/Sub trigger will do the job. This function obtains the user'sfcmTokenfrom Firestore Cloud Messaging and sends push notification.
Process
The goal of quarantine is to obey rules. Subjects cannot interact with others - period. Focus on precise verification will help in keeping people isolated.
Our idea is to give a simple task to a user that will prove their identity and location as the quarantine subject.
We can base the verification process on:
Photos that show something that is unique and cannot be easily moved from the “quarantine location” (for example: a painting on the wall or a bookshelf);
Selfie;
The orientation of a device;
GPS location;
Reaction time.
At the beginning of their quarantine users will be asked to make a reference verification. Users will need to memorize the reference verification, namely: what was on the picture and how the picture was taken.
Users will receive verification requests at random times, or when being outside of their quarantine area. Here we can use a mix of Cloud Tasks, Cloud Scheduler and Cloud Functions.
Once verification is triggered, the user will be asked to mimic reference verification as closely as possible. It will be much harder to cheat this method because the user would have to take fake photos, set fake GPS coordinates and even fake the orientation of the device in order to spoof the verification. Hopefully, the effort required to cheat the procedure will be enough to discourage people from breaking the quarantine rules.
Decision model
Now, we need to assess whether the submitted verification request matches the reference. We have five main input features: photo of an object, photo of a user's face, measurements of gyroscope, GPS coordinates and timestamps.
In the preprocessing step, we need to compute features quantifying differences between reference and actual verification. For the image, we need a method to compare two images discarding light setup, shadows and simple distortions.
Accelerated KAZE is described as a “novel and fast multiscale feature detection and description approach that exploits the benefits of nonlinear scale spaces”. Scary name, but it’s already implemented in OpenCV. Essentially, it assigns keypoints to each image and tries to find matching pairs.
For face photos, we can use well-known face matching algorithms based on face keypoints and pre-trained models. Alternatively, there are commercial APIs like Azure Cognitive Services Face Verification.
Having numerical features we can use heuristics or train ML binary classification models. For prototype, we'll be fine with simple heuristics that calculate a weighted average and return boolean decision based on a specified threshold.
GCP has many services to support machine learning solutions. For example, the model can be deployed as a Cloud Run service or using AI Platform in the future.
Finally, the result is saved back to the verification document in Firestore and returned to the user.
Data analysis and reporting
Location data is streamed to BigQuery. We also can capture the history of user data stored in Firestore documents using out-of-the-box Firebase Extension: Firestore BigQuery Export.
Now, we have all our data in a single place (locations, user profile, verification results). This gives us a detailed view of the current state of the system and how it has changed over time.
Example analyses that we can perform on such data model are:
Detailed stats on how well people follow quarantine rules;
Areas with quarantined users and their change overtime, which could uncovers hidden trends in spreading virus geographically or potential outbreaks;
Cohort analysis.
BigQuery comes with GIS and machine learning features. Using BQML we can train classification, regression and clustering models with simple SQL statements.
BigQuery integrates seamlessly with BI tools like Data Studio, Tableau or Looker. Such dashboards can serve as simple yet powerful monitoring tools summarizing key metrics and providing useful insights.
Bells and whistles!
Core of the system is ready. We have fulfilled the proposed use-cases. The functionality of the system can be extended in multiple ways:
Tracking users health status to predict likelihood of being infected;
Rewarding users with achievements or promo codes to increase their engagement;
Optimizing case offices work by suggesting next actions;
Making the system GDPR compliant;
Anonymizing and aggregating data to share with the public.
Conclusions
In this post, we’ve addressed a real problem that can significantly improve quarantine tracking and directly relate to the decreased number of SARS-Cov-2 victims.
We utilized Google Cloud Platform offering to collect, process, store data and host services. Managed, serverless services like BigQuery, Cloud Functions or AI Platform can significantly reduce time-to-market and still provide enough scalability, reliability and security, which makes them a perfect fit for such use-case.
We’ve seen countries like South Korea using such technology before. It has a significant impact on decreasing growth rate and avoiding “super-spreaders”. All the technology to save lives is ready and what needs to be done is just to glue it up!
Final note
Described solution was designed and prototyped for https://www.hackcrisis.com/, hackathon supported by the Polish government. Project has been selected for the final TOP30 out of over 180 submissions. Solution is the result of equally hard work of a team of me and my friends from outside of GetInData: Łukasz Niedziałek, Bartłomiej Margas, Wojciech Pyrak, Katarzyna Rode, Jakub Wyszomierski and Aleksandra.
From this point, I would like to say many thanks to the GetInData team for their support and an opportunity to share the solution.
big data
google
google cloud platform
cloud
COVID-19
26 March 2020
Like this post? Spread the word
Want more? Check our articles
Big Data Event
A Review of the Big Data Technology Warsaw Summit 2024! Part 2: Private RAG-backed Data Copilot, Allegro and PLAY case studies
In this blogpost series, we share takeaways from selected topics presented during the Big Data Tech Warsaw Summit ‘24. In the first part, which you…