Ryder Nguyen

Summary

Building on top of the knowledge gained about ARIMA & Rt from the past hackathon with other Hult students, I wanted to learn more about the advantages that Google Cloud Platform could offer. This project involves:

Fully designing the data pipeline from EXTRACT (from external sources and within the GCP platform) to TRANSFORM (using queries in BigQuery, pandas & APIs) to LOAD (linking/blending tables for visualization in Data Studio)
Utilizing the managed cluster workflow to spin up a cluster, process tasks and automatically spin down the cluster to save resources
Incorporate an ARIMA model to forecast the number of confirmed and deaths cases in 50 states

For better viewing experience of the dashboard, please view on a desktop device.

Data Pipeline Process

Lumiata (Hackathon)

Original Repository

A group of Hult students explore and built an ARIMA & Rt model using Python for a hackathon

Insights

We predict how many confirmed cases is going to appeer in each county in the next 7 days using the ARIMA model we forecast new cases in the coming week. The ARIMA model works by finding a linear regression based on the time series previous timelags (AR, auto-regression). To make the time series stationary we need to differentiate it (I, integrated). It also looks at the moving average (MA) when forecasting. We use a gridsearch function to find the appropriate orders for each county.

01 According to our calculated Rt

According to our calculated Rt, as of 04.30.20, the pandemic is dying out in San Francisco, with an Rt below 1.
02 Counties with the highest prediction

The five counties with the highest predicted new cases over the next week, from 04.30.20, are:
New York City, New York, 19077
Cook, Illinois, 11186
Los Angeles, California, 6642
Philadelphia, Pennsylvania, 3484
Middlesex, Massachusetts, 3435

Our Team

Tien "Ryder" Nguyen

Data Analyst LinkedIn

Luis E. Ariza

Data Analyst LinkedIn

Mats Borgen Lunde

Data Analyst LinkedIn

Punyisa Kraisang

Software Engineer LinkedIn

Stephanie Dominguez

Software Engineer LinkedIn

COVID-19

Summary

Updated Repository

Medium Story

Building on top of the knowledge gained about ARIMA & Rt from the past hackathon with other Hult students, I wanted to learn more about the advantages that Google Cloud Platform could offer. This project involves:

Data Pipeline Process

Lumiata (Hackathon)

Original Repository

Insights

Our Team

Tien "Ryder" Nguyen

Luis E. Ariza

Mats Borgen Lunde

Punyisa Kraisang

Stephanie Dominguez