Insights from Data with BigQuery: Challenge Lab

Ajil T U
6 min readOct 3, 2020

Detailed walk-through of the Insights from Data with BigQuery Skill Badge on Google Cloud Platform.

Photo by Campaign Creators on Unsplash

This medium article focuses on the detailed walk through of the steps I took to solve the challenge lab Insights from Data with BigQuery: Challenge Lab in Google Cloud Skill Badge on the Google Cloud Platform.

This lab is only recommended for students who have completed the quest BigQuery Basics for Data Analysts on Qwiklabs.

Activate Cloud Shell

In the Cloud Console, in the top right toolbar, click the Activate Cloud Shell button.

Click Continue.

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

You can list the active account name with this command:

gcloud auth list

You can list the project ID with this command:

gcloud config list project

Open BigQuery Console

In the Google Cloud Console, select Navigation menu > BigQuery:

The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.

Click Done.

The BigQuery console opens.

Click on your project ID

Query 1: Total Confirmed Cases

Build a query that will answer “What was the total count of confirmed cases on Apr 15, 2020?” The query needs to return a single row containing the sum of confirmed cases across all countries. The name of the column should be total_cases_worldwide.

Copy and paste the following query into the BigQuery Query editor:

SELECT sum(cumulative_confirmed) as total_cases_worldwide FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where date='2020-04-15'

Query 2: Worst Affected Areas

Build a query for answering “How many states in the US had more than 100 deaths on Apr 10, 2020?” The query needs to list the output in the field count_of_states. Hint: Don’t include NULL values.

click on compose new query and then copy and paste the following query into the BigQuery Query editor:

with deaths_by_states as ( SELECT subregion1_name as state, sum(cumulative_deceased) as death_count FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="United States of America" and date='2020-04-10' and subregion1_name is NOT NULL group by subregion1_name ) select count(*) as count_of_states from deaths_by_states where death_count > 100

Query 3: Identifying Hotspots

Build a query that will answer “List all the states in the United States of America that had more than 1000 confirmed cases on Apr 10, 2020?” The query needs to return the State Name and the corresponding confirmed cases arranged in descending order. Name of the fields to return state and total_confirmed_cases.

SELECT * FROM ( 
SELECT subregion1_name as state, sum(cumulative_confirmed) as total_confirmed_cases
FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_code="US" AND date='2020-04-10' AND subregion1_name is NOT NULL
GROUP BY subregion1_name ORDER BY total_confirmed_cases DESC ) WHERE total_confirmed_cases > 1000

Query 4: Fatality Ratio

Build a query that will answer “What was the case-fatality ratio in Italy for the month of April 2020?” Case-fatality ratio here is defined as (total deaths / total confirmed cases) * 100. Write a query to return the ratio for the month of April 2020 and containing the following fields in the output: total_confirmed_cases, total_deaths, case_fatality_ratio.

1) SELECT sum(cumulative_confirmed) as total_confirmed_cases, sum(cumulative_deceased) as total_deaths, (sum(cumulative_deceased)/sum(cumulative_confirmed))*100 as case_fatality_ratio FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="Italy" and date='2020-04-30'2) SELECT sum(cumulative_confirmed) as total_confirmed_cases, sum(cumulative_deceased) as total_deaths, (sum(cumulative_deceased)/sum(cumulative_confirmed))*100 as case_fatality_ratio FROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="Italy" AND date BETWEEN '2020-04-01'and '2020-04-30'

Query 5: Identifying specific day

SELECT dateFROM `bigquery-public-data.covid19_open_data.covid19_open_data` where country_name="Italy" and cumulative_deceased>10000 order by date asc limit 1

Query 6: Finding days with zero net new cases

WITH india_cases_by_date AS ( SELECT date, SUM( cumulative_confirmed ) AS cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name ="India" AND date between '2020-02-21' and '2020-03-15' GROUP BY date ORDER BY date ASC ) , india_previous_day_comparison AS (SELECT date, cases, LAG(cases) OVER(ORDER BY date) AS previous_day, cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases FROM india_cases_by_date ) select count(*) from india_previous_day_comparison where net_new_cases=0

Query 7: Doubling rate

WITH us_cases_by_date AS (SELECTdate,SUM(cumulative_confirmed) AS casesFROM`bigquery-public-data.covid19_open_data.covid19_open_data`WHEREcountry_name="United States of America"AND date between '2020-03-22' and '2020-04-20'GROUP BYdateORDER BYdate ASC), us_previous_day_comparison AS(SELECTdate,cases,LAG(cases) OVER(ORDER BY date) AS previous_day,cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases,(cases - LAG(cases) OVER(ORDER BY date))*100/LAG(cases) OVER(ORDER BY date) AS percentage_increaseFROM us_cases_by_date)select Date, cases as Confirmed_Cases_On_Day, previous_day as Confirmed_Cases_Previous_Day, percentage_increase as Percentage_Increase_In_Casesfrom us_previous_day_comparisonwhere percentage_increase > 10

Query 8: Recovery rate

WITH cases_by_country AS ( SELECT country_name AS country, sum(cumulative_confirmed) AS cases, sum(cumulative_recovered) AS recovered_cases FROM bigquery-public-data.covid19_open_data.covid19_open_data WHERE date = '2020-05-10' GROUP BY country_name ) , recovered_rate AS(SELECT country, cases, recovered_cases, (recovered_cases * 100)/cases AS recovery_rate FROM cases_by_country ) SELECT country, cases AS confirmed_cases, recovered_cases, recovery_rate FROM recovered_rate WHERE cases > 50000 ORDER BY recovery_rate desc LIMIT 10

Query 9: CDGR — Cumulative Daily Growth Rate

WITH france_cases AS ( SELECT date, SUM(cumulative_confirmed) AS total_cases FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE country_name="France" AND date IN ('2020-01-24', '2020-05-10') GROUP BY date ORDER BY date) , summary as ( SELECT total_cases AS first_day_cases, LEAD(total_cases) OVER(ORDER BY date) AS last_day_cases, DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diff FROM france_cases LIMIT 1 ) select first_day_cases, last_day_cases, days_diff, POW((last_day_cases/first_day_cases),(1/days_diff))-1 as cdgr from summary

Create a Datastudio report

SELECT date, SUM(cumulative_confirmed) AS country_cases, SUM(cumulative_deceased) AS country_deaths FROM `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE date BETWEEN '2020-03-15' AND '2020-04-30' AND country_name ="United States of America" GROUP BY date

Click on EXPLORE DATA > Explore with Data Studio.

Authorize Data Studio to access BigQuery.

You may fail to create a report for the first-time logon of Data Studio. Click + Blank Report and accept the Terms of Service. Go back to the BigQuery page and click Explore with Data Studio again.

In the new Data Studio report, select Add a chart > Time series Chart.

Wait for a minute.

Click Check my progress to verify the objective.

Add country_cases and country_deaths to the Metric field.

Click Save to commit the change.

--

--