Engineer Data in Google Cloud: Challenge Lab

Ajil T U
2 min readOct 8, 2020

Detailed walk-through of the Engineer Data in Google Cloud Skill Badge on Google Cloud Platform.

Photo by Luke Chesser on Unsplash

This medium article focuses on the detailed walk through of the steps I took to solve the challenge lab Engineer Data in Google Cloud: Challenge Lab in Google Cloud Skill Badge on the Google Cloud Platform.

This lab is only recommended for students who have completed the quest Data Engineering on Qwiklabs.

Open BigQuery Console

In the Google Cloud Console, select Navigation menu > BigQuery:

The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.

Click Done.

The BigQuery console opens.

Click on your project ID > Taxirides > historical_taxi_rides_raw

Task 1: Clean your training data

Copy and paste the following query into the BigQuery Query editor:

CREATE OR REPLACE TABLE
taxirides.taxi_training_data AS
SELECT
(tolls_amount + fare_amount) AS fare_amount,
pickup_datetime,
pickup_longitude AS pickuplon,
pickup_latitude AS pickuplat,
dropoff_longitude AS dropofflon,
dropoff_latitude AS dropofflat,
passenger_count AS passengers,
FROM
taxirides.historical_taxi_rides_raw
WHERE
RAND() < 0.001
AND trip_distance > 0
AND fare_amount >= 2.5
AND pickup_longitude > -78
AND pickup_longitude < -70
AND dropoff_longitude > -78
AND dropoff_longitude < -70
AND pickup_latitude > 37
AND pickup_latitude < 45
AND dropoff_latitude > 37
AND dropoff_latitude < 45
AND passenger_count > 0

Task 2: Create a BQML model called taxirides.fare_model

CREATE OR REPLACE MODEL taxirides.fare_modelTRANSFORM(  * EXCEPT(pickup_datetime)   , ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean  , CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING) AS dayofweek  , CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING) AS hourofday)OPTIONS(input_label_cols=['fare_amount'], model_type='linear_reg') AS SELECT * FROM taxirides.taxi_training_data

Task 3: Perform a batch prediction on new data

CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions  ASSELECT * FROM ML.PREDICT(MODEL taxirides.fare_model,(  SELECT * FROM taxirides.report_prediction_data))​

--

--