ETL

In this project, we have built an ETL pipeline in python, which extracts weather data from an API using the requests library, transformsthe obtained data using Pandas and NumPy, and at the end loads the data into a DynamoDB table with the awswrangler library.


Before Starting

For this project, we will use the API of meteoblue.com, in which we are given an API key to perform API calls, that key will be saved in a config.py file.




Extract

Mateoblue.com has a lot of weather related options, we chose the 7 days weather option and the air quality option.
In the code below, we are getting the data


Transform

To transform the data we use pandas with numpy, so before converting the JSON into a dataframe, we choose the values we are going to work with

This data will be loaded into a table in DynamoDB, so we replace all the NaN values and add the id column, the id has to be a hash

We transform the temperature and felt temperature values into integers to show the data more clearly and with the precipitation and dust concentration values we convert them into object types.

For the uv index and air quality columns, we classify them according to their percentages

Load

AWS Data Wrangler is  AWS SDK for pandas (awswrangler), we will use it to load our dataframe in a table in DynamoDB

After finishing the process, the result

This ETL has been built in a way, but there are other very powerful ways, for example in the extraction process, you can have several sources of information and in the transformation process use them to combine them and have useful information, and at the end save them in another DB or make graphs.