Writing production-ready ETL pipelines in Python / Pandas

👉

Click here to checkout Writing production-ready ETL pipelines in Python / Pandas

Are you interested in learning how to write professional ETL pipelines using best practices in Python and Data Engineering? Look no further than the course "Writing production-ready ETL pipelines in Python / Pandas." This course offers practical interactive lessons, theory lessons when needed, and gives you the python code for each lesson in the course material, the whole project on GitHub, and the ready to use docker image with the application code on Docker Hub. You will also have access to power point slides for download for each theoretical lesson and useful links for each topic and step where you can find more information and go further in depth.

Course Overview

The course "Writing production-ready ETL pipelines in Python / Pandas" is a comprehensive guide on how to write ETL pipelines using Python code and Pandas. The course will show you each step to write an ETL pipeline from scratch to production using the necessary tools such as Python 3.9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub, and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage, and the memory-profiler.

The course offers two different approaches to coding in the Data Engineering field, which are functional and object-oriented programming. In addition, the course introduces and applies best practices in developing Python code such as design principles, clean coding, virtual environments, project/folder setup, configuration, logging, exception handling, linting, dependency management, performance tuning with profiling, unit testing, integration testing, and dockerization.

The goal of this course is to extract the Xetra dataset from the AWS S3 source bucket on a scheduled basis, create a report using transformations, and load the transformed data to another AWS S3 target bucket using an ETL Pipeline that is written in a way that it can be deployed easily to almost any production environment that can handle containerized applications. The production environment we are going to write the ETL pipeline for consists of a GitHub Code repository, a DockerHub Image Repository, an execution platform such as Kubernetes and an Orchestration tool such as the container-native Kubernetes workflow engine Argo Workflows or Apache Airflow.

Course Content

The course consists of practical interactive lessons, theory lessons when needed, power point slides, and useful links for each topic and step where you can find more information and go further in depth.

The practical interactive lessons are where you will have to code and implement the pipeline. During these lessons, you will learn how to write an ETL pipeline using Pandas in Python from scratch to production step by step, debugging the code, and applying best practices in developing Python code.

The theory lessons will introduce new topics related to data engineering, such as configuration, logging, dependencies, and error handling. These topics will be explained in power point slides that you can download and use as reference in your future projects.

Course Benefits

This course offers many benefits for you as a student, including gaining a deeper understanding of Data Engineering and ETL pipelines, understanding the best practices in developing Python code, and becoming proficient in using the necessary tools for deploying an ETL pipeline to a production environment.

Another benefit of this course is that the Xetra dataset you are going to extract, transform, and load is derived near-time on a minute-by-minute basis from Deutsche Börse’s trading system and saved in an AWS S3 bucket. Therefore, this dataset will give you real-world experience and help you understand how ETL pipelines work in real life scenarios.

By taking this course, you will also have access to the Python code for each lesson in the course material, the whole project on GitHub, and the ready to use docker image with the application code on Docker Hub. You will also be able to understand the code better by debugging it and applying the best practices in developing Python code.

Course Ratings

The course has an impressive rating aggregate of 4.35497 out of 5. This rating was given by 605 students who have taken the course and benefited from its contents.

This high rating is a testament to the quality of the course and the expertise of the instructors who have developed it. The course has received positive reviews from students who praise its content, structure, and the practical experience it provides.

Course Review

If you're interested in learning how to write professional ETL pipelines using best practices in Python and Data Engineering, "Writing production-ready ETL pipelines in Python / Pandas" is the course for you. This comprehensive guide offers practical interactive lessons, theory lessons when needed, and gives you access to the Python code for each lesson in the course material, the whole project on GitHub, and the ready to use docker image with the application code on Docker Hub. The benefits of this course include gaining deeper understanding of Data Engineering and ETL pipelines, understanding the best practices in developing Python code, and becoming proficient in using the necessary tools for deploying an ETL pipeline to a production environment. As a student, you will also have access to a real dataset that will give you real-world experience and help you understand how ETL pipelines work in real life scenarios. With its high rating aggregate of 4.35497 out of 5, this course is highly recommended by students who have taken it.