Start Data Engineering

A newsletter with tutorials, data design patterns, open-source tools, and techniques used by data-driven companies to help you become a better data engineer.
Date Title
2026-04-04 3 Data Storage Techniques Every Data Engineer Should Know
2026-03-28 4 Data Engineering Concepts To Land A High-Paying Data Engineering Job
2026-03-07 How to Implement Data Quality Checks in Python Without Third-Party Tools
2026-02-15 Free Airflow 3.0 Tutorial
2026-02-08 Use Given/When/Then Specs to Make AI Generate Production-Ready Pipelines, Not Spaghetti Code
2026-01-18 Python Notebooks in Production: How marimo Solves Jupyter’s Biggest Problems for Software Engineers
2026-01-17 Demonstrate Python Expertise by Building Libraries: From Architecture to Published Package
2026-01-10 How to Write Integration Tests for Python Data Pipelines
2026-01-03 How to Create Python Data Pipelines by Defining Architecture and Generating Code with LLMs
2025-08-13 How to Use Spark SQL Merge Into - Step-by-Step Tutorial
2025-08-12 Six Data Modeling Techniques For Building Production-Ready Tables Fast
2025-08-11 Free 10-Minute Polars Tutorial for Data Engineers
2025-08-10 Free Python Standard Library How-to Cheatsheet for Data Engineers
2025-08-09 How to Get Really Good at Advanced SQL for Data Engineering
2025-08-05 How to quickly set up a local Spark development environment?
2025-06-10 Using Joins and Group Bys the right way for data warehousing
2025-06-07 CTEs(Common Table Expression) or Temporary Tables for Spark SQL
2025-06-03 Advanced SQL is knowing how to model the data & get there effectively
2025-05-05 Data Engineering Interview Preparation Series #3: SQL
2025-04-14 How to Extract Data from APIs for Data Pipelines using Python
2025-04-05 How to create an SCD2 Table using MERGE INTO with Spark & Iceberg
2025-03-18 How to quickly deliver data to business users? #1. Adv Data types & Schema evolution
2025-03-01 How to Manage Upstream Schema Changes in Data Driven Fast Moving Company
2025-02-16 Visual Studio Code (VSCode) extensions for data engineers
2025-02-10 Should Data Pipelines in Python be Function based or Object-Oriented (OOP)?
2025-02-03 How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?
2025-01-28 How to ensure consistent metrics in your warehouse
2025-01-20 Data Engineering Interview Preparation Series #2: System Design
2024-12-18 How to reference a seed from a different dbt project?
2024-11-22 What do Snowflake, Databricks, Redshift, BigQuery actually do?
2024-10-17 25 SQL tips to level up your data engineering skills
2024-10-14 How to use nested data types effectively in SQL
2024-09-23 How to decide on a data project for your portfolio
2024-09-18 How to build a data project with step-by-step instructions
2024-09-05 What are the Key Parts of Data Engineering?
2024-08-13 Data Engineering Interview Preparation Series #1: Data Structures and Algorithms
2024-07-26 How to implement data quality checks with greatexpectations
2024-07-16 What are the types of data quality checks?
2024-07-01 SQL or Python for Data Transformations?
2024-06-24 Why use Apache Airflow (or any orchestrator)?
2024-06-14 Data Engineering Projects
2024-06-12 Data Engineering Project for Beginners - Batch edition
2024-06-11 Build Data Engineering Projects, with Free Template
2024-05-30 Python Essentials for Data Engineers
2024-05-29 dbt(Data Build Tool) Tutorial
2024-05-28 Building Cost Efficient Data Pipelines with Python & DuckDB
2024-05-21 Enable stakeholder data access with Text-to-SQL RAGs
2024-05-09 How to reduce your Snowflake cost
2024-04-22 How to test PySpark code with pytest
2024-04-22 Docker Fundamentals for Data Engineers
2024-02-22 Data Engineering Best Practices - #2. Metadata & Logging
2023-12-13 Uplevel your dbt workflow with these tools and techniques
2023-11-14 What is an Open Table Format? & Why to use one?
2023-10-25 6 Steps to Avoid Messy Data in Your Warehouse
2023-07-20 Data Engineering Best Practices - #1. Data flow & Code
2023-06-30 What is a self-serve data platform & how to build one
2023-06-13 How to become a valuable data engineer
2023-05-15 Data Engineering Project: Stream Edition
2023-02-15 Change Data Capture, with Debezium
2023-01-12 Data Pipeline Design Patterns - #2. Coding patterns in Python
2022-12-11 Data Pipeline Design Patterns - #1. Data flow patterns
2022-08-11 How to gather requirements for your data project
2022-06-24 5 Steps to land a high paying data engineering job
2022-05-18 Setting up a local development environment for python data projects using Docker
2022-04-12 What is the difference between a data lake and a data warehouse?
2022-03-18 End-to-end data engineering project - batch edition
2022-02-22 Automating data testing with CI pipelines, using Github Actions
2021-12-12 How to choose the right tools for your data pipeline
2021-11-11 Setting up end-to-end tests for cloud data pipelines
2021-10-22 How to improve at SQL as a data engineer
2021-10-12 6 Responsibilities of a Data Engineer
2021-10-12 6 Key Concepts, to Master Window Functions
2021-10-12 Whats the difference between ETL & ELT?
2021-10-12 What are Common Table Expressions(CTEs) and when to use them?
2021-10-12 How to add tests to your data pipelines
2021-10-11 10 Skills to Ace Your Data Engineering Interviews
2021-10-05 What is a staging area?
2021-10-03 What is a Data Warehouse?
2021-09-16 How to Scale Your Data Pipelines
2021-08-29 Understand & Deliver on Your Data Engineering Task
2021-08-17 4 Key Patterns to Load Data Into A Data Warehouse
2021-07-21 How to Validate Datatypes in Python
2021-06-25 Designing a Data Project to Impress Hiring Managers
2021-05-13 How to make data pipelines idempotent
2021-04-26 Writing memory efficient data pipelines in Python
2021-04-08 How to gather requirements to re-engineer a legacy data pipeline
2021-03-27 How to trigger a spark job from AWS Lambda
2021-02-28 How to set up a dbt data-ops workflow, using dbt cloud and Snowflake
2021-02-13 Apache Superset Tutorial
2021-02-07 How to Join a fact and a type 2 dimension (SCD2) table
2021-01-30 How to update millions of records in MySQL?
2021-01-16 How to unit test sql transforms in dbt
2021-01-06 How to Backfill a SQL query using Apache Airflow
2021-01-01 How to do Change Data Capture (CDC), using Singer
2020-11-08 How to Pull Data from an API, Using AWS Lambda
2020-10-12 How to submit Spark jobs to EMR cluster from Airflow
2020-07-26 Ensuring Data Quality, With Great Expectations
2020-07-11 Designing a “low-effort” ELT system, using stitch and dbt
2020-06-19 3 Key techniques, to optimize your Apache Spark code
2020-06-11 What, why, when to use Apache Kafka, with an example
2020-06-02 A proven approach to land a Data Engineering job
2020-05-02 What Does It Mean for a Column to Be Indexed
2020-04-25 Advantages of Using dbt(Data Build Tool)
2020-04-18 Apache Airflow Review: the good, the bad
2020-04-11 Review: Building a Real Time Data Warehouse
2020-04-05 3 Key Points to Help You Partition Late Arriving Events
2020-03-29 Scheduling a SQL script, using Apache Airflow, with an example
2020-03-20 10 Key skills, to help you become a data engineer
No matching items
Back to top