Visual Studio Code (VSCode) extensions for data engineers

1. Introduction

Whether you are setting up visual studio code for your colleagues or want to improve your workflow, tons of extensions are available. If you have wondered

What are the best visual studio code extensions for data engineers?

How do I share my visual studio code environment with my colleagues?

How does Visual Studio code user/workspace/devcontainers/profiles work?

Then this post is for you!

Imagine being able to quickly set up Visual Studio Code on any laptop exactly how you want it. You won’t notice that you are coding on a different machine!

In this post, we will go over Visual Studio Code’s settings hierarchy, how to set up Visual Studio Code on any machine exactly to your liking with profiles, useful extensions for data engineering, and the caveats of unrestricted extensions.

By the end of this post, you will have set up Visual Studio code exactly how you like it and be able to share it with other data engineers. Let’s get started.

TL;DR If you want a setup for data engineering, follow this short video in your project directory:

2. Python environment setup

Before we set up Visual Studio code, we will install Python and use a virtual environment to keep things tidy.

# Install UV at https://docs.astral.sh/uv/getting-started/installation/#standalone-installer

# Select the Python version for your project directory
uv python install 3.13
# Create a project 
uv init my-data-pipelines
cd my-data-pipelines
# run a script
uv run main.py
# Running the script will set a virtual env at .venv
# install libraries
uv add polars

Libraries & their version are stored in the project.ml file.

[project]
name = "my-data-pipelines"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "polars>=1.22.0",
]

3. VSCode Primer

VSCode User Workspace Profile

Before we dig into setting up extensions, it’s helpful to understand key components for setting up visual code exactly to your preference:

  1. User & Workspace settings: Change how visual studio code works using its settings . You can define project-specific settings (aka workspace) and settings for your entire machine (aka user). Note your workspace settings will override your user settings.
  2. Extensions: Tools(paid & free) available for use via the visual studio code marketplace. Extensions add functionality.
  3. Profiles: You can define a list of extensions and settings into a profile that can be shared and used by anyone with Visual Studio code. Profiles let dev teams quickly have the same IDE experience. Here is the link to my Data Engineering Profile . Import this link as shown below.
  4. Snippets: Snippets are keyboard shortcuts that generate boilerplate code. Open the list of available snippets with Ctrl + Shift + p -> Snippets: Configure Snippets. Let’s look at an example to generate a try/except/else/finally block with a teef snippet.
  5. Devcontainers: Devcontainers enable you to develop in docker containers with the VSCode. Define your extensions, settings, profiles, etc in the devcontainer.

Devcontainers enable you to work directly on the files inside the docker container with the Visual Studio Code you are used to. Here is a sample devcontainer config that I use to install jupyter and python extensions and install requirements inside the container with pip install.

4. Extensions overview

While we have a lot of extensions, let’s look at the typical high-value ones:

1. Gitlens

Visualize git changes in VSCode. Gitlens

2. Python testing and debugging

Execute Python tests with the option to debug them. Python test & debug

3. ruff

Automatically clean up your code and format it. Ruff

4. SQL Tools

Connect to most databases and format SQL code. SQL Tools

5. Jupyter Notebook

Run jupyter notebook inside VSCode. Jupyter

6. Data Wrangler

Interactively transform your data and generate pandas transformation code. Data Wrangler

7. AutoDocString

Generate documentation for your class/function by typing """ under its definition. autoDocstring

8. Rainbow CSV

Sometimes, you want to inspect a csv without having to use cut or other such tools. Rainbow csv

9. DBT Power User

Run dbt commands via UI, render lineage, and docs inside Visual Studio Code. DBT power user

5. Privacy, Performance, and Cognitive Overload

I recommend understanding a tool in depth (read the docs/settings) to know how it works and use it for a few months before adding more. Unwanton addition of extensions can lead to cognitive overload.

Security of extensions, most extensions are not verified. Microsoft offloads the responsibility to the user with this prompt:

Security nightmare

Performance cost: Every extension is a typescript/Javascript app running in the background.

Performance concerns

In the above screenshot, code . represents VSCode. Note all the sub-processes (with different process ids PID) that get created and their memory usage!

6. Conclusion

VSCode is an excellent IDE, primarily due to its extensive list of extensions. If you are unhappy with your setup and feel it could be better, use this Data Engineering Profile.

If you use Neovim, check out my NeoVim config here .

What other extensions do you recommend? Please let me know in the comment section below.

  1. DBT power user

If you found this article helpful, share it with a friend or colleague using one of the socials below!

Land your dream Data Engineering job!

Overwhelmed by all the concepts you need to learn to become a data engineer? Have difficulty finding good data projects for your portfolio? Are online tutorials littered with sponsored tools and not foundational concepts?

Learning data engineer can be a long and rough road, but it doesn't have to be!

Pick up any new tool/framework with a clear understanding of data engineering fundamentals. Demonstrate your expertise by building well-documented real-world projects on GitHub.

Sign up for my free DE-101 course that will take you from basics to building data projects in 4 weeks!

Join now and get started on your data engineering journey!

    Testimonials:

    I really appreciate you putting these detailed posts together for your readers, you explain things in such a detailed, simple manner that's well organized and easy to follow. I appreciate it so so much!
    I have learned a lot from the course which is much more practical.
    This course helped me build a project and actually land a data engineering job! Thank you.

    When you subscribe, you'll also get emails about data engineering concepts, development practices, career advice, and projects every 2 weeks (or so) to help you level up your data engineering skills. We respect your email privacy.

    M ↓   Markdown