Visual Studio Code (VSCode) extensions for data engineers
- 1. Introduction
- 2. Python environment setup
- 3. VSCode Primer
- 4. Extensions overview
- 5. Privacy, Performance, and Cognitive Overload
- 6. Conclusion
- 7. Recommended reading
1. Introduction
Whether you are setting up visual studio code for your colleagues or want to improve your workflow, tons of extensions are available. If you have wondered
What are the best visual studio code extensions for data engineers?
How do I share my visual studio code environment with my colleagues?
How does Visual Studio code user/workspace/devcontainers/profiles work?
Then this post is for you!
Imagine being able to quickly set up Visual Studio Code on any laptop exactly how you want it. You won’t notice that you are coding on a different machine!
In this post, we will go over Visual Studio Code’s settings hierarchy, how to set up Visual Studio Code on any machine exactly to your liking with profiles, useful extensions for data engineering, and the caveats of unrestricted extensions.
By the end of this post, you will have set up Visual Studio code exactly how you like it and be able to share it with other data engineers. Let’s get started.
TL;DR If you want a setup for data engineering, follow this short video in your project directory:
2. Python environment setup
Before we set up Visual Studio code, we will install Python and use a virtual environment to keep things tidy.
# Install UV at https://docs.astral.sh/uv/getting-started/installation/#standalone-installer
# Select the Python version for your project directory
uv python install 3.13
# Create a project
uv init my-data-pipelines
cd my-data-pipelines
# run a script
uv run main.py
# Running the script will set a virtual env at .venv
# install libraries
uv add polars
Libraries & their version are stored in the project.ml
file.
[project]
name = "my-data-pipelines"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"polars>=1.22.0",
]
3. VSCode Primer
Before we dig into setting up extensions, it’s helpful to understand key components for setting up visual code exactly to your preference:
User & Workspace settings
: Change how visual studio code works using its settings . You can define project-specific settings (aka workspace) and settings for your entire machine (aka user). Note your workspace settings will override your user settings.Extensions
: Tools(paid & free) available for use via the visual studio code marketplace. Extensions add functionality.Profiles
: You can define a list of extensions and settings into a profile that can be shared and used by anyone with Visual Studio code. Profiles let dev teams quickly have the same IDE experience. Here is the link to my Data Engineering Profile . Import this link as shown below.Snippets
: Snippets are keyboard shortcuts that generate boilerplate code. Open the list of available snippets withCtrl + Shift + p
->Snippets: Configure Snippets
. Let’s look at an example to generate atry/except/else/finally
block with ateef
snippet.Devcontainers
: Devcontainers enable you to develop in docker containers with the VSCode. Define your extensions, settings, profiles, etc in the devcontainer.
Devcontainers enable you to work directly on the files inside the docker container with the Visual Studio Code you are used to. Here is a sample devcontainer config
that I use to install jupyter and python extensions and install requirements inside the container with pip install
.
4. Extensions overview
While we have a lot of extensions, let’s look at the typical high-value ones:
1. Gitlens
Visualize git changes in VSCode. Gitlens
2. Python testing and debugging
Execute Python tests with the option to debug them. Python test & debug
3. ruff
Automatically clean up your code and format it. Ruff
4. SQL Tools
Connect to most databases and format SQL code. SQL Tools
5. Jupyter Notebook
Run jupyter notebook inside VSCode. Jupyter
6. Data Wrangler
Interactively transform your data and generate pandas transformation code. Data Wrangler
7. AutoDocString
Generate documentation for your class/function by typing """
under its definition.
autoDocstring
8. Rainbow CSV
Sometimes, you want to inspect a csv without having to use cut
or other such tools.
Rainbow csv
9. DBT Power User
Run dbt commands via UI, render lineage, and docs inside Visual Studio Code. DBT power user
5. Privacy, Performance, and Cognitive Overload
I recommend understanding a tool in depth (read the docs/settings) to know how it works and use it for a few months before adding more. Unwanton addition of extensions can lead to cognitive overload.
❗ Security of extensions, most extensions are not verified. Microsoft offloads the responsibility to the user with this prompt:
❗ Performance cost: Every extension is a typescript/Javascript app running in the background.
In the above screenshot, code .
represents VSCode. Note all the sub-processes (with different process ids PID) that get created and their memory usage!
6. Conclusion
VSCode is an excellent IDE, primarily due to its extensive list of extensions. If you are unhappy with your setup and feel it could be better, use this Data Engineering Profile.
If you use Neovim, check out my NeoVim config here .
What other extensions do you recommend? Please let me know in the comment section below.
7. Recommended reading
If you found this article helpful, share it with a friend or colleague using one of the socials below!