How to reference a seed from a different dbt project?
- 1. Introduction
- 2. Ways to reuse seed data across multiple dbt projects
- 3. dbt deps = download all dependency packages to your local dbt_packages folder
- 4. Conclusion
- 5. Further reading
1. Introduction
If your company has multiple dbt projects, you would have had to use code cross projects. Creating cross-project dependencies is not straightforward in a SQL templating system like dbt.
If you are wondering
How to use seed data defined in one dbt project in another
How dbt packages work under the hood
Caveats to be aware of when using assets cross-projects
This post is for you. In this post, we will go over how to use packaging in dbt to reuse assets and how packaging works under the hood. By the end of this post, you will know how to access seed data cross-projects.
2. Ways to reuse seed data across multiple dbt projects
2.1. Code setup
Use this repository to follow along: import_dbt .
2.1.1. Prerequisites
Clone and cd into this repo:
git clone https://github.com/josephmachado/import_dbt.git
cd import_dbt
2.1.2. Setup project environment
We will download the package project_1
into the dbt project project_2
.
cd project_2
# remove virtual env files and duck db files from prior runs
rm -rf myenv
rm -rf *.duckdb
# set up venv
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
source myenv/bin/activate
# Clean out old dbt files
dbt clean
2.2. Turn the source repo into a dbt package
You can create a package if you have a dbt project . With a defined package, other dbt projects can reuse your models, seed data, and macros (e.g., dbt utils ).
2.2.1. Define package version in dbt_project.yml
Before you create a package, you need to define its version. Most software packages follow semantic versioning to ensure backward portability and indicate when there may be breaking changes, and dbt recommends it as well.
Define the package name and its version in your dbt_project.yml file.
We can see in our project_1 we define the package details in its dbt_project.yml file
name: project_1
version: 0.0.1
config-version: 2
...
2.2.2. Store your package for other dbt projects to reference
After defining the package name and version, you need to store your dbt package in a location accessible to other dbt projects that may reference your package.
dbt packages support storing in
- dbt package hub: dbt’s package hub is public. You can reference a package with a unique name.
- a git server: You can directly reference a git link from the hosted git repo. E.g., you can use Github/Gitlab. You need to reference the package git URL.
- Company’s internal code hosting service: Most companies host packaged code (e.g., jar, etc.) on an internal server. You can also use this option.
- Local filesystem: This option is available if your projects are in the same repo (or in your local file system). You can reference the dbt package location using the package’s relative path.
We can see the reference to our project_1
dbt package in project_2’s packages.yml
.
packages:
- local: ../project_1
Since we are using a local filesystem to access our dbt package, we can see a symlink to it after running dbt deps
in project_2.
dbt deps
Note that you need to do a release when you host your package on the dbt package hub, git server, and your inter
Now run dbt in project_2
dbt seed
dbt run
To verify the use of project_1’s data in project_2’s pipeline, do the following:
duckdb dbt.duckdb
select * from stg_package_seed; -- This is a model in project_2 using seed from project_1
2.3. Use project dependencies (dbt enterprise only)
dbt Cloud Enterprise’s project dependency feature lets you directly reference models from other dbt projects.
dbt cloud does this by running a separate service that uses all your dbt projects’ manifest files to identify which project has a specific model.
2.4. Store seed data in your warehouse
A straightforward option would be to store the seed data in a static table in your data warehouse. Note that you will need to access this table when you require it.
Storing data in a warehouse may not always be the best option, mainly if the seed data is solely for testing.
3. dbt deps = download all dependency packages to your local dbt_packages folder
When you run dbt deps
, dbt will download the required packages to a local folder called dbt_packages
.
4. Conclusion
To recap, we saw:
- Using packages to use assets cross-pojects
- Using project dependencies (dbt cloud enterprise plan only)
- What running
dbt deps
actually does
If your use case is simple, consider using the package dependency model. If it is complex (rare), consider using the dbt project option from the dbt cloud enterprise.
While the project approach will reduce your work, note if not used sparingly, it will result in complexity and make it hard to migrate off the dbt cloud to internal management of dbt projects.
Please let me know in the comment section below if you have any questions or comments.
5. Further reading
If you found this article helpful, share it with a friend or colleague using one of the socials below!