How to reference a seed from a different dbt project?

1. Introduction

If your company has multiple dbt projects, you would have had to use code cross projects. Creating cross-project dependencies is not straightforward in a SQL templating system like dbt.

If you are wondering

How to use seed data defined in one dbt project in another

How dbt packages work under the hood

Caveats to be aware of when using assets cross-projects

This post is for you. In this post, we will go over how to use packaging in dbt to reuse assets and how packaging works under the hood. By the end of this post, you will know how to access seed data cross-projects.

2. Ways to reuse seed data across multiple dbt projects

2.1. Code setup

Use this repository to follow along: import_dbt .

2.1.1. Prerequisites

  1. Python

Clone and cd into this repo:

git clone https://github.com/josephmachado/import_dbt.git
cd import_dbt

2.1.2. Setup project environment

We will download the package project_1 into the dbt project project_2.

cd project_2

# remove virtual env files and duck db files from prior runs
rm -rf myenv
rm -rf *.duckdb

# set up venv
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
source myenv/bin/activate

# Clean out old dbt files
dbt clean

2.2. Turn the source repo into a dbt package

You can create a package if you have a dbt project . With a defined package, other dbt projects can reuse your models, seed data, and macros (e.g., dbt utils ).

dbt package

2.2.1. Define package version in dbt_project.yml

Before you create a package, you need to define its version. Most software packages follow semantic versioning to ensure backward portability and indicate when there may be breaking changes, and dbt recommends it as well.

Define the package name and its version in your dbt_project.yml file.

We can see in our project_1 we define the package details in its dbt_project.yml file

name: project_1
version: 0.0.1
config-version: 2
...

2.2.2. Store your package for other dbt projects to reference

After defining the package name and version, you need to store your dbt package in a location accessible to other dbt projects that may reference your package.

dbt packages support storing in

  1. dbt package hub: dbt’s package hub is public. You can reference a package with a unique name.
  2. a git server: You can directly reference a git link from the hosted git repo. E.g., you can use Github/Gitlab. You need to reference the package git URL.
  3. Company’s internal code hosting service: Most companies host packaged code (e.g., jar, etc.) on an internal server. You can also use this option.
  4. Local filesystem: This option is available if your projects are in the same repo (or in your local file system). You can reference the dbt package location using the package’s relative path.

We can see the reference to our project_1 dbt package in project_2’s packages.yml .

packages:
  - local: ../project_1

Since we are using a local filesystem to access our dbt package, we can see a symlink to it after running dbt deps in project_2.

dbt deps

dbt package symlink

Note that you need to do a release when you host your package on the dbt package hub, git server, and your inter

Now run dbt in project_2

dbt seed
dbt run

To verify the use of project_1’s data in project_2’s pipeline, do the following:

duckdb dbt.duckdb
select * from stg_package_seed; -- This is a model in project_2 using seed from project_1

2.3. Use project dependencies (dbt enterprise only)

dbt Cloud Enterprise’s project dependency feature lets you directly reference models from other dbt projects.

dbt cloud does this by running a separate service that uses all your dbt projects’ manifest files to identify which project has a specific model.

2.4. Store seed data in your warehouse

A straightforward option would be to store the seed data in a static table in your data warehouse. Note that you will need to access this table when you require it.

Storing data in a warehouse may not always be the best option, mainly if the seed data is solely for testing.

Store seed in warehouse

3. dbt deps = download all dependency packages to your local dbt_packages folder

When you run dbt deps, dbt will download the required packages to a local folder called dbt_packages.

dbt package symlink

4. Conclusion

To recap, we saw:

  1. Using packages to use assets cross-pojects
  2. Using project dependencies (dbt cloud enterprise plan only)
  3. What running dbt deps actually does

If your use case is simple, consider using the package dependency model. If it is complex (rare), consider using the dbt project option from the dbt cloud enterprise.

While the project approach will reduce your work, note if not used sparingly, it will result in complexity and make it hard to migrate off the dbt cloud to internal management of dbt projects.

Please let me know in the comment section below if you have any questions or comments.

5. Further reading

  1. dbt tutorial
  2. dbt + snowflake setup
  3. CI/CD

If you found this article helpful, share it with a friend or colleague using one of the socials below!