dbt Sources #
Sources are the raw data tables that dbt builds upon.
Definition #
Sources get defined in the dbt_project.yml
file.
source-paths: ["models"]
^ The path to source data can be changed.
The path structure might look like:
./models/<source directory>/schema.yml
Where <source directory>
might be something like Finance, Ops, Sales, Customers, etc. Anything really.
The schema.yml
in the source directory:
- Describes the
Example of a schema.yml
file:
version: 2 #why '2'?
sources:
- name: <source name, often matches database name> # essential; everything else is optional
description: '<text>'
database: <database name, in BigQuery, Snowflake, etc.>
schema: <schema name>
tables:
- name: <table name> # essential
description: '<text>'
columns: ...
Next to the schema.sql
is a model, like <data model>.sql
, such as ./models/<source directory>/<data model>.sql
References #
Sources can be referenced later on in data models.
select * from {{ source(<source_name>, <table_name>)}}
Where source_name
and table_name
are named in the schema.yml
discussed above. Be sure that these are truly unique.
The whole thing might look like:
with <cte_alias> (
select * from {{ source(<source_name>, <table_name>)}}
),
final as (
select * from <cte_alias>
)
select * from final