How I Study Open Source Community Growth with dbt
Most organizations spend at least some of their time contributing to an open source project. 100% of them, though, depend in some way on the output of open source communities.
Find tutorials, product updates, and developer insights in the dbt Developer blog.
Start hereMost organizations spend at least some of their time contributing to an open source project. 100% of them, though, depend in some way on the output of open source communities.
💾 This article is for anyone who has ever questioned the sanity of a date not in ISO 8601 format
Have you ever been assigned to add new fields or concepts to an existing set of models and wondered:
Why are there multiple models named almost the same but slightly different?
Which model has the fields I need?
Which model is upstream or downstream from which?
I’ve used the dateadd SQL function thousands of times.
I’ve googled the syntax of the dateadd SQL function all of those times except one, when I decided to hit the "are you feeling lucky" button and go for it.
In switching between SQL dialects (BigQuery, Postgres and Snowflake are my primaries), I can literally never remember the argument order (or exact function name) of dateadd.
This article will go over how the DATEADD function works, the nuances of using it across the major cloud warehouses, and how to standardize the syntax variances using dbt macro.
It is a thankless but necessary task. In SQL, often we’ll need to UNION ALL two or more tablesIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells. vertically, to combine their values.
Hi there,
Before I get to the goods, I just wanted to quickly flag that Coalesce is less than 3 weeks away! 😱 If you had to choose just ONE of the 60+ sessions on tap, consider Tristan's keynote with A16z's Martin Casado.
It has two of my favorite elements:
Spice 🌶️
Not-actually-about-us 😅
Martin and Tristan will discuss something we've all probably considered with the latest wave of innovation (and funding) in our space:
Is the modern data stack just another wave in a long string of trendy technologies, or is it somehow more permanent?
Hear their take, and share your own by registering here.
Hello there,
While I have a lot of fun things to share this month, I can't start with anything other than this:
Yep, it's official:
💥dbt will support metric definitions💥
With this feature, you'll be able to centrally define rules for aggregating metrics (think, "active users" or "MRR") in version controlled, tested, documented dbt project code.
Hello there,
Do you remember? The 21st day of September? 🎶 Course you do it was two days ago. Well that's a win in your bucket and the day's barely begun! So let's get a win for someone else -- like Jeremy Cohen, the dbt Core product manager.
I'm sure you know that half of the updates in this email are pushed automatically when we upgrade everyone to the latest version of dbt Cloud 🚀
But did you know the other half requires you (or your account admin) to actively switch to the latest version of dbt Core? 😱 If this isn't happening regularly (how-to video here), you may miss out on major improvements to performance, stability, and speed.
Give Jeremy a win and check out the blog he just posted on why this matters even more leading up to 💥dbt v1.0💥. While we're throwing W's, don't forget to also register for his talk at Coalesce now!
At dbt Labs, as more folks adopt dbt, we have started to see more and more use cases that push the boundaries of our established best practices. This is especially true to those adopting dbt in the enterprise space.
After two years of helping companies from 20-10,000+ employees implement dbt & dbt Cloud, the below is my best attempt to answer the question: “Should I have one repository for my dbt project or many?” Alternative title: “To mono-repo or not to mono-repo, that is the question!”
Since this blog post was first published, many data platforms have added support for materialized views, which are a superior way to achieve the goals outlined here. We recommend them over the below approach.
Before I dive into how to create this, I have to say this. You probably don’t need this. I, along with my other Fishtown colleagues, have spent countless hours working with clients that ask for near-real-time streaming data. However, when we start digging into the project, it is often realized that the use case is not there. There are a variety of reasons why near real-time streaming is not a good fit. Two key ones are:
So when presented with a near-real-time modeling request, I (and you as well!) have to be cynical.
If you’ve been using dbt for over a year, your project is out-of-date. This is natural.
New functionalities have been released. Warehouses change. Best practices are updated. Over the last year, I and others on the Fishtown Analytics (now dbt Labs!) team have conducted seven audits for clients who have been using dbt for a minimum of 2 months.