Finally, A Better Date Dimension

star schema

Introduction:

The date dimension might be the most fundamental object in a dimensional data warehouse. It is used by nearly every fact table (often more than once through role-playing) in the warehouse. I have no intention of rehashing the conceptual basics of this object as folks like Ralph Kimball, Chris Adamson, and others have said pretty much everything that needs to be said on this topic. Even though nearly every fact table, query, and report uses this dimension, something has always been “not quite right.” As companies have grown and expanded operations around the world, proper handling of time zones and DST have often been recognized as an important requirement, but due to onerous limitations or outright capability gaps, the solution was far from ideal.

Historically, SQL Server has been no exception, but this all changed with the release of SQL Server 2016 CTP 3.1 and the new AT TIME ZONE clause. This capability, coupled with existing functionality, finally enabled us to build robust date dimensions, with fewer design trade-offs than ever before. Aaron Bertrand posted a great summary of the new capabilities in this post, and Matt Johnson, filled in the remaining details. Matt has probably forgotten more about date and time handling than most people ever knew. Originally, I came across his course on Pluralsight. Deeply impressed, I looked him up, only to discover that we work at the same company. I had a chance to chat with him about my requirements, and he not only confirmed this solution would meet my requirements, but also cleared up a concern I had about the AT TIME ZONE clause from reading Aaron’s blog post (see Matt’s comment in the post for details). The error in Aaron’s post has since been corrected. Matt confirmed that it maintains perfect visibility to Windows time zones for current, past and future dates, all with full DST awareness. Due to underlying dependencies, this solution will only work on SQL 2016 CTP 3.1+ (SQL Server and Azure SQL Database) and as of the time of writing, and will NOT work on Azure SQL Data Warehouse or APS (PDW).  I will share an alternative at the end of the post.

The Solution:

  • Use date values (Date datatype) as the primary/surrogate key in the date dimension, and FK in all fact tables
  • Store all business dates additionally in fact tables as a DateTimeOffset
  • Per best practices, align the fact table to a local perspective through a view layer

What’s with the Date and DateTimeOffset? In my experience, the majority of queries and reports simply want to display a date, or perform lag calculations across one or more milestones. They don’t always need the full power, awesome as it is, for every request.  In an accumulating snapshot fact table, there are many dates that fall into both of these categories. By storing the date foreign keys in the fact tables as actual dates, they can be used directly as dates, and all that implies, without joining to the dimension unless there is a need. To avoid optimizer issues, you may still want to join to the date dimension for filtering, as that is and continues to be a best practice in most cases. Remember to test, don’t assume. Hopefully this recommendation wont’ start one of the now familiar religious wars on whether you should use an integer, smart key, or a date.  I have used them all, and they all have trade-offs.  The main trade-off when using a date datatype is with the unknown member.  Just pick an arbitrarily high or low value, be consistent, and be done with it.  Yes, you will have to be a little careful when filtering.

Storing time stamps as a DateTimeOffset in the fact table has several key benefits:

  1. Provides the ability to store a precise point in time (NOT what a date dimension is for) with full timezone context. This eliminates the need to store these in a standard time to use for calculations AND a local perspective to preserve the original context. For maximum fidelity, you must know the time zone of the incoming data to correctly provide the offset during the ETL process.
  2. Provides a stable context for performing lag calculations (elapsed time across milestones). If you have ever stored datetimes from multiple data sources without properly managing the time zone and/or DST context, you know you can get some very unpredictable results ranging from over or under reporting intervals, to zero or even negative intervals in edge scenarios. Read this post for a humorous account of how date handling can go horribly wrong.
  3. Provides a full fidelity input for deriving the FK to the date dimension. Remember, the rows in the date dimension represent calendar days, not a point in time. To be sure you choose the right day for a precise point in time, you must derive the day from the DateTimeOffset, or at the edges, you could pick the wrong day. This should be done for a default context to store in the fact table during the ETL process. To support multiple local perspectives, you could either store multiple instances of the FKs, 1 set per perspective, or you could provide 1:N views that perform this conversation based on AT TIME ZONE. There may be a performance hit for deriving the FKs at runtime, but depending on how the view is used and your server, this may be acceptable. Always test, never assume.
  4. Serves as the ideal input for usage of the AT TIME ZONE clause.

AT TIME ZONE is even supported in temporal tables (the subject of an upcoming post) which are now available in SQL Server.  Finally, as Aaron mentioned in his post, AT TIME ZONE also supports parameterization for the target time zone, and can be used to convert standard Datetime data types to DateTimeOffset, again all with full DST fidelity.  The DST thing is a big deal.  Its not that you couldn’t build your own solution, but it was cumbersome and tricky to keep the metadata up to date.  If you are not on 2016, this is still required.  See Matt’s project on GitHub for a great “roll your own” solution for SQL Server when you absolutely MUST have full visibility to the IANA database.  Check out NODA Time to do the same thing for .NET.

Hopefully you can see how AT TIME ZONE could benefit your application.  Comments welcome.

Leave a Reply

Your email address will not be published. Required fields are marked *