Snowflake Snapshots

How and why to use Snapshots in Snowflake

Snowflake Snapshots

I’m sure I am not the only one: Every now and then someone approaches me with a question like “I pulled some data from this report some weeks/months ago and when I open the same report now, the numbers changed - what happened?”

Sometimes it’s possible to reconstruct the applied filters and exactly reproduce the numbers, but if too much time passed… well, who remembers what they filtered in a report last week, let alone 6 months ago?

This situation can be quite annoying: it’s difficult to reproduce outputs of a data model when the raw data refreshed several times in between and there might even have happened some adjustment in the data model.

The solution: create a copy of modeled data just to have it available for such situations.

Sure, you can include that in your transformation pipeline (e.g. using dbt), but in case you’re lucky enough to host your models in Snowflake, there now (announced in June 2025 and as of this writing in open preview) is a new kid on the block:

Snapshots for backups and immutable storage

Snapshots represent discrete backups of Snowflake objects. You choose which objects to back up, how frequently to back them up, how long to keep the backups, and whether to add a retention lock so that they can’t be deleted prematurely.

Create snapshots

I happen to arrange my prepared data models in schemas. Hence, taking a snapshot of a whole schema containing all materialized tables (and view definitions if there are any) is my preferred method to backup those models. Of course you could just as well create backups of individual schema objects (tables) or whole catalogs (databases).

First, we’ll create a snapshot policy defining how often snapshots are taken. Depending on the frequency, the maximum retention period differs (during the preview period anyway). For daily snapshots (I don’t refresh my models more often?) you can set up to 1 year retention:

create or replace snapshot policy consume.fundraising.ssp_data_model_daily
schedule = '1440 MINUTE'
expire_after_days = 366
comment = 'Daily backups of the fundraising data model'
;

In theory you could use CRON here, too, but then the retention might have an issue with identifying its maximum value. Or, alternatively, not set a schedule at all and just use a task to create a new snapshot right after the model is refreshed (cf. below).

Next, define the stuff to be included in the snapshots:

create or replace snapshot set consume.fundraising.sss_data_model
for schema consume.fundraising
with snapshot policy consume.fundraising.ssp_data_model_daily
;
create or alter task consume.fundraising.ta_refresh_fundraising_snapshot
user_task_managed_initial_warehouse_size = 'XSMALL'
serverless_task_max_statement_size = 'XSMALL'
user_task_timeout_ms = 600000 -- 10 minutes
target_completion_interval = '10 MINUTES'
allow_overlapping_execution = false
comment = 'Take snapshot of fundraising data model'
after consume.fundraising.ta_refresh_fundraising_model
as
alter snapshot set consume.fundraising.sss_data_model add snapshot
;

And that’s really it. Now every day a snapshot of the model is stored (for a year) and whenever someone want’s to look back in time, you can simply restore a snapshot.

Restore snapshots

First, identify all snapshots currently available:

show snapshots in snapshot set consume.fundraising.sss_data_model
;

This will provide a table of snapshots and their IDs. Pick the ID of whichever snapshot you want to restore and create a new schema from the restored snapshot:

create schema consume.fundraising_<suffix_for_the_restored_snapshot> 
from snapshot set consume.fundraising.sss_data_model 
identifier '<snapshot_id>'
;

If you now point your report to this schema, it will display the data just as it was materialized back when the snapshot was created.

Clean up snapshots

Since you probably don’t need to keep all (daily) snapshots for the whole retention period (though storage costs are presumably not too significant anyway), you might want to clean those up from time to time. I do this using a cleanup task, removing all snapshots older than 90 days, but keeping the first snapshot of a calendar month:

create or replace task consume.fundraising.ta_snapshot_cleanup
user_task_managed_initial_warehouse_size = 'XSMALL'
serverless_task_max_statement_size = 'XSMALL'
schedule = 'USING CRON 57 7 * * 6 Europe/Zurich' -- Saturday, 07:57
user_task_timeout_ms = 600000 -- 10 minutes
target_completion_interval = '10 MINUTES'
allow_overlapping_execution = false
suspend_task_after_num_failures = 10
task_auto_retry_attempts = 1
comment = 'Drop old snapshots of fundraising data model'
as
declare cur cursor for
  select ID
  from consume.information_schema.snapshots
  where snapshot_set_catalog = 'CONSUME'
    and snapshot_set_schema = 'FUNDRAISING'
    and snapshot_set_name = 'SSS_DATA_MODEL'
    and created < dateadd(day, -90, current_timestamp())
  qualify row_number() over (partition by date_trunc(month, created) order by created asc) > 1;
begin    
  for rec in cur do
    execute immediate 'ALTER SNAPSHOT SET CONSUME.FUNDRAISING.SSS_DATA_MODEL DELETE SNAPSHOT IDENTIFIER ''' || rec.ID || '''';
  end for;
end
;

Notes

This method works really well if there are no dependencies outside the snapshot schema. Once again, efficiency depends on discipline: If you stick to a strict layer-architecture during modeling, it’ll be beneficial here.

This snapshot feature created less fuzz than many other features Snowflake released this year. However fancy all the other features are, I believe it’s nevertheless worth noting how such less-fancy improvements can have a significant impact on convenience and efficiency in the daily life of a data engineer/analyst 🥹