Finding Snowflake tasks that silently stopped running

Snowflake tasks are great... until one stops running and nobody notices.

Finding Snowflake tasks that silently stopped running

Snowflake tasks are great - I use them for scheduling virtually everything in my lakehouse... until one stops running and nobody notices.

This investigation started with a straightforward question: what used to run but doesn't anymore? The answer turned out to be surprisingly hard to get, because the real difficulty isn't retrieving the task run history (from SNOWFLAKE.ACCOUNT_USAGE.TASK_HISTORY). It's figuring out whether a task still exists at all.

This post walks through the process of tracking down those silently stopped tasks and ends with the SQL stored procedure that made it reliable.

Why ACCOUNT_USAGE is not enough

At first glance, SNOWFLAKE.ACCOUNT_USAGE looked like it should answer everything: TASK_HISTORY records when tasks ran. Until I spotted many tasks I dropped weeks ago. If I dropped a task, I kind of presumed it to not be running anymore πŸ˜… So I needed to find a way to only get the history for non-deleted tasks.

Unfortunately there is no ACCOUNT_USAGE.TASKS view with a DELETED flag or something like this to query the horizon catalog across all databases and schemas.

TASK_VERSIONS looked like it should reveal which tasks are still current. But closer inspection revealed another gap: TASK_VERSIONS.STATE turns out to be NULL for both existing and deleted tasks.

The only globally available reliable source of truth for whether a task currently still exists I found turned out to be the SHOW TASKS command.

Why this needs a stored procedure

SHOW TASKS comes with two constraints that made a direct query approach unworkable: It's (1) a SHOW command, not a normal SELECT. Its output is only accessible via RESULT_SCAN(<query_id>). That meant FROM (SHOW TASKS) inside a CTE doesn't work. The correct query_id has to be referenced explicitly when calling RESULT_SCAN. And (2) calling RESULT_SCAN(LAST_QUERY_ID()) directly inside a complex query is fragile - it easily breaks the moment the query structure changes.

A stored procedure allows the execution sequence to be controlled explicitly: SHOW TASKS runs first, the result is captured, and only then does the query logic execute against it. Wrapping the logic in a SQL stored procedure was the solution giving full control over execution order and returning a clean result set.

The p_get_inactive_tasks procedure

The goal was clear: Find tasks that ran 29-56 days ago, have not run in the last 28 days, and still exist in the account. 28 days cover a monthly job cycle (in case there are any monthly tasks) and the 29–56 day lookback window gives enough history to catch tasks that ran recently but have since gone quiet.

The interface is intentionally simple:

call meta.log.p_get_inactive_tasks();

Returned columns: database_name, schema_name, task_name, last_run (latest run between 29 and 56 days ago).

Here is the full implementation:

create or replace procedure meta.log.p_get_inactive_tasks()
returns table (
      database_name     varchar
    , schema_name       varchar
    , task_name         varchar
    , last_run          timestamp_ltz
)
language sql
as
$$
declare
    res resultset;
begin
    show tasks in account;
    
    create or replace temporary table meta.log.t_current_tasks as
        select
              "database_name"   as database_name
            , "schema_name"     as schema_name
            , "name"            as task_name
        from table(result_scan(last_query_id(-1)))
    ;
    
    res := (
        with tasks_ran_past_28_days as (
            select distinct
                  database_name
                , schema_name
                , name                  as task_name
            from snowflake.account_usage.task_history
            where   scheduled_time >= dateadd(day, -28, current_timestamp())
        )
        , tasks_ran_29_to_56_days_ago as (
            select distinct
                  database_name
                , schema_name
                , name                  as task_name
                , max(scheduled_time)::timestamp_ltz   as last_run
            from snowflake.account_usage.task_history
            where   scheduled_time >= dateadd(day, -56, current_timestamp())
                and scheduled_time <  dateadd(day, -28, current_timestamp())
                and database_name != 'SNOWFLAKE'
            group by all
        )
        select
              t.database_name
            , t.schema_name
            , t.task_name
            , t.last_run
        from tasks_ran_29_to_56_days_ago t
            inner join meta.log.t_current_tasks ct
                on  ct.database_name = t.database_name
                and ct.schema_name   = t.schema_name
                and ct.task_name     = t.task_name
            left join tasks_ran_past_28_days r
                on  r.database_name = t.database_name
                and r.schema_name   = t.schema_name
                and r.task_name     = t.task_name
        where   r.task_name is null
        order by
              t.last_run desc
    );
    
    return table(res);
end
$$;

How the logic works

The procedure does three things.

1. Capture the current tasks

First the procedure runs SHOW TASKS IN ACCOUNT and immediately captures its results into a temporary table via RESULT_SCAN(LAST_QUERY_ID(-1)). This gives a snapshot of all tasks that exist at the moment the procedure runs. Everything else is joined against this table.

2. Compute task activity windows

Inside the WITH clause, two CTEs define the activity windows:

tasks_ran_past_28_days captures any task that ran at least once in the last 28 days. tasks_ran_29_to_56_days_ago captures tasks that last ran between 29 and 56 days ago, excluding the SNOWFLAKE system database itself.

3. Filter to "inactive but existing"

In the final SELECT:

An inner join with t_current_tasks removes every task that no longer exists. A left join with tasks_ran_past_28_days combined with WHERE r.task_name IS NULL keeps only tasks with no recent runs.

The result is a list that is actually actionable: every row represents something that can either be fixed, suspended intentionally, or deleted.

Putting it into the ops loop

Of course I don't want to manually call this procedure every now and then. I would miss it once and tasks go to sleep silently again. To wire this into my operational routine, I created a second procedure to call the first and, if the first returns any stopped tasks, send a notification to Teams (yeah, yeah, I know... πŸ™„)

create or replace procedure meta.log.p_notify_inactive_tasks()
returns string
language javascript
as
$$
    var initialText = "Inactive tasks (no runs in past 28 days): ";
    var resultText = initialText;
    var taskCount = 0;
    var tasks = [];

    var query = `
        call meta.log.p_get_inactive_tasks()
    `;

    var statement = snowflake.createStatement({sqlText: query});
    var result = statement.execute();

    while (result.next()) {
        var task = result.getColumnValue(1) + "." + result.getColumnValue(2) + "." + result.getColumnValue(3) 
                   + " (last run: " + result.getColumnValue(4) + ")";
        tasks.push(task);
        taskCount++;
    }

    if (taskCount > 0) {
        resultText += tasks.join(" + ");
        var sendResult = snowflake.execute({
            sqlText: "call SYSTEM$SEND_SNOWFLAKE_NOTIFICATION(SNOWFLAKE.NOTIFICATION.TEXT_PLAIN(SNOWFLAKE.NOTIFICATION.SANITIZE_WEBHOOK_CONTENT(?)), SNOWFLAKE.NOTIFICATION.INTEGRATION('i_teams_webhook'))",
            binds: [resultText]
        });
        return "Notification sent: " + resultText;
    } else {
        return "No inactive tasks found.";
    }
$$;

This procedure uses JavaScript because string manipulation is just so much easier there than in SQL 😜 And it utilizes an outbound notification integration:

CREATE NOTIFICATION INTEGRATION (webhooks) | Snowflake Documentation

I don't include a guide on how to set this up for Teams here, because this guide already exists:

Sending Alerts to MS Teams from Snowflake
Jeff provides a detailed walkthrough of how to send alerts to Teams from Snowflake

And the final piece of the puzzle is a task to frequently trigger the second procedure:

create or replace task meta.log.ta_inactive_tasks
    user_task_managed_initial_warehouse_size = 'XSMALL'
    serverless_task_max_statement_size = 'XSMALL'
    schedule = 'USING CRON 0 9 * * MON Europe/Zurich'
as
    call meta.log.p_notify_inactive_tasks()
;
alter task meta.log.ta_inactive_tasks resume;

And that's it: Now tasks can't silently disappear anymore 😎