Use Perplexity in Snowflake

Send requests to (and receive responses from) the Perplexity API using a Snowflake Snowsight notebook

Use Perplexity in Snowflake

Snowflake Cortex is cool, isn’t it? Using LLMs inside SQL - sounds pretty useful!

Still, I always felt like something is missing... Something critical... The capability to search. Like perplexity.ai can do. Buuut… Perplexity has an API. And Snowflake can call… well, well, well!

Technically, it doesn't search on the fly, but the Sonar models utilize up-to-date crawl results

So you have a task at hand on which you would like to use Perplexity’s search capability, say you want to check whether the company addresses in your database are up to date. Let’s do this with a Snowsight notebook, extending this to a UDF should not be too hard from there.

Sure, you could ask an address broker to verify the addresses for you - that’s what they are good at, right? But then you wouldn’t have this particular reason to build a Perplexity integration inside Snowflake, so bear with me 😜

1. get the data ready

Obviously we’ll need some data to work with, so let’s create a table containing some addresses. In my case, I want to use addresses of Salesforce Accounts synced to Snowflake:

create or replace table sandbox.public.addresses_to_verify as
select
    id
    , concat(
        '1. company name: ', name
        , '\n2. address: ', coalesce(billingstreet, ''), ', ', coalesce(billingpostalcode, ''), ' ', coalesce(billingcity, ''), ', ', coalesce(billingcountry, '')
        , case when website is not null then concat('\n3. website: ', website) else '' end
    ) as verify_this
from raw.salesforce.account
where ... 
    and isdeleted = false
limit 10
;

2. prepare the notebook

Sending requests to an external endpoint from inside Snowflake requires an external access integration - let’s create it:

You’ll need Account Admin privileges to do so!
create or replace network rule nr_perplexity
mode = 'EGRESS'
type = 'HOST_PORT'
value_list = ('api.perplexity.ai:443');

create or replace external access integration i_perplexity
allowed_network_rules = (nr_perplexity)
enabled = true;

grant usage on integration i_perplexity to role sysadmin;

Grant usage to whichever role you’re using to run the notebook. And in the notebook’s setting, add the integration:

Since we’ll send requests to Perplexity, we’ll need the requests package:

3. the actually interesting part

Now the actual notebook fun begins 🤓

We’ll first fetch the data from our prepared table:

Note how I use upper case column names here - my Snowflake account is set to store column names case-insensitively… so technically everything in Snowflake is upper-case unless explicitly defined otherwise.
from snowflake.snowpark.session import Session

# Get the active session
session = Session.builder.getOrCreate()

# Fetch data from the table
df = session.table("sandbox.public.addresses_to_verify").select("ID", "VERIFY_THIS")

Next, prepare the function to send a request to Perplexity:

import requests

def send_to_perplexity(text):
    api_key = "pplx-..."  # Replace with your actual API key
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    data = {
        "model": "llama-3.1-sonar-small-128k-online",
        "messages": [
            {"role": "system", "content": """You are a marketing assistant verifying the address of a company. You will receive the following information:
1. company name
2. address
3. website - this is optional and sometimes missing

Your task is to verify if the address of the company is correct. Please return the following information for the company you receive:
"address_is_correct": boolean - if the address for this company is correct, return TRUE, else FALSE
"updated_address": string - the correct address of the company

Return the information as a JSON object with the 2 keys "address_is_correct" and "updated_address" and their corresponding values.

Only return the JSON without any comments, notes, reasoning or explanation."""},
            {"role": "user", "content": f"{text}"}
        ]
    }
    response = requests.post("https://api.perplexity.ai/chat/completions", headers=headers, json=data)
    return response.json()['choices'][0]['message']['content']

In here you’ll have to add your Perplexity API key and pick the model you want to use. Also (obviously) adjust the system prompt to whatever you want Perplexity to do for you.

And then, we’ll process each row of our table:

verified_addresses = df.select("ID", "VERIFY_THIS").to_pandas().apply(lambda row: (row['ID'], send_to_perplexity(row['VERIFY_THIS'])), axis=1).tolist()

This will take a while as rows are processed sequentially. I tested 4000 rows and it took about 2 hours (And cost about 20 USD in Perplexity credits). Not the best performance, sure, but the result was worth the wait 😜 This is the reason why this whole endeavor is (currently and i.m.h.o.) not suitable for frequent pipelines, but mostly for ad-hoc operations.

Finally, once all rows are processed, they are stored in another table in Snowflake:

from snowflake.snowpark.functions import lit

# Create a new DataFrame with verified addresses
verified_df = session.create_dataframe(verified_addresses, schema=["ID", "VERIFIED_ADDRESS"])

# Write the results to the destination table
verified_df.write.mode("overwrite").save_as_table("sandbox.public.verified_addresses")

The only thing left to do at this point is to try_parse_json() the verified_address column in this output table and continue to process the result in whichever way you want.

Maybe after extracting the substring between the first opening and the last closing curly parenthesis, depending on how strict the chosen model follows the prompt