Thursday, November 28, 2024

Snowflake's REDUCE Function to Glean Insights from SEC Filing Data

 Summary

  • What is the REDUCE Higher-order Function?
  • Learn the JSON structure of the SEC company filing from an example.
  • What fiscal end periods are represented in the JSON document?
  • Answer the question using the REDUCE higher-order function in Snowflake.

What is the REDUCE Higher-order Function?

Recently, Snowflake made REDUCE Higher-order function generally available. This function adds another powerful, easy-to-use tool to your toolkit to process arrays. The REDUCE function allows you to accumulate values across an array into a single value. It takes an array as input, an initial accumulator value, and a Lambda expression that defines the logic for processing each array element.

REDUCE( <array> , <init> , <lambda_expression> ) 

The JSON Structure of the SEC Filing

My goal is to understand the cash carried by Kimberly-Clark Corporation in its balance sheet. The company is known for its products, such as Huggies and Cottonelle. I want to list all the fiscal end dates in the data. There can be inconsistencies in the data filed with the SEC, especially concerning the fiscal periods represented, so knowing what fiscal periods are in the data can be invaluable. Also, the SEC filing may have repeated data. This is because investors wish to compare current results with past results, so a Q2 report should include Q1 and Q2 data from the previous year. So, the SEC filing would have repetitions.

Note: You can learn about my External Table structure here. In my LinkedIn profile, you can read a series of blogs about my setup to query SEC filings.

Here's the JSON structure we will use in the REDUCE function:

{ 
"cik": 55785, 
"description": "Amount of currency on hand as well 
as demand deposits with banks or financial institutions. 
Includes other kinds of accounts that have the general characteristics of 
demand deposits.", 
"entityName": "KIMBERLY-CLARK CORPORATION", 
"label": "Cash and Cash Equivalents, at Carrying Value", 
"tag": "CashAndCashEquivalentsAtCarryingValue", 
"taxonomy": "us-gaap", 
"units":  { 
"USD": [
                  { 
                    "accn": "0001193125-10-038621", 
                    "end": "2006-12-31", 
                    "filed": "2010-02-24", 
                    "form": "10-K", 
                    "fp": "FY", 
                    "frame": "CY2006Q4I", 
                    "fy": 2009, 
                    "val": 361000000 
                  }, 
                  {
                    "accn": "0000055785-09-000026", 
                    "end": "2007-12-31", 
                    "filed": "2009-08-07", 
                    "form": "10-Q", 
                    "fp": "Q2", 
                    "fy": 2009, 
                    "val": 473000000
                  }
               ]
                }
}

What fiscal end periods are represented in the JSON document?

I have created an external table called CONS_STAPLES_CASH_AND_CASH_EQUIVALENTS. For the REDUCE function, the input is the path to the array:

Path to the Array Elements:

VALUE:"units":"USD" 

I wish to get a concatenated string of all the fiscal end periods represented in the JSON document. This is represented by the "end" key. This is represented in the init parameter as ''. Finally, in the Lambda Expression, the arg1 argument is the accumulator, and the arg2 argument is the current element being processed in the array.

Lambda Expression and the Query:

(arg1, arg2) -> arg1 || ' ' || arg2:"end" || ', '
SELECT 
            TICKER_SYMBOL, 
            VALUE, 
            REDUCE(VALUE:"units":"USD", '',  (arg1, arg2) -> arg1 || ' ' || arg2:"end" || ', ')           
                                                                                             FISCAL_PERIOD_END_DATES 
FROM 
            CONS_STAPLES_CASH_AND_CASH_EQUIVALENTS_ET;

When I execute the query, the REDUCE function retrieves the value for the "end" key, concatenates it to the accumulator, and returns it (Exhibit 1). The screenshot shows that for Kimberly-Clark Corp (KMB), the JSON has data from the fiscal period ending 2006-12-31. But for Target (TGT), the data in the JSON is from 2016-01-30.

Exhibit 1: The Fiscal Period End Date Returned by the REDUCE function.

Snowflake Snowsight

I can also tell that the SEC filing has duplicate data that I must handle in my query. For example, I can see that the 2007-12-31 is represented multiple times in the file I downloaded from the SEC (Exhibit 2).

Exhibit 2: Fiscal Period End Dates Accumulated By the REDUCE Higher-order Function.

SEC.GOV

I can quickly see the data in my JSON files downloaded from the SEC. I did not have to use a LATERAL FLATTEN to get at the data. The REDUCE function boosts my efficiency when I am dealing with JSON data. Try out Snowflake's REDUCE and other Higher-order functions; they will make you more productive.

No comments:

Post a Comment

Retail Inventory Status - 2024 Holiday Season

 The 2024 holiday shopping season has officially started with Black Friday on November 29. I want to get answers to a couple of questions: ...