v0.1.0 of dbt-uaparser released

dbt data engineering packages

User-Agent strings encode information about users that visit sites. This dbt package implements a User Defined Function that leverages UA parsing code that can be used within database queries.

Andrew Stewart
2022-07-01

User-Agent strings encode information about users that visit sites, including things like OS, device model, version numbers, etc.

Some examples:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) RockMelt/0.8.36.78 Chrome/7.0.517.44 Safari/534.7
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; ja-jp) AppleWebKit/533.16 (KHTML, like Gecko) Version/4.1 Safari/533.16
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; de-de) AppleWebKit/533.16 (KHTML, like Gecko) Version/4.1 Safari/533.16
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/533.4 (KHTML, like Gecko) Version/4.1 Safari/533.4
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; nb-no) AppleWebKit/533.16 (KHTML, like Gecko) Version/4.1 Safari/533.1

Parsing these strings from log data can often get complicated. When processing this kind of data within a data warehouse context, you likely find yourself trying to cobble together a nested mess of string parsing functions and conditional logic to account for any possible string you may encounter. This is a great use case for User Defined Functions (UDFs), which typically allow one to define custom routines that can be used like native instructions in-database. Depending on the database engine, these may just be essentially SQL macros, or they may allow you to leverage other languages like Javascript or Python.

Even with UDFs, the amount of logic needed to define robust UA parsing functions can be intimidating to manage. Fortunately there are already many comprehensive parsing libraries out there that account for all the different formats and structures one may run into when parsing UA strings. Using Snowflake as an example, we can define a UDF that interfaces with a UA parser:

Once the UDF has been defined, we can use it within our queries like this:

Using dbt we can define a macro that will create this UDF in our data warehouse. I authored a dbt packge that does exactly that: https://github.com/andrewcstewart/dbt-uaparser

This dbt package implements a User Defined Function that leverages the UA parsing code from https://faisalman.github.io/ua-parser-js/. You can find the documentation here.

Citation

For attribution, please cite this work as

Stewart (2022, July 1). Andrew Stewart: v0.1.0 of dbt-uaparser released. Retrieved from https://andrewcstewart.github.io/posts/2022-07-01-v010-of-dbt-uaparser-released/

BibTeX citation

@misc{stewart2022v0.1.0,
  author = {Stewart, Andrew},
  title = {Andrew Stewart: v0.1.0 of dbt-uaparser released},
  url = {https://andrewcstewart.github.io/posts/2022-07-01-v010-of-dbt-uaparser-released/},
  year = {2022}
}