User-Agent strings encode information about users that visit sites,
including things like OS, device model, version numbers, etc.
Some examples:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) RockMelt/0.8.36.78 Chrome/7.0.517.44 Safari/534.7
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; ja-jp) AppleWebKit/533.16 (KHTML, like Gecko) Version/4.1 Safari/533.16
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; de-de) AppleWebKit/533.16 (KHTML, like Gecko) Version/4.1 Safari/533.16
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/533.4 (KHTML, like Gecko) Version/4.1 Safari/533.4
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; nb-no) AppleWebKit/533.16 (KHTML, like Gecko) Version/4.1 Safari/533.1
Parsing these strings from log data can often get complicated. When
processing this kind of data within a data warehouse context, you likely
find yourself trying to cobble together a nested mess of string parsing
functions and conditional logic to account for any possible string you
may encounter. This is a great use case for User Defined
Functions (UDFs), which typically allow one to define custom
routines that can be used like native instructions in-database.
Depending on the database engine, these may just be essentially SQL
macros, or they may allow you to leverage other languages like
Javascript or Python.
Even with UDFs, the amount of logic needed to define robust UA
parsing functions can be intimidating to manage. Fortunately there are
already many comprehensive parsing libraries out there that account for
all the different formats and structures one may run into when parsing
UA strings. Using Snowflake
as an example, we can define a UDF that interfaces with a UA
parser:
Once the UDF has been defined, we can use it within our queries like
this:
Using dbt we can define a macro
that will create this UDF in our data warehouse. I authored a dbt packge
that does exactly that: https://github.com/andrewcstewart/dbt-uaparser
This dbt
package implements a User Defined Function that
leverages the UA parsing code from https://faisalman.github.io/ua-parser-js/. You can find
the documentation here.
Footnotes
Citation
For attribution, please cite this work as
Stewart (2022, July 1). Andrew Stewart: v0.1.0 of dbt-uaparser released. Retrieved from https://andrewcstewart.github.io/posts/2022-07-01-v010-of-dbt-uaparser-released/
BibTeX citation
@misc{stewart2022v0.1.0,
author = {Stewart, Andrew},
title = {Andrew Stewart: v0.1.0 of dbt-uaparser released},
url = {https://andrewcstewart.github.io/posts/2022-07-01-v010-of-dbt-uaparser-released/},
year = {2022}
}