Running SQL queries on CSV data

March 16, 2024

I was recently working with some CSV files pulled from a Samsung Health dump. I wanted to run a few queries on the data in order to familiarize myself with their format and contents. SQL seemed like a natural choice, and it turns out there is a myriad of different ways to directly query CSV data with SQL.

A few notes:

The first line of the file is extraneous information that Samsung Health includes for its own purposes. Hence the tail -n +2 command.
It turns out that there is one more header value than in the data rows themselves. Some of these tools are not particularly happy about this, and so the sed command removes the very last (empty) header value.

q

tail -n +2 device_profiles.csv \
  | q -d, -H -b -O -T 'select * from -' \
  | less -S

trdsql

tail -n +2 device_profiles.csv \
  | trdsql -ih -icsv -oat 'select * from -' \
  | less -S

sqlite3

tail -n +2 device_profiles.csv \
  | sqlite3 -csv :memory: '.import /dev/stdin stdin' '.mode table' 'select * from stdin' \
  | less -S

duckdb

tail -n +2 device_profiles.csv \
  | sed 's@,$@@' \
  | duckdb -table -cmd 'create table stdin as select * from read_csv_auto("/dev/stdin"); select * from stdin;' \
  | less -S

csvlook

tail -n +2 device_profiles.csv \
  | sed 's@,$@@' \
  | csvsql --query 'select * from stdin' \
  | csvlook \
  | less -S