sql – We Saw a Chicken …

The cryptically-named q (it also bills itself as being able to â€œRun SQL directly on CSV files | Text as Dataâ€) is very nifty indeed. It allows you to run SQL queries on delimited text files. It seems to support the full SQLite SQL dialect, too.

I used to frequently query the IESO‘s Hourly Wind Generator Output report (which now hides behind a JS link to obscure the source URL, http://www.ieso.ca//imoweb/pubs/marketReports/download/HourlyWindFarmGen_20160122.csv).Â Now that the file has nearly 10 years of hourly data and many (but not all) wind projects, it may have outlived its usefulness. But it does allow me to show off some mad SQLite skills â€¦

The first problem is that the file uses nasty date formats. Today would be 23-Jan-16 in the report’s Date field, which is filled with the ugh. You can fix that, though, with a fragment of SQL modified from here:

printf("%4d-%02d-%02d", substr(Date, 8,2)+2000, (instr("---JanFebMarAprMayJunJulAugSepOctNovDec", substr(Date, 4,3))-1)/3, substr(Date, 1, 2)) as isodate

The above data definition sets the isodate column to be in the familiar and useful YYYY-MM-DD ISO format.

A related example would be to query the whole CSV file for monthly mean generation from Kingsbridge and K2 Wind projects (they’re next to one another) for months after K2’s commissioning in March 2015. Here’s what I did in q:

q -T -O -H -d, 'select printf("%4d-%02d", substr(Date, 8,2)+2000, (instr("---JanFebMarAprMayJunJulAugSepOctNovDec", substr(Date, 4,3))-1)/3) as isomonth, avg(KINGSBRIDGE) as kavg, avg(K2WIND) as k2avg from Downloads/HourlyWindFarmGen_20160122.csv where isomonth>"2015-03" group by isomonth'

which gave the results:

isomonthÂ Â  Â kavgÂ Â  Â k2avg
2015-04Â Â  Â 12.7277777778Â Â  Â 37.4569444444
2015-05Â Â  Â 8.94623655914Â Â  Â 67.6747311828
2015-06Â Â  Â 6.05833333333Â Â  Â 66.6847222222
2015-07Â Â  Â 3.96370967742Â Â  Â 45.372311828
2015-08Â Â  Â 6.34811827957Â Â  Â 67.436827957
2015-09Â Â  Â 7.29027777778Â Â  Â 79.7194444444
2015-10Â Â  Â 14.5658602151Â Â  Â 128.037634409
2015-11Â Â  Â 15.9944444444Â Â  Â 130.729166667
2015-12Â Â  Â 17.6075268817Â Â  Â 152.422043011
2016-01Â Â  Â 19.6408730159Â Â  Â 163.013888889

Neat! (or at least, I think so.)

I have a GIS problem:

There are two layers in the map above:

the green areas represent land lots, with owners Arnold, Beal, Carr, …
the red points represent wind turbines, or really anything that needs to be identified inside a lot.

While I used to be quite decent with plain ol’ SQL, the spatial extensions (such as are in SpatiaLite) are nipping my heid. What I need to query:

How many turbines does each land owner get?
Are there any turbines that aren’t in a lot?

Here’s the data, and I’ll show you a solution once I work it out.

Update: dang, the SpatiaLite tutorial is useful. Here’s how to get the count for each owner:

select Properties.owner, count(*) from Points, Properties where within(Points.geometry, Properties.geometry) group by Properties.owner;

which gives:

Owner	count(*)
Arnold	1
Beal	2
Dockrill	2

Nifty, what!? Now to work out the other one, which I suspect will use a ‘not within()’.

Update #2: Nope, the SpatiaLite mailing list came through:

SELECT id,
       unitid
FROM   points
WHERE  id NOT IN (SELECT points.id
                  FROM   points,
                         properties
                  WHERE  Within(points.geometry,properties.geometry));

I have to admit that I still don’t really think of nested SELECTs – my SQL formative years were spent on a DB that didn’t support them.

Tag: sql

q absolves data sins and makes CSV queries easy

diary of a geonumpty