robfelty.com


Exploring querying parquet with Hive, Impala, and Spark

Posted in wordpress

Quarter note = 11202015 robfelty
Treble clef 4/4 Time
At Automattic, we have a lot of data from WordPress.com, our flagship product. We have over 90 million users, and 100 million blogs. Our data team is constantly analyzing our data to discover how we can better serve our users. In 2015, one of our big focuses has been to improve the new user experience. As part of this we have been doing funnel analyses for our signup process. That is, for every person who starts our signup process, what […] (Read more)

(Un)verified

Posted in wordpress

Quarter note = 08062015 robfelty
Treble clef 4/4 Time
According to my city’s website to pay my water bill, I am both a verified and an unverified user. Not sure how that it is possible

I am now an automattician!

Posted in wordpress

Quarter note = 04202015 robfelty
Treble clef 4/4 Time
Today is my first official day at Automattic. I am excited!

UNIX tip – xargs with multiple commands

Posted in UNIX

Quarter note = 04012015 robfelty
Treble clef 4/4 Time
Xargs is an extremely powerful complement to the awesome find command. One downside is that you usually need to have a single pipeline. By default you can’t put together a bunch of commands which are not piped. However, it is possible to call a shell with xargs. In this way, you can execute multiple commands in this shell, but from xargs point of view, it is calling a single command – the shell interpreter. More details here: bash – xargs […] (Read more)

Postgres tip of the day – show size of all databases

Posted in sql

Quarter note = 03262015 robfelty
Treble clef 4/4 Time
Here is a handy little query to show the size of all the databases on a particular postgres server: SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database; datname | size --------------+--------- template1 | 6369 kB template0 | 6361 kB postgres | 6589 kB foo | 55 MB bar | 5129 MB foobar | 85 GB (6 rows)