robfelty.com


Unicode block names in regular expressions

Posted in bash, java, perl, python, regex

Quarter note = 11032014 robfelty
Treble clef 4/4 Time
Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression which will test if a character is contained in a certain block. For all the different possible blocks, see here: Unicode block names for use […] (Read more)

Pretty printing json

Posted in bash, python

Quarter note = 01032014 robfelty
Treble clef 4/4 Time
Here is a really simple way to pretty print some unformatted json $ echo '{"foo": "lorem", "bar": "ipsum"}' | python -mjson.tool { "bar": "ipsum", "foo": "lorem" }

Using awk to sum rows of numbers

Posted in bash, linux, UNIX

Quarter note = 11142013 robfelty
Treble clef 4/4 Time
I have a script which takes a tab-delmited file for regression tests, and converts it xml. I want to do a sanity check, to make sure that the number of utterances in my xml files matches the number in the tab-delimited.txt file. I can do this in 2 lines in UNIX robert_felty$ wc -l samples2.txt 72148 samples2.txt robert_felty$ find . -name '*.xml' | xargs grep -c " (Read more)

Site redesign

Posted in (x)html, bash, css, php, sql, wordpress

Quarter note = 12012010 robfelty
Treble clef 4/4 Time
I started my website in 2003. At the time it was hosted by the University of Michigan, where I was a graduate student. They gave all students some space for a personal website. It was really great, though it did come with some limitations, like no php or cgi allowed. I managed to kludge some server side includes and javascript together to get a fairly decent food website. I also had some other stuff on my site like some academic […] (Read more)