robfelty.com


Unicode block names in regular expressions

Posted in bash, java, perl, python, regex

Quarter note = 11032014 robfelty
Treble clef 4/4 Time
Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression which will test if a character is contained in a certain block. For all the different possible blocks, see here: Unicode block names for use […] (Read more)

Monkey patching in python

Posted in python

Quarter note = 07012014 robfelty
Treble clef 4/4 Time
I was just reading an article about Martijn Pieters, who is a python expert, and he mentioned monkey patching I did not know what monkey patching is, so I googled it, and found a great answer on stack overflow Basically, it takes advantage of python’s class access philosophy. Unlike java, which has a strict access policy, in python, all attributes and methods of a class are mutable. So it is possible to write code like this: from SomeOtherProduct.SomeModule import SomeClass […] (Read more)

Java anchored regex

Posted in java, regex

Quarter note = 04032014 robfelty
Treble clef 4/4 Time
I just discovered this today when doing some regex in Java. When I first started doing regex in Java, I was surprised to learn that Java seems to treat all regular expressions as anchored. That is, if you have a string foobar and search for “foo” it will not match. This is different than grep, perl, and other tools. In other words, for Java, the following regexes are equivalent: "foo" "^foo$" If you want to find foo within foobar you […] (Read more)

Solr DataImportHandler preImportDeleteQuery gotcha

Posted in lucene, solr

Quarter note = 03312014 robfelty
Treble clef 4/4 Time
One handy feature of the DataImportHandler in solr is that you can group documents by different entities. In the MKB we have a couple different kinds of entities we import – songs, albums, tvshows, etc. Sometimes we make a change or improvement to the underlying data of one type of entity, and want to test it out. Instead of reimporting all the data, we can just reimport that one specific entity. To do this correctly, we need to define a […] (Read more)

Pretty printing json

Posted in bash, python

Quarter note = 01032014 robfelty
Treble clef 4/4 Time
Here is a really simple way to pretty print some unformatted json $ echo '{"foo": "lorem", "bar": "ipsum"}' | python -mjson.tool { "bar": "ipsum", "foo": "lorem" }