Frequently, I find myself wanting to do some simple language detection. For Chinese, Japanese, and Korean, this can easily be done by looking at the types of characters in some text. The simplest and most robust way to do this is to use Unicode block names. It is very simple to write a regular expression which will test if a character is contained in a certain block. For all the different possible blocks, see here: Unicode block names for use […] (Read more)