robfelty.com


UNIX tip of the day —
duplicate and replace lines with awk

Posted in linguistics, UNIX

Quarter note = 01182019 robfelty
Treble clef 4/4 Time
Today I got some data I wanted to add to my machine learning training datasets for named entity recognition. My system is designed to be used with output from automatic speech recognition (ASR). It is frequently difficult to be certain whether ASR output will contain hyphens or not, e.g. (email, vs e-mail) so frequently I include both variants to be robust. I was able to automatically add these variants with a quick awk oneliner awk ‘/-/ {print; gsub(“-“, ” “)} […] (Read more)

Vim regex-fu for LaTeX

Posted in latex, linguistics

Quarter note = 09242009 robfelty
Treble clef 4/4 Time
When writing a beamer presentation with LaTeX, I organize my presentation into sections and subsections. Frequently, the title of the first frame (slide) in a subsection has the same name as the subsection. Let’s say I start off with the following structure: \section[corpora]{Accessing text corpora} \subsection[gutenberg]{The Gutenberg Corpus} \subsection[chat]{The web and chat Corpus} \subsection[brown]{The Brown Corpus} \subsection[reuters]{The Reuters Corpus} \subsection[inaugural]{The Inaugural address Corpus} \subsection[annotated]{Annotated corpora} \subsection[foreign]{Corpora in other languages} \subsection[DIY]{Loading your own corpora} For each subsection, I want to put […] (Read more)

Why doesn’t Mac update standard UNIX utilities?

Posted in linguistics, linux, mac osx, perl

Quarter note = 09152008 robfelty
Treble clef 4/4 Time
I am currently teaching a course on programming for linguists. We are using python, but for the first few classes, I have been going over some standard UNIX utilities like cd, ls and such, plus using regular expressions with grep and sed. I actually don’t use sed that much. I tend to reach for perl, since I know it better, and it can do pretty much all the same stuff that sed can plus much more. But sed is simpler […] (Read more)

Bash one-liners to the rescue

Posted in general, linguistics, linux

Quarter note = 07152008 robfelty
Treble clef 4/4 Time
I recently find myself using handy bash one-liners more all the time. I think that this is where unix/linux can really start to shine. There are so many little programs that just do one thing, and one thing well. But the ability to combine these together through pipes means you have extremely flexible and powerful tools at the ready. I have been working on a new project at work to come up with some lists for testing speech recognition. We […] (Read more)

Working on making fancy graphs with R / fixed versus random babble

Posted in linguistics

Quarter note = 04032008 robfelty
Treble clef 4/4 Time
I have been working on learning R for several months now, and continue to get better at it and enjoy it more all the time. I am currently working on a spoken word recognition project at work. The task we are using is quite simple. Participants listen to words that have been mixed with multi-talker babble (kind of like background conversation at a cocktail party), and type in what they hear. We are analyzing the errors they make to try […] (Read more)