Bash one-liners to the rescue

July 15th, 2008

I recently find myself using handy bash one-liners more all the time. I think that this is where unix/linux can really start to shine. There are so many little programs that just do one thing, and one thing well. But the ability to combine these together through pipes means you have extremely flexible and powerful tools at the ready.

I have been working on a new project at work to come up with some lists for testing speech recognition. We decided to use the TIMIT database, which contains recordings of many different sentences from many different speakers all around America. I first wrote a perl script to generate some basic stats on the sentences, like how many words were in each sentence, and what the word frequency for those words is. Then I wrote a perl script to randomly select some of the sentences, and create several different lists of sentences. Finally, I wrote an R script which took the original .wav files, and mixed in signal-dependent noise in one channel, so that we can vary the signal to noise ratio during presentation of the stimuli by adjusting the balance on our sound system.

Along the way, I ran into a couple problems with the original sound files. It turns out that 446 of the 6300 sound files were clipped, and highly distorted. I noticed this on my own in listening to a few of the files I had generated with R. I could have gone through all 6300 files manually, and removed the distorted ones, but that would have taken a long time. Instead, I used the program sox, which is a low-level, powerful audio processing program. I first used the find command to find all .wav files in the directory I was interested in (including sub-directories), then I passed each file to sox, and told sox not to play the output , but instead just give me some stats (-n stat). After some testing with a few clipped, and non-clipped files, I realized that for clipped files, the output from sox ended with a line that said either “Try: blah blah”, or “Can’t determine type”. I then later discovered that there might still be clipped files, and these would have a maximum amplitude of 1 or minimum or -1. So I knew that any clipped file would produce this output. So I passed the results from sox to grep (notice I had to redirect STDERR to STDOUT 2>&1), and then if the output contained a line starting with “Try:” or “Can’t”, then I moved that file the $file.clipped.

for file in `find . -name "*.wav" -print`; do
  if [[ `sox $file -n stat 2>&1 | grep -E "^(Try:|Can't|(Minimum|Maximum) amplitude:\s+-?1\.00)"` ]]; then
    echo "$file CLIPPED";
    mv $file $file.clipped;
  fi;
done

After doing this, I simply amended my perl script which randomly generated lists to make sure that the wav file actually existed. Clipped files now ended in .clipped, instead of .wav.

There was an additional problem I had previously discovered with these sound files. They seemed to have some non-standard headers in them, which meant that the R script I was using to add noise to them couldn’t read the files. However, passing the files through sox made the files readable by R. (Windows Media Player on a Windows box couldn’t read the files either.) I only wanted to process the files I was actually going to add noise to, so I used another handy little bash one-liner. This one cuts a column of the file which contains all the sentences I am going to use, and then for each filename, processes the file through sox, and outputs it to the destination directory of my choosing.

for file in `cut -f 18 -d $'\t' timitLists2.txt`;
  do sox $file ~/R/work/timit/clean/`basename $file`;
done

Note that I have expanded the code into one more line, but pretty much they are one-liners. I think technically a one-liner doesn’t involve successive commands, which the first example does, but the first command is just an echo, to make sure I know what it is doing.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • del.icio.us
  • digg
  • Slashdot
  • Technorati
  • YahooMyWeb

sort using TAB as field separator in bash

July 14th, 2008

I have run into this problem several times recently, and decided to finally write down the solution for myself rather than keep searching the internet for it.

This is the problem: if you want to sort a file that is tab-delimited (and some of the filelds contain spaces), then you must explicitly tell sort to use TABS as the field separator, otherwise it will use any whitespace character. For functions such as cut and paste, this can be done like so:

cut -f 1 -d '\t' file

where -f specifies the field number and -d specifies the field seperator.
The sort command uses the -t flag instead. So one would think that this would work:

#INCORRECT
sort -k 2 -t '\t' file

where -k specifies the field number and -t specifies the field separator
Unfortunately this does not work, because sort won’t accept ‘\t’, since it treats it as a multi-byte character. The solution is to place a $ before it, like so:

#CORRECT
sort -k 2 -t $'\t' file

The dollar sign tells bash to use ANSI-C quoting
From: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_03.html

3.3.5. ANSI-C quoting

Words in the form “$’STRING’” are treated in a special way. The word expands to a string, with backslash-escaped characters replaced as specified by the ANSI-C standard. Backslash escape sequences can be found in the Bash documentation.

So now I have the answer for myself the next time the problem arises. I hope someone else benefits as well.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • del.icio.us
  • digg
  • Slashdot
  • Technorati
  • YahooMyWeb

Back in business

June 9th, 2008

The building where I have my server which runs this blog lost power on Wednesday, June 4th, and it finally got restored this morning. Fortunately, I am pretty religious about making backups on an external hard drive, so I had a day-old backup, and was able to upload most stuff to my dreamhost account, and set up failover service with my DNS host, zoneedit, so traffic to robfelty.com was redirected to robfelty.org. For those of you that noticed, I had the blog up, but I set it to cache heavily and I disabled comments. I did this partially because I didn’t want my databases to get out of sync, and also to reduce mysql usage, since dreamhost is shared hosting. Hopefully the main server should not be without power again for awhile, and now I will be better prepared for it should it happen again.

Now that I am back online, I can get back to working on wordpress plugins. You’ll notice I did do some design changes in the interim as well. I added a global navigation bar so that all of website is more fully integrated now.

For pictures on the flooding which caused the outage, check out: my personal photo album

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • del.icio.us
  • digg
  • Slashdot
  • Technorati
  • YahooMyWeb

ubuntu 8.04 released today. Where are the torrents?

April 24th, 2008

There has been many news stories about the latest release of ubuntu, so it is not a surprise that their site seems very unresponsive. I like kubuntu, so I started downloading a kubuntu torrent. If you have trouble with the main site, I am making the torrent available on mine as well.
For KDE3
kubuntu-8.04-desktop-amd64.iso.torrent
For KDE4
kubuntu-kde4-8.04-desktop-amd64.iso.torrent

Have fun trying them out!

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • del.icio.us
  • digg
  • Slashdot
  • Technorati
  • YahooMyWeb

Friendfeed plugin for wordpress

April 11th, 2008

Friendfeed released an API a couple weeks ago. This made me excited, as I figured I could write a wordpress plugin to grab comments people leave on my friendfeed back into the blog. I did not of course find the time to do this right away, as I have already been spending way too much time working on my 2 other wordpress plugins (Collapsing Categories and Collapsing Archives). Today I discovered that someone had already beaten me to the punch.

The Friendfeed comment plugin is located on the wordpress.org site. It is not very feature rich yet, but I am sure it will improve. Thanks to Glenn Slaven for writing the plugin.

I am hoping that I might get to see a few comments on this post via friendfeed and the normal route.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • del.icio.us
  • digg
  • Slashdot
  • Technorati
  • YahooMyWeb