A-  A  A+ RSS Feed

Deep Thoughts by Robert Felty

thoughts on wordpress, latex, cooking et alia

Posts Tagged ‘bash’

Thursday, December 4th, 2008

LaTeX utility scripts

Processing a LaTeX file usually takes several steps. At a bare minimum, it usually requires 2 runs through latex (or pdflatex). Two runs are necessary in order to get cross-references and the table of contents right. Since LaTeX processes a page at a time, it can’t generate a table of contents on page 1 until it knows what sections, subsections etc. are in the rest of the file. That is what the .aux file is. Then on the second run, LaTeX reads that info from the .aux file. If you are running bibtex, or making an index, there are additional programs to run. Typing this from the command line (or even hitting the compile button from a GUI like TeXshop) can be tedious. I know that many people use Makefiles to achieve this task. However, as far as I know, Makefiles are specific to a particular project. That is, for every new LaTeX project, you have to create a new Makefile. Instead, I use bash scripts. This allows me to specify a filename on the command line, and I also get some more flexibility from the power of bash. I choose bash for this task, since it is mostly just stringing together commands I would perform on the command line anyways. It is also installed on most any linux machine, and on Mac OSX. (Sorry windows users, (unless you use cygwin)).

I have several different scripts, depending on what I am doing.

Traditional LaTeX

#!/bin/bash
# this script processes a latex file.
DVIPS=dvips
PDF=ps2pdf
PROG=latex
SEED=`echo $1 | cut -f1 -d"."`
$PROG --shell-escape -interaction=batchmode $SEED &&
(if [ -e ${SEED}.idx ]; then
  makeindex $SEED
fi) &&
(if [ $(grep bibdata ${SEED}.aux) ]; then
  bibtex $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED
fi) &&
$PROG --shell-escape -interaction=batchmode $SEED &&
$DVIPS $SEED.dvi -t letter -Ppdf -o $SEED.ps &&
$PDF $SEED.ps &&
echo "*****************************
  SUCCESSFULLY PROCESSED $SEED
*****************************"
||
echo "*****************************
  PARSING PROBLEM with $SEED. run $PROG manually to see errors
*****************************"

For pdflatex

#!/bin/bash
PROG=pdflatex
SEED=`echo $1 | cut -f1 -d"."`
$PROG --shell-escape -interaction=batchmode $SEED &&
(if [ -e ${SEED}.idx ]; then
  makeindex $SEED
fi) &&
(if [ $(grep bibdata ${SEED}.aux) ]; then
  bibtex $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED
fi) &&
$PROG --shell-escape -interaction=batchmode $SEED &&
echo "*****************************
  SUCCESSFULLY PROCESSED $SEED
*****************************"
||
echo "*****************************
  PARSING PROBLEM with $SEED. run $PROG manually to see errors
*****************************"

You can run the scripts like so:

pdftexit myfile.tex

The .tex is optional. In fact you could use .pdf, or nothing. The extension gets stripped off. Note that as it is currently written, if you have a file like 2008.04.11.tex, it will break, since it splits by period, and only takes the first part. A way around this would be to use basename instead, but that would not work for files named anything other than .tex.

Also note that I suppress most of latex’s output by using the -interaction=batchmode option. By default LaTeX prints out quite a bit of information, and printing to the screen can really slow things down. If something goes wrong, then you can always run it once manually. The script will detect if something goes wrong and tell you.

I hope you find the scripts useful. For convenience, here is zip file with both bash scripts.

Tuesday, July 15th, 2008

Bash one-liners to the rescue

I recently find myself using handy bash one-liners more all the time. I think that this is where unix/linux can really start to shine. There are so many little programs that just do one thing, and one thing well. But the ability to combine these together through pipes means you have extremely flexible and powerful tools at the ready.

I have been working on a new project at work to come up with some lists for testing speech recognition. We decided to use the TIMIT database, which contains recordings of many different sentences from many different speakers all around America. I first wrote a perl script to generate some basic stats on the sentences, like how many words were in each sentence, and what the word frequency for those words is. Then I wrote a perl script to randomly select some of the sentences, and create several different lists of sentences. Finally, I wrote an R script which took the original .wav files, and mixed in signal-dependent noise in one channel, so that we can vary the signal to noise ratio during presentation of the stimuli by adjusting the balance on our sound system.

Along the way, I ran into a couple problems with the original sound files. It turns out that 446 of the 6300 sound files were clipped, and highly distorted. I noticed this on my own in listening to a few of the files I had generated with R. I could have gone through all 6300 files manually, and removed the distorted ones, but that would have taken a long time. Instead, I used the program sox, which is a low-level, powerful audio processing program. I first used the find command to find all .wav files in the directory I was interested in (including sub-directories), then I passed each file to sox, and told sox not to play the output , but instead just give me some stats (-n stat). After some testing with a few clipped, and non-clipped files, I realized that for clipped files, the output from sox ended with a line that said either “Try: blah blah”, or “Can’t determine type”. I then later discovered that there might still be clipped files, and these would have a maximum amplitude of 1 or minimum or -1. So I knew that any clipped file would produce this output. So I passed the results from sox to grep (notice I had to redirect STDERR to STDOUT 2>&1), and then if the output contained a line starting with “Try:” or “Can’t”, then I moved that file the $file.clipped.

for file in `find . -name "*.wav" -print`; do
  if [[ `sox $file -n stat 2>&1 | grep -E "^(Try:|Can't|(Minimum|Maximum) amplitude:\s+-?1\.00)"` ]]; then
    echo "$file CLIPPED";
    mv $file $file.clipped;
  fi;
done

After doing this, I simply amended my perl script which randomly generated lists to make sure that the wav file actually existed. Clipped files now ended in .clipped, instead of .wav.

There was an additional problem I had previously discovered with these sound files. They seemed to have some non-standard headers in them, which meant that the R script I was using to add noise to them couldn’t read the files. However, passing the files through sox made the files readable by R. (Windows Media Player on a Windows box couldn’t read the files either.) I only wanted to process the files I was actually going to add noise to, so I used another handy little bash one-liner. This one cuts a column of the file which contains all the sentences I am going to use, and then for each filename, processes the file through sox, and outputs it to the destination directory of my choosing.

for file in `cut -f 18 -d $'\t' timitLists2.txt`;
  do sox $file ~/R/work/timit/clean/`basename $file`;
done

Note that I have expanded the code into one more line, but pretty much they are one-liners. I think technically a one-liner doesn’t involve successive commands, which the first example does, but the first command is just an echo, to make sure I know what it is doing.

Monday, July 14th, 2008

sort using TAB as field separator in bash

I have run into this problem several times recently, and decided to finally write down the solution for myself rather than keep searching the internet for it.

This is the problem: if you want to sort a file that is tab-delimited (and some of the filelds contain spaces), then you must explicitly tell sort to use TABS as the field separator, otherwise it will use any whitespace character. For functions such as cut and paste, this can be done like so:

cut -f 1 -d '\t' file

where -f specifies the field number and -d specifies the field seperator.
The sort command uses the -t flag instead. So one would think that this would work:

#INCORRECT
sort -k 2 -t '\t' file

where -k specifies the field number and -t specifies the field separator
Unfortunately this does not work, because sort won’t accept ‘\t’, since it treats it as a multi-byte character. The solution is to place a $ before it, like so:

#CORRECT
sort -k 2 -t $'\t' file

The dollar sign tells bash to use ANSI-C quoting
From: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_03.html

3.3.5. ANSI-C quoting

Words in the form “$’STRING’” are treated in a special way. The word expands to a string, with backslash-escaped characters replaced as specified by the ANSI-C standard. Backslash escape sequences can be found in the Bash documentation.

So now I have the answer for myself the next time the problem arises. I hope someone else benefits as well.