A-  A  A+ RSS Feed

Deep Thoughts by Robert Felty

thoughts on wordpress, latex, cooking et alia

Posts Tagged ‘latex’

Thursday, December 4th, 2008

LaTeX utility scripts

Processing a LaTeX file usually takes several steps. At a bare minimum, it usually requires 2 runs through latex (or pdflatex). Two runs are necessary in order to get cross-references and the table of contents right. Since LaTeX processes a page at a time, it can’t generate a table of contents on page 1 until it knows what sections, subsections etc. are in the rest of the file. That is what the .aux file is. Then on the second run, LaTeX reads that info from the .aux file. If you are running bibtex, or making an index, there are additional programs to run. Typing this from the command line (or even hitting the compile button from a GUI like TeXshop) can be tedious. I know that many people use Makefiles to achieve this task. However, as far as I know, Makefiles are specific to a particular project. That is, for every new LaTeX project, you have to create a new Makefile. Instead, I use bash scripts. This allows me to specify a filename on the command line, and I also get some more flexibility from the power of bash. I choose bash for this task, since it is mostly just stringing together commands I would perform on the command line anyways. It is also installed on most any linux machine, and on Mac OSX. (Sorry windows users, (unless you use cygwin)).

I have several different scripts, depending on what I am doing.

Traditional LaTeX

#!/bin/bash
# this script processes a latex file.
DVIPS=dvips
PDF=ps2pdf
PROG=latex
SEED=`echo $1 | cut -f1 -d"."`
$PROG --shell-escape -interaction=batchmode $SEED &&
(if [ -e ${SEED}.idx ]; then
  makeindex $SEED
fi) &&
(if [ $(grep bibdata ${SEED}.aux) ]; then
  bibtex $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED
fi) &&
$PROG --shell-escape -interaction=batchmode $SEED &&
$DVIPS $SEED.dvi -t letter -Ppdf -o $SEED.ps &&
$PDF $SEED.ps &&
echo "*****************************
  SUCCESSFULLY PROCESSED $SEED
*****************************"
||
echo "*****************************
  PARSING PROBLEM with $SEED. run $PROG manually to see errors
*****************************"

For pdflatex

#!/bin/bash
PROG=pdflatex
SEED=`echo $1 | cut -f1 -d"."`
$PROG --shell-escape -interaction=batchmode $SEED &&
(if [ -e ${SEED}.idx ]; then
  makeindex $SEED
fi) &&
(if [ $(grep bibdata ${SEED}.aux) ]; then
  bibtex $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED
fi) &&
$PROG --shell-escape -interaction=batchmode $SEED &&
echo "*****************************
  SUCCESSFULLY PROCESSED $SEED
*****************************"
||
echo "*****************************
  PARSING PROBLEM with $SEED. run $PROG manually to see errors
*****************************"

You can run the scripts like so:

pdftexit myfile.tex

The .tex is optional. In fact you could use .pdf, or nothing. The extension gets stripped off. Note that as it is currently written, if you have a file like 2008.04.11.tex, it will break, since it splits by period, and only takes the first part. A way around this would be to use basename instead, but that would not work for files named anything other than .tex.

Also note that I suppress most of latex’s output by using the -interaction=batchmode option. By default LaTeX prints out quite a bit of information, and printing to the screen can really slow things down. If something goes wrong, then you can always run it once manually. The script will detect if something goes wrong and tell you.

I hope you find the scripts useful. For convenience, here is zip file with both bash scripts.

Wednesday, March 19th, 2008

Finally a better LaTeX to html converter

About a year ago I wrote a post about my frustration with the lack of a good LaTeX to html converter. Recently I found one. It is called plasTeX, so named because it is written in python. Finally a converter which works well with most any LaTeX package or macro you write, and produces sane, relatively standards-compliant html. Best of all, it comes with a built-in if \ifplastex, so you can specify that some things should be in the pdf version, while others should be in the html version. It also support some other formats like docbook, but I am not too interested in that, so I haven’t tested it all.

Though I discovered plasTeX a few months ago, I finally decided to play around with it some more this week, when I wasn’t feeling like working too much. So I decided to convert my CV from html to latex, so I can have both pretty pdf and html versions. So now I will update my CV only in latex, which is much nicer to write than html anyways.

To get both a pdf and an html version, this is what I do:

pdflatex cv
plastex -c cv.cfg cv
tidy --show-body-only true -asxhtml -utf8 -wrap 78 -indent cv/index.html > cv/index2.html

It took me awhile to figure out the syntax of the plasTeX config file. It is possible to specify these things on the command line, but I got tired of having too many things on the command line. The trick is that the plasTeX options are divided up into different sections. The sec-num-depth option, which tells plasTeX how many section levels deep to number, is in the document option section, while the split-level option is in the files section. The split-level option tells plasTeX how many different html pages to make. For my CV, I only wanted one. My plastex config file looks like:

[document]
sec-num-depth = 0
[files]
split-level = 0

Though plasTeX is pretty nice, there are 3 complaints I still have. One is that it uses <h1> tags for \section, which I can understand, but there is a long tradition in the web of having only one <h1> tag per page, which usually holds the title of the page. PlasTeX does have the ability to use different themes, so there might be a way to change this with a theme, but I haven’t figured that out yet. For the time being, I simply used \subsection tags instead.

My second complaint about plasTeX is that it doesn’t do any nice formatting of the html like indenting or line-wrapping. That is only a minor complaint though, since I can simply run it through htmltidy (as in the code above).

My third complaint is that plasTeX puts everything inside a <td> or <li> inside a <p>, which seems very strange to me, and creates some ugly formatting, so I used some CSS tricks to basically take away the standard effects of <p> tags inside these:

td > p {margin:0;padding:0;}
li > p {margin:0;padding:0;}

I am not completely finished yet, but you can check out the html and pdf results in: my new CV.