A-  A  A+ RSS Feed

Deep Thoughts by Robert Felty

thoughts on wordpress, latex, cooking et alia

Archive for the 'latex' Category

Thursday, September 24th, 2009

Vim regex-fu for LaTeX

When writing a beamer presentation with LaTeX, I organize my presentation into sections and subsections. Frequently, the title of the first frame (slide) in a subsection has the same name as the subsection. Let’s say I start off with the following structure:

\section[corpora]{Accessing text corpora}
\subsection[gutenberg]{The Gutenberg Corpus}
\subsection[chat]{The web and chat Corpus}
\subsection[brown]{The Brown Corpus}
\subsection[reuters]{The Reuters Corpus}
\subsection[inaugural]{The Inaugural address Corpus}
\subsection[annotated]{Annotated corpora}
\subsection[foreign]{Corpora in other languages}
\subsection[DIY]{Loading your own corpora}

For each subsection, I want to put in one frame, with the name of the subsection being the name of the frame. Regular expressions to the rescue! In vim, all I have to is use V to select each line with subsection, then I hit :, which allows me to operate on those lines only.

'<,'>

is automatically inserted after the colon, which stands for “from the beginning of the highlighted section to the end of it”. Then I use s to perform my substitution. \r inserts a new line.

:'<,'>s/{\(.*\)}/{\1}\r\\begin{frame}\r\\frametitle<presentation>{\1}\r\\end{frame}/

The result is:

\section[corpora]{Accessing text corpora}
\begin{frame}
\frametitle<presentation>{Accessing text corpora}
\end{frame}
\subsection[gutenberg]{The Gutenberg Corpus}
\begin{frame}
\frametitle<presentation>{The Gutenberg Corpus}
\end{frame}
\subsection[chat]{The web and chat Corpus}
\begin{frame}
\frametitle<presentation>{The web and chat Corpus}
\end{frame}
\subsection[brown]{The Brown Corpus}
\begin{frame}
\frametitle<presentation>{The Brown Corpus}
\end{frame}
\subsection[reuters]{The Reuters Corpus}
\begin{frame}
\frametitle<presentation>{The Reuters Corpus}
\end{frame}
\subsection[inaugural]{The Inaugural address Corpus}
\begin{frame}
\frametitle<presentation>{The Inaugural address Corpus}
\end{frame}
\subsection[annotated]{Annotated corpora}
\begin{frame}
\frametitle<presentation>{Annotated corpora}
\end{frame}
\subsection[foreign]{Corpora in other languages}
\begin{frame}
\frametitle<presentation>{Corpora in other languages}
\end{frame}
\subsection[DIY]{Loading your own corpora}
\begin{frame}
\frametitle<presentation>{Loading your own corpora}
\end{frame}
Tuesday, August 18th, 2009

Blogging with LaTeX

The first question on reader’s mind must be — why use LaTeX to blog?

Well, I have a pretty specific instance in mind, but I can imagine that others might be interested as well. This fall I am teaching a course on computational corpus linguistics at CU Boulder. I like to have some materials online for the students, such as the syllabus, course notes, etc. I thought about setting up a simple webpage, but then I decided instead to use wordpress, because it would give me extra functionality, such as auto-generating rss feeds, and with some plugins, would allow me to notify students via e-mail when I post lecture notes or slides, or homework tips. The one drawback I could see is that it would be hard to update the syllabus throughout the semester, as I tend to change the calendar some throughout the semester. Then I thought that maybe I could hack up a quick xml-rpc solution to the problem, and about an hour later, I had it done. All I needed was the Wordpress::API perl module from CPAN.

Now I have a handy little publish script which first compiles my syllabus as pdf, then compiles it as html using plasTeX. Then I run it through tidy for formatting and hack in a few more things. Then I update the syllabus page on the course blog via xml-rpc. Finally, I rsync the pdf to the server. And with the post-notification plugin for wordpress, students will get an e-mail when the syllabus is updated.

Here are the scripts:

publish

#!/bin/bash
# this script compiles a pdf version, an html version, and then copies it to my
# webserver

# compile as pdf
pdflatex syllabus && pdflatex syllabus

# compile as html
plastex  -c syllabus.cfg syllabus
# clean up code a bit
tidy --show-body-only true --ascii-chars true  -asxhtml --input-encoding utf8 --output-encoding ascii -wrap 0 -indent --tab-size 4 syllabus/index.html > syllabus/syllabus.tmp

# add in link to pdf version in the html version
cat pdflink syllabus/syllabus.tmp > syllabus/syllabus.html

# update the syllabus page via xml-rpc
./updateSyllabusPage.pl

# upload the newest version of the pdf
rsync -avzu syllabus.pdf robfelty.com:/var/www/html/robfelty/teaching/ling5200Fall2009/ling5200Fall2009-syllabus.pdf

updateSyllabusPage.pl

#!/usr/bin/perl -w
use WordPress::API;

# get new syllabus content
my $contentFile = 'syllabus/syllabus.html';
open(CONTENT, $contentFile);

# slurp in content
$content = do {local ( $/ ); <CONTENT> };

my $w = WordPress::API->new({
   proxy => 'http://robfelty.com/teaching/ling5200Fall2009/xmlrpc.php',
   username => 'myusername',
   password => 'mypassword',
});

my $page = $w->page(3);

$page->description($content);
$page->save;

Finally, if you want to see how I customize content for pdf and html separately, you can check out the LaTeX source file

Sunday, March 29th, 2009

Converting LaTeX to Microsoft Word with plasTeX and Open Office

Sample of LaTeX document converted to Word
Sample of LaTeX document converted to Word

First of all, any LaTeX user might ask — why would I want to convert beautiful LaTeX into ugly Microsoft Word? The main reason is collaborators who want to use track changes. I recently sent a draft of a paper to some colleagues it two formats – .pdf and .doc. The pdf was formatted beautifully with LaTeX, but if your collaborators are not comfortable with editing a LaTeX file, it is difficult to make comments in pdf files, though there are some options for it (I like Mac OSX’s Preview application).

So when I sent out this latest paper for comments, I decided to send two versions. Converting to Word via copy and paste is very laborious, and not worth the effort. Recently though, I have been using plasTeX to convert LaTeX into html. I know that programs like Open Office can import html, so I decided to try that route to convert into .doc. First I used plasTeX to convert html, and specified a few options to get the output I wanted:

plastex --theme minimal --sec-num-depth 0 --split-level 0 <filename>.tex

By default, this creates a subdirectory called , with an index.html file inside it.

Next I opened a new text document in Open Office, then selected Insert > File, and selected this index.html file. Presto! I had the document, complete with figures, tables, footnotes, and references. It wasn’t formatted as nicely as the pdf, but now my authors could insert their own comments and send it back to me electronically. One last step though. By default Open Office links to external figures instead of embedding them. To override this, select Edit > Links. Then highlight all the links, and click on the button to “break links”. Finally, save the document as a .doc file, and e-mail it to the collaborators as an attachment.

I have to incorporate their comments back into my original LaTeX file, but this is much less painful to me than having to write the whole thing in Word to begin with.

Thursday, December 4th, 2008

LaTeX utility scripts

Processing a LaTeX file usually takes several steps. At a bare minimum, it usually requires 2 runs through latex (or pdflatex). Two runs are necessary in order to get cross-references and the table of contents right. Since LaTeX processes a page at a time, it can’t generate a table of contents on page 1 until it knows what sections, subsections etc. are in the rest of the file. That is what the .aux file is. Then on the second run, LaTeX reads that info from the .aux file. If you are running bibtex, or making an index, there are additional programs to run. Typing this from the command line (or even hitting the compile button from a GUI like TeXshop) can be tedious. I know that many people use Makefiles to achieve this task. However, as far as I know, Makefiles are specific to a particular project. That is, for every new LaTeX project, you have to create a new Makefile. Instead, I use bash scripts. This allows me to specify a filename on the command line, and I also get some more flexibility from the power of bash. I choose bash for this task, since it is mostly just stringing together commands I would perform on the command line anyways. It is also installed on most any linux machine, and on Mac OSX. (Sorry windows users, (unless you use cygwin)).

I have several different scripts, depending on what I am doing.

Traditional LaTeX

#!/bin/bash
# this script processes a latex file.
DVIPS=dvips
PDF=ps2pdf
PROG=latex
SEED=`echo $1 | cut -f1 -d"."`
$PROG --shell-escape -interaction=batchmode $SEED &&
(if [ -e ${SEED}.idx ]; then
  makeindex $SEED
fi) &&
(if [ $(grep bibdata ${SEED}.aux) ]; then
  bibtex $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED
fi) &&
$PROG --shell-escape -interaction=batchmode $SEED &&
$DVIPS $SEED.dvi -t letter -Ppdf -o $SEED.ps &&
$PDF $SEED.ps &&
echo "*****************************
  SUCCESSFULLY PROCESSED $SEED
*****************************"
||
echo "*****************************
  PARSING PROBLEM with $SEED. run $PROG manually to see errors
*****************************"

For pdflatex

#!/bin/bash
PROG=pdflatex
SEED=`echo $1 | cut -f1 -d"."`
$PROG --shell-escape -interaction=batchmode $SEED &&
(if [ -e ${SEED}.idx ]; then
  makeindex $SEED
fi) &&
(if [ $(grep bibdata ${SEED}.aux) ]; then
  bibtex $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED &&
  $PROG --shell-escape -interaction=batchmode $SEED
fi) &&
$PROG --shell-escape -interaction=batchmode $SEED &&
echo "*****************************
  SUCCESSFULLY PROCESSED $SEED
*****************************"
||
echo "*****************************
  PARSING PROBLEM with $SEED. run $PROG manually to see errors
*****************************"

You can run the scripts like so:

pdftexit myfile.tex

The .tex is optional. In fact you could use .pdf, or nothing. The extension gets stripped off. Note that as it is currently written, if you have a file like 2008.04.11.tex, it will break, since it splits by period, and only takes the first part. A way around this would be to use basename instead, but that would not work for files named anything other than .tex.

Also note that I suppress most of latex’s output by using the -interaction=batchmode option. By default LaTeX prints out quite a bit of information, and printing to the screen can really slow things down. If something goes wrong, then you can always run it once manually. The script will detect if something goes wrong and tell you.

I hope you find the scripts useful. For convenience, here is zip file with both bash scripts.

Thursday, November 6th, 2008

TeX Live 2008 — reasons to upgrade

new features in pgf tikz
New features in pgf/tikz

TeX Live 2008 was finally released about a month ago. I am a member of TUG, so I should be getting a DVD of it sometime soon, but today I finally decided I couldn’t wait, and I would just download it. The main impetus came after reading a discussion in comp.text.tex, in which someone was trying to reduce his compile time. He had a bunch of pgf/tikz graphics, and they can take a long time to compile. Pgf/Tikz version 2.0, which was released in February, now includes the ability to save pgf graphics as external files, and then automatically include them using a standard \includegraphics command. So you only have to compile your graphics once, which can reduce compile time a lot. I think most LaTeX users probably compile often, especially if writing equations, since it is easy to mess those up and have your document not compile. So I can definitely appreciate the desire to speed up compile time. My most unproductive are days when I am running programs that take on the order of 30 seconds to 5 minutes to run, because I end up checking my e-mail or surfing the web while the program is running, and I usually end up spending more time doing that than the time it took for the program to run.

Anyways, so I wanted to try out this new functionality in pgf/tikz, so I downloaded the latest version from CTAN and installed it. (There is a nice tutorial on externalization in the manual (which is now 560 pages long) — search for “externalization”). Then I tried to compile a beamer presentation, and it failed. I was sort of expecting this, since I know beamer relies heavily on pgf. So I decided to just upgrade my whole texlive. By default, texlive gets installed into /usr/local/texlive/year, so I actually now have 2007 and 2008. I will keep both for awhile just to make sure I don’t have any problems. My non texlive packages are in /usr/local/texlive/texmf-local, so those did not get modified at all.

The first thing I did after installing the new texlive was to test a beamer presentation, and there were no problems, as I had expected. Then I used texdoc to check the manual for pgf and beamer to make sure that they were the newest versions, which they are. When I did so, the manuals got opened in evince. I prefer kpdf, and I had changed this in texdoc in my old version. I thought about just copying the old version over, but I decided to run a diff first, expecting to see just a few lines of output. I was quite surprised when lots and lots of changes started showing up, so then I did a word count on each. texdoc from 2007 was 206 lines long. texdoc from 2008 is 890 lines long. The old version was just a bourne shell script. The new version uses texlua. And the new version is much, much improved!! With the old version, to read the beamer manual, or the pgf manual I had to type:

texdoc beameruserguide
texdoc pgfmanual

There were quite a few other packages that had similar problems. But now in the new version, it works as one would hope.

texdoc beamer

So, I stuck with the new version of texdoc, but I did modify it give preference to kpdf over evince for viewing pdf documents. I just searched for evince, then changed the order of the two lines. Even though I don’t know lua at all, the code was very nicely formatted and easy to read.

Another nice thing about the new version of pgf is that it has a bunch more features, including easy ways to create drop shadows, and some new default shapes, like callouts. A few more of the new features are explained at this texample post