A-  A  A+ RSS Feed

Deep Thoughts by Robert Felty

thoughts on wordpress, latex, cooking et alia

Posts Tagged ‘plastex’

Tuesday, August 18th, 2009

Blogging with LaTeX

The first question on reader’s mind must be — why use LaTeX to blog?

Well, I have a pretty specific instance in mind, but I can imagine that others might be interested as well. This fall I am teaching a course on computational corpus linguistics at CU Boulder. I like to have some materials online for the students, such as the syllabus, course notes, etc. I thought about setting up a simple webpage, but then I decided instead to use wordpress, because it would give me extra functionality, such as auto-generating rss feeds, and with some plugins, would allow me to notify students via e-mail when I post lecture notes or slides, or homework tips. The one drawback I could see is that it would be hard to update the syllabus throughout the semester, as I tend to change the calendar some throughout the semester. Then I thought that maybe I could hack up a quick xml-rpc solution to the problem, and about an hour later, I had it done. All I needed was the Wordpress::API perl module from CPAN.

Now I have a handy little publish script which first compiles my syllabus as pdf, then compiles it as html using plasTeX. Then I run it through tidy for formatting and hack in a few more things. Then I update the syllabus page on the course blog via xml-rpc. Finally, I rsync the pdf to the server. And with the post-notification plugin for wordpress, students will get an e-mail when the syllabus is updated.

Here are the scripts:

publish

#!/bin/bash
# this script compiles a pdf version, an html version, and then copies it to my
# webserver

# compile as pdf
pdflatex syllabus && pdflatex syllabus

# compile as html
plastex  -c syllabus.cfg syllabus
# clean up code a bit
tidy --show-body-only true --ascii-chars true  -asxhtml --input-encoding utf8 --output-encoding ascii -wrap 0 -indent --tab-size 4 syllabus/index.html > syllabus/syllabus.tmp

# add in link to pdf version in the html version
cat pdflink syllabus/syllabus.tmp > syllabus/syllabus.html

# update the syllabus page via xml-rpc
./updateSyllabusPage.pl

# upload the newest version of the pdf
rsync -avzu syllabus.pdf robfelty.com:/var/www/html/robfelty/teaching/ling5200Fall2009/ling5200Fall2009-syllabus.pdf

updateSyllabusPage.pl

#!/usr/bin/perl -w
use WordPress::API;

# get new syllabus content
my $contentFile = 'syllabus/syllabus.html';
open(CONTENT, $contentFile);

# slurp in content
$content = do {local ( $/ ); <CONTENT> };

my $w = WordPress::API->new({
   proxy => 'http://robfelty.com/teaching/ling5200Fall2009/xmlrpc.php',
   username => 'myusername',
   password => 'mypassword',
});

my $page = $w->page(3);

$page->description($content);
$page->save;

Finally, if you want to see how I customize content for pdf and html separately, you can check out the LaTeX source file

Sunday, March 29th, 2009

Converting LaTeX to Microsoft Word with plasTeX and Open Office

Sample of LaTeX document converted to Word
Sample of LaTeX document converted to Word

First of all, any LaTeX user might ask — why would I want to convert beautiful LaTeX into ugly Microsoft Word? The main reason is collaborators who want to use track changes. I recently sent a draft of a paper to some colleagues it two formats – .pdf and .doc. The pdf was formatted beautifully with LaTeX, but if your collaborators are not comfortable with editing a LaTeX file, it is difficult to make comments in pdf files, though there are some options for it (I like Mac OSX’s Preview application).

So when I sent out this latest paper for comments, I decided to send two versions. Converting to Word via copy and paste is very laborious, and not worth the effort. Recently though, I have been using plasTeX to convert LaTeX into html. I know that programs like Open Office can import html, so I decided to try that route to convert into .doc. First I used plasTeX to convert html, and specified a few options to get the output I wanted:

plastex --theme minimal --sec-num-depth 0 --split-level 0 <filename>.tex

By default, this creates a subdirectory called , with an index.html file inside it.

Next I opened a new text document in Open Office, then selected Insert > File, and selected this index.html file. Presto! I had the document, complete with figures, tables, footnotes, and references. It wasn’t formatted as nicely as the pdf, but now my authors could insert their own comments and send it back to me electronically. One last step though. By default Open Office links to external figures instead of embedding them. To override this, select Edit > Links. Then highlight all the links, and click on the button to “break links”. Finally, save the document as a .doc file, and e-mail it to the collaborators as an attachment.

I have to incorporate their comments back into my original LaTeX file, but this is much less painful to me than having to write the whole thing in Word to begin with.

Wednesday, March 19th, 2008

Finally a better LaTeX to html converter

About a year ago I wrote a post about my frustration with the lack of a good LaTeX to html converter. Recently I found one. It is called plasTeX, so named because it is written in python. Finally a converter which works well with most any LaTeX package or macro you write, and produces sane, relatively standards-compliant html. Best of all, it comes with a built-in if \ifplastex, so you can specify that some things should be in the pdf version, while others should be in the html version. It also support some other formats like docbook, but I am not too interested in that, so I haven’t tested it all.

Though I discovered plasTeX a few months ago, I finally decided to play around with it some more this week, when I wasn’t feeling like working too much. So I decided to convert my CV from html to latex, so I can have both pretty pdf and html versions. So now I will update my CV only in latex, which is much nicer to write than html anyways.

To get both a pdf and an html version, this is what I do:

pdflatex cv
plastex -c cv.cfg cv
tidy --show-body-only true -asxhtml -utf8 -wrap 78 -indent cv/index.html > cv/index2.html

It took me awhile to figure out the syntax of the plasTeX config file. It is possible to specify these things on the command line, but I got tired of having too many things on the command line. The trick is that the plasTeX options are divided up into different sections. The sec-num-depth option, which tells plasTeX how many section levels deep to number, is in the document option section, while the split-level option is in the files section. The split-level option tells plasTeX how many different html pages to make. For my CV, I only wanted one. My plastex config file looks like:

[document]
sec-num-depth = 0
[files]
split-level = 0

Though plasTeX is pretty nice, there are 3 complaints I still have. One is that it uses <h1> tags for \section, which I can understand, but there is a long tradition in the web of having only one <h1> tag per page, which usually holds the title of the page. PlasTeX does have the ability to use different themes, so there might be a way to change this with a theme, but I haven’t figured that out yet. For the time being, I simply used \subsection tags instead.

My second complaint about plasTeX is that it doesn’t do any nice formatting of the html like indenting or line-wrapping. That is only a minor complaint though, since I can simply run it through htmltidy (as in the code above).

My third complaint is that plasTeX puts everything inside a <td> or <li> inside a <p>, which seems very strange to me, and creates some ugly formatting, so I used some CSS tricks to basically take away the standard effects of <p> tags inside these:

td > p {margin:0;padding:0;}
li > p {margin:0;padding:0;}

I am not completely finished yet, but you can check out the html and pdf results in: my new CV.