A-  A  A+ RSS Feed

Deep Thoughts by Robert Felty

thoughts on wordpress, latex, cooking et alia

Posts Tagged ‘pdf’

Wednesday, March 19th, 2008

Finally a better LaTeX to html converter

About a year ago I wrote a post about my frustration with the lack of a good LaTeX to html converter. Recently I found one. It is called plasTeX, so named because it is written in python. Finally a converter which works well with most any LaTeX package or macro you write, and produces sane, relatively standards-compliant html. Best of all, it comes with a built-in if \ifplastex, so you can specify that some things should be in the pdf version, while others should be in the html version. It also support some other formats like docbook, but I am not too interested in that, so I haven’t tested it all.

Though I discovered plasTeX a few months ago, I finally decided to play around with it some more this week, when I wasn’t feeling like working too much. So I decided to convert my CV from html to latex, so I can have both pretty pdf and html versions. So now I will update my CV only in latex, which is much nicer to write than html anyways.

To get both a pdf and an html version, this is what I do:

pdflatex cv
plastex -c cv.cfg cv
tidy --show-body-only true -asxhtml -utf8 -wrap 78 -indent cv/index.html > cv/index2.html

It took me awhile to figure out the syntax of the plasTeX config file. It is possible to specify these things on the command line, but I got tired of having too many things on the command line. The trick is that the plasTeX options are divided up into different sections. The sec-num-depth option, which tells plasTeX how many section levels deep to number, is in the document option section, while the split-level option is in the files section. The split-level option tells plasTeX how many different html pages to make. For my CV, I only wanted one. My plastex config file looks like:

[document]
sec-num-depth = 0
[files]
split-level = 0

Though plasTeX is pretty nice, there are 3 complaints I still have. One is that it uses <h1> tags for \section, which I can understand, but there is a long tradition in the web of having only one <h1> tag per page, which usually holds the title of the page. PlasTeX does have the ability to use different themes, so there might be a way to change this with a theme, but I haven’t figured that out yet. For the time being, I simply used \subsection tags instead.

My second complaint about plasTeX is that it doesn’t do any nice formatting of the html like indenting or line-wrapping. That is only a minor complaint though, since I can simply run it through htmltidy (as in the code above).

My third complaint is that plasTeX puts everything inside a <td> or <li> inside a <p>, which seems very strange to me, and creates some ugly formatting, so I used some CSS tricks to basically take away the standard effects of <p> tags inside these:

td > p {margin:0;padding:0;}
li > p {margin:0;padding:0;}

I am not completely finished yet, but you can check out the html and pdf results in: my new CV.

Tuesday, March 11th, 2008

convert pdf to png with imagemagick

Imagemagick is a swiss-army knife of command-line image conversion, but can be a bit complicated to actually use. I have been making most of my figures with R lately, and printing them to pdfs, which I can include very easily into documents with pdflatex. I like pdf because it is scalable, fairly small file size (smaller than .eps), and portable. But today a colleague wanted to include a few of my figures in her own powerpoint presentation, and powerpoint only likes bitmaps. She was just going to take screenshots of the figures, but I quickly said, “no, I will just convert them to pngs”. She replied: “I don’t want you to go to a bunch of trouble.” “No trouble at all,” I replied. Then I quickly wrote a bash for loop to convert all the pdf figures into pngs. Then an hour later when I went to zip them up and e-mail them to her, I realized that they looked like crap. After a bit of searching online, I found the flags I was looking for, and eventually used:

for file in *.pdf; do \
echo $file;\
convert -density 600x600 -resize 800x560 -quality 90 $file `echo $file|cut -f1 -d'.'`.png;\
done

And now the code explained:
-density 600×600 says treat the pdf as 600×600 dpi resolution
-quality 90 says use the highest compression level for png (9) and no filtering (0)
-resize 800×560 gives the dimensions in pixels of the resulting png file

Happy ImageMagicking!