Converting Microsoft Word to html

I have just scripted an automatic cron job to convert my resume in word into html.

#!/bin/bash
if [ /var/www/home/resume.html -nt /var/www/home/resume.doc ]
then
       exit 0
fi
touch /var/www/home/resume.doc
cd /usr/local/share/www
source ./bin/activate
cd ./mich431

antiword -w 120 /var/www/home/resume.doc | sed -e 's@^Last upd.*@@;s@^ @@;s@^\. @\+ @;/./,/^$/!d;s/@/\\@/;s/ mich431.net/ `mich431.net`_ `resume.doc`_/;s/\(^Su.*\|^Wo.*\|^[HJT][eio].*\)/\n\1\n-----/;s/\(^Mich.*\)/\n\1\n=====/;s/^\([BCFGNOP].*\|S[oe].*\|Ap.*\|W[ae].*\)/\n\1\n\`````/;s/^.$//' > rs.rst
cat endRst.txt >> rs.rst
make html
cat _build/html/rs.html | sed 's@ cl.*software-toolset@ id="st@;s@ cl.*networking-technology@ id="nt@;s@ cl.*operating-systems@ id="os@;s@ cl.*programming-mark-up-scripting-languages@ id="pl@' > _build/html/res.html
cat _build/html/res.html | sed 's@ class="simple"@@g;s@ class="section.*>@>@g;s@ class="reference external"@@g' >  _build/html/rs.html
cat _build/html/rs.html | sed 's@\(.*Last rendered.*\)</p>@\1 This resume brought to you by the number 0x5f3759df and the letter \&mu\;.</p>@' > /var/www/home/resume.html
exit 0

This checks (hourly) if the Word version is newer than the html version, and if it is, moves to the sphinx environment, uses antiword to convert the word document to text, then uses sed to cleanup the resulting document. This means my Word and html resumes will almost always be in sync and I won’t have to make a manual html copy anymore. :)

TODO - fix sphinx's setup so the post build sed lines are not needed.

Michael 20120315

Last rendered on 15 March 2012 at 11:50.