It started when I wanted to print the documentation for Keras so that I could peruse it on a train journey. Obviously this is a weird thing to want to do in at least two different ways, but sometimes you’ve just got to have ink on paper.
The online HTML documentation for Keras is built from extended markdown files, and can (in theory – hence this post) also be used to generate a PDF file. However, when I tried it there turned out to be a few “issues”, so I thought I’d document this in case anyone (including future-me) wants to go on the same journey. I’m not going to document the process of discovery I went though, just what seems to be the easiest route to getting a printable PDF. Note that these instructions are for Windows.
Ready? Lets go.
Part 1: Install the prerequisites.
1. Install Python 3.6 if you don’t already have it.
2. Download and install mkdocs. I used pip3 to do this. It will install itself into <user>\AppData\Local\Pandoc and you’ll need to add this to your path.
3. Download and install mkdocs-pandoc.
4. Download and install MiKTeX. mkdocs-pandoc emits LaTeX and expects you to have a way to convert this to PDFusing pdflatex.exe, which is included in MiKTeX (and probably other TeX distributions). Choosing the option to automatically install missing packages will save time later. Add C:\Program Files\MiKTeX 2.9\miktex\bin\x64 to your path.
5. Download the Keras repo from Github and unzip keras-master.zip. You’ll need this because it includes the documentation source files.
Part 2: Generate the PDF
1. In an command window, go to keras-master/docs. You’ll see a readme.md file containing instructions for building the docs. We’ll be following this outline but adding some workarounds.
2. Run python autogen.py and wait for errors to occur:
Traceback (most recent call last):
File “autogen.py”, line 559, in <module> readme = read_file(‘../README.md’) File “autogen.py”, line 532, in read_file return f.read() File “C:\Program Files\Python36\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x81 in position 7411: character maps to <undefined>
This is happening because autogen.py is trying to read files that don’t match the default file encoding. We’ll need to hack autogen.py, so open it in an editor ando to the first place that f.read() is called, which for me was line 532. Change the previous line from
with open(path) as f:
to
with open(path, encoding=”latin1″) as f:
Save autogen.py and run the command again. You should get a similar error on a different line (it was line 563 for me) so fix it the same way. Save autogen.py and run the command yet again. This time it should complete successfully and generate a bunch of markdown files in the sources directory.
3. Run the mkdocs serve command. This does some stuff and starts a local web server. Since this blocks the command window, you’ll need to start another command window and go to the docs directory.
4. In the new command window, run mkdocs build. This generates an html web site in the site directory. This may not be strictly necessary, but its a useful check that things are working.
5. At this point we leave the instructions in readme.md behind. Run the command mkdocs2pandoc > keras.pd. This creates a single large markdown file (keras.pd) that is optimised for generating PDFs.
6. The markdown document has a few problems at this point, which we can rectify by hand-editing it. I used notepad++ for this.
- Change the top-level heading on the first line from #Home to #Introduction.
- The markdown document contains a number of HTML <span> tags with level five headings on the next line. This causes pandoc (see step 7 below) to output the tag’s alt-text but fail to render the heading correctly. The solution is to add a line between them by using search and replace to replace </span>\r\n##### with </span>\r\n\r\n#####.
- The markdown contains HTML <img> tags that pandoc won’t handle. Either remove these manually or (as I did) leave them in and pandoc will render the alt text. Annoying but I can live with it.
7. Run the command pandoc –toc -f markdown+grid_tables+table_captions -V geometry:margin=2.5cm –pdf-engine=xelatex -o keras.pdf keras.pd . You’ll probably see some warnings: just ignore them. If you chose not to allow MiKTeX to automatically install missing packages then you be asked for permission to install various things as they are needed. The default LaTeX stylesheet renders pages with large margins, so I overrode this by setting them to 2.5 cm (about one inch) – tweak this according to preference. I also specified xelatex as the PDF engine because the default engine had fatal problems with some Unicode characters in the document. The table of contents defaults to three levels: if you want more detail then add –toc-depth=4 to the command.
At the end of all this, you’ll see you have a keras.pdf file. Load this into your favourite PDF viewer or print it out according to your preference.