Pandoc+TeXLive realize Markdown to PDF

Posted by Sk8Er_GuY on Sun, 12 Dec 2021 13:18:12 +0100

1, Foreword

Before Analysis of cloud document hosting scheme As mentioned in, we adopt the scheme of Pandoc+TeXLive to realize Markdown to PDF. Next, we will introduce the scheme in detail.

2, Pandoc

Pandoc It is a free and open source general document conversion tool, which supports format conversion between a large number of markup languages, such as Markdown, HTML, LaTex, PDF, Microsoft Word, etc. its source code is placed in GitHub Warehouse.

Note: markup language is a kind of text coding that combines text and other information related to text to show the details of document structure and data processing. Other information related to the text (including text structure and representation information) is combined with the original text, but marked.
Markup language is more than just a language. Like many languages, it needs a runtime environment to make it useful. The elements that provide the runtime environment are called user agents.

2.1 installing Pandoc

  1. Go to the official website Installation page , Download Pandoc according to the corresponding system.
  2. Here, take the Windows platform as an example to download pandoc-2.14.2-windows-x86_64.zip.
  3. Unzip pandoc-2.14 2-windows-x86_ After 64.zip, add pandoc.zip Add the path of exe to the environment variable.
  4. Execute the following command in the terminal to check whether the installation is successful:
pandoc –v

The detailed version information displayed in the command result indicates that the installation is successful, as follows:

pandoc.exe 2.14.2
Compiled with pandoc-types 1.22, texmath 0.12.3.1, skylighting 0.11,
citeproc 0.5, ipynb 0.1.0.1
User data directory: C:\Users\liuyuxin\AppData\Roaming\pandoc
Copyright (C) 2006-2021 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

2.2 basic usage

  1. To view command parameters:
pandoc -h
  1. To convert Word to Markdown:
pandoc --extract-media ./images README.docx -o README.md

Note: the – extract media parameter is used to specify The path of the exported picture in the docx file is specified here as the images folder in the current directory.

  1. Convert Markdown to LaTex:
pandoc --toc –H head.tex README.md -o README.tex

You can see more introductions Pandoc user manual In addition, the official also provided a Online conversion tool as well as Various conversion examples.

2.3 conversion principle

Pandoc converts Markdown documents into PDF files, which actually includes two steps:

  • Step 1: convert Markdown to LaTeX source file.
  • Part II: call the pdflatex, xelatex or other Tex commands of the system to render the LaTex source file into a PDF file.

3, TeXLive

3.1 TeX

First, we need to know what TeX is?

TeX is a typesetting system. It provides a powerful and flexible typesetting language. It has more than 900 instructions and supports macro functions. Users can expand functions by constantly defining new commands.

Note: TeX was born mainly for the rapid production of high-quality science and technology and mathematics printed documents.

3.2 LaTeX

LaTeX is a typesetting system based on tex. LaTeX uses tex typesetting program to format its output, and is written in tex macro language. It is the most popular and widely used tex extended version at present.

Tex has many distributions, such as TeXLive (cross platform), MacTeX, MiKTeX, and so on.

3.3 TeX release

If you want to render LaTeX source files to PDF, you must use the TeX rendering engine, the so-called TeX distribution, which is responsible for rendering the contents into PDF files that can be read and printed according to LaTeX syntax.

Here we use TeXLive , it can be downloaded on the shared server, mainly considering the following four advantages:

  • free
  • Cross platform;
  • Contains a xelatex engine that supports Chinese Unicode encoding;
  • Built in simple TeX editor TeXwoks, which can quickly preview the effect;

3.4 installation of TeXLive

  1. go to TeXLive Download the image of the corresponding version on the official website. If the speed is slow, you can go to Domestic image Download, Download texlive2020 here iso.
  2. After following the default steps, add [installation directory] \ texlive20\bin\win32 to the environment variable.
  3. Execute the following command in the terminal to check whether the installation is successful:
tex -v
latex -v
xelatex -v
pdflatex -v

The command result displays detailed version information, which indicates that the installation is successful. In the following, the xelatex rendering engine is mainly used. Execute xelatex -v here, and the results are as follows:

XeTeX 3.14159265-2.6-0.999992 (TeX Live 2020/W32TeX)
kpathsea version 6.3.2
Copyright 2020 SIL International, Jonathan Kew and Khaled Hosny.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the XeTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the XeTeX source.
Primary author of XeTeX: Jonathan Kew.
Compiled with ICU version 65.1; using 65.1
Compiled with zlib version 1.2.11; using 1.2.11
Compiled with FreeType2 version 2.10.1; using 2.10.1
Compiled with Graphite2 version 1.3.13; using 1.3.13
Compiled with HarfBuzz version 2.6.4; using 2.6.4
Compiled with libpng version 1.6.37; using 1.6.37
Compiled with poppler version 0.68.0
Compiled with fontconfig version 2.13.92; using 2.13.92

3.5 Latex basic syntax

LaTeX is like a programming language. Each time you use special functions, you need to import relevant packages. Finally, PDF files are generated through the compilation of the rendering engine. Its language structure is as follows:

%% Define document information
\documentclass{article}  %% Specify the document type, commonly used are report,article,book,letter wait

%% Import the macro package you need to use
\usepackage{geometry}    %% Import macro package for document page setup (macro command Collection)
\usepackage{graphicx}    %% Import graphics macro package, support inserting pictures
\usepackage{longtable}   %% Import long table macro package and support inserting tables

%% Page configuration
\geometry{left=2.5cm,right=2.5cm,top=2.5cm,bottom=2.5cm} %% Use the commands in the imported macro package to set the margins

%% Define document information
\title{My first Latex document} %% document title
\author{YuXin Liu}              %% Document author
\date{2021/12/08}               %% Document time

%% Indicates the beginning of the document content
\begin{document}    

\maketitle          %% Render (print) document information
\newpage            %% Start another page
\tableofcontents    %% Rendering directory
\newpage            %% Start another page
\section{section}   %% Add chapter
Hello world!
\subsection{subsection} %% Add sub chapter
AWTK!!!!\tiny{Designer} 
\subsubsection{subsubsection} %% Add sub chapter·
\Huge{LaTeX} 
    
\centerline{  %% The content of this line is centered (here it means the picture is centered)
  \includegraphics[width=0.5\textwidth]{images//Pikachu.jpg}%% render picture
}

\normalsize 
this is a image.
this is a image. \\\\

%% Add table
\begin{longtable}{|c|c|r|r|r|r|r|r|r|l|}
    \caption{caption}       %% Table title
    \label{table:label}  \\ %% Add table labels
    \hline                  %% Add horizontal line
    line1   &   line2   &   $t_1$   &   $t_{12}$    &   $t_2$       &   $r$(\%)&    $D$(GB)&    $D_{nc}(GB)$&$G_t$(\%)&Station\\    
    \hline
    % Data per row
    10      &   2       &   0:22:00 &   9:46:00 &   2:00:00 &   80.49   &   159.18  &   302.25  &   89.88   &   Cours Dillon    \\
    204     &   205     &   2:01:00 &   2:57:00 &   1:11:00 &   47.97   &   95.21   &   138.43  &   45.38   &   Ayguevives Collège  \\
    % More data
    \hline
\end{longtable}

%% Indicates the end of the document content
\end{document}

Render effect:

3.6 LaTeX related data sharing

  1. LaTeX online compiler TeXPage : you can edit online, preview the effect, and download some mature templates.
  2. Learn the basic syntax of LaTeX: Latex basic grammar - Zhihu (zhihu.com).
  3. Learn about macro package commands commonly used in LaTeX (Chinese): CTEX - online documentation - TeX/LaTeX common macro packages.
  4. LaTex domestic template base, forum, example and knowledge base: LaTeX studio (latexstudio.net).
  5. TexLive domestic image resources: Index of /pub/tex/historic/systems/texlive (utah.edu).

After installing TexLive, use the terminal command to view the official documents (English):

  • View all user manual Collections: texdoc texdoc
  • Configuration Guide: texdoc cfgguide
  • Font Guide: texdoc fntguide
  • Description of any macro package in TexLive: texdoc [macro package name], for example, check the description of graphics macro package: texdoc graphic

4, Conversion skills

4.1 processing Chinese

The pdflatex command used by Pandoc by default cannot process Unicode characters. If the markdown contains Chinese, an error will be reported during the conversion to PDF. You need to use xelatex to process Chinese, and use the CJKmainfont option to specify a font that supports Chinese.

In Win, for pandoc version 2.0 and above, the commands for generating PDF files are as follows:

pandoc --pdf-engine=xelatex -V CJKmainfont="Microsoft YaHei" README.md -o README.pdf

CJKmainfont specifies the font names that support Chinese. To find these fonts, you first need to know the language code, such as zh in Chinese, and then use the following command to view all fonts that support this language in the system:

fc-list :lang=zh

Note: FC list is usually pre installed in Unix system. Other platforms need to search and install dependent packages by themselves. You can directly view the C:\Windows\Fonts directory in Windows.

  • Win7 can access: control panel \ all control panel items \ fonts.
  • Win10 can view the font name through [setting] = > [personalization] = > [font].

4.2 add document information using YAML header

Pandoc supports header in YAML format. Through header, you can specify the title, author, update time and other information of the article. An example header is as follows:

---
title: "My title"
author: "author"
date: 2021-12-12
---

4.3 code highlighting

Pandoc supports adding background highlighting to the code in block code, provides different highlighted topics, and supports many languages. To list the highlighting schemes provided by pandoc, use the following command:

pandoc --list-highlight-styles

To list all supported languages, use the following command:

pandoc --list-highlight-languages

To use syntax highlighting, the block code in the Markdown file must specify the language and use the – highlight style option on the command line, for example:

pandoc --pdf-engine=xelatex --highlight-style tango README.md -o README.pdf

4.4 hyperlink style

According to the instructions of Pandoc user guide, we can add colors to various links through the colorlinks option to distinguish them from ordinary text. At the same time, in order to accurately control the colors of different types of links, Pandoc also provides personalization options for different link colors:

colorlinks
    add color to link text; automatically enabled if any of linkcolor, filecolor, citecolor, urlcolor, or toccolor are set

linkcolor, filecolor, citecolor, urlcolor, toccolor
    color for internal links, external links, citation links, linked URLs, and
    links in table of contents, respectively: uses options allowed by xcolor,
    including the dvipsnames, svgnames, and x11names lists

For example, if we want to color the URL link and set urlcolor to NavyBlue, we can use the following command:

pandoc --pdf-engine=xelatex -V colorlinks -V urlcolor=NavyBlue README.md -o README.pdf

The colors of other links can be set as described above.

4.5 adding numbers to section s

By default, the generated PDF does not contain a directory, and the titles at all levels do not contain numbers. Only the font size changes. To add numbers to each section, you can use the - N option:

pandoc --pdf-engine=xelatex -N -o README.pdf README.md

4.6 adding a directory to a document

With the directory, you can use the -- toc option:

pandoc --pdf-engine=xelatex --toc -o README.pdf README.md

4.7 modify PDF margins

The PDF margin generated using the default settings is too large, according to Pandoc official FAQ , you can change the margin using the following options:

-V geometry:"top=2cm, bottom=1.5cm, left=2cm, right=2cm"

The complete command is:

pandoc --pdf-engine=xelatex -V geometry:"top=1.5cm, bottom=1.5cm, left=2cm, right=2cm" -o README.pdf README.md

4.8 code, reference or list rendering failure

The reason is that a blank line is required before block quote, list and table in Pandoc. In addition, each line in the block quote is rendered as PDF, which fails to wrap correctly, and the text of all lines runs to one line, which can be solved by forcing spaces to be added after each line of the original block quote.

4.9 code plus background color

When converting Markdown to PDF, Pandoc uses \ textt in LaTeX to represent inline code, in which the color variable shadecolor is defined as the code background color. To increase the readability of the inline code, we can modify the shadecolor in the \ texttt command to add a background color to the text. First, create the head Tex file, in which the following commands are added:

%% set up shade Background color (code background color)
\usepackage{color,framed}
\definecolor{shadecolor}{RGB}{235,235,235}

When converting Markdown files using Pandoc, add the - H option to reference head Tex file, for example:

pandoc --pdf-engine=xelatex -H head.tex README.md -o README.pdf

4.10 use head Textconfigure PDF parameters

Many options and settings are required to convert Markdown to PDF. Writing these settings on the command line is not only a waste of time, but also not conducive to minor modifications. Therefore, some commonly used commands can be put in head Tex file, and then reference the Markdown file when converting it.

For example, you can put the commands of setting page width, coloring code background and setting link color into head In tex:

%% set margins
\usepackage[top=2cm, bottom=2cm, left=1.5cm, right=1.5cm]{geometry}

%% For Chinese word wrap
\XeTeXlinebreaklocale "zh"

%% Set row spacing 1.5 times
\linespread{1.5}\selectfont

%% Set first line indent
\usepackage{indentfirst}
\setlength{\parindent}{2em}

%% Distance between paragraphs
\setlength{\parskip}{3pt} 	

%% Add 0 between words pt To 1 pt To ensure left-right alignment
\XeTeXlinebreakskip = 0pt plus 1pt

%% Set header and footer
\usepackage{fancyhdr}
\pagestyle{fancy}
\lhead{title} 
\chead{}
\rhead{version}
\lfoot{ZLG}
\cfoot{@2021 Guangzhou ZHIYUAN Electronics Co.,Ltd.}
\rfoot{\thepage}
\renewcommand{\headrulewidth}{0.4pt}  %% Header split line width
\renewcommand{\footrulewidth}{0.4pt}  %% Footer split line width

%% Set header background color
\usepackage{colortbl}
\definecolor{tableheadcolor}{RGB}{225,225,225}

%% set up shade Background color (code background color)
\usepackage{color,framed}
\definecolor{shadecolor}{RGB}{235,235,235}

%% set up block quote Style of
%% Because I don't know how to set it uniformly quote Background color, first reference shade Background color
%% Distinguish by left and right margins quote
\usepackage{quoting}
\newenvironment{shadedquotation}
 {\begin{shaded*}
  \quoting[leftmargin=1em, rightmargin=1em, vskip=0pt, font=itshape]
 }
 {\endquoting
 \end{shaded*}
 }

% quote application shadedquotation environment
\def\quote{\shadedquotation}
\def\endquote{\endshadedquotation}

4.11 setting picture size

To specify the size of a picture when converting markdown to PDF, you only need to add {width=xx%} after the referenced picture in markdown document. It should be noted that the percentage here is based on the maximum text width of each line in PDF document, that is, when {width=100%}, the image will occupy the PDF horizontally in proportion (excluding the left and right margins).

![image](./images/image.png){width=70%}

Topics: Latex pdf