"Python practical secret 04" adds text watermarks to pdf files in batches

Posted by deregular on Tue, 25 Jan 2022 05:42:21 +0100

The complete sample code and files of this article have been uploaded to my Github warehouse https://github.com/CNFeffery/PythonPracticalSkills

This is the fourth issue of my series "Python practical secrets". This series is based on the author's experience in using Python in his daily work. Each issue brings you a simple skill that you can learn in three minutes.

As the fourth issue of the series, we will learn how to batch add text watermarks to pdf files.

In some cases, we need to add text watermark to single or multiple pdf files, especially the text watermark that needs to be spread at a certain spacing on each page. With the help of two practical pdf file operation libraries, reportlab and pikepdf, we can easily add text watermark in batch.

After completing the installation with pip install reportlab pikepdf, we can follow the steps to realize the required functions:

  • Generates the specified text watermark pdf file

In order to add watermark to the target pdf file, we first need to have a separate text watermark file in pdf format. I wrote a convenient and easy-to-use function in reportlab to generate the watermark file. You can carefully learn the steps through annotation or call it directly:

from typing import Union, Tuple
from reportlab.lib import units
from reportlab.pdfgen import canvas
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

# Register the font. The font here is copied from the font directory of windows
pdfmetrics.registerFont(TTFont('msyh', r'./msyh.ttc'))

def create_watermark(content: str,
                     filename: str, 
                     width: Union[int, float], 
                     height: Union[int, float], 
                     font: str, 
                     fontsize: int,
                     angle: Union[int, float] = 45,
                     text_stroke_color_rgb: Tuple[int, int, int] = (0, 0, 0),
                     text_fill_color_rgb: Tuple[int, int, int] = (0, 0, 0),
                     text_fill_alpha: Union[int, float] = 1) -> None:
    '''
    Used to generate contains content Watermark of text content pdf file
    content: Watermark text content
    filename: Exported watermark file name
    width: Canvas width, unit: mm
    height: Canvas height, unit: mm
    font: Corresponding registered font code
    fontsize: Font size
    angle: Rotation angle
    text_stroke_color_rgb: Text outline rgb colour
    text_fill_color_rgb: Text fill rgb colour
    text_fill_alpha: Text transparency
    '''

    # Create a pdf file and specify the file name and size. Here, take the pixel unit as an example
    c = canvas.Canvas(f"{filename}.pdf", pagesize = (width*units.mm, height*units.mm))
    
    # Make slight canvas translation to ensure the integrity of the text
    c.translate(0.1*width*units.mm, 0.1*height*units.mm)
    
    # Set rotation angle
    c.rotate(angle)
    
    # Set font and size
    c.setFont(font, fontsize)
    
    # Set text outline color
    c.setStrokeColorRGB(*text_stroke_color_rgb)
    
    # Set text fill color
    c.setFillColorRGB(*text_fill_color_rgb)
    
    # Sets the transparency of the text fill color
    c.setFillAlpha(text_fill_alpha)
    
    # Draw text content
    c.drawString(0, 0, content)
    
    # Save watermark pdf file
    c.save()

Let's use this function to generate the watermark file:

# Manufacturing sample text watermark pdf file
create_watermark(content='The official account. Python [big data analysis] author: Frey', 
                 filename='Watermark example', 
                 width=200,
                 height=200, 
                 font='msyh', 
                 fontsize=35,
                 text_fill_alpha=0.3)

Look at the effect. It's very good. When you use it, you can adjust it yourself to find the watermark export result with satisfactory size and picture:

  • Batch overwrite the watermark file to the target pdf file

After generating the text watermark file, we can insert the ready-made watermark file into the target pdf file. Here we can easily realize it by using the relevant functions in pikepdf. I wrote a simple function. You only need to pass in a few necessary parameters when calling:

from typing import List
from pikepdf import Pdf, Page, Rectangle

def add_watermark(target_pdf_path: str,
                  watermark_pdf_path: str,
                  nrow: int,
                  ncol: int,
                  skip_pages: List[int] = []) -> None:
    '''
    To target pdf Add tile watermark to file
    target_pdf_path: target pdf Path to file+file name
    watermark_pdf_path: watermark pdf Path to file+file name
    nrow: Number of watermark tiles
    ncol: Number of watermark tiled columns
    skip_pages: Need to skip page sequence number without watermark (starting from 0)
    '''
    
    # Read in the pdf file that needs to be watermarked
    target_pdf = Pdf.open(target_pdf_path)
    
    # Read the watermark pdf file and extract the watermark page
    watermark_pdf = Pdf.open(watermark_pdf_path)
    watermark_page = watermark_pdf.pages[0]
    
    # Traverse all pages in the target pdf file (excluding several pages specified by skip_pages)
    for idx, target_page in enumerate(target_pdf.pages):
        
        if idx not in skip_pages:
            for x in range(ncol):
                for y in range(nrow):
                    # Adds a watermark to the specified range of the target page
                    target_page.add_overlay(watermark_page, Rectangle(target_page.trimbox[2] * x / ncol, 
                                                                      target_page.trimbox[3] * y / nrow,
                                                                      target_page.trimbox[2] * (x + 1) / ncol, 
                                                                      target_page.trimbox[3] * (y + 1) / nrow))
                    
    # Save the watermark result as a new pdf
    target_pdf.save(target_pdf_path[:-4]+'_Watermark added.pdf')

Next, we call this function directly to the example file [Wu Enda] machine learning training script - Chinese version In pdf, each page except the cover page is added with our example watermark according to the tiling density of 3 rows and 2 columns:

add_watermark(target_pdf_path='./[[Wu Enda] machine learning training script-Chinese version.pdf',
              watermark_pdf_path='./Watermark example.pdf',
              nrow=3,
              ncol=2,
              skip_pages=[0])

Readers can try more and get more experience~

This sharing is over. I'll see you later~ 👋

Topics: Python