Python/PyPDF4:如何在创建的 PDF 中指定/PageLabels? [英] Python/PyPDF4: How do I specify the /PageLabels in the created PDF?
问题描述
我正在使用 PyPDF4 创建一个 离线可读版本《自然》杂志.
I am using PyPDF4 to create an offline-readable version of the journal "Nature".
我使用 PyPDF4 PdfFileReader 阅读单个文章 PDF 并使用 PdfFileWriter 创建单个合并输出.
I use PyPDF4 PdfFileReader to read the individual article PDFs and PdfFileWriter to create a single, merged ouput.
我想解决的问题是有些问题的页码不是从1开始的,例如issue 7805 从第 563 页开始.
The problem that I am trying to solve is that the page numbers of some issues do not start at 1, for example, issue 7805 starts with page 563.
如何在文档目录中指定所需的/PageLabels
?
How do I specify the desired /PageLabels
in the document catalog?
for pdf_file in pdf_files:
input_pdf = PdfFileReader(open(pdf_file, 'rb'))
page_indices = file_page_dictionary[pdf_file]
for page_index in page_indices:
page = input_pdf.getPage(page_index)
# Specify actual page number here:
# page.setPageNumber(actual_page_numbers[page_index])
output.addPage(page)
with open(pdf_output_name, 'wb') as f:
output.write(f)
推荐答案
在探索了 PDF 标准和一些黑客之后,我发现以下函数将添加一个创建页面的 PageLabels
条目标签从偏移量开始(即第一页将被标记为偏移量,第二页,偏移量+1,等等).
After exploring the PDF standard and a bit of hacking, I found that the following function will add a single PageLabels
entry that creates page lables starting from offset (i.e. the first page will be labelled the offset, the second page, offset+1, etc.).
# output_pdf is an instance of PdfFileWriter().
# offset is the desired page offset.
def add_pagelabels(output_pdf, offset):
number_type = PDF.DictionaryObject()
number_type.update({PDF.NameObject("/S"):PDF.NameObject("/D")})
number_type.update({PDF.NameObject("/St"):PDF.NumberObject(offset)})
nums_array = PDF.ArrayObject()
nums_array.append(PDF.NumberObject(0)) # physical page index
nums_array.append(number_type)
page_numbers = PDF.DictionaryObject()
page_numbers.update({PDF.NameObject("/Nums"):nums_array})
page_labels = PDF.DictionaryObject()
page_labels.update({PDF.NameObject("/PageLabels"): page_numbers})
root_obj = output_pdf._root_object
root_obj.update(page_labels)
可以创建其他页面标签条目(即具有不同的偏移量或不同的编号样式).
Additional page label entries can be created (i.e. with different offsets or different numbering styles).
请注意,第一个 PDF 页面的索引为 0.
Note that the first PDF page has an index of 0.
# Use PyPDF to manipulate pages
from PyPDF4 import PdfFileWriter, PdfFileReader
# To manipulate the PDF dictionary
import PyPDF4.pdf as PDF
def pdf_pagelabels_roman():
number_type = PDF.DictionaryObject()
number_type.update({PDF.NameObject("/S"):PDF.NameObject("/r")})
return number_type
def pdf_pagelabels_decimal():
number_type = PDF.DictionaryObject()
number_type.update({PDF.NameObject("/S"):PDF.NameObject("/D")})
return number_type
def pdf_pagelabels_decimal_with_offset(offset):
number_type = pdf_pagelabels_decimal()
number_type.update({PDF.NameObject("/St"):PDF.NumberObject(offset)})
return number_type
...
nums_array = PDF.ArrayObject()
# Each entry consists of an index followed by a page label...
nums_array.append(PDF.NumberObject(0)) # Page 0:
nums_array.append(pdf_pagelabels_roman()) # Roman numerals
# Each entry consists of an index followed by a page label...
nums_array.append(PDF.NumberObject(1)) # Page 1 -- 10:
nums_array.append(pdf_pagelabels_decimal_with_offset(first_offset)) # Decimal numbers, with Offset
# Each entry consists of an index followed by a page label...
nums_array.append(PDF.NumberObject(10)) # Page 11 --> :
nums_array.append(pdf_pagelabels_decimal_with_offset(second_offset))
page_numbers = PDF.DictionaryObject()
page_numbers.update({PDF.NameObject("/Nums"):nums_array})
page_labels = PDF.DictionaryObject()
page_labels.update({PDF.NameObject("/PageLabels"): page_numbers})
root_obj = output._root_object
root_obj.update(page_labels)
这篇关于Python/PyPDF4:如何在创建的 PDF 中指定/PageLabels?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!