使用python从LaTeX文件中提取特定部分 [英] Extract specific section from LaTeX file with python
问题描述
我有一组LaTeX文件.我想为每一个提取抽象"部分:
I have a set of LaTeX files. I would like to extract the "abstract" section for each one:
\begin{abstract}
.....
\end{abstract}
我在这里尝试过建议:如何解析LaTex文件
I have tried the suggestion here: How to Parse LaTex file
并尝试过:
A = re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data)
其中 data 包含来自 LaTeX 文件的文本.但是 A
只是一个空列表.任何帮助将不胜感激!
Where data contains the text from the LaTeX file. But A
is just an empty list. Any help would be greatly appreciated!
推荐答案
.*
不匹配换行符,除非给出re.S标志:
.*
does not match newlines unless the re.S flag is given:
re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)
示例
考虑此测试文件:
Example
Consider this test file:
\documentclass{report}
\usepackage[margin=1in]{geometry}
\usepackage{longtable}
\begin{document}
Title maybe
\begin{abstract}
Good stuff
\end{abstract}
Other stuff
\end{document}
这得到了摘要:
>>> import re
>>> data = open('a.tex').read()
>>> re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)
['\nGood stuff\n']
文档
在 re
模块的网页中:
Documentation
From the re
module's webpage:
re.S
re.DOTALL
re.S
re.DOTALL
设为."特殊字符与处的任何字符匹配全部,包括换行符;没有此标志,."会匹配任何东西除了换行符.
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
这篇关于使用python从LaTeX文件中提取特定部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!