如何让python-docx使用复杂脚本? [英] How to get python-docx working with complex scripts?

查看:149
本文介绍了如何让python-docx使用复杂脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个工作正常的docx生成器,该生成器对欧洲语言适用,并且我正尝试添加复杂的脚本支持.我发现了一些可以尝试的食谱问题: python-具有CTL(复杂文本布局)语言的docx add_style

I have a working docx generator which works fine for European languages, and I'm trying to add complex script support. I found another question with some recipes to try: python-docx add_style with CTL (Complex text layout) language

我设法使其正常工作,以便以正确的字体和大小显示复杂的脚本文本,但是我无法使双向(从右到左)文本起作用.明显的"x.font.rtl = True"不起作用,另一篇文章中给出的拼写也不起作用("lang.set(qn('w:bidi'),'fa-IR')).我必须从他的食谱中删除"rpr.get_or_add_sz()"行,这给我留下了一个无法读取的文件,但是如果没有它,其他一切都将起作用,而且我认为这与此问题无关.

I managed to get it working so that complex-script text comes out in the correct typeface and size, but I can't get bidirectional (right-to-left) text working. The obvious "x.font.rtl = True" doesn't work, and neither does the spell given in the other post ("lang.set(qn('w:bidi'),'fa-IR')"). I had to take out the line " "rpr.get_or_add_sz()" from his recipe, which left me with an unreadable file, but everything else works without it and I don't think that it's related to this problem.

以下是在生成的文档的styles.xml文件中显示的样式:

Here is the style as it appears in the generated document's styles.xml file:

<w:style w:styleId="Hebrew" w:type="paragraph" w:customStyle="1">
    <w:name w:val="Hebrew"/>
    <w:basedOn w:val="Normal"/>
    <w:pPr>
        <w:jc w:val="right"/>
    </w:pPr>
    <w:rPr>
        <w:rFonts w:cs="Arial"/>
        <w:rtl/>
        <w:szCs w:val="24"/>
        <w:lang w:bidi="he-IL"/>
    </w:rPr>
</w:style>

有人可以建议我如何使从右到左语言的段落正常工作吗?

Can anyone advise me on what to do to get paragraphs in right-to-left languages working?

推荐答案

根据上面的评论,在ROAR的帮助下(感谢ROAR!),我一切正常.

As per the comments above, and with much help from ROAR (thanks, ROAR!) I got everything working.

ROAR的食谱此处效果很好只是调用rpr.get_or_add_sz()给了我一个不可读的.docx文件.省略它可以使一切正常工作,并且似乎不会引起任何问题.关键的丢失链接是将以下内容添加到< w:pPr>中.样式:

ROAR's recipe here worked perfectly except that calling rpr.get_or_add_sz() gave me an unreadable .docx file. Leaving it out made everything work and didn't appear to cause any problems. The crucial missing link was to add the following to <w:pPr> in the style:

<w:bidi w:val="1">
<w:jc w:val="both"/>

有一个my_style.get_or_add_pPr()方法来获取对< w:pPr>的引用.样式,然后代码类似于更新< w:rPr>的代码:

There is a my_style.get_or_add_pPr() method to get a reference to the <w:pPr> section of the style, and the code is then similar to the code for updating <w:rPr>:

w_nsmap = '{'+ppr.nsmap['w']+'}'
bidi = None
jc = None
for element in ppr:
  if element.tag == w_nsmap + 'bidi':
    bidi = element
  if element.tag == w_nsmap + 'jc':
    jc = element
if bidi is None:
  bidi = OxmlElement('w:bidi')
if jc is None:
  jc = OxmlElement('w:jc')
bidi.set(qn('w:val'),'1')
jc.set(qn('w:val'),'both')
ppr.append(bidi)
ppr.append(jc)

我需要做的最后一件事是处理混合语言的文本,这是通过将文本分成多个部分来完成的.我正在处理的希伯来语文本的paras带有rtl = True的修改后的样式,但是我拆分了以字母开头和结尾的所有ASCII序列:

The final thing I needed was to deal with mixed-language text, which I did by breaking the text into multiple runs. The paras of Hebrew text I was dealing with were given the modified style with rtl=True, but I split out any ASCII sequences which started and ended with a letter:

[A-Za-z][\u0020-\u007e]*[A-Aa-z]

使用rtl = False分开运行.

into separate runs with rtl=False.

这篇关于如何让python-docx使用复杂脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆