使用 Apache POI XWPFDocument 编写 RTL 语言时的符号(撇号、括号)问题 [英] problems with symbols (apostrophe, parenthesis) when writing RTL language with Apache POI XWPFDocument

查看:36
本文介绍了使用 Apache POI XWPFDocument 编写 RTL 语言时的符号(撇号、括号)问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试将希伯来语数据从 excel 文件复制到文档中.虽然字母本身被正确复制,但只要涉及某些符号,它就会变得一团糟.

i've been trying to copy Hebrew data from excel files into a document. while the letters themselves were copied correctly, it got a betty messy whenever some symbols were involved.

例如:我得到的不是 (text),而是 )text(

for example: instead of (text), i got )text(

这是我目前的代码:

XWPFParagraph newPara = document.insertNewParagraph(cursor);
newPara.setAlignment (ParagraphAlignment.RIGHT); 
CTP ctp = newPara.getCTP();
CTPPr ctppr;
if ((ctppr = ctp.getPPr()) == null) ctppr = ctp.addNewPPr();
ctppr.addNewBidi().setVal(STOnOff.ON);
XWPFRun newParaRun = newPara.createRun();
newParaRun.setText(name);

我尝试了一些双向文本方向支持";(bidi) 线

i've tried some "bidirectional text direction support" (bidi) lines

(从这里得到它:如何改变文本方向(不是段落对齐)) 在 apache poi word 文档中?(XWPF) )

但这不是那个,也与对齐无关...

but it's not that, nor has to do with alignment...

推荐答案

使用较旧的文字处理软件应用程序时,当 LTR 字符和 RTL 字符在一个文本运行中混合时似乎会出现问题.然后使用特殊的 BiDi 字符类型可能是解决方案.请参阅https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types.

Using older word processing software applications there seems to be problems when LTR characters and RTL characters gets mixed in one text run. Then using special BiDi character types might be the solution. See https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types.

另请参阅使用 Aphace POI 的双向 Word 文档.

使用它有以下作用:

import java.io.FileOutputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff;

public class CreateWordRTLParagraph {

 public static void main(String[] args) throws Exception {

  XWPFDocument doc= new XWPFDocument();

  XWPFParagraph paragraph = doc.createParagraph();
  XWPFRun run = paragraph.createRun();
  run.setText("Paragraph 1 LTR");

  paragraph = doc.createParagraph();

  CTP ctp = paragraph.getCTP();
  CTPPr ctppr;
  if ((ctppr = ctp.getPPr()) == null) ctppr = ctp.addNewPPr();
  ctppr.addNewBidi().setVal(STOnOff.ON);

  run = paragraph.createRun();
  String line = "(שָׁלוֹם)";
  run.setText("\u202E" + line + "\u202C");

  paragraph = doc.createParagraph();
  run = paragraph.createRun();
  run.setText("Paragraph 3 LTR");
    
  FileOutputStream out = new FileOutputStream("WordDocument.docx");
  doc.write(out);
  out.close();
  doc.close();    
 }
}

它在具有 LTR 字符(())的文本行之前使用 U+202E RIGHT-TO-LEFT OVERRIDE (RLO)和 RTL 字符 (שָׁלוֹם) 混合和 U+202C POP DIRECTIONAL FORMATTING (PDF) 在该文本行之后.这会准确地告诉文字处理软件 RTL 的开始和结束位置.这导致我使用 MS Word 365WordPad 获得正确的输出.

It uses U+202E RIGHT-TO-LEFT OVERRIDE (RLO) before the text line having LTR charcters (( and )) and RTL characters (שָׁלוֹם) mixed and U+202C POP DIRECTIONAL FORMATTING (PDF) after that text line. That tells the word processing software exactly where RTL starts and ends. That leads to correct output for me using MS Word 365 and WordPad.

Using apache poi 5.0.0 for Bidi .setVal(STOnOff.ON) 不太可能,但 .setVal(true) 可以使用:

Using apache poi 5.0.0 for Bidi .setVal(STOnOff.ON) is not more possible but .setVal(true) can be used:

  //ctppr.addNewBidi().setVal(STOnOff.ON); // up to apache poi 4.1.2
  ctppr.addNewBidi().setVal(true); // from apache poi 5.0.0 on

这篇关于使用 Apache POI XWPFDocument 编写 RTL 语言时的符号(撇号、括号)问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆