如何在iOS中正确阅读PDF格式的中文 [英] how to read chinese from pdf in ios correctly

查看:118
本文介绍了如何在iOS中正确阅读PDF格式的中文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我所做的,但看起来很乱.预先感谢.

here is what I have done, but it appears disorderly. Thanks in advance.

1.使用 CGPDFStringCopyTextString 从pdf中获取文本

1.use CGPDFStringCopyTextString to get the text from the pdf

2.将NSString编码为char *

2.encode the NSString to char*

NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(kCFStringEncodingGB_18030_2000);
const char *char_content = [self.currentData cStringUsingEncoding:enc];

下面是我如何获取currentData的信息:

Below is how I get the currentData:

void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
  BIDViewController *pp = (__bridge BIDViewController*)userInfo;
  CGPDFArrayRef array;
  bool success = CGPDFScannerPopArray(inScanner, &array);
  for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 1)
  {
      if(n >= CGPDFArrayGetCount(array))
          continue;
      CGPDFStringRef string;
      success = CGPDFArrayGetString(array, n, &string);
      if(success)
      {
          NSString *data = (__bridge NSString *)CGPDFStringCopyTextString(string);
          [pp.currentData appendFormat:@"%@", data];
      }
  }
}
 - (IBAction)press:(id)sender {
    table = CGPDFOperatorTableCreate();
    CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
    CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
    self.currentData = [NSMutableString string];
    CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(pagerf);
    CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, (__bridge void *)(self));
    bool ret = CGPDFScannerScan(scanner);
}

推荐答案

根据

According to the Mac Developer Library CGPDFStringCopyTextString returns a CFString object that represents a PDF string as a text string. The PDF string is given as a CGPDFString which is a series of bytes—unsigned integer values in the range 0 to 255; thus, this method already decodes the bytes according to some character encoding.

没有明确给出任何编码,因此它假定一种编码类型,很可能是 PDFDocEncoding UTF-16BE Unicode字符编码方案,这两种编码可能用来表示PDF文档文档内容流之外的文本字符串,请参见. 7.9.2.2节文本字符串类型

It is given none explicitly, so it assumes one encoding type, most likely the PDFDocEncoding or the UTF-16BE Unicode character encoding scheme which are the two encodings that may be used to represent text strings in a PDF document outside the document’s content streams, cf. section 7.9.2.2 Text String Type and Table D.1, Annex D in the PDF specification.

现在您还没有从收到CGPDFString的位置告诉我们.不过,我假设您是从文档内容流之一中的收到的.另一方面,可以使用任何可以想象的编码来编码那里的文本字符串.所使用的编码由与字符串一起显示的字体的嵌入数据给出.

Now you have not told us from where you received your CGPDFString. I assume, though, that you received it from inside one of the document’s content streams. Text strings there, on the other hand, can be encoded with any imaginable encoding. The encoding used is given by the embedded data of the font the string is to be displayed with.

有关此的更多信息,您可能需要阅读 CGPDFScannerPopString返回奇怪的结果,然后看看 PDFKitten .

For more information on this you may want to read CGPDFScannerPopString returning strange result and have a look at PDFKitten.

这篇关于如何在iOS中正确阅读PDF格式的中文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆