编码UTF-16之后,如果要在iTextSharp中使用,则字符串会损坏 [英] After encoding UTF-16, the string is broken if I want to use in iTextSharp

查看:93
本文介绍了编码UTF-16之后,如果要在iTextSharp中使用,则字符串会损坏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我从文本文件中获取一些信息,然后将这些信息添加到pdf文件的元数据中.在生产者"部分中,发生了土耳其语字符ğş的错误.我通过使用 UTF-16 这样解决了这个问题:

Firstly I am getting some informations from a text file, later these informations are added to pdf files' meta data. In the "Producer" section an error was occured about Turkish characters as ğ, ş. And I solved the problem via using UTF-16 like this:

write.Info.Put(new PdfName("Producer"), new PdfString("Ankara Üniversitesi Hukuk Fakültesi Dergisi (AÜHFD), C.59, S.2, y.2010, s.309-334.", "UTF-16"));

这是屏幕截图:然后,我将使用 foreach 循环获取所有pdf文件,并读取元数据并将其插入SQLite数据库文件中.问题就在这里发生.因为当我要从pdf文件获取并设置为数据库文件 UTF-16 编码的字符串(生产者数据)时,它会出现如下奇怪的字符:

Here is the screenshot: Then, I am getting all pdf files with foreach loop and reading meta data and insert into SQLite database file. The problem occurs right here. Because when I want to get from pdf file and set to database file UTF-16 encoded string (Producer data), it arises strange characters like this:

我不明白,为什么会发生错误.

I don't understand, why it occurs error.

这是我的所有代码.以下代码从文本文件获取元数据,并插入pdf文件的meta元部分:

Here is my all codes. The following codes get meta data from text file and insert pdf files' meta meta section:

    var articles = Directory.GetFiles(FILE_PATH, "*.pdf");
    foreach (var article in articles)
    {
        var file_name = Path.GetFileName(article);
        var read = new PdfReader(article);
        var size = read.GetPageSizeWithRotation(1);
        var doc = new Document(size);
        var write = PdfWriter.GetInstance(doc, new FileStream(TEMP_PATH + file_name, FileMode.Create, FileAccess.Write));
        // Article file names like, 1.pdf, 2.pdf, 3.pdf....
        // article_meta_data.txt file content like this: 
        //1@Article 1 Tag Number@Article 1 first - last page number@Article 1 Title@Article 1 Author@Article 1 Subject@Article 1 Keywords
        //2@Article 2 Tag Number@Article 2 first - last page number@Article 2 Title@Article 2 Author@Article 2 Subject@Article 2 Keywords
        //3@Article 3 Tag Number@Article 3 first - last page number@Article 3 Title@Article 3 Author@Article 3 Subject@Article 3 Keywords
        var pdf_file_name = Convert.ToInt32(Path.GetFileNameWithoutExtension(article)) - 1;
        var line = File.ReadAllLines(FILE_PATH + @"article_meta_data.txt");
         var info = line[pdf_file_name].Split('@');

            var producer = Kunye(info); // It returns like: Ankara Üniversitesi Hukuk Fakültesi Dergisi (AÜHFD), C.59, S.2, y.2010, s.309-334.
            var keywords = string.IsNullOrEmpty(info[6]) ? "" : info[6];
            doc.AddTitle(info[3]);
            doc.AddSubject(info[5]);
            doc.AddCreator("UzPDF");
            doc.AddAuthor(info[4]);
            write.Info.Put(new PdfName("Producer"), new PdfString(producer, "UTF-16"));
            doc.AddKeywords(keywords);
            doc.Open();
            var cb = write.DirectContent;
            for (var page_number = 1; page_number <= read.NumberOfPages; page_number++)
            {
                doc.NewPage();
                var page = write.GetImportedPage(read, page_number);
                cb.AddTemplate(page, 0, 0);
            }
            doc.Close();
            read.Close();
            File.Delete(article);
            File.Move(TEMP_PATH + file_name, FILE_PATH + file_name);
    }

以下代码从文件中获取数据并插入SQLite数据库文件.对于数据库操作,我正在使用 Devart-SQLite的dotConnect .

And the following codes get data from files and insert SQLite database file. For database operation, I am using Devart - dotConnect for SQLite.

    var files = Directory.GetFiles(FILE_PATH, "*.pdf");
    var connection = new Linq2SQLiteDataContext();
    TruncateTable(connection);
    var i = 1;
    foreach (var file in files)
    {
        var read = new PdfReader(file);
        var title = read.Info["Title"].Trim();
        var author = read.Info["Author"].Trim();
        var producer = read.Info["Producer"].Trim();
        var file_name = Path.GetFileName(file)?.Trim();
        var subject = read.Info["Subject"].Trim();
        var keywords = read.Info["Keywords"].Trim();
        var art = new article
        {
            id = i,
            title = (title.Length > 255) ? title.Substring(0, 255) : title,
            author = (author.Length > 100) ? author.Substring(0, 100) : author,
            producer = (producer.Length > 255) ? producer.Substring(0, 255) : producer,
            filename = file_name != null && (file_name.Length > 50) ? file_name.Substring(0, 50) : file_name,
            subject = (subject.Length > 50) ? subject.Substring(0, 50) : subject,
            keywords = (keywords.Length > 500) ? keywords.Substring(0, 500) : keywords,
            createdate = File.GetCreationTime(file),
            update = File.GetLastWriteTime(file)
        };
        connection.articles.InsertOnSubmit(art);
        i++;
    }
    connection.SubmitChanges();

推荐答案

而不是:

new PdfString(producer, "UTF-16")

使用:

new PdfString(producer, PdfString.TEXT_UNICODE)

UTF-16是一种存储Unicode值的特定方法,但您不必担心,iText会为您处理一切.

UTF-16 is a specific way to store Unicode values but you don't need to worry about that, iText will take care of everything for you.

这篇关于编码UTF-16之后,如果要在iTextSharp中使用,则字符串会损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆