我怎样才能读取包含HTML的Lync对话文件? [英] How can I read a Lync conversation file containing HTML?

查看:335
本文介绍了我怎样才能读取包含HTML的Lync对话文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法读本地文件,为一个字符串,在C#

I'm having trouble reading a local file, into a string, in c#.

下面是我想出了到现在:

Here's what I came up with till now:

 string file = @"C:\script_test\{5461EC8C-89E6-40D1-8525-774340083829}.html";
 using (StreamReader reader = new StreamReader(file))
 {
      string line = "";
      while ((line = reader.ReadLine()) != null)
      {
           textBox1.Text += line.ToString();
      }
 }

和它似乎工作的唯一解决方案。

And it's the only solution that seems to work.

我已经尝试读取文件其他一些建议的方法,如:

I've tried some other suggested methods for reading a file, such as:

string file = @"C:\script_test\{5461EC8C-89E6-40D1-8525-774340083829}.html";
string html = File.ReadAllText(file).ToString();
textBox1.Text += html;

然而,正如预期它不工作

Yet it does not work as expected.

下面是文件我要看书的前几行:

Here are the first few lines of the file i'm trying to read:

你可以看到,它有一些古怪的人物,说实话我不知道如果这是这个怪异行为的原因。

as you can see, it has some funky characters, honestly I don't know if that's the cause of this weird behavior.

但在第一种情况下,代码似乎跳过这些行,只有打印通过Office Communicator的生成的文档.. 。

But in the first case, the code seems to skip those lines, printing only "Document generated by Office Communicator..."

推荐答案

我不知道这是否是回答这个正确的方式,但这里是我已经成功地做到目前为止:

I don't know if it's the right way to answer this, but here's what I've managed to do so far:

        string file = @"C:\script_test\{1C0365BC-54C6-4D31-A1C1-586C4575F9EA}.hist";
                    string outText = "";
        //Encoding iso = Encoding.GetEncoding("ISO-8859-1");
        Encoding utf8 = Encoding.UTF8;
        StreamReader reader = new StreamReader(file, utf8);
        char[] text = reader.ReadToEnd().ToCharArray();
        //skip first n chars
        /*
        for (int i = 250; i < text.Length; i++)
        {
            outText += text[i];
        }
        */
        for (int i = 0; i < text.Length; i++)
        {
            //skips non printable characters
            if (!Char.IsControl(text[i]))
            {
                outText += text[i];
            }
        }
        string source = "";
        source = WebUtility.HtmlDecode(outText);
        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.LoadHtml(source);

        string html = "<html><style>";
        foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//style"))
        {
            html += node.InnerHtml+ Environment.NewLine;
        }
        html += "</style><body>";
        foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body"))
        {
            html += node.InnerHtml + Environment.NewLine;
        }
        html += "</body></html>";
        richTextBox1.Text += html+Environment.NewLine;

        webBrowser1.DocumentText = html;

谈话显示正确,风格和编码。

The conversation displays correctly, both style and encoding.

所以这对我来说是一个开始。

So it's a start for me.

谢谢大家的支持!

修改

Char.IsControl(char)

跳过非打印字符:)

这篇关于我怎样才能读取包含HTML的Lync对话文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆