我怎样才能读取包含HTML的Lync对话文件? [英] How can I read a Lync conversation file containing HTML?
问题描述
我无法读本地文件,为一个字符串,在C#
I'm having trouble reading a local file, into a string, in c#.
下面是我想出了到现在:
Here's what I came up with till now:
string file = @"C:\script_test\{5461EC8C-89E6-40D1-8525-774340083829}.html";
using (StreamReader reader = new StreamReader(file))
{
string line = "";
while ((line = reader.ReadLine()) != null)
{
textBox1.Text += line.ToString();
}
}
和它似乎工作的唯一解决方案。
And it's the only solution that seems to work.
我已经尝试读取文件其他一些建议的方法,如:
I've tried some other suggested methods for reading a file, such as:
string file = @"C:\script_test\{5461EC8C-89E6-40D1-8525-774340083829}.html";
string html = File.ReadAllText(file).ToString();
textBox1.Text += html;
然而,正如预期它不工作
Yet it does not work as expected.
下面是文件我要看书的前几行:
Here are the first few lines of the file i'm trying to read:
你可以看到,它有一些古怪的人物,说实话我不知道如果这是这个怪异行为的原因。
as you can see, it has some funky characters, honestly I don't know if that's the cause of this weird behavior.
但在第一种情况下,代码似乎跳过这些行,只有打印通过Office Communicator的生成的文档.. 。
But in the first case, the code seems to skip those lines, printing only "Document generated by Office Communicator..."
推荐答案
我不知道这是否是回答这个正确的方式,但这里是我已经成功地做到目前为止:
I don't know if it's the right way to answer this, but here's what I've managed to do so far:
string file = @"C:\script_test\{1C0365BC-54C6-4D31-A1C1-586C4575F9EA}.hist";
string outText = "";
//Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
StreamReader reader = new StreamReader(file, utf8);
char[] text = reader.ReadToEnd().ToCharArray();
//skip first n chars
/*
for (int i = 250; i < text.Length; i++)
{
outText += text[i];
}
*/
for (int i = 0; i < text.Length; i++)
{
//skips non printable characters
if (!Char.IsControl(text[i]))
{
outText += text[i];
}
}
string source = "";
source = WebUtility.HtmlDecode(outText);
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(source);
string html = "<html><style>";
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//style"))
{
html += node.InnerHtml+ Environment.NewLine;
}
html += "</style><body>";
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body"))
{
html += node.InnerHtml + Environment.NewLine;
}
html += "</body></html>";
richTextBox1.Text += html+Environment.NewLine;
webBrowser1.DocumentText = html;
谈话显示正确,风格和编码。
The conversation displays correctly, both style and encoding.
所以这对我来说是一个开始。
So it's a start for me.
谢谢大家的支持!
修改
Char.IsControl(char)
跳过非打印字符:)
这篇关于我怎样才能读取包含HTML的Lync对话文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!