分析c#中的html字符串 [英] Analysing html string in c#

查看:105
本文介绍了分析c#中的html字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经通过webclient.downloadstring(url)下载了html源代码。然后我保存在.txt文件中。现在,我曾经使用循环一次在线分析这些代码。但是这次html有点乱(不规则的换行符,空格)。我的意思是当我在该网站的chrome中查看源代码时,chrome会为我格式化,这很容易。但是在.txt文件中它没有格式化,所以我很难分析它。



首先我读了一行然后将它们分开,然后找东西。但是现在我无法跟踪事情,因为我不能猜到线路是什么,因为线条是不规则的。



任何想法?

谢谢。







我要重新提问。< br $> b $ b



事情是我愿意从壁纸网站中提取图像链接,图像类别,子类别等信息。我需要在特定标签中查找html代码中特定类的链接。我使用了字符串匹配算法。但有没有办法从标签爬行到标签,子标签,父标签?就像在javascript中使用DOM一样?

I have download html source via webclient.downloadstring(url). and I saved then in .txt file. Now I used to analysis those codes on line at a time using loops. But this time the html are bit messy(Irregular newlines, spaces). I mean when I view the source in chrome of that site, chrome formats it for me, and it''s easy. But in the .txt file it''s not formatted, so I''m having hard time to analyse it.

Like first I read the line then split them, then look for things. But now I can''t track things as I can''t guess what is in the line, as the lines are irregular.

Any ideas?
Thanks.



I''m going to re-question.


The thing is I''m willing to extract information like image links, image category, subcategory from a wallpaper website. I need to look for links within specific tags with specific classes in the html code. I''ve using string match algorithm. But is there a way to crawl from tag to tag, child tags, parent tags? Like using DOM in javascript?

推荐答案

这就是问题所在:你将问题的分析文本文件作为一般问题来解决,而不用担心文件内容的任何细节。但是这个普遍的问题不能得到一般解决方案,因为文本文件的概念并不确定。他们可以,嗯......任何东西。毕竟,HTML和XML文件也是文本文件,但你似乎没有问题。



-SA
Here is where the trouble lies: you formulated the problem with the difficulty of analyzing of text files as a general problem, without concerns of any detail of the file contents. But this general problem cannot have general solution, simply because the notion of "text file" is not anything certain. They can be, well… anything. After all, HTML and XML files are text files, too, but you seemingly don''t have problems with them.

—SA


你最好用ajax和jquery读取页面然后跟踪你想要的东西,我从来没有尝试过,但希望它能更快更好地工作。
better you read the page using ajax with jquery and then track what you want, i never tried but hope it will work faster and better.


这篇关于分析c#中的html字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆