需要从HTML文档中提取文本消息 [英] Need to extract text messages out of an HTML document

查看:111
本文介绍了需要从HTML文档中提取文本消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



<< p>< ; iframe class =goog-te-menu-frame skiptranslatesrc =javascript:void(0)frameborder =0style =display:none; visibility:visible;>< / iframe>< div class =chatbox3>< div class =chatbox2>< div class =chatbox>< div class =logwrapperstyle =top:89px; margin-right:168px; >< div class =logbox>< div style =position:relative; min-height:100%;>< div class =logitem>< p class =statuslog >您现在正在和一位随机陌生人聊天。 < / div>< div class =logitem>< p class =strangermsg>< strong class =msgsource>陌生人:< / strong>< < / span>< / span>< / p>< / div>< div class =logitem>< p class =strangermsg>< strong class =msgsource>陌生人:其中/强> < / span>< / span>< / p>< / div>< div class =logitem>< p class =strangermsg>< strong class =msgsource >陌生人:其中/强> < span>这是一个文本< / span>< / p>< / div>< div class =logitem>< p class =youmsg>< strong class =msgsource >您:其中/强> < / div>< span>< p class =statuslog>< / p>< / div>< div class =logitem>陌生人已断开连接< / p>< / div>< div class =logitem>< div class =statuslog> $ b

输出如下:



你现在正在和一个随机的陌生人聊天。陌生人:hii there陌生人:很高兴见到你陌生人:这是一段文字你不应该带这段文字陌生人已经断开。我想提取由陌生人发送到字符串(Visual Basic)的所有消息,并忽略由我发送的消息和系统消息,比如你现在正在和一个随机的陌生人聊天。 Sai hi!陌生人已断开连接。
我不知道应该如何处理此问题并需要帮助,谢谢。 / strong>

解决方案

如果其他人对这样的操作感兴趣,我设法通过应用HTML然后使用 Document.Body.InnerHtml 属性在richtextbox中获取文本输出,所以我可以轻松处理文本而不是处理HTML代码。

  OmegleHTML.Text = Omegle.Document.Body.InnerHtml 
WebBrowser1.Document.Body.InnerHtml = OmegleHTML。 Text
Log.Text = WebBrowser1.Document.Body.OuterText

我也用过以下代码可以在聊天记录之前删除任何不相关的文本:

  Dim SInd,Eind As Integer 
SInd = 0
Eind = Log.Text.IndexOf(你现在正在和一个陌生人聊天,说你好!)
Log.Text = Log.Text .Remove(SInd,Eind)

这是我得到的最接近的。如果您有更好的答案,请发布。


Hello, I have a long HTML document, this is only the part that interests me:

<iframe class="goog-te-menu-frame skiptranslate" src="javascript:void(0)" frameborder="0" style="display: none; visibility: visible;"></iframe><div class="chatbox3"><div class="chatbox2"><div class="chatbox"><div class="logwrapper" style="top: 89px; margin-right: 168px;"><div class="logbox"><div style="position: relative; min-height: 100%;"><div class="logitem"><p class="statuslog">You're now chatting with a random stranger. Say hi!</p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>hii there</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>nice to meet you</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>this is a text</span></p></div><div class="logitem"><p class="youmsg"><strong class="msgsource">You:</strong> <span>this text should not be taken</span></p></div><div class="logitem"><p class="statuslog">Stranger has disconnected.</p></div><div class="logitem"><div class="statuslog">

It outputs as follows:

You're now chatting with a random stranger. Say hi!

Stranger: hii there

Stranger: nice to meet you

Stranger: this is a text

You: this text should not be taken

Stranger has disconnected.

I want to extract all messages sent by Stranger into strings (Visual Basic), and ignore messages sent by me and system messages such as You are now chatting with a random stranger. Sai hi! and Stranger has disconnected. I have no idea on how I should approach this and need help, thank you.

解决方案

If anyone else is interested in such an operation, I've managed to simplify the process by applying the HTML code to another webbrowser then using the Document.Body.InnerHtml property to get the text output in a richtextbox, so I can easily deal with the text instead of dealing with the HTML code.

OmegleHTML.Text = Omegle.Document.Body.InnerHtml
WebBrowser1.Document.Body.InnerHtml = OmegleHTML.Text
Log.Text = WebBrowser1.Document.Body.OuterText

I've also used the following code to get rid of any irrelevant text before the chat log:

Dim SInd, Eind As Integer
SInd = 0
Eind = Log.Text.IndexOf("You're now chatting with a random stranger. Say hi!")
Log.Text = Log.Text.Remove(SInd, Eind)

This is the closest I've got. If you have a better answer, please post it.

这篇关于需要从HTML文档中提取文本消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆