需要从HTML文档中提取文本消息 [英] Need to extract text messages out of an HTML document
问题描述
输出如下: 你现在正在和一个随机的陌生人聊天。陌生人:hii there陌生人:很高兴见到你陌生人:这是一段文字你不应该带这段文字陌生人已经断开。我想提取由陌生人发送到字符串(Visual Basic)的所有消息,并忽略由我发送的消息和系统消息,比如 如果其他人对这样的操作感兴趣,我设法通过应用HTML然后使用 我也用过以下代码可以在聊天记录之前删除任何不相关的文本: 这是我得到的最接近的。如果您有更好的答案,请发布。 Hello, I have a long HTML document, this is only the part that interests me: It outputs as follows: You're now chatting with a random stranger. Say hi!<< p>< ; iframe class =goog-te-menu-frame skiptranslatesrc =javascript:void(0)frameborder =0style =display:none; visibility:visible;>< / iframe>< div class =chatbox3>< div class =chatbox2>< div class =chatbox>< div class =logwrapperstyle =top:89px; margin-right:168px; >< div class =logbox>< div style =position:relative; min-height:100%;>< div class =logitem>< p class =statuslog >您现在正在和一位随机陌生人聊天。 < / div>< div class =logitem>< p class =strangermsg>< strong class =msgsource>陌生人:< / strong>< < / span>< / span>< / p>< / div>< div class =logitem>< p class =strangermsg>< strong class =msgsource>陌生人:其中/强> < / span>< / span>< / p>< / div>< div class =logitem>< p class =strangermsg>< strong class =msgsource >陌生人:其中/强> < span>这是一个文本< / span>< / p>< / div>< div class =logitem>< p class =youmsg>< strong class =msgsource >您:其中/强> < / div>< span>< p class =statuslog>< / p>< / div>< div class =logitem>陌生人已断开连接< / p>< / div>< div class =logitem>< div class =statuslog> $ b
你现在正在和一个随机的陌生人聊天。 Sai hi!
和陌生人已断开连接。
我不知道应该如何处理此问题并需要帮助,谢谢。 / strong> Document.Body.InnerHtml
属性在richtextbox中获取文本输出,所以我可以轻松处理文本而不是处理HTML代码。
OmegleHTML.Text = Omegle.Document.Body.InnerHtml
WebBrowser1.Document.Body.InnerHtml = OmegleHTML。 Text
Log.Text = WebBrowser1.Document.Body.OuterText
Dim SInd,Eind As Integer
SInd = 0
Eind = Log.Text.IndexOf(你现在正在和一个陌生人聊天,说你好!)
Log.Text = Log.Text .Remove(SInd,Eind)
<iframe class="goog-te-menu-frame skiptranslate" src="javascript:void(0)" frameborder="0" style="display: none; visibility: visible;"></iframe><div class="chatbox3"><div class="chatbox2"><div class="chatbox"><div class="logwrapper" style="top: 89px; margin-right: 168px;"><div class="logbox"><div style="position: relative; min-height: 100%;"><div class="logitem"><p class="statuslog">You're now chatting with a random stranger. Say hi!</p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>hii there</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>nice to meet you</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>this is a text</span></p></div><div class="logitem"><p class="youmsg"><strong class="msgsource">You:</strong> <span>this text should not be taken</span></p></div><div class="logitem"><p class="statuslog">Stranger has disconnected.</p></div><div class="logitem"><div class="statuslog">
I want to extract all messages sent by Stranger into strings (Visual Basic), and ignore messages sent by me and system messages such as You are now chatting with a random stranger. Sai hi!
and Stranger has disconnected.
I have no idea on how I should approach this and need help, thank you.
If anyone else is interested in such an operation, I've managed to simplify the process by applying the HTML code to another webbrowser then using the Document.Body.InnerHtml
property to get the text output in a richtextbox, so I can easily deal with the text instead of dealing with the HTML code.
OmegleHTML.Text = Omegle.Document.Body.InnerHtml
WebBrowser1.Document.Body.InnerHtml = OmegleHTML.Text
Log.Text = WebBrowser1.Document.Body.OuterText
I've also used the following code to get rid of any irrelevant text before the chat log:
Dim SInd, Eind As Integer
SInd = 0
Eind = Log.Text.IndexOf("You're now chatting with a random stranger. Say hi!")
Log.Text = Log.Text.Remove(SInd, Eind)
This is the closest I've got. If you have a better answer, please post it.
这篇关于需要从HTML文档中提取文本消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!