如何制作简单的搜索程序? [英] How to make simple searching program ?

查看:202
本文介绍了如何制作简单的搜索程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好。

我需要一个程序/算法,它将搜索指定的短语,并将包含它们的网站的网址保存在文本文件中。



我知道C#和一点点Java(还在学习它)。

也许你可以告诉我我需要做什么。



我尝试了什么:



我尝试了一些预先制作的程序,但它没有用。我也尝试过类似于在文档中搜索的东西,但是在网络上。这也失败了。我有一些想法,但不知道如何将它们写成代码。

解决方案

最简单的方法(对于初学者来说几乎是唯一的)是使用 WebBrowser控件 [ ^ ]并让它向Google询问:然后抓取网址的结果并存储它们。

但即使这样也会相当混乱。其中一些可能会有所帮助:网站抓取c# [ ^ ]


您问的是如何编写搜索引擎?这是一项庞大复杂的任务,涉及众多复杂的技术和理论,将构成数十万行代码。这不是一项简单的任务,只需要几行代码。



了解如何编写网络爬虫并学习如何使用Lucene等可以使用的技术搜索。


首先,请看我对这个问题的评论。



现在,这里有你真正的组件需要。首先,您需要能够下载要搜索的每个页面。这可以使用类 HttpWebRequest 来完成:

HttpWebRequest类(System.Net) [ ^ ]



对于起点,您可以查看我在此完全分享的应用程序的源代码:如何从互联网上下载文件 [ ^ ]。



这个应用程序非常小(只有一个代码文件)并且清晰,因此不难看出它是如何工作的。另见我过去的其他答案:

FTP:下载文件 [ ^ ],

如何从Asp.net 2.0中的服务器下载文件 [ ^ ],

从网页获取特定数据 [ ^ ],

Perf组织某种Web请求并获得结果 [ ^ ],

如何使用c#从网址获取特定数据 [ ^ ]。br />


现在,当您拥有来自Web的内容时,通常是HTML数据,您需要解析,执行搜索,更重要的是,找到您搜索的其他网址。我建议HTML Agility Pack,Microsoft公共许可下的开源产品: Html Agility Pack - Home [ ^ ]。



无论如何,请查看此列表解析器: HTML解析器的比较 - 维基百科,免费的百科全书 [ ^ ]。



一般来说,你需要的是接近网络抓取网络抓取 - 维基百科,免费的百科全书 [ ^ ]。



- SA

Hi.
I need a program/algorithm wich will search for a specified phrase and save urls of sites containing them in text file.

I know C# and a little bit of Java (still learning it).
Maybe you can tell me what i need to do.

What I have tried:

I tried some pre-made programs but it didn't work. I also tried making something similar to searching in document, but around web. And this failed too. I’ve got few ideas but don’t know how to write them as a code.

解决方案

The easiest way (and for a beginner pretty much the only) would be to use a WebBrowser control[^] and get it to ask Google: Then scrape the results for the URLs and store them.
But even that's going to be fairly messy. Some of these may help: site scraping c#[^]


You're asking how to write a search engine? That is a massive complicated task spanning numerous complicated technologies and theories and will constitute hundreds of thousands of lines of code. This is not a trivial task that only requires a few lines of code.

Learn how to write a web crawler and learn how to use technologies like Lucene which you can use for searching.


First of all, please see my comment to the question.

Now, here are the components you really need. First, you need to be able to download each page you want to search in. This can be done using the class HttpWebRequest:
HttpWebRequest Class (System.Net)[^]

For the starting point, you can look at the source code of my application I shared in full here: how to download a file from internet[^].

This application is very small (only one code file) and clear, so it's not hard to see how it works. See also my other past answers:
FTP: Download Files[^],
how to download a file from the server in Asp.net 2.0[^],
get specific data from web page[^],
Performing a some kind of Web Request and getting result[^],
How to get particular data from a url using c#[^].

Now, when you have the content from the Web, it is, typically, HTML data, which you would need to parse, to perform your search, and, importantly, to find out other URLs for your search. I would recommend HTML Agility Pack, an open-source product under the Microsoft Public License: Html Agility Pack — Home[^].

Anyway, review this list of parsers: Comparison of HTML parsers — Wikipedia, the free encyclopedia[^].

Generally, what you need is close to Web scraping: Web scraping — Wikipedia, the free encyclopedia[^].

—SA


这篇关于如何制作简单的搜索程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆