如何填写网站表单并在C#中检索结果? [英] How do I fill in a website form and retrieve the result in C#?

查看:225
本文介绍了如何填写网站表单并在C#中检索结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的程序能够访问处理字符串输入并返回一些信息的网站。我想输入两个序列,提交它们并通过程序读取结果。该网站如下:



http:// scansite.mit.edu/motifscan_seq.phtml



如果您输入5031601作为蛋白质名称和DRNAYVWTLKGRTWKPTLVILRI作为序列,您将被重定向到结果网站。这是我希望能够通过我的程序阅读的网站。
我已经研究了很多这方面的内容,但似乎无法得到任何有用的解决方案。



任何人都可以请我帮忙吗?






编辑:

我试着用下面的代码创建一个web请求改编自链接):

  WebRequest请求= WebRequest.Create(
http://scansite.mit。 EDU / motifscan_seq);
request.Method =POST;
string postData = @motif_option = all& protein_id = 5031601&
sequence = DRNAYVWTLKGRTWKPTLVILRI&
stringency = High& submit = Submit Request;
byte [] byteArray = Encoding.UTF8.GetBytes(postData);
request.ContentType =application / x-www-form-urlencoded;
request.ContentLength = byteArray.Length;
Stream dataStream = request.GetRequestStream();
dataStream.Write(byteArray,0,byteArray.Length);
dataStream.Close(); (StreamReader sr = new StreamReader(resSteam))$ b使用(WebResponse response = request.GetResponse())
using(Stream resSteam = response.GetResponseStream())
使用b $ b File.WriteAllText(SearchResults.html,sr.ReadToEnd());
System.Diagnostics.Process.Start(SearchResults.html);

当我打开SearchResults.html时,它包含输入蛋白质名称的原始表单网站。该序列尚未输入(它是一个文本区,而不是文本框)。并没有提交。有什么我失踪或做错了吗?






通过将请求发送到action属性中声明的uri来解决此问题的形式标记(http://scansite.mit.edu/cgi-bin/motifscan_seq)。

解决方案

你的问题是一个有点含糊不清,但听起来像你想要做的是屏幕抓取。它的基本含义是,你下载页面的HTML并解析它来获取你想要的值。



有问题的站点需要一个 POST 请求到以下网址:



http://scansite.mit.edu/cgi-bin/motifscan_seq



使用以下参数:

  motif_option:全部
protein_id:5031601
序列:DRNAYVWTLKGRTWKPTLVILRI
严格性:高
提交:提交请求

您需要做的是生成一个 POST 请求URL并传递相同的键/值对,除了使用您的值。这里有一些关于如何用C#做的文档(查看页面一半的例子):



http://msdn.microsoft.com/en-us/library/debx8sh9.aspx



当您获取HTML时,您需要解析它并找到您需要的相关部分。不幸的是,HTML中没有ID或类,所有东西都是由表格构成的,所以这可能会非常具有挑战性。这里是另一个涉及C#屏幕抓取的问题:

用C#刮屏幕HTML


I would like my program to be able to access a website that processes string input and returns some information about it. I want to input two sequences, submit them and read the result through the program. The website is the following:

http://scansite.mit.edu/motifscan_seq.phtml

If you enter say 5031601 as Protein Name and DRNAYVWTLKGRTWKPTLVILRI as Sequence, you will be redirected to the results site. This is the site I want to be able to read with my program. I have researched a lot about this but I can't seem to get any useful solution.

Can anyone please help me out?


EDIT:

I tried to create a web request with the following code (adapted from the link):

        WebRequest request = WebRequest.Create(
                                   "http://scansite.mit.edu/motifscan_seq");
        request.Method = "POST";
        string postData = @"motif_option=all&protein_id=5031601&
                           sequence=DRNAYVWTLKGRTWKPTLVILRI&
                           stringency=High&submit=Submit Request";
        byte[] byteArray = Encoding.UTF8.GetBytes(postData);
        request.ContentType = "application/x-www-form-urlencoded";
        request.ContentLength = byteArray.Length;
        Stream dataStream = request.GetRequestStream();
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Close();

        using (WebResponse response = request.GetResponse())
        using (Stream resSteam = response.GetResponseStream())
        using (StreamReader sr = new StreamReader(resSteam))
            File.WriteAllText("SearchResults.html", sr.ReadToEnd());
        System.Diagnostics.Process.Start("SearchResults.html");

When I open the SearchResults.html, it contains the original form site with the protein name entered. The sequence hasn't been entered (it is a textarea, not a textbox). And it hasn't been submitted. Is there anything I'm missing or doing wrong?


Resolved the issue by sending the request to the uri that is stated in the action attribute of the form tag (http://scansite.mit.edu/cgi-bin/motifscan_seq).

解决方案

Your question's a bit vague, but what it sounds like you want to do is screen scraping. What it basically means is that you download the HTML of the page and parse it to grab the values that you want.

The site in question takes a POST request to the following URL:

http://scansite.mit.edu/cgi-bin/motifscan_seq

With the following parameters:

motif_option: all
protein_id:   5031601
sequence:     DRNAYVWTLKGRTWKPTLVILRI
stringency:   High
submit:       Submit Request

What you have to do is generate a POST request to the URL and pass in the same key/value pairs, except with your values instead. Here's some documentation on how to do that with C# (look at the example halfway down the page):

http://msdn.microsoft.com/en-us/library/debx8sh9.aspx

When you get the HTML back, you will need to parse it and find the relevant parts that you need. Unfortunately, there are no IDs or classes in the HTML and everything is made from tables, so this might be quite challenging. Here is another question that covers screen scraping in C#:

Screen Scraping HTML with C#

这篇关于如何填写网站表单并在C#中检索结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆