C#抓取音频文件的在线URL地址 [英] C # grab audio files online url address

查看:105
本文介绍了C#抓取音频文件的在线URL地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C#抓取音频文件的在线URL地址.例如,我想以mp3或wma格式抓取Google音乐,音频文件的URL地址.如何实现?谢谢.
首先,我想知道如何在程序中搜索歌曲;
其次,当我进入搜索网页时,如何获得歌曲的真实地址.
三,如何正确获取网页原始源代码.
谢谢.

C # grab audio files online url address.For example, I want to crawl google music in mp3 or wma format audio files''s url address.How to achieve it? Thank you.
first,I want to know how to search songs in my programme;
second,when I get the search web page,how can I get the real address of the song.
thir,how to get the webpage''s original source code correctly.
Thank you.

推荐答案

要进行爬网,您需要使用类System.Net.HttpWebRequest来获取HTML文件.您的编译时类型应该为System.Net.WebRequest,因为派生类的实例是通过工厂方法 Create创建的,具体取决于URI 模式.请在此处查看代码示例: http://msdn.microsoft.com/zh-cn/library/system.net.webrequest.aspx [ ^ ].您可以使用 HTTP请求方法"GET"来获取资源.

问题是知道URL.例如,在许多情况下,服务器会根据某些搜索请求动态生成HTTP页面.由于可以多种不同方式发布请求,因此这是特定于应用程序的.在这种情况下,您将需要学习如何分别模拟每个Web应用程序的"POST"请求.

现在,您需要解析HTML文件并找到您的音频文件引用.如果只对自动文件感兴趣,而不对任何特定结构感兴趣,则可以像使用正则表达式一样简单.您的搜索条件可以在开头包含"http://",在结尾处包含文件扩展名.请参阅类System.Text.RegularExpressions.Regex http://msdn.microsoft.com/en -us/library/system.text.regularexpressions.regex.aspx [ ^ ].您的Regex模式应该很容易设计.如果您在此处遇到问题,请作为后续问题.

仅当HTML静态显示音频记录的URL时,这些技术才能解决您的问题.

使用这种方法,您可以仅废弃HTML文档中静态已知的URL.在更多的神秘场景中,用户不会单击带有所需资源的静态URL的锚定链接.而是调用了一些Javascript.它向服务器形成一些HTTP请求,该请求使用Ajax发送以下载文件.在某些情况下,用户需要回答一些问题以确认请求不是由机器人完成的.这样的方案原则上是可破解的.同时,从理论上讲,不可能设计出一种通用算法来分析所涉及的Javascript算法,以模仿您的搜寻器所需的用户操作.在某些单独的情况下,您可以使用自己的大脑来破解这种情况.结果不能在所有情况下都得到保证.

—SA
For crawling, you need to fetch HTML files using the class System.Net.HttpWebRequest. You compile-time type should be System.Net.WebRequest though as the instances of derived classes are created by the factory method Create, depending on URI schema. See the code sample here: http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^]. You can fetch the resource using HTTP request method "GET".

The problem is knowing the URL. In many cases, the servers generate HTTP pages on the fly on some search request, for example. As the requests could be posted in many different ways, this is application-specific. In such cases, you will need to learn how to simulate such "POST" request for every Web application separately.

Now, you need to parse the HTML file and find your audio file references. If you are only interested only in auto files and not in any specific structure, it can be as simple as using regular expression. Your search criteria can include "http://" in the beginning and file extensions you''re interested in at the end. See the class System.Text.RegularExpressions.Regex, http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx[^]. Your Regex pattens should be easy enough to design. If you face a problem here, as a follow-up question.

These techniques will solve your problem only when the HTML statically presents the URLs of the audio records.

Using such method you can scrap only the URLs which are statically known in your HTML document. There are more cryptic scenarios where the user does not click at the anchored link with the static URL of the resource you need. Instead, some Javascript is called; it forms some HTTP request to the server which is sent using Ajax to get the file downloaded. In some cases, the user needs to answer some question to confirm the request is done not by a robot. Such scenarios are crackable in principle. At the same time, it is not theoretically possible to devise a universal algorithm which could analyze the Javascript algorithm involved to mimic the required user actions by your crawler. In some separate cases you will be able to crack such scenario when you use your own brain. The result cannot be guaranteed in all cases.

—SA


这篇关于C#抓取音频文件的在线URL地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆