读取OCaml中的URL的HTML内容 [英] Reading HTML contents of a URL in OCaml

查看:134
本文介绍了读取OCaml中的URL的HTML内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个OCaml函数,它接受一个URL并返回一个由该位置HTML文件内容组成的字符串。任何想法?

非常感谢!



Best,
Surikator。

解决方案

我已经使用ocurl和nethtml完成了这两项操作



ocurl 阅读URL的内容(这里属性的吨数是最小的),

  let string_of_uri uri = 
尝试让连接= Curl.init()和write_buff = Buffer.create 1763 in
Curl。 set_write函数连接
(fun x - > Buffer.add_string write_buff x; String.length x);
Curl.set_url连接uri;
Curl.perform连接;
Curl.global_cleanup();
Buffer.contents write_buff;
加上_ - > raise(IO_ERROR uri)

并从 nethtml ; (您可能需要为 Nethtml.parse 设置一个DTD)

$ p $ let parse_html_string uri =
let ch = new Netchannels.input_string(string_of_uri uri)in
let docs = Nethtml.parse?return_pis:(Some false)ch
ch#close_in();
docs

干杯!


I would like to write an OCaml function which takes a URL and returns a string made up of the contents of the HTML file at that location. Any ideas?

Thanks a lot!

Best, Surikator.

解决方案

I've done both of those things using ocurl and nethtml

ocurl to read the contents of the URL (tons of properties here; this is the minimum),

let string_of_uri uri = 
    try let connection = Curl.init () and write_buff = Buffer.create 1763 in
        Curl.set_writefunction connection
                (fun x -> Buffer.add_string write_buff x; String.length x);
        Curl.set_url connection uri;
        Curl.perform connection;
        Curl.global_cleanup ();
        Buffer.contents write_buff;
    with _ -> raise (IO_ERROR uri)

and from nethtml; (you might need to set up a DTD for Nethtml.parse)

let parse_html_string uri = 
    let ch = new Netchannels.input_string (string_of_uri uri) in
    let docs = Nethtml.parse ?return_pis:(Some false) ch in
    ch # close_in ();
    docs

Cheers!

这篇关于读取OCaml中的URL的HTML内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆