循环单击可单击元素的列表,并将html写出到相应的文件中 [英] Loop through list of clickable elements and write out the html to respective files

查看:91
本文介绍了循环单击可单击元素的列表,并将html写出到相应的文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用jQuery来获取包含某些关键字的元素列表.我可以获取元素列表,但不知道如何遍历每个元素,单击其子元素并下载新加载的页面.这是我到目前为止拥有的casperjs代码:

I'm using jQuery to get a list of elements that contain certain key words. I'm able to get the list of elements but I don't know how to loop through each element, click on its child element and download the newly loaded page. Here's the casperjs code I have so far:

var casper = require('casper').create({
    clientScripts: ["/var/www/html/project/public/js/jquery-3.3.1.min.js"]
});

var fs = require('fs');

casper.start('https://m.1xbet.co.ke/en/line/Football/', function () {
    var links = casper.evaluate(function () {
        $.expr[":"].contains = $.expr.createPseudo(function (arg) {
            return function (elem) {
                return $(elem).text().toUpperCase().indexOf(arg.toUpperCase()) >= 0;
            };
        });
        return $("#events-betting").find("li.events__item_head:contains(World cup)");
    });

    var date = new Date(), year = date.getFullYear(), month = date.getMonth() + 1, day = date.getDate();
    var folderName = year + '-' + month + '-' + day;

    // loop would go here to save each file
    var path = "destination/" + folderName + "/1xbet/worldcup-1";
    fs.write(path + ".html", this.getHTML(), "w");

});

casper.run();

我想单击链接对象上的各个项目-它们不是锚定标签,而是可单击的div,它们带有内联javascript来监听点击.

I'd like to click on the individual items on the links object - they aren't anchor tags but rather they are clickable divs with inline javascript listening for a click.

目标是单击包含我感兴趣的某些文本的div,然后单击后,我可以选择抓取HTML并将其保存在文件中,也可以获取当前的url.两者都可以满足我的目的.由于可能有多个带有所需文本的div,所以我想找到一种遍历每个div并执行相同操作的方法.

The goal is to click on the div that has certain text I'm interested in, then once clicked, I can either choose to scrape the HTML and save it in a file or get the current url; either will be fine for my purposes. Since there could be multiple divs with the desired text, I'd like for a way to loop through each and do perform the same operation.

这是我感兴趣的页面的示例:

This is an example of the page I'm interested in:

https://m.1xbet.co.ke/en/line/足球/

在这种情况下,父元素是:#events-betting和nested是带有可点击div的li标签的列表.

The parent element in this case is: #events-betting and nested is a list of li tags with clickable divs.

推荐答案

我可以选择抓取HTML并将其保存在文件中或获取当前网址

当然,该解决方案是针对该确切站点的,但是在进行网页抓取时,这再次是很正常的.

Of course the solution is very specific to this exact site, but then again it is quite normal when doing web scraping.

casper.start('https://m.1xbet.co.ke/en/line/Football/', function () {

  var links = casper.evaluate(function () {

    $.expr[":"].contains = $.expr.createPseudo(function (arg) {
      return function (elem) {
        return $(elem).text().toUpperCase().indexOf(arg.toUpperCase()) >= 0;
      };
    });

    var links = [];
    // Better to scrpape .events__title as it contains data-href attribute
    $("#events-betting").find(".events__title:contains(World cup)").each(function (i, item) {
      var lastPartOfurl = item.getAttribute("data-href");
      lastPartOfurl = lastPartOfurl.split("/");
      links.push("https://m.1xbet.co.ke/en/line/Football/" + item.getAttribute("data-champ") + "-" + lastPartOfurl[1]+'/');
    })

    return links;
  });

  console.log(links);
});

结果:

https://m.1xbet.co.ke/en/line/Football/1536237-FIFA-World-Cup-2018/,https://m.1xbet.co.ke/en/line/Football/1204917-FIFA-World-Cup-2018-Winner/,https://m.1xbet.co.ke/en/line/Football/1518431-FIFA-World-Cup-2018-Special-bets/,https://m.1xbet.co.ke/en/line/Football/1706515-FIFA-World-Cup-2018-Teams-Statistics-Group-Stage/

这篇关于循环单击可单击元素的列表,并将html写出到相应的文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆