字符串比较不适用于从Web抓取中收到的文本 [英] String comparison not working for text received from web scraping

查看:66
本文介绍了字符串比较不适用于从Web抓取中收到的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在比较从网络抓取中获得的文本与代码中的硬编码文本.这两个文本是相同的.无大写小错误.它们是相同的,但比较仍然失败.我正在共享我的代码的一部分. 问题出在第47到56行之间.在这些行之间,if else块中的字符串比较失败.为这些块提供的值是完美的值,理想情况下应满足条件.由于某些原因,满足条件49的if条件;如果满足条件,则满足其他条件.这种行为很奇怪.在Java中转换时,相同的代码可以正常运行,并且不会在执行所有if条件时出现故障.请看一下并提供帮助.谢谢.

I am comparing text received from web scraping with hardcoded text in my code. The two texts are identical. No Capital-Small error. They are identical but still the comparison fails. I am sharing a part of my code. The problem is between the lines 47 to 56. Between these lines, the string comparison in if else blocks fails. The values provided for these blocks are perfectly fine values which should ideally satisfy the condition. The if condition at 49 gets satisfied for some reason and the other if conditions don't get satisfied. This behaviour is so weird. The same code when converted in Java runs and works fine without a glitch executing all the if conditions. Please have a look and help. Thanks.

我也尝试过使用开关盒,但是也失败了.

I have tried this with switch case as well but failed with it too.

 import 'package:http/http.dart';
 import 'package:html/parser.dart';
 import 'package:html/dom.dart';
 import 'dart:convert';
 class Worker{

 static final String OperatingCashFlowINRMil = 'Operating Cash Flow INR Mil';
 static final String CapSpendingINRMil = 'Cap Spending INR Mil';
 static final String FreeCashFlowINRMil = 'Free Cash Flow INR Mil';
 static final String DividendsINR = 'Dividends INR';
 static final String DividendPayoutRatio = 'Payout Ratio % *';
 static Map<String,String> _RequestHeaders = Map<String,String>();

 static void fetchData() async
 {
 String MSUrlToGetFinancialData =
    "https://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1553353302056&t=0P0000AX98&region=ind&culture=en-US&version=SAL&cur=&order=desc&_=1553353302079";
Client client = Client();

Response response2 = await client.get(MSUrlToGetFinancialData,
    headers: getRequestHeaders());

var FinDataResponse = response2.body;

FinDataResponse = FinDataResponse.replaceAll("jsonp1553353302056(", "");
FinDataResponse =
    FinDataResponse.substring(0, FinDataResponse.length - 1);

JsonDecoder jsonDecoder = JsonDecoder();
var FinDataJson = jsonDecoder.convert(FinDataResponse);
String FinDataString = FinDataJson["componentData"];
Element FinDataDoc = parse(FinDataString).body;
Element DataTable = FinDataDoc.querySelector("table");
List<Element> lstYears = DataTable.querySelector("thead")
    .querySelector("tr")
    .querySelectorAll("th");
List<Element> lstRows =
DataTable.querySelector("tbody").querySelectorAll("tr");

Map<String, Element> mapItemNameToElement = Map<String, Element>();

///////////////////////////////////////////////////////////////////////////0
for (Element e in lstRows) {
  String ItemHeading = e.children[0].text.trim().toString();
  print(ItemHeading);//The identical values which can satisfy the following conditions can be seen printed here.

  if (ItemHeading == DividendsINR) {//This condition does not get satisfied even when the ItemHeading value is identical.
    mapItemNameToElement.putIfAbsent(DividendsINR, () => e);
  } else if (ItemHeading == DividendPayoutRatio) {//This condition gets satisfied.
    mapItemNameToElement.putIfAbsent(DividendPayoutRatio, () => e);
  } else if (ItemHeading == OperatingCashFlowINRMil) {//This condition does not get satisfied even when the ItemHeading value is identical.
    mapItemNameToElement.putIfAbsent(OperatingCashFlowINRMil, () => e);
  } else if (ItemHeading == CapSpendingINRMil) {//This condition does not get satisfied even when the ItemHeading value is identical.
    mapItemNameToElement.putIfAbsent(CapSpendingINRMil, () => e);
  } else if (ItemHeading == FreeCashFlowINRMil) {//This condition does not get satisfied even when the ItemHeading value is identical.
    mapItemNameToElement.putIfAbsent(FreeCashFlowINRMil, () => e);
  }
}
}


static Map<String,String> getRequestHeaders()
{
if(_RequestHeaders.length == 0)
{
  _RequestHeaders.putIfAbsent("Accept-Encoding", () => "gzip, deflate, br");
  _RequestHeaders.putIfAbsent("referer", () => "https://www.morningstar.com/");
  _RequestHeaders.putIfAbsent("user-agent", () => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36");
  _RequestHeaders.putIfAbsent("authority", () => "www.morningstar.com");
}
return _RequestHeaders;
}
}

我的pubspec.yaml:

My pubspec.yaml :

name: dev1_stock_meter
description: A new Flutter application.

version: 1.0.0+1

environment:
  sdk: ">=2.1.0 <3.0.0"

dependencies:
  flutter:
    sdk: flutter
  firebase_core: ^0.2.5+1
  firebase_auth: ^0.7.0
  cloud_firestore:
  fluttertoast: ^3.0.4
  autocomplete_textfield: ^1.6.4
  html: ^0.13.3+3
  http: ^0.12.0
  date_format: ^1.0.6
  intl:
  csv: ^4.0.3
  xml:
  cupertino_icons: ^0.1.2

dev_dependencies:
  flutter_test:
  sdk: flutter

flutter:
  uses-material-design: true

assets:
  - images/logo.jpg

fonts:
  - family: GoogleSans
    fonts:
      - asset: fonts/GoogleSans-Regular.ttf
        weight: 300
      - asset: fonts/GoogleSans-Bold.ttf
        weight: 400

预期结果: 应该满足if条件,并将Element e放置在mapItemNameToElement中.

Expected Result: The if conditions should get satisfied and the Element e should be put in the mapItemNameToElement.

推荐答案

您的html字符串具有html实体

Your html string has html entities

Dividends INR does not equal Dividends&nbsp;<span>INR

使用 https://pub.dartlang.org/packages/html_unescape 解码itemheader比较之前

use https://pub.dartlang.org/packages/html_unescape to decode itemheader before you do comparison

这篇关于字符串比较不适用于从Web抓取中收到的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆