在python中将数据转换为DataFrame [英] Convert data to DataFrame in python

查看:26
本文介绍了在python中将数据转换为DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在@JaSON的帮助下,这是一个代码,使我能够从本地html获取表中的数据,并且该代码使用硒

从硒导入Webdriver的

 驱动程序= webdriver.Chrome("C:/chromedriver.exe")driver.get('file:///C:/Users/Future/Desktop/local.html')计数器= len(driver.find_elements_by_id("Section3"))xpath =``//div [@ id ='Section3']/following-sibling :: div [count(preceding-sibling :: div [@ id ='Section3'])= {0} and count(following-sibling:: div [@ id ='Section3'])= {1}]"打印(计数器)对于我在范围内(计数器):print('\ nRow#{} \ n'.format(i + 1))_xpath = xpath.format(i + 1,计数器-(i + 1))单元格= driver.find_elements_by_xpath(_xpath)对于单元格中的单元格:值= cell.find_element_by_xpath(".//td").文本打印(值) 

如何将这些行转换为可以导出到csv文件的有效表?这是本地HTML链接

解决方案

我修改了您的代码以执行简单的输出.这不是很好的pythonic语言,因为它不使用Dataframe的向量化创建,但是这是它的工作方式.第一次建立熊猫第二个设置数据框(但我们还不知道这些列)然后在第一遍设置列(如果列长度可变,这会引起问题然后将值输入数据框

 将pandas导入为pd从硒导入webdriver驱动程序= webdriver.Chrome("C:/chromedriver.exe")driver.get('file:///C:/Users/Future/Desktop/local.html')计数器= len(driver.find_elements_by_id("Section3"))xpath =``//div [@ id ='Section3']/following-sibling :: div [count(preceding-sibling :: div [@ id ='Section3'])= {0} and count(following-sibling:: div [@ id ='Section3'])= {1}]"打印(计数器)df = pd.Dataframe()对于我在范围内(计数器):print('\ nRow#{} \ n'.format(i + 1))_xpath = xpath.format(i + 1,计数器-(i + 1))单元格= driver.find_elements_by_xpath(_xpath)如果i == 0:df = pd.DataFrame(columns = cells)#用列名填充数据框对于单元格中的单元格:值= cell.find_element_by_xpath(".//td").文本#print(值)如果不是值:#检查字符串不为空#始终将值放在第一个项目中df.at [i,0] = value#将值放入框架df.to_csv('filename.txt')#将数据帧输出到文件 

如何更好地将这些内容连续放入字典中,然后放入datframe中.但是我正在手机上写这个,所以我无法测试.

With the help of @JaSON, here's a code that enables me to get the data in the table from local html and the code uses selenium

from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    for cell in cells:
         value = cell.find_element_by_xpath(".//td").text
         print(value)

How can these rows converted to be valid table that I can export to csv file? Here's the local HTML link https://pastebin.com/raw/hEq8K75C

** @Paul Brennan: After trying to edit counter to be counter-1 I got 17 rows to skip the error of row 18 temporarily, I got the filename.txt and here's snapshot of the output

解决方案

I have modified your code to do a simple output. This is not very pythonic as it does not use vectorized creation of the Dataframe, but here is how it works. First set up pandas second set up a dataframe (but we don't know the columns yet) then set up the columns on the first pass (this will cause problems if there are variable column lengths Then input the values into the dataframe

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

df = pd.Dataframe()

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    if i == 0:
        df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        value = cell.find_element_by_xpath(".//td").text
        #print(value)
        if not value:  # check the string is not empty
            # always puting the value in the first item
            df.at[i, 0] = value # put the value in the frame

df.to_csv('filename.txt') # output the dataframe to a file

How this could be made better is to put the items in a row into a dictionary and put them into the datframe. but I am writing this on my phone so I cannot test that.

这篇关于在python中将数据转换为DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆