调用Python文件后,Excel CSV输出出现错误 [英] Excel CSV output appearing wrong after calling Python File

查看:64
本文介绍了调用Python文件后,Excel CSV输出出现错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前在.csv中的以下输出中苦苦挣扎,它们中的各种随机字符是不应存在的玩家名称和值

Currently struggling with the following output in .csv where their is various random character within the is the players names and values where there shouldn't be

(我已经给出下面的输出图片)

我想知道代码中哪里出了错,我在努力消除随机字符

I'm wondering where I'm going wrong in the code where I'm struggling to eliminate the random characters

我正在尝试删除以下字符,例如Â,Ã,©,‰等。
有什么建议吗?

Python代码

#importing

import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {'User-Agent': 
       'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/47.0.2526.106 Safari/537.36'}

#calling websites
page = "https://www.transfermarkt.co.uk/transfers/transferrekorde/statistik/top/plus/0/galerie/0?saison_id=2000"
pageTree = requests.get(page, headers=headers)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')

#calling players names
Players = pageSoup.find_all("a", {"class": "spielprofil_tooltip"})
#Let's look at the first name in the Players list.
Players[0].text

#calling value of players
Values = pageSoup.find_all("td", {"class": "rechts hauptlink"})
#Let's look at the first name in the Values list.
Values[0].text

PlayersList = []
ValuesList = []

for i in range(0,25):
   PlayersList.append(Players[i].text)
   ValuesList.append(Values[i].text)

df = pd.DataFrame({"Players":PlayersList,"Values":ValuesList})

df.to_csv('2000.csv', index=False)

df.head()

=============================== ===================================

====================================================================

我的Excel输出

推荐答案

...
utf8_bom = '\xEF\xBB\xBF'
with open('2000.csv', 'w') as csv_file:
    csv_file.write(utf8_bom)
    df.to_csv(csv_file, index=False, mode='a')

说明:BOM是字节顺序标记(q.v.)。如果Excel在CSV文件的开头找到了它,它将使用它来确定编码,在您的情况下为UTF-8(对于Python 3,默认编码是正确的)。

Explanation: The BOM is the byte order mark (q.v.). If Excel finds it at the beginning of the CSV file, it uses it to determine the encoding, which in your case is UTF-8 (the default encoding – correctly – for Python 3).

编辑

正如Mark Tolonen所指出的,上面是以下代码:

As Mark Tolonen pointed out, the compact version of the above is the following code:

df.to_csv('2000.csv', encoding='utf-8-sig', index=False)

-sig 编码的名称代表签名,即开头的BOM,Microsoft软件使用它来检测编码。另请参见<< a href = https://docs.python.org/3.7/library/codecs.html#encodings-and-unicode rel = nofollow noreferrer>编码和Unicode 部分a href = https://docs.python.org/3/library/codecs.html rel = nofollow noreferrer> codecs 手册。

The -sig in the name of the encoding stands for "signature", i.e., the BOM at the beginning which is used by Microsoft software to detect the encoding. See also the Encodings and Unicode section of the codecs manual.

这篇关于调用Python文件后,Excel CSV输出出现错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆