大 pandas :使用不带引号的文字选项卡写入选项卡分隔的数据框 [英] pandas: write tab-separated dataframe with literal tabs with no quotes
问题描述
我必须重新格式化一个遗传学软件的数据,需要将每个列分成两个,例如 0-> G G; 1-> A G; 2 - > A A;
。输出文件应该是制表符分隔的。我想在大熊猫中做到这一点:
I have to reformat my data for a genetics software which requires to split each column into two, e.g 0-> G G; 1-> A G; 2 -> A A;
. The output file is supposed to be tab-delimited. I am trying to do it in pandas:
import csv
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,3, size = (10,5)),
columns=[ chr(c) for c in range(97, 97+5) ])
def fake_alleles(x):
if x==0:
return "A\tA"
if x==1:
return "A\tG"
if x==2:
return "G\tG"
plinkpast6 = df.applymap(fake_alleles)
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)
这给我一个错误错误:需要转义,但没有escapechar设置
。有其他方法可以使用 pandas
吗?
Which gives me an error Error: need to escape, but no escapechar set
. Are there other ways to do it with pandas
?
推荐答案
sep =\t
数据框行的元素,并在其间插入一个\t
。问题是在元素中有\t
,这令人困惑。它希望你逃避元素中的那些\t
,你没有。我怀疑你希望你的最终输出是6列。
sep="\t"
is trying to take each element of the dataframe row and insert a "\t"
in between. Problem is there are "\t"
in the elements and it's confusing it. It wants you to escape those "\t"
s in the elements and you haven't. I suspect you want your final output to be 6 columns.
尝试这样:
import csv
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,3, size = (10,20)))
def fake_alleles(x):
if x==0:
return "A\tA"
if x==1:
return "A\tG"
if x==2:
return "G\tG"
plinkpast6 = df.iloc[:,:3].applymap(fake_alleles)
plinkpast6 = plinkpast6.stack().str.split('\t', expand=True).unstack()
plinkpast6.to_csv("test.ped", sep="\t", quoting=csv.QUOTE_NONE)
这篇关于大 pandas :使用不带引号的文字选项卡写入选项卡分隔的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!