numpy中的多个插入,其中成对的元素没有子文本 [英] Multiple inserts in numpy where paired elements don't have subtext

查看:148
本文介绍了numpy中的多个插入,其中成对的元素没有子文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题在此以前的帖子中继续由@ecortazar回答.但是,我也想在不包含特定字符串的pd.Series的两个元素之间粘贴,仅使用Pandas/Numpy.注意:文本中所有带有href的行都是不同的.

This question follows up on this previous post answered by @ecortazar. However, I'd also like to paste between two elements in a pd.Series which did not include a certain string, using Pandas / Numpy only. Note: All lines with href in the text are different.

import pandas as pd
import numpy as np

table = pd.Series(

        ["<td class='test'>AA</td>",                  # 0 
        "<td class='test'>A</td>",                    # 1
        "<td class='test'><a class='test' href=...",  # 2
        "<td class='test'>B</td>",                    # 3
        "<td class='test'><a class='test' href=...",  # 4
        "<td class='test'>BB</td>",                   # 5
        "<td class='test'>C</td>",                    # 6
        "<td class='test'><a class='test' href=...",  # 7 
        "<td class='test'>F</td>",                    # 8
        "<td class='test'>G</td>",                    # 9 
        "<td class='test'><a class='test' href=...",  # 10 
        "<td class='test'>X</td>"])                   # 11


dups = ~table.str.contains('href') & table.shift(-1).str.contains('href') 
array = np.insert(table.values, dups[dups].index, "None")
pd.Series(array)


# OUTPUT:
# 0                      <td class='test'>AA</td>
# 1                                          None
# 2                       <td class='test'>A</td>
# 3     <td class='test'><a class='test' href=...
# 4                                          None Incorrect
# 5                       <td class='test'>B</td>
# 6     <td class='test'><a class='test' href=...
# 7                      <td class='test'>BB</td>
# 8                                          None
# 9                       <td class='test'>C</td>
# 10    <td class='test'><a class='test' href=...
# 11                      <td class='test'>F</td>
# 12                                         None
# 13                      <td class='test'>G</td>
# 14    <td class='test'><a class='test' href=...
# 15                      <td class='test'>X</td>

这是我想要的实际文本输出.

Here is the actual text output I'd like.

# OUTPUT:
# 0                      <td class='test'>AA</td>
# 1                                          None
# 2                       <td class='test'>A</td>
# 3     <td class='test'><a class='test' href=...
# 4                       <td class='test'>B</td>
# 5     <td class='test'><a class='test' href=...
# 6                      <td class='test'>BB</td>
# 7                                          None
# 8                       <td class='test'>C</td>
# 9     <td class='test'><a class='test' href=...
# 10                      <td class='test'>F</td>
# 11                                         None
# 12                      <td class='test'>G</td>
# 13    <td class='test'><a class='test' href=...
# 14                      <td class='test'>X</td>

推荐答案

您可以执行与以前相同的过程.

Your can do the same procedure as before.

唯一的警告是您必须在转换前执行not(〜)运算符.原因是该移位将在Series的第一个位置创建一个np.nan,这会将Series定义为float,从而导致not操作失败.

The only caveat is that you must do the not (~) operator before the shift. The reason is that the shift will create a np.nan in the first position of your Series, which will define the Series as floats, thus failing on the not operation.

import pandas as pd
import numpy as np

table = pd.Series(
        ["<td class='test'>AA</td>",                  # 0 
        "<td class='test'>A</td>",                    # 1
        "<td class='test'><a class='test' href=...",  # 2
        "<td class='test'>B</td>",                    # 3
        "<td class='test'><a class='test' href=...",  # 4
        "<td class='test'>BB</td>",                   # 5
        "<td class='test'>C</td>",                    # 6
        "<td class='test'><a class='test' href=...",  # 7 
        "<td class='test'>F</td>",                    # 8
        "<td class='test'>G</td>",                    # 9 
        "<td class='test'><a class='test' href=...",  # 10 
        "<td class='test'>X</td>"])                   # 11


not_contain = ~table.str.contains('href')
cond = not_contain & not_contain.shift(1)
array = np.insert(table.values, cond[cond].index, "None")
pd.Series(array)

这篇关于numpy中的多个插入,其中成对的元素没有子文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆