Панды vlookup из двух столбцов и поиск значений
У меня есть такой фрейм,
+-------+--------+
| A | B |
+-------+--------+
| David | Frank |
| Tim | David |
| Joe | Sam |
| Frank | Bob |
| Cathy | Tarun |
| | Rachel |
| | Tim |
+-------+--------+
Теперь я хочу vlookup столбцы друг друга и найти пропущенные значения,
+-------+--------+-------------------+-------------------+
| A | B | C | D |
+-------+--------+-------------------+-------------------+
| David | Frank | Available on both | Available on both |
| Tim | David | Available on both | Available on both |
| Joe | Sam | in A not in B | in B not in A |
| Frank | Bob | Available on both | in B not in A |
| Cathy | Tarun | in A not in B | in B not in A |
| | Rachel | | in B not in A |
| | Tim | | Available on both |
+-------+--------+-------------------+-------------------+
1 ответ
Решение
Ты можешь использовать numpy.select
с условиями, созданными isin
для проверки членства и notnull
для фильтра пропущенных значений:
print (df)
A B
0 David Frank
1 Tim David
2 Joe Sam
3 Frank Bob
4 Cathy Tarun
5 NaN Rachel
6 NaN Tim
df['C'] = np.select([df.A.isin(df.B), df.A.notnull()],
['Available on both', 'in A not in B'], default=None)
df['D'] = np.select([df.B.isin(df.A), df.B.notnull()],
['Available on both', 'in B not in A'], default=None)
print (df)
A B C D
0 David Frank Available on both Available on both
1 Tim David Available on both Available on both
2 Joe Sam in A not in B in B not in A
3 Frank Bob Available on both in B not in A
4 Cathy Tarun in A not in B in B not in A
5 NaN Rachel None in B not in A
6 NaN Tim None Available on both