Панды vlookup из двух столбцов и поиск значений

У меня есть такой фрейм,

+-------+--------+
|   A   |   B    |
+-------+--------+
| David | Frank  |
| Tim   | David  |
| Joe   | Sam    |
| Frank | Bob    |
| Cathy | Tarun  |
|       | Rachel |
|       | Tim    |
+-------+--------+

Теперь я хочу vlookup столбцы друг друга и найти пропущенные значения,

+-------+--------+-------------------+-------------------+
|   A   |   B    |         C         |         D         |
+-------+--------+-------------------+-------------------+
| David | Frank  | Available on both | Available on both |
| Tim   | David  | Available on both | Available on both |
| Joe   | Sam    | in A not in B     | in B not in A     |
| Frank | Bob    | Available on both | in B not in A     |
| Cathy | Tarun  | in A not in B     | in B not in A     |
|       | Rachel |                   | in B not in A     |
|       | Tim    |                   | Available on both |
+-------+--------+-------------------+-------------------+

1 ответ

Решение

Ты можешь использовать numpy.select с условиями, созданными isin для проверки членства и notnull для фильтра пропущенных значений:

print (df)
       A       B
0  David   Frank
1    Tim   David
2    Joe     Sam
3  Frank     Bob
4  Cathy   Tarun
5    NaN  Rachel
6    NaN     Tim

df['C'] = np.select([df.A.isin(df.B), df.A.notnull()], 
                    ['Available on both', 'in A not in B'], default=None)
df['D'] = np.select([df.B.isin(df.A), df.B.notnull()], 
                    ['Available on both', 'in B not in A'], default=None)
print (df)
       A       B                  C                  D
0  David   Frank  Available on both  Available on both
1    Tim   David  Available on both  Available on both
2    Joe     Sam      in A not in B      in B not in A
3  Frank     Bob  Available on both      in B not in A
4  Cathy   Tarun      in A not in B      in B not in A
5    NaN  Rachel               None      in B not in A
6    NaN     Tim               None  Available on both
Другие вопросы по тегам