Назовите группы из результатов испытаний Tueky в соответствии со значительными
У меня есть тестовая таблица Тьюки в результате pairwise_tukeyhsd
от питона statsmodels.stats.multicomp
group1 group2 meandiff lower upper reject
0 101 102 0.2917 -0.0425 0.6259 False
1 101 103 0.1571 -0.1649 0.4792 False
2 101 104 -0.1333 -0.4675 0.2009 False
3 101 105 0.0833 -0.2509 0.4175 False
4 101 106 -0.0500 -0.3626 0.2626 False
5 102 103 -0.1345 -0.4566 0.1875 False
6 102 104 -0.4250 -0.7592 -0.0908 True
7 102 105 -0.2083 -0.5425 0.1259 False
8 102 106 -0.3417 -0.6543 -0.0290 True
9 103 104 -0.2905 -0.6125 0.0316 False
10 103 105 -0.0738 -0.3959 0.2482 False
11 103 106 -0.2071 -0.5067 0.0924 False
12 104 105 0.2167 -0.1175 0.5509 False
13 104 106 0.0833 -0.2293 0.3960 False
14 105 106 -0.1333 -0.4460 0.1793 False
У меня есть эта таблица как pandas
, Я хотел бы обозначить (буквами) группы (101-106), обозначающие статистические отношения. Для этого конкретного примера желаемый результат будет: (Я не против, если результаты будут df, список, словарь)
group label
101 ab
102 a
103 ab
104 b
105 ab
106 b
Как видите, все группы с одинаковыми буквами имеют одинаковое среднее значение (столбец отклонения = False), а группы с разными буквами (столбец отклонения = True) имеют различное среднее значение. Например, среднее значение группы 101 равно значению всех других групп, потому что группа 101 имеет букву ab, а все остальные группы имеют либо a, либо b, либо ab. С другой стороны, группа 106 имеет только букву b, которая указывает, что она похожа на все группы, за исключением группы 102, которая имеет только букву a.
Я не мог найти автоматическое решение Python для этого. Я видел, что R имеет пакет для этого называется multcompLetters
Есть ли что-то подобное в Python?
1 ответ
Итак, после пары дней, проведенных за ним, и без каких-либо предложенных ответов / комментариев от других пользователей, я думаю, что я понял это. Допустим, таблица из моего вопроса называется df
, Следующий скрипт предназначен для моих нужд, но я надеюсь, что он может помочь другим. Я добавил комментарии, чтобы облегчить понимание.
df_True = df.loc[df.reject==True,:]
letters = list(string.ascii_lowercase)
n = 0
group1_list = df_True.group1.tolist() #get the groups from the df with only True (True df) to a list
group2_list = df_True.group2.tolist()
group3 = group1_list+group2_list #concat both lists
group4 = list(set(group3)) #get unique items from the list
group5 = [str(i) for i in group4 ] #convert unicode to a str
group5.sort() #sort the list
gen = ((i, 0) for i in group5) #create dict with 0 so the dict won't be empty when starts
dictionary = dict(gen)
group6 = [(group5[i],group5[j]) for i in range(len(group5)) for j in range(i+1, len(group5))] #get all combination pairs
for pairs in group6: #check for each combination if it is present in df_True
print n
print dictionary
a = df_True.loc[(df_True.group1==pairs[0])&(df_True.group2==pairs[1]),:] #check if the pair exists in the df
a.shape[0] == 0
if a.shape[0] == 0: #it mean that the df is empty as it does not appear in df_True so this pair is equal
print 'equal'
if dictionary[pairs[0]] != 0 and dictionary[pairs[1]] == 0: #if the 1st is populated but the 2nd in not populated
print "1st is populated and 2nd is empty"
dictionary[pairs[1]] = dictionary[pairs[0]]
elif dictionary[pairs[0]] != 0 and dictionary[pairs[1]] != 0: #if both are populated, check matching labeles
print "both are populated"
if len(list(set([c for c in dictionary[pairs[0]] if c in dictionary[pairs[1]]]))) >0: #check if they have a common label
print "they have a shared character"
print "equal but have different labels"
#check if the 1st group label doesn't appear in anyother labels, if it is unique then the 2nd group can have the first group label
m = 0 #count the number of groups that have a shared char with 1st group
j = 0 #count the number of groups that have a shared char with 2nd group
for key, value in dictionary.iteritems():
if key != pairs[0] and len(list(set([c for c in dictionary[pairs[0]] if c in value])))==0:
for key, value in dictionary.iteritems():
if key != pairs[1] and len(list(set([c for c in dictionary[pairs[1]] if c in value])))==0:
if m == len(dictionary)-1 and j == len(dictionary)-1: #it means that this value is unique because it has no shared char with another group
print "unique"
dictionary[pairs[1]] = dictionary[pairs[0]][0]
print "there is at least one group in the dict that shares a char with the 1st group"
dictionary[pairs[1]] = dictionary[pairs[1]] + dictionary[pairs[0]][0]
else: # if it equals 0, meaning if the 1st is empty (which means that the 2nd must be also empty)
print "both are empty"
dictionary[pairs[0]] = letters[n]
dictionary[pairs[1]] = letters[n]
print "not equal"
if dictionary[pairs[0]] != 0: # if the first one is populated (has a value) then give a value only to the second
print '1st is populated'
# if the 2nd is not empty and they don't share a charcter then no change is needed as they already have different labels
if dictionary[pairs[1]] != 0 and len(list(set([c for c in dictionary[pairs[0]] if c in dictionary[pairs[1]]]))) == 0:
print "no change"
elif dictionary[pairs[1]] == 0: #if the 2nd is not populated give it a new letter
dictionary[pairs[1]] = letters[n+1]
#if the 2nd is populated and equal to the 1st, then change the letter of the 2nd to a new one and assign its original letter to all the others that had the same original letter
elif dictionary[pairs[1]] != 0 and len(list(set([c for c in dictionary[pairs[0]] if c in dictionary[pairs[1]]]))) > 0:
#need to check that they don't share a charcter
print "need to add a letter"
original_value = dictionary[pairs[1]]
dictionary[pairs[1]] = letters[n]
for key, value in dictionary.iteritems():
if key != pairs[0] and len(list(set([c for c in original_value if c in value])))>0: #for any given value, check if it had a character from the group that will get a new letter, if so, it means that they are equal and thus the new letter should also appear in the value of the "old" group
dictionary[key] = original_value + letters[n] #add the original letter of the group to all the other groups it was similar to
print '1st is empty'
dictionary[pairs[0]] = letters[n]
dictionary[pairs[1]] = letters[n+1]
print dictionary
# get the letter out the dictionary
labels = list(dictionary.values())
labels1 = list(set(labels))
final_label = ''.join(labels1)
for GroupName in group_names:
if GroupName in dictionary:
print "already exists"
dictionary[GroupName] = final_label
for key, value in dictionary.iteritems(): #this keeps only the unique char per group and sort it by group
dictionary[key] = ''.join(set(value))
Спасибо за ваш вклад. Мне пришлось немного изменить ваш код, чтобы исправить некоторые недостающие вещи и адаптироваться к python3. Основные изменения были
- строка импорта (отсутствовала)
- измените dictionary.iteritems на dictionary.items (python3)
- преобразовать все print "..." в print("...") (python3)
- переменная group_names отсутствовала
- заставить GroupName быть str в цикле group_name
- отсортировать окончательный словарь в dict2
Ваши исходные данные находятся в файле csv, который теперь называется input2.csv.
import pandas as pd
import numpy as np
import math
import itertools
import string
df = pd.read_csv('input2.csv', index_col=0)
df_True = df.loc[df.reject==True,:]
letters = list(string.ascii_lowercase)
n = 0
group1_list = df_True.group1.tolist() #get the groups from the df with only True (True df) to a list
group2_list = df_True.group2.tolist()
group3 = group1_list+group2_list #concat both lists
group4 = list(set(group3)) #get unique items from the list
group5 = [str(i) for i in group4 ] #convert unicode to a str
group5.sort() #sort the list
gen = ((i, 0) for i in group5) #create dict with 0 so the dict won't be empty when starts
dictionary = dict(gen)
group6 = [(group5[i],group5[j]) for i in range(len(group5)) for j in range(i+1, len(group5))] #get all combination pairs
for pairs in group6: #check for each combination if it is present in df_True
a = df_True.loc[(df_True.group1==pairs[0])&(df_True.group2==pairs[1]),:] #check if the pair exists in the df
a.shape[0] == 0
if a.shape[0] == 0: #it mean that the df is empty as it does not appear in df_True so this pair is equal
print ('equal')
if dictionary[pairs[0]] != 0 and dictionary[pairs[1]] == 0: #if the 1st is populated but the 2nd in not populated
print ("1st is populated and 2nd is empty")
dictionary[pairs[1]] = dictionary[pairs[0]]
elif dictionary[pairs[0]] != 0 and dictionary[pairs[1]] != 0: #if both are populated, check matching labeles
print ("both are populated")
if len(list(set([c for c in dictionary[pairs[0]] if c in dictionary[pairs[1]]]))) >0: #check if they have a common label
print ("they have a shared character")
print ("equal but have different labels")
#check if the 1st group label doesn't appear in anyother labels, if it is unique then the 2nd group can have the first group label
m = 0 #count the number of groups that have a shared char with 1st group
j = 0 #count the number of groups that have a shared char with 2nd group
for key, value in dictionary.items():
if key != pairs[0] and len(list(set([c for c in dictionary[pairs[0]] if c in value])))==0:
for key, value in dictionary.items():
if key != pairs[1] and len(list(set([c for c in dictionary[pairs[1]] if c in value])))==0:
if m == len(dictionary)-1 and j == len(dictionary)-1: #it means that this value is unique because it has no shared char with another group
print ("unique")
dictionary[pairs[1]] = dictionary[pairs[0]][0]
print ("there is at least one group in the dict that shares a char with the 1st group")
dictionary[pairs[1]] = dictionary[pairs[1]] + dictionary[pairs[0]][0]
else: # if it equals 0, meaning if the 1st is empty (which means that the 2nd must be also empty)
print ("both are empty")
dictionary[pairs[0]] = letters[n]
dictionary[pairs[1]] = letters[n]
print ("not equal")
if dictionary[pairs[0]] != 0: # if the first one is populated (has a value) then give a value only to the second
print ('1st is populated')
# if the 2nd is not empty and they don't share a charcter then no change is needed as they already have different labels
if dictionary[pairs[1]] != 0 and len(list(set([c for c in dictionary[pairs[0]] if c in dictionary[pairs[1]]]))) == 0:
print ("no change")
elif dictionary[pairs[1]] == 0: #if the 2nd is not populated give it a new letter
dictionary[pairs[1]] = letters[n+1]
#if the 2nd is populated and equal to the 1st, then change the letter of the 2nd to a new one and assign its original letter to all the others that had the same original letter
elif dictionary[pairs[1]] != 0 and len(list(set([c for c in dictionary[pairs[0]] if c in dictionary[pairs[1]]]))) > 0:
#need to check that they don't share a charcter
print ("need to add a letter")
original_value = dictionary[pairs[1]]
dictionary[pairs[1]] = letters[n]
for key, value in dictionary.items():
if key != pairs[0] and len(list(set([c for c in original_value if c in value])))>0: #for any given value, check if it had a character from the group that will get a new letter, if so, it means that they are equal and thus the new letter should also appear in the value of the "old" group
dictionary[key] = original_value + letters[n] #add the original letter of the group to all the other groups it was similar to
print ('1st is empty')
dictionary[pairs[0]] = letters[n]
dictionary[pairs[1]] = letters[n+1]
print (dictionary)
# get the letter out the dictionary
labels = list(dictionary.values())
labels1 = list(set(labels))
final_label = ''.join(labels1)
for GroupName in group_names:
if GroupName in dictionary:
print ("already exists")
dictionary[str(GroupName)] = final_label
for key, value in dictionary.items(): #this keeps only the unique char per group and sort it by group
dictionary[key] = ''.join(set(value))
dict2 = dict(sorted(dictionary.items())) # the final output