본문 바로가기

ML/PM/Stat/Statistics

Chi-suqare & Fisher's Exact test와 기대빈도

기술통계 부분은 내가 요즘엔 거의 하지 않아서.. 자주 잊어버린다. 


핑계겠지만... 잊어버리지 않도록. 자주 간과하는 문제점 하나를 짚어보자면.


Chi-square testFisher's exact test를 이용하여  contingency table을 분석할 수 있는데, 아래와 같은 차이점이 있다.


With large samples, a chi-squared test can be used in this situation. However, the significance value it provides is only an approximation, because the sampling distribution of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution. The approximation is inadequate when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the "expected values") being low. The usual rule of thumb for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one degree of freedom (this rule is now known to be overly conservative[4]). In fact, for small, sparse, or unbalanced data, the exact and asymptotic p-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.[5][6] In contrast the Fisher test is, as its name states, exact as long as the experimental procedure keeps the row and column totals fixed, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-squared test is appropriate. (영문 위키 참조)


따라서.. 일반적으로 기대도수 (expected value)가 5보다 작은 범주의 20% 이상에서 기대빈도가 5미만이어서는 안된다. ? 혹은 25% 기준...?

통용되는 기준이 다르지만, 결론은 하나의 cell에라도 기대빈도 5미만인 경우가 있으면 결과는 부정확해진다는것.


여기서 expected value = ni * nj / n_total 그냥 raw data 읽고서 무식하게.. 5미만이네 아니네 하지말자...;


그나저나 Fisher's exact test는 말 그대로 n수에 따른 approximation이 아니기 때문에, 샘플수와 상관없이 언제나 정확, But 계산이 어려움. 끝

'ML/PM/Stat > Statistics' 카테고리의 다른 글

Joint, Marginal, and Conditional Probability / Bayes's Law  (0) 2014.01.28
QQ plot  (0) 2014.01.09
Multiple Comparisons  (0) 2013.12.18
PDF vs PMF & Other Distributions  (0) 2010.12.06
유의수준? 유의하다?  (0) 2010.01.10