Adventures in Machine Learning

Mastering Statistical Tests: Friedman and Nemenyi for Accurate Data Analysis

Friedman and Nemenyi Tests: Analyzing Repeated Measures Data

Imagine a researcher studying the effects of three different drugs on reaction times in patients. How can that researcher determine if there is a significant difference in reaction times between the three drugs?

This article will introduce two statistical tests that can be used to analyze this type of data: the Friedman test and the Nemenyi test.

1) Statistical Tests: Friedman and Nemenyi

Friedman Test

The Friedman test is a non-parametric test used to analyze data from repeated measures ANOVA. It is primarily used when the assumption of normality is not met, making traditional parametric tests unsuitable. The test compares three or more groups and determines if there is a significant difference between them.

Example of Friedman Test

Let’s say the researcher from our earlier example has collected reaction times from 10 patients for each drug. The null hypothesis would be that there is no significant difference in reaction times between the drugs. The alternative hypothesis would be that there is a significant difference in reaction times between the drugs. The Friedman test would be used to analyze this data and determine if the null hypothesis can be rejected.

Nemenyi Test

The Nemenyi test is a post-hoc test used after the Friedman test to determine which groups are significantly different from each other. It is used when the Friedman test determines a significant difference between groups. The test compares all possible pairwise combinations of means and produces a matrix of p-values.

Example of Nemenyi Test

Continuing with our example, if the Friedman test determines a significant difference between the reaction times of the three drugs, the Nemenyi test would be used to determine which pairwise comparisons are statistically significant. For example, the test might determine that Drug A and Drug B have significantly different reaction times, while Drug B and Drug C do not have significantly different reaction times.

2) Data Creation

Creating data is an essential step in any research study. In our example, the researcher would need to measure reaction times for each patient and each drug. This would involve deciding on the measurement tool and recording the data accurately.

Example of Creating Data

In our example, the researcher might use arrays to record the response times for each patient and each drug. They would measure the reaction times of 10 patients for each drug and record the data in a spreadsheet. The spreadsheet would have columns for patient ID, drug, and response time.

Conclusion

This article has introduced the Friedman and Nemenyi tests as statistical tools used to analyze data from repeated measures ANOVA. We have also discussed the importance of creating data in any research study. By understanding these concepts, researchers can make informed decisions about their data analysis and draw accurate conclusions.

3) Performing Friedman Test

The Friedman test is commonly used in research to determine whether there is a significant difference between three or more groups. It is a non-parametric test, which means it does not make any assumptions about the distribution of the data.

This section will discuss how to perform the Friedman test in Python using the scipy.stats module.

Example of performing Friedman Test

To perform the Friedman test, we first need to define the null and alternative hypotheses. The null hypothesis is that there is no significant difference between the groups, while the alternative hypothesis is that there is a significant difference between at least two groups.

Once we have defined these hypotheses, we can calculate the test statistic and p-value for our dataset. The test statistic is calculated using the formula:

= [12N / (k(k+1))] * (R – T)

where N is the total number of observations, k is the number of groups, R is the rank sum of the ith group, and T is the average rank.

The p-value is then calculated using the chi-squared distribution with k-1 degrees of freedom. To perform the Friedman test in Python, we can use the scipy.stats module.

The code below shows an example of how to perform the test on a dataset of reaction times for three drugs:

import numpy as np
from scipy.stats import friedmanchisquare
# Define the data
drug_a = [20.2, 21.1, 19.8, 22.3, 25.6, 23.4, 22.0, 21.8, 23.3, 24.2]
drug_b = [23.5, 24.9, 22.4, 26.1, 24.3, 25.5, 22.6, 25.1, 23.9, 22.7]
drug_c = [27.8, 29.1, 28.4, 27.9, 30.2, 29.5, 28.9, 28.1, 30.5, 29.8]
# Combine the data into a 2D array
data = np.array([drug_a, drug_b, drug_c])
# Conduct the Friedman test
stat, p_value = friedmanchisquare(*data)
# Print the results
print("Friedman Test")
print("-------------")
print("Statistic: ", stat)
print("p-value: ", p_value)

In this example, we first defined the data for each drug. We then combined the data into a 2D array and passed it to the `friedmanchisquare` function.

The function returns the test statistic and p-value, which we printed to the console. In this example, we would reject the null hypothesis since the p-value is less than 0.05.

4) Performing Nemenyi Test

Once we have determined that there is a significant difference between the groups using the Friedman test, we can perform a post-hoc test to determine which pairs of groups are significantly different from each other. The Nemenyi test is a commonly used post-hoc test that compares all possible pairs of groups and produces a matrix of p-values.

This section will discuss how to perform the Nemenyi test in Python using the scikit_posthocs module.

Example of performing Nemenyi Test

To perform the Nemenyi test, we first need to prepare our data. We can use a numpy array to store our data, with each row representing a group and each column representing an observation.

We will also need to transpose our data to ensure that each row represents an observation and each column represents a group. Once we have prepared our data, we can use the `sp.posthoc_nemenyi_friedman()` function from the scikit_posthocs module to perform the test.

The function returns a matrix of p-values for all pairwise comparisons between groups. We can then use these p-values to determine which pairs of groups are significantly different from each other.

The code below shows an example of how to perform the Nemenyi test in Python using the scikit_posthocs module:

import numpy as np
import scikit_posthocs as sp
from scipy.stats import friedmanchisquare
# Define the data
drug_a = [20.2, 21.1, 19.8, 22.3, 25.6, 23.4, 22.0, 21.8, 23.3, 24.2]
drug_b = [23.5, 24.9, 22.4, 26.1, 24.3, 25.5, 22.6, 25.1, 23.9, 22.7]
drug_c = [27.8, 29.1, 28.4, 27.9, 30.2, 29.5, 28.9, 28.1, 30.5, 29.8]
# Combine the data into a numpy array and transpose
data = np.array([drug_a, drug_b, drug_c])
data = data.transpose()
# Perform the Friedman test
stat, p_value = friedmanchisquare(*data)
# Perform the Nemenyi test
p_values = sp.posthoc_nemenyi_friedman(data)
# Find the pairwise comparisons that are significant
for i in range(len(p_values)):
    for j in range(i+1, len(p_values)):
        if p_values[i][j] < 0.05:
            print("Group", i+1, "and Group", j+1, "are significantly different.")

In this example, we first defined the data for each drug. We then combined the data into a numpy array and transposed it.

We used the `friedmanchisquare` function to perform the Friedman test and the `posthoc_nemenyi_friedman` function from the scikit_posthocs module to perform the Nemenyi test. The function returns a matrix of p-values, which we iterated through to find the pairwise comparisons that are significant.

In this example, we would find that Group 1 and Group 2 are significantly different, as well as Group 2 and Group 3.

This article discusses two statistical tests commonly used in research studies: the Friedman test and the Nemenyi test. The Friedman test is used to determine if there is a significant difference between three or more groups, while the Nemenyi test is used as a post-hoc test to determine which pairs of groups are significantly different. We have provided examples of how to perform these tests using Python and explained the importance of creating data in any research study.

Accurate data analysis is essential to draw meaningful conclusions from research studies. Mastering these statistical tests can enable researchers to make informed decisions and deliver reliable results.

Popular Posts