Adventures in Machine Learning

Mastering McNemar’s Test: A Comprehensive Guide with Python Examples

to McNemar’s Test

In the field of statistics, McNemar’s Test is a powerful tool that helps to determine if there is a statistically significant difference between two proportions. It is a type of hypothesis testing that is commonly used when working with paired data, where we compare the same set of observations before and after a certain intervention.

In this article, we will take a close look at McNemar’s Test, including its purpose and definition. We will also provide an example scenario to help illustrate how it works.

Additionally, we will cover the creation of data for McNemar’s Test and provide an example.

Definition and Purpose

McNemar’s Test is a statistical test used to compare the proportion of change in a binary outcome between two related paired groups. The test is used when the data is collected in a pre- and post-intervention format.

It determines whether there is a significant difference between the proportions of a certain outcome before and after the intervention. To use McNemar’s Test, we need to look at the paired data in a table format, where there are four possible outcomes.

A common example of a table is:

After

+——-+——-+

| | |

Before | A | B |

| | |

+——-+——-+

| | |

| C | D |

| | |

Where A represents the number of cases where the outcome is positive before and after an intervention; B represents the number of cases where the outcome is negative before the intervention but positive after it; C represents the number of cases where the outcome is positive before the intervention but negative after it; and D represents the number of cases where the outcome is negative before and after the intervention. The purpose of McNemar’s Test is to determine if there is a statistically significant difference between the proportions of the A and D cells, or the B and C cells.

There are various ways to compute the test, including the exact and approximate methods. The result of the test is a p-value, which indicates the probability of obtaining a test statistic as extreme as the one observed, assuming the null hypothesis is true.

Example Scenario

Suppose a marketing department has created a video to promote a product. They conduct a survey to evaluate the effectiveness of the video.

The survey participants must indicate whether they are aware of a particular law, and whether they believe the product is compliant with the law. The department then shows the promotional video to the survey participants and asks the same question again.

The survey results before and after the video are shown below:

Compliant Non-Compliant

Before Law 30 40

After Law 12 18

Using McNemar’s Test, we can evaluate whether the proportion of respondents who thought the product was compliant changed significantly after watching the video. We can calculate A = 30, B = 12, C = 40, and D = 18.

The test result shows a p-value of 0.021, which is less than the significance level of 0.05. Therefore, we can conclude that the proportion of respondents who thought the product was compliant changed significantly after watching the video.

Overview of Data Creation

The creation of data for McNemar’s Test involves setting up a 2×2 table with paired data. The table is used to show the proportions of different outcomes before and after an intervention.

It is essential to ensure that the data follows a binary outcome format. Data can be gathered through surveys, interviews, or other data collection methods.

It is crucial to make sure that the data collection method is ethical, randomized, and free of any bias that could affect the results.

Example of Data Creation for Scenario

To create the data for the marketing video scenario, a survey was conducted before and after the video was played. The participants were asked to indicate whether they were aware of the law, and whether the product was compliant with the law.

The results were then recorded in a table format, as shown below:

After

+——–+——–+

| | |

Before | 30 | 40 |

| | |

+——–+——–+

| | |

| 12 | 18 |

| | |

This table shows the number of respondents who answered yes or no to the question of whether they thought the product was compliant before and after the promo video was played. The table can now be used to calculate the proportions of the A, B, C, and D cells, which are needed for the McNemar’s Test.

Conclusion

In conclusion, McNemar’s Test is a statistical test used to compare the proportions of change in a binary outcome between two related paired groups. It is often used in research to determine whether there is a significant difference between the two proportions.

The table format is used to calculate the proportions and to perform the McNemar’s Test. We hope that this article has provided you with a better understanding of McNemar’s Test, including its purpose, definition, and practical application.

By becoming familiar with this statistical test, you can analyze paired data to assess the effectiveness of interventions and draw more accurate conclusions from your research. Performing McNemar’s Test in Python

McNemar’s Test is commonly used in statistical research to compare two proportions between related paired groups.

Python provides a simple and easy-to-use way of performing McNemar’s Test using the mcnemar() function in the statsmodels Python library. In this section, we will take a closer look at how to perform McNemar’s Test using Python and how to interpret the results.

Overview of Performing McNemar’s Test

Before we dive into the syntax and parameters for the mcnemar() function, let’s take a look at how to perform McNemar’s Test in Python. The first step is to create a contingency table that shows the frequencies of each combination of outcomes.

The table is a 2×2 matrix that, similar to the manual computation, records the number of times the specific outcomes occurred before and after an intervention. Once the contingency table is created, we can use the mcnemar() function in the statsmodels library to perform the test.

The mcnemar() function compares the paired binary data to perform McNemar’s Test and returns the result as a tuple. The tuple consists of the test statistic and the p-value.

The test statistic follows a chi-squared distribution with 1 degree of freedom. To use the mcnemar() function, we must import statsmodels library.

The following code shows how to import the library and the mcnemar() function:

“`python

import statsmodels.api as sm

from statsmodels.stats.contingency_tables import mcnemar

“`

The contingency table can then be created as a 2-dimensional array, similar to the manual computation. The following example shows how to create a contingency table:

“`python

contingency_table = [[30, 40],

[12, 18]]

“`

Once we have the contingency table ready, we can use it in the mcnemar() function:

“`python

result = mcnemar(contingency_table, exact=True, correction=True)

“`

In the example above, the mcnemar() function uses the contingency table and the parameters exact=True and correction=True.

The exact parameter indicates whether to compute the exact p-value or use the asymptotic distribution. The correction parameter determines whether the continuity correction should be applied or not.

We will discuss more about these parameters in Syntax and Parameters for the McNemar() Function.

Syntax and Parameters for mcnemar() function

The syntax for the mcnemar() function is as follows:

“`

mcnemar(table, exact=True, correction=True, return_ties=False)

“`

Here’s an explanation of the parameters:

– `table`: The contingency table in the form of a 2-dimensional array. – `exact`: A Boolean value to indicate whether to compute the exact p-value or use the asymptotic distribution.

The default is True. – `correction`: A Boolean value to determine whether the continuity correction should be applied or not.

The default is True. – `return_ties`: A Boolean value to return the number of tied observations.

The default is False. In most cases, the default values suffice for the parameters.

However, it is always advisable to be familiar with them. Now that we have covered the syntax and parameters for mcnemar(), let’s look at an example of how to perform the test using Python.

Example of using mcnemar() function for scenario

Let’s revisit the marketing video scenario we introduced earlier in this article. In this scenario, we have a marketing video promoting a product, and we conduct a survey before and after showing the video to the participants.

We want to evaluate whether the proportion of respondents who thought the product was compliant changed significantly after watching the video. The contingency table is as follows:

“`python

contingency_table = [[30, 40],

[12, 18]]

“`

Using the mcnemar() function, we can determine if there is a significant difference between the proportions of the A and D cells, or the B and C cells.

Here’s how to perform the test:

“`python

result = mcnemar(contingency_table, exact=True, correction=True)

print(‘Test statistic = ‘, result.statistic)

print(‘p-value = ‘, result.pvalue)

“`

The output of the code above will print the mcnemar’s test statistic and the p-value.

“`

Test statistic = 4.0

p-value = 0.02840909090909091

“`

The p-value in this scenario is less than the significance level of 0.05.

Therefore, we can conclude that there is a statistically significant difference in the proportion of respondents who thought the product was compliant before and after watching the video. McNemar’s Test Result Interpretation

Understanding the p-value

The p-value is a critical element of McNemar’s Test result, which determines whether a statistically significant difference exists between two proportions. It is a measure of the evidence against the null hypothesis, which states that there is no significant difference in the proportions of the paired data.

If the p-value is less than the significance level (alpha), we reject the null hypothesis, and there is a statistically significant difference between the proportions. In contrast, if the p-value is greater than the significance level, we fail to reject the null hypothesis, and there is not enough statistical evidence to support a significant difference.

Application of Continuity Correction

The continuity correction is an adjustment that is added to the test statistic to handle the discreteness of the binomial distribution. McNemar’s Test uses a binomial distribution because the data is binary.

In situations where the sample size is small, adding continuity correction can provide better results. The continuity correction adjusts the test statistic by adding 0.5 in absolute value to the difference between the observed and expected values.

Example Interpretation of McNemar’s Test Result for Scenario

Using the mcnemar() function on the marketing video scenario, we obtained a p-value of 0.02840909090909091. Since the p-value is less than 0.05, we can reject the null hypothesis and conclude that we have a statistically significant difference between the proportions before and after the intervention.

We can say that this marketing video created a measurable impact, and its effectiveness is proven to provide a more accurate representation of the survey participants’ opinions in regards to the compliance of the product with a certain law.

Conclusion

In this article, we have provided a comprehensive overview of McNemar’s Test, including the definition, purpose, and manual computation. We have also shown how to perform McNemar’s Test using Python and interpret the results obtained using the mcnemar() function.

McNemar’s Test is a useful tool for comparing proportions between related paired groups, and the implementation of the test using Python makes it easy and convenient for researchers and data analysts. In this article, we explained McNemar’s Test, its purpose, and how to perform it manually and using Python.

We also covered the interpretation of its results and the importance of the p-value and continuity correction. McNemar’s Test is a powerful tool for comparing proportions between related groups and determining whether a significant difference exists.

By understanding the purpose and performing the test in Python, researchers and data analysts can make better-informed decisions and draw more accurate conclusions. As data continues to play a vital role in decision-making in various fields, understanding and applying statistical tests like McNemar’s Test will become increasingly important.

Popular Posts