# Unlocking the Power of Python’s Statistics Module: Essential Functions for Data Analysis

## Python Statistics Module Functions

Statistics is a branch of mathematics that deals with data collection, analysis, and interpretation. In the world of data science, analyzing datasets is one of the primary tasks.

Python has a built-in module named `statistics` that provides functions to calculate various statistics for a given dataset. In this article, we will explore some of the essential functions of the statistics module and how to use them.

### 1. `Mean()` function

The mean of a dataset is the average value of the numbers in the dataset.

It’s a fundamental concept in statistics because it’s used to estimate the expected value of a random variable. The mean can be calculated using the formula:

``Mean = Sum of the values / Number of values``

The `statistics.mean()` function in Python calculates the mean of a set of values.

For example, `statistics.mean([1, 3, 5, 7, 9])` will return 5.

### 2. `Median()` function

The median is another important statistic in the world of data science. It is the middle value in a dataset when the data is ordered from smallest to largest.

In case of an even number of values, the median is the average of the two middle numbers. In Python, the `statistics.median()` function can be used to calculate the median of a set of values.

For example, `statistics.median([1, 3, 5, 7, 9])` will return 5.

### 3. `Median_high()` function

The `median_high()` function is used to calculate the high median of a dataset with an even number of values. It returns the higher of the two middle values.

It is used mainly when dealing with discrete data. For example, `statistics.median_high([1, 3, 5, 7, 9, 11])` will return 7.

### 4. `Median_low()` function

The `median_low()` function is used to calculate the low median of a dataset with an even number of values.

It returns the lower of the two middle values. It is also used when dealing with discrete data.

For example, `statistics.median_low([1, 3, 5, 7, 9, 11])` will return 5.

### 5. `Stdev()` function

Standard deviation is a measure of the amount of variation in a dataset. It is calculated as the square root of the variance.

The variance is the average of the squared differences from the mean. The `statistics.stdev()` function in Python is used to calculate the standard deviation of a set of values.

For example, `statistics.stdev([1, 3, 5, 7, 9])` will return 2.83.

### 6. `Sum()` function

The `sum()` function in Python is used to calculate the summation of all the values in a dataset. It is equivalent to the notation `Σx` in math.

For example, `statistics.sum([1, 3, 5, 7, 9])` will return 25.

### 7. `Counts()` function

The `counts()` function returns the frequency of occurrence of each value in a given dataset. It returns a list of tuples where each tuple has two values; the first value is the data point and the second value is the frequency of that data point.

For example, `statistics.counts([1, 2, 3, 1, 1, 4, 2])` will return `[(1, 3), (2, 2), (3, 1), (4, 1)]`.

### 8. The `Mean()` Function

In statistics, the mean (or average) is a measure of central tendency that represents the sum of a set of values divided by the number of values. It is an important concept because it is often used to provide an estimate of the expected value of a random variable.

The mean can be calculated for any type of data, including numerical and categorical data. The `statistics.mean()` function in Python is used to calculate the mean of a set of values.

The function takes the set of values as an argument, and it returns the mean as a float value. The following is an example of how to calculate the mean using the `statistics.mean()` function:

``````import statistics

data = [1, 2, 3, 4, 5]

mean = statistics.mean(data)

print(mean)
# Output: 3.0``````

The above code calculates the mean of the dataset `[1, 2, 3, 4, 5]`.

The output of the code is 3.0, which is the mean of the dataset. The mean is a measure of central tendency, but it can sometimes be misleading.

This happens when the data has extreme values or outliers. In such cases, the mean may not represent the typical value of the dataset.

Therefore, it is essential to consider other measures of central tendency, such as the median or mode, when analyzing datasets.

## Conclusion

In this article, we have explored some of the essential functions of the statistics module in Python. Understanding these functions is crucial in data analysis as they are used to calculate various statistics on datasets.

We have also seen how the `statistics.mean()` function can be used to calculate the mean of a set of values. The mean is an essential measure of central tendency, but it is essential to consider other measures of central tendency to get a better understanding of the dataset.

### 3) The `Median()` Function

In statistics, the median is the middle value of a dataset when the data is arranged in ascending or descending order. It is an important metric because it helps identify the central tendency of the data and is not affected by outliers or extreme values.

The median is used in various applications, including economics, marketing research, and healthcare. The `statistics.median()` function in Python is used to calculate the median of a dataset.

The function accepts a list of values as input and returns the median value of the dataset. For example, consider the dataset `[4, 2, 5, 7, 1, 3]`.

To calculate the median, we first need to sort the dataset in ascending order, which gives `[1, 2, 3, 4, 5, 7]`. The middle value of this list is 4, which is the median.

The following code demonstrates how to calculate the median using the `statistics.median()` function:

``````import statistics

data = [4, 2, 5, 7, 1, 3]

median = statistics.median(data)

print(median)
# Output: 4``````

The above code calculates the median of the given dataset. The output of the code is 4, which is the median of the dataset.

The median is an essential metric when it comes to data analysis. When we have a dataset with a large number of values, the mean may not be an accurate representation of the dataset’s central tendency.

In such cases, the median is a better metric to use. For instance, if we have a medical research dataset containing patient ages ranging from 1 to 90, the median age may be a better representation of central tendency than the mean age, in which case outliers like a 90-year-old patient could unnecessarily skew the mean.

### 4) The `Median_High()` Function

The `median_high()` function is a variation of the median function, which is used to calculate the median of a dataset in case of an even number of values. The `median_high()` function returns the highest value of the two median values.

This is used when dealing with discrete data or data that takes on integer values. For example, suppose we have the dataset `[4, 7, 3, 1, 8, 5]`.

The median of this dataset is 4 and 5, since these two values are in the middle of the dataset. However, if we have a dataset with an even number of values like `[4, 7, 3, 1, 8, 5, 10, 6]`, there is no middle value, and the median is the average of the two middle values, which is 5.5. Instead of returning both median values, as the `median()` function does, the `median_high()` function returns only the higher value of the two middle values, which, in this instance would be 6.

The `statistics.median_high()` function in Python is used to calculate the `median_high` of a dataset. The function accepts a list of values as input and returns the `median_high` value of the dataset.

For example, consider the dataset `[4, 7, 3, 1, 8, 5, 10, 6]`. To calculate the `median_high`, we first need to sort the dataset in ascending order, which gives `[1, 3, 4, 5, 6, 7, 8, 10]`.

The two middle values of this ordered list are 5 and 6. The `median_high` of the given dataset is 6, since it is the higher of the two values.

The following code demonstrates how to calculate the `median_high` using the `statistics.median_high()` function:

``````import statistics

data = [4, 7, 3, 1, 8, 5, 10, 6]

median_high = statistics.median_high(data)

print(median_high)
# Output: 6``````

It is essential to note that the `median_high` function applies to discrete data with an even number of values. If the dataset is continuous in nature, it is not meaningful to use the `median_high()` function as there is no discernible separation of values in the dataset.

## Conclusion

The `median` and `median_high` functions are essential tools in a statistician’s toolkit. They help identify the central tendency of a dataset irrespective of outliers.

The Python statistics module has inbuilt functions, `statistics.median()` and `statistics.median_high()` to calculate the median or `median_high` of a given dataset. Understanding these functions can help you better analyze datasets and make informed decisions.

### 5) The `Median_Low()` Function

In statistical analysis, the median is a measure of central tendency that provides insights into a dataset’s distribution. The `median_low()` function is a variation of the median function, which is used to calculate the median of a dataset with an even number of values.

The `median_low()` function returns the lowest value of the two median values. This is also used when dealing with discrete data or data that takes on integer values.

For example, consider the dataset `[4, 7, 3, 1, 8, 5]`. The median is 4 and 5 since these two values are in the middle of the dataset.

However, if we have a dataset with an even number of values, like `[4, 7, 3, 1, 8, 5, 10, 6]`, there is no middle value, and the median is the average of the two middle values, which is 5.5. Instead of returning both median values, as the `median()` function does, the `median_low()` function returns the lower of the two middle values, which, in this instance, would be 5. The `statistics.median_low()` function in Python is used to calculate the `median_low` of a dataset.

The function accepts a list of values as input and returns the `median_low` value of the dataset. For example, consider the dataset `[4, 7, 3, 1, 8, 5, 10, 6]`.

To calculate the `median_low`, we first need to sort the dataset in ascending order, which gives `[1, 3, 4, 5, 6, 7, 8, 10]`. The two middle values of this ordered list are 5 and 6.

The `median_low` of the given dataset is 5 since it is the lower of the two values. The following code demonstrates how to calculate the `median_low` using the `statistics.median_low()` function:

``````import statistics

data = [4, 7, 3, 1, 8, 5, 10, 6]

median_low = statistics.median_low(data)

print(median_low)
# Output: 5``````

It is crucial to note that the `median_low()` function applies to discrete data with an even number of values.

If the dataset is continuous in nature, it is not meaningful to use the `median_low()` function as there is no discernible separation of values in the dataset.

### 6) The `Stdev()` Function

Standard deviation is a measure of the spread of data points in a dataset. The `stdev()` function is used to calculate the standard deviation of a dataset in Python.

The standard deviation reflects how much the data deviates from the mean, the higher the standard deviation, the more dispersed the data is. It is an important metric in statistical analysis since it provides insights into how much variability is present in the dataset.

The `statistics.stdev()` function in Python is used to calculate the standard deviation of a dataset. The function takes a list of values as input and returns the standard deviation of the dataset.

## For example:

``````import statistics

data = [1, 3, 5, 7, 9]

stdev = statistics.stdev(data)

print(stdev)
# Output: 2.8284271247461903``````

The above code calculates the standard deviation of the dataset `[1, 3, 5, 7, 9]`. The output of the code is `2.8284271247461903`, which is the standard deviation of the dataset.

Standard deviation is an important measure because it helps in understanding how much variation is present in the data. In general, the larger the standard deviation, the wider the range of the data points, revealing that there is significant variability in the dataset.

The standard deviation can also be used to identify outliers in the dataset, as they are often outliers from the mean value.

## Conclusion

In summary, the `median_low()` function and `stdev()` function are important tools for statistical analysis in Python. They help identify the central tendency and variability in a dataset respectively.

The `median_low()` function is used to calculate the median of a dataset in case of even number values, while the `stdev()` function calculates the standard deviation of a dataset. Understanding these functions is crucial for data analysis, and the Python statistics module provides efficient tools to help in the analysis of datasets.

### 7) The `_Sum()` Function

The `_sum()` function in the Python statistics module is used to calculate the sum of all values in a dataset. The function takes a list of values as input and returns the sum of the dataset.

The `_sum()` function is a built-in function in the statistics module and is commonly used in statistical analysis, including summation of data points in a dataset. For example, consider the following list of numbers: `[1, 3, 5, 7, 9]`.

To calculate the sum of this dataset, we can use the `_sum()` function as follows:

``````import statistics

data = [1, 3, 5, 7, 9]

summation = statistics._sum(data)

print(summation)
# Output: 25``````

The above code calculates the sum of the dataset `[1, 3, 5, 7, 9]` by using the `_sum()` function. The output of the code is 25, which is the sum of the dataset.

The `_sum()` function is essential in statistical analysis as it provides a way of determining the total value or amount of all the data points in a dataset. This is particularly useful when defining the overall performance of a dataset or examining the overall outcome.

### 8) The `_Counts()` Function

The `_counts()` function in the Python statistics module is used to calculate the frequency of occurrence of each value in a given dataset. It returns a list of tuples where each tuple has two values; the first value is the data point and the second value is the frequency of that data point.

For example, `statistics._counts([1, 2, 3, 1, 1, 4, 2])` will return `[(1, 3), (2, 2), (3, 1), (4, 1)]`.