## Curve Fitting in Python: Understanding the Basics

When it comes to data analysis, curve fitting is an important tool that can be used to model and analyze datasets. Curve fitting is the process of finding a function or equation that best fits a given dataset.

This can be done using Python, which is an open-source programming language that is ideal for data analysis and scientific computing. In this article, we will explore curve fitting in Python, including its purpose, methods, and using sample datasets.

## Purpose of Curve Fitting

The primary purpose of curve fitting is to find a function or equation that best describes the relationship between two variables in a dataset. This relationship can be linear or nonlinear, and the function or equation that best fits the data can be used to make predictions or gain insight into the underlying process that generated the data.

## Two Methods for Curve Fitting

### 1. Least Square Method

The least square method is an optimization technique that minimizes the sum of the squared differences between the actual and predicted values.

### 2. Maximum Likelihood Estimation

Maximum likelihood estimation, on the other hand, involves finding the parameters of a probability distribution that maximize the likelihood of observing the given data.

## Sample Datasets and Code Snippets

To better understand curve fitting in Python, let’s take a look at some sample datasets and code snippets. We will be using Python’s SciPy library, which provides a `curve_fit()`

function that can be used for curve fitting.

First, we need to import the necessary libraries:

```
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
```

### Now we can create some sample data to work with:

```
x_data = np.linspace(-5, 5, num=50)
y_data = 2.5 * np.sin(2 * np.pi * 0.75 * x_data) + 1.5 * np.sin(2 * np.pi * 1.25 * x_data) + np.random.normal(size=50)
```

In this example, we have created a dataset of 50 points that are generated from a combination of two sine waves with some random noise added. Using the `curve_fit()`

function, we can fit a model to the data:

```
def model_f(x, a, b, c):
return a * np.sin(b * x) + c
popt, pcov = curve_fit(model_f, x_data, y_data)
```

In this example, we have defined a function called `model_f()`

that takes three parameters: `a`

, `b`

, and `c`

. We then pass this function along with the `x`

and `y`

data to the `curve_fit()`

function. The output of the `curve_fit()`

function is two variables: `popt`

and `pcov`

.

The `popt`

variable contains the optimized values of the parameters `a`

, `b`

, and `c`

, while `pcov`

contains the estimated covariance matrix.

## Least Square Method

Now let’s take a closer look at the least square method. The least square method is an optimization technique that aims to minimize the sum of the squared differences between the actual and predicted values.

It works by finding the values of the parameters that minimize this square difference. The minimizing function is given by:

```
i (f(x_i, z) - y_i) ** 2
```

where `i`

is the index of the data point, `f(x_i, z)`

is the model predicted value, `y_i`

is the actual value, and `z`

is the parameter we want to optimize.

Once we have obtained the optimized value, we can also estimate the errors on the parameters using the covariance matrix:

```
errors = np.sqrt(np.diag(pcov))
```

## Example 1 Code Snippet

Let’s take a look at an example of curve fitting using the least square method. We will start by creating some sample data:

```
x_data = np.linspace(-10, 10, 100)
y_data = 5 * x_data ** 3 + 2 * x_data ** 2 - 10 * x_data + 5 + np.random.normal(scale=100, size=100)
```

In this example, we have generated 100 points that follow a third-degree polynomial function with some random noise added.

### We can define a function that represents this polynomial:

```
def model_f(x, a, b, c, d):
return a * x ** 3 + b * x ** 2 + c * x + d
```

Now we can pass this function, along with the `x`

and `y`

data, to the `curve_fit()`

function:

```
popt, pcov = curve_fit(model_f, x_data, y_data)
```

The `popt`

variable will contain the optimized values of the parameters `a`

, `b`

, `c`

, and `d`

, while `pcov`

will contain the estimated covariance matrix. We can now use these optimized values to plot the model function along with the original data:

```
x_model = np.linspace(-10, 10, 1000)
y_model = model_f(x_model, *popt)
plt.plot(x_data, y_data, 'o')
plt.plot(x_model, y_model, '-')
plt.show()
```

This will generate a plot that shows the original data in circles, and the fitted curve in a solid line.

## Conclusion

In conclusion, curve fitting in Python is a powerful tool that can be used to model and analyze datasets. The least square method is an optimization technique that minimizes the sum of the squared differences between the actual and predicted values.

Maximum likelihood estimation involves finding the parameters of a probability distribution that maximize the likelihood of observing the given data. By using Python’s SciPy library, we can fit models to datasets and gain insight into the underlying processes that generated the data.

## Maximum Likelihood Estimation for Better Fitting

Another method for curve fitting in Python is Maximum Likelihood Estimation (MLE), which involves finding the parameters of a probability distribution that maximizes the likelihood of observing the given data. This method is useful when the data follows a specific distribution, such as the Normal distribution.

MLE also involves minimizing a function, but instead of the sum of squared errors, we minimize the negative log likelihood function, which is given by:

```
-[i=1 to n] log((1/ ( * ( 2))) * e^(-(yi - fi)/)^2))
```

where `n`

is the number of data points, `sigma ()`

is the standard deviation, `fi`

is the model predicted value, and `yi`

is the actual value. The objective is to find the parameter values that maximize this log-likelihood function.

We can find these values by using the `curve_fit()`

function in SciPy.

## Obtaining Optimized Value with Better Fit

When using MLE, we get not only the optimized values of the parameters, but also the covariance matrix, which can be used to estimate the errors on the parameters. A smaller error means that the estimate is more precise.

Once we have obtained the optimized value, we can plot the model function along with the original data to visualize the fit.

## Example 2

Let’s look at an example of using MLE in Python for curve fitting:

```
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def fit_f(x, a, b, c, d):
return a * np.exp(-b * x) + c * np.sin(d * x)
x_data = np.linspace(0, 4, 50)
y_data = fit_f(x_data, 2.5, 1.3, 0.5, 0.75) + 0.2 * np.random.normal(size=len(x_data))
popt, pcov = curve_fit(fit_f, x_data, y_data)
a_opt, b_opt, c_opt, d_opt = popt
x_model = np.linspace(0, 4, 1000)
y_model = fit_f(x_model, a_opt, b_opt, c_opt, d_opt)
plt.plot(x_data, y_data, 'o')
plt.plot(x_model, y_model, '-')
plt.show()
```

In this example, we have defined a function called `fit_f()`

that takes four parameters: `a`

, `b`

, `c`

, and `d`

. We have also created some sample data using this function with some added noise.

We then pass this function, along with the `x`

and `y`

data, to the `curve_fit()`

function which performs MLE to obtain the optimized values of the parameters and the corresponding covariance error matrix. We can then use these optimized values to generate the model function and plot it along with the original data.

As we can see from the plot, the model fits the data quite well.

## Conclusion

In conclusion, curve fitting in Python is a useful tool for modeling and analyzing data. Maximum likelihood estimation is an alternative to the least squares method for parameter estimation.

It involves minimizing a negative log likelihood function to find the optimized values of the parameters that maximize the likelihood of observing the data. By using the SciPy library in Python, we can perform curve fitting using both methods and obtain not only the optimized parameter values but also the corresponding covariance error matrix.

This information allows us to visualize the fit of the model and estimate the error of the parameter estimates, which is crucial in many analytical applications. In conclusion, curve fitting in Python is a powerful tool that helps us model and analyze datasets.

This article discussed two methods for curve fitting in Python, namely the least square method and maximum likelihood estimation. The least square method is an optimization technique that minimizes the sum of the squared differences between the actual and predicted values, while maximum likelihood estimation involves finding the parameters of a probability distribution that maximize the likelihood of observing the given data.

Using Python’s SciPy library, we can fit models to datasets and gain insight into the underlying processes that generated the data. The takeaway from this article is that curve fitting in Python is a crucial technique that enables us to make informed decisions in engineering, science, and business.