# Pearson Correlation

The **Pearson correlation coefficient** measures the **linear relationship** between two continuous variables.

---

# 📌 When to Use
Use Pearson correlation when:
- Both variables are **numeric**
- Relationship is roughly **linear**
- No extreme outliers
- Data is approximately **normally distributed**

---

# 📐 Simple Formula

$$
r = \frac{\sum (x - \bar{x})(y - \bar{y})}
         {\sqrt{\sum (x - \bar{x})^2} \sqrt{\sum (y - \bar{y})^2}}
$$

The value ranges between:
- \( r = 1 \): Perfect positive linear relationship  
- \( r = -1 \): Perfect negative linear relationship  
- \( r = 0 \): No linear relationship  

---

# 🔬 Numerical Example

Suppose:

| X | Y |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |

Compute means:
- $ \bar{x} = 2 $
- $ \bar{y} = 3.33 $

Covariance numerator:

$$
(1 − 2)(2 − 3.33) +
(2 − 2)(3 − 3.33) +
(3 − 2)(5 − 3.33)
= 1.66 + 0 + 1.66 = 3.33
$$

Denominator:
$$
\sqrt 2 * \sqrt 4.66 = 3.05
$$


Correlation:
$$
r = \frac{3.33}{3.05} = 0.69
$$

---

# 🧪 Assumptions
- Variables are continuous  
- Approximately normal distribution  
- Linear relationship  
- No major outliers  

---

# 🧠 Interpretation
- |r| < 0.3 → weak  
- |r| 0.3-0.7 → moderate  
- |r| > 0.7 → strong  

The p-value indicates whether the correlation is statistically significant.

---

# 📎 Advanced Math

Alternative covariance form:

$$
r = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}
$$

---

# 🐍 Python Implementation

SciPy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html

Used in EasyStatistics:
```python
from scipy.stats import pearsonr
r, p = pearsonr(x, y)
```

# 📎 Related Tests
[Spearman Correlation](statistics/correlation/spearman.md)

[Partial Pearson Correlation](statistics/correlation/partial-pearson.md)