# Documentation: Step-by-Step Explanation of the Linear Regression Script

## 1. Introduction

This documentation explains every part of the **Income vs Expense
Regression Analysis** script. The code explores the relationship between
individuals' expenses (X) and their income (Y) using Python's data
science libraries. It covers data loading, regression computation,
visualization, and interpretation.

------------------------------------------------------------------------

## 2. Step 1: Importing Required Libraries

``` python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
```

**Explanation:** - `pandas` --- used for handling and structuring
tabular data into DataFrames. - `numpy` --- used for numerical
operations and array manipulation. - `matplotlib.pyplot` --- used to
create plots and visualize data. -
`sklearn.linear_model.LinearRegression` --- provides a simple linear
regression model. - `sklearn.metrics.r2_score` --- calculates the
coefficient of determination (R²) for model evaluation.

------------------------------------------------------------------------

## 3. Step 2: Creating the Dataset

``` python
data = {
    "Person": list(range(1, 21)),
    "Income_Y": [...],
    "Expense_X": [...]
}
df = pd.DataFrame(data)
```

**Explanation:** - The dataset is represented as a dictionary where each
key corresponds to a column. - `Person` assigns an ID (1--20) to each
individual. - `Income_Y` stores the income values (dependent
variable). - `Expense_X` stores the expense values (independent
variable). - `pd.DataFrame(data)` converts the dictionary into a
DataFrame for structured analysis.

------------------------------------------------------------------------

## 4. Step 3: Defining Variables

``` python
X = df["Expense_X"].values.reshape(-1, 1)
Y = df["Income_Y"].values
```

**Explanation:** - `X` (independent variable): represents expense
values. - `Y` (dependent variable): represents income values. -
`.reshape(-1, 1)` converts the one-dimensional X array into a
two-dimensional format required by Scikit-learn models.

------------------------------------------------------------------------

## 5. Step 4: Fitting the Linear Regression Model

``` python
model = LinearRegression()
model.fit(X, Y)
a = model.intercept_
b = model.coef_[0]
```

**Explanation:** - A `LinearRegression()` object is created to model the
relationship. - `.fit(X, Y)` trains the model --- it calculates the
best-fit line minimizing errors. - `a` is the **intercept**,
representing predicted income when expense = 0. - `b` is the **slope**,
indicating how income changes for each unit change in expense.

------------------------------------------------------------------------

## 6. Step 5: Predictions and R² Calculation

``` python
Y_pred = model.predict(X)
r2 = r2_score(Y, Y_pred)
```

**Explanation:** - `model.predict(X)` generates predicted income values
based on expense. - `r2_score(Y, Y_pred)` measures how well the
regression model explains the variation in income. - An R² value close
to 1 indicates a strong linear relationship.

------------------------------------------------------------------------

## 7. Step 6: Displaying Regression Results

``` python
print(f"Regression equation: Y = {a:.3f} + {b:.3f}X")
print(f"Slope (b): {b:.6f}")
print(f"Intercept (a): {a:.6f}")
print(f"Coefficient of determination (R²): {r2:.6f}")
```

**Explanation:** - Prints the regression equation with three decimal
places. - Displays slope, intercept, and R² values clearly formatted for
interpretation.

------------------------------------------------------------------------

## 8. Step 7: Interpretation

``` python
print(f"- Slope (b): For every 1-unit increase in Expense, Income increases by {b:.2f} units.")
print(f"- Intercept (a): When Expense = 0, predicted Income is {a:.2f}.")
print(f"- R² = {r2:.4f}: {r2*100:.2f}% of variation in Income is explained by Expense.")
```

**Explanation:** - The slope (b) quantifies the rate of change. - The
intercept (a) gives the model's starting point. - R² shows the model's
explanatory power in percentage terms.

------------------------------------------------------------------------

## 9. Step 8: Correlation Coefficient

``` python
corr = np.corrcoef(df["Expense_X"], df["Income_Y"])[0, 1]
print(f"Correlation coefficient (r): {corr:.6f}")
```

**Explanation:** - Calculates the **Pearson correlation coefficient**, a
measure of linear association between Expense and Income. - Values near
+1 indicate a very strong positive relationship.

------------------------------------------------------------------------

## 10. Step 9: Plotting the Scatter Diagram and Regression Line

``` python
plt.figure(figsize=(8, 6))
plt.scatter(df["Expense_X"], df["Income_Y"], color='blue', label="Data points")
X_line = np.linspace(df["Expense_X"].min(), df["Expense_X"].max(), 200).reshape(-1, 1)
Y_line = model.predict(X_line)
plt.plot(X_line, Y_line, color='red', linewidth=2, label="Regression line")
```

**Explanation:** - Creates a scatter plot for visualizing raw data. -
Uses `np.linspace()` to generate evenly spaced expense values for a
smooth regression line. - `plt.plot()` overlays the regression line on
the scatter plot.

------------------------------------------------------------------------

## 11. Step 10: Enhancing the Plot

``` python
plt.xlabel("Expense (X)")
plt.ylabel("Income (Y)")
plt.title("Scatter Plot of Income (Y) vs Expense (X)")
plt.legend()
plt.grid(True)
eq_text = f"Y = {a:.2f} + {b:.4f}X\n$R^2$ = {r2:.4f}"
plt.annotate(eq_text, xy=(0.05, 0.95), xycoords="axes fraction", fontsize=10,
             verticalalignment="top", bbox=dict(boxstyle="round,pad=0.4", alpha=0.15))
plt.tight_layout()
plt.savefig("regression_plot.png", dpi=150)
plt.show()
```

**Explanation:** - Labels the axes and title for clarity. - Adds a
legend for distinguishing data points from the regression line. -
Annotates the plot with the regression equation and R² value. - Saves
the figure as `regression_plot.png` in the working directory.

------------------------------------------------------------------------


