Introduction to Simple Linear Regression

In the previous article, we introduced the motivation behind econometrics and the role it plays in the field of economics. We also briefly discussed the concept of an econometric model, which was essentially an equation that captures the relationship between variables. Today we will leap further into the econometric discussion by examining the most fundamental model: Simple Linear Regression (SLR).

Suppose we have collected data for two variables, and we want to use a Simple Linear Regression model to estimate their relationship. The equation to link the two variables (let’s call them x and y) would be as follows (Wooldridge 21):

$y_i = \beta_0 + \beta_1 x_i + \mu_i$

where:

$y_i$ is the independent variable, or regressand
$x_i$ is the dependent variable, or regressor
$\beta_0$ is the intercept parameter
$\beta_1$ is the slope parameter
$\mu_i$ is the error term

Immediately, you may realize that this formula is near identical to the slope-intercept form of a line. Indeed, the idea behind the SLR model is produce a line that best represents the collection of data points (i.e. the regression line). We can denote the estimated line (also called a line of fitted values) as follows:

$\hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_i$

Notice that while $y_i$ , $\beta_0$ and $\beta_1$ are being estimated here, $x_i$ is not. This means we can plug in any arbitrary value for $x$ and the model will estimate a value for $y$ given our slope-intercept parameters. Another value we are particularly interested in is the residual. This is essentially the distance between a data point and our fitted line.

$\hat{\mu_i} = y_i - \hat{y_i}$

Source: Introductory Econometrics, A Modern Approach

We have yet to discuss how to obtain the estimates for the slope and intercept parameters. Though we have formulas for $\hat{y_i}$ and $\hat{u_i}$ , we still require $\hat{\beta_0}$ and $\hat{\beta_1}$ to calculate them. We will not divulge too much into the mathematical derivation, but it is important to understand the idea behind achieving the estimates.

Often, you will see the term OLS or ordinary least squares used in conjunction with simple linear regression. This refers to the method of estimating $\hat{\beta_0}$ and $\hat{\beta_1}$ . The idea behind this method revolves around the residual. Recall that the residual is the distance between a data point and the fitted value (on the line). The goal is to minimize the sum of squared residuals with respect to $\beta_0$ and $\beta_1$ , that is (Wooldridge 27):

$min \sum_{i=1}^{n} \hat{\mu_i}^2 = min \sum_{i=1}^{n} (y_i -\hat{y_i})^2 = min \sum_{i=1}^{n}(y_i - \hat{\beta_0} +\hat{\beta_1}x_i)^2$

We will need to take the partial derivatives with respect to $\beta_0$ and $\beta_1$ and set them to zero. The solution to the system of equations will minimize the sum of squared residuals. We call these following equations the first order conditions:

(1) $\begin{equation*} -2 \sum_{i=1}^{n}(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0 \end{equation*}$

(2) $\begin{equation*} -2 \sum_{i=1}^{n} x_i(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0 \end{equation*}$

With some algebra, one can see that (Stock and Watson 115):

$\hat{\beta_1} = \frac{\sum_{i=1}^{n} (x_i - \bar{x}) (y_i - \bar{y}) }{\sum_{i=1}^{n} (x_i - \bar{x})^2}$

$\hat{\beta_0} = \bar{y} - \hat{\beta_1}\bar{x}$

At this point, you may be overwhelmed with the theory and lack of practicality of the SLR model. Let us now consider an application of regressions in Finance using the capital asset pricing model:

$r - r_f = \beta(r_m - r_f)$

$\beta$ captures the sensitivity of stock returns to changes in returns of the market portfolio (Brealey et al 386). For example, Apple’s current $\beta$ is reported at 1.32 (Google Finance) with respect to NASDAQ. If $\beta$ is bigger than 1, we know the stock has higher risk than that of the market portfolio. Conversely, $\beta$ is less than 1 suggests that the stock is less risky. The market portfolio always has a beta of 1. As we can see, the capital asset pricing model looks similar to that of a simple linear regression model. If we include an error term, we can use OLS to estimate the parameter $\beta$ by regressing the equity risk premium $(r - r_f)$ on the market premium $(r_m - r_f)$ (Stock and Watson 118).

The Simple Linear Regression Model serves as a building block for many more complex models. In future articles, we will study the underlying assumptions in which the linear regression model depends upon. If those conditions fail, we will explore strategies to mitigate the potential issues that may arise during our regression analysis. Furthermore, we will see that this model can be extended to more than just one regressor.

Sources:

Brealey, Richard A., Stewart C. Myers, Alan J. Marcus, Devashis Mitra, and William Lim. Fundamentals of Corporate Finance. 6th ed. New York: McGraw-Hill, 2011. Print.
Stock, James H., and Mark W. Watson. Introduction to Econometrics. 3rd ed. Boston: Pearson/Addison Wesley, 2007. Print.
Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 6th ed. Boston: Cengage Learning, 2013. Print.

UW Economics Society

Introduction to Simple Linear Regression

Leave a Reply Cancel reply