All posts by Quinlan Lee

OLS, BLUE and the Gauss Markov Theorem

From left to right, Carl Friedrich Gauss and Andrey Markov, known for their contributions in statistical methods.

In today’s article, we will extend our knowledge of the Simple Linear Regression Model to the case where there are more than one explanatory variables. Under certain conditions, the Gauss Markov Theorem assures us that through the Ordinary Least Squares (OLS) method of estimating parameters, our regression coefficients are the Best Linear Unbiased Estimates, or BLUE (Wooldridge 101). However, if these underlying assumptions are violated, there are undesirable implications to the usage of OLS.

In practice, it is almost impossible to find two economic variables that share a perfect relationship captured by the Simple Linear Regression Model. For example, suppose we are interested in measuring wage for different people in Canada. While it is plausible to assume that education is a valid explanatory variable, most people would agree it is certainly not the only one. Indeed, one may include work experience (in years), age, gender or perhaps even location as regressors.

As such, suppose we have collected the data for multiple variables, x1,… xn, and y. Through a Multiple Linear Regression Model, we can estimate the relationship between y and the various regressors, x1,… xn (Wooldridge 71).

  • yi is the ith observation for the independent variable
  • xki is the ith observation for the kth regressor
  • βk is the coefficient for the kth regressor
  • εi is the error term

As in the simple case, we can use the Ordinary Least Squares method (OLS) to derive the estimates for our coefficients in the Multiple Linear Regression Model. Recall, our goal is to summarize the sum of squared residuals, that is (Wooldridge 73) :

If we take the partial derivatives of the above equation with respect to β0, β1, …, βn and set them to zero, the result is a system of n+1 equations. The solution to this system will produce the estimates for each βi.

In general, the OLS method for estimation is preferred because it is easy to use and understand. However, simplicity comes with its limitations. Ordinary Least Squares provides us with a linear estimator of parameters in Multiple Linear Regression. In other words, we obtain a column vector of estimates for βi that can be expressed as a linear function of the dependent variable y. Like all other linear estimators, the ultimate goal of OLS is to obtain the BLUE Let us first agree on a formal definition of BLUE. On one hand, the term “best” means that it has “lowest variance”; on the other, unbiasedness refers to the expected value of the estimator being equivalent to the true value of the parameter (Wooldridge 102).

We now turn our attention to the Gauss Markov Theorem, which guarantees that the Ordinary Least Squares method under certain conditions. They are colloquially referred to as the Gauss Markov Assumptions. It is important to note that the first four ensure the unbiasedness of the linear estimator, while the last one preserves the lowest variance (Wooldridge 105).

  1. Linearity in Parameters
  2. Random Sampling
  3. No Perfect Collinearity
  4. Exogeneity
  5. Homoscedasticity

The first two assumptions are self-explanatory; the parameters we are estimating must be linear, and our sample data is to be collected through a randomized, probabilistic mechanism. The third condition, no perfect collinearity, ensures that the regressors are not perfectly correlated with one another. An example of this is including both outcomes of a binary variable into a model. Suppose we are interested in official language preferences: if we were to add English and French as regressors, the model would exhibit perfect collinearity because we know if someone prefers English, they do not prefer French at the exact same time. Mathematically, if they were both indicator variables, we would not be able to differentiate when an observation prefers English or French because one of them will always have a value of 1. Exogeneity means that the regressors cannot be correlated with the error term. The converse of this is endogeneity, and examples of this include omitted variable bias, reverse causality, and measurement error. The fifth and final assumption is homoscedasticity, which means the variance of the error term must be constant no matter what the value of regressors are.

Admittedly, no one will ever walk up to you and ask “What are the conditions for the Gauss Markov Theorem?”. However, as the first article alluded to a few weeks ago, we need to use econometric models with discretion. To put the importance of these assumptions into perspective, consider this analogy. The criminal code is in place so that the citizens of our country can function well together without harming one another. A police officer will never come up to you and ask you to recite the criminal code, but when you start violating the laws, you will likely find yourself in trouble. It is important for us to identify when we are breaking the law, and find methods to avoid doing so. The same can be said using OLS. By learning the five assumptions, we know of possible issues that we may run into when performing linear regression.

In summary, let’s end the discussion of OLS with more insights on the Gauss Markov Theorem.   If all of the conditions simultaneously hold, we know that OLS can is BLUE. In later articles, we will discuss specific ways to mitigate violations of these conditions. For example, when we have endogeneity present (the fourth assumption is violated), our OLS estimator will be biased. We will talk about methods to solve this issue like performing an Instrumental Variable Estimation to produce unbiased estimates.



  1. Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western Cengage Learning, 2013. Print.
Source: Introductory Econometrics, A Modern Approach

Introduction to Simple Linear Regression

In the previous article, we introduced the motivation behind econometrics and the role it plays in the field of economics. We also briefly discussed the concept of an econometric model, which was essentially an equation that captures the relationship between variables. Today we will leap further into the econometric discussion by examining the most fundamental model: Simple Linear Regression (SLR).

Suppose we have collected data for two variables, and we want to use a Simple Linear Regression model to estimate their relationship. The equation to link the two variables (let’s call them x and y) would be as follows (Wooldridge 21):

    \[ y_i = \beta_0 + \beta_1 x_i + \mu_i \]


  • y_i is the independent variable, or regressand
  • x_i is the dependent variable, or regressor
  • \beta_0 is the intercept parameter
  • \beta_1 is the slope parameter
  • \mu_i is the error term

Immediately, you may realize that this formula is near identical to the slope-intercept form of a line. Indeed, the idea behind the SLR model is produce a line that best represents the collection of data points (i.e. the regression line). We can denote the estimated line (also called a line of fitted values) as follows:

    \[ \hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_i \]

Notice that while y_i, \beta_0 and \beta_1 are being estimated here, x_i is not. This means we can plug in any arbitrary value for x and the model will estimate a value for y given our slope-intercept parameters. Another value we are particularly interested in is the residual. This is essentially the distance between a data point and our fitted line.

    \[ \hat{\mu_i} = y_i - \hat{y_i} \]

Source: Introductory Econometrics, A Modern Approach

We have yet to discuss how to obtain the estimates for the slope and intercept parameters. Though we have formulas for \hat{y_i} and \hat{u_i}, we still require \hat{\beta_0} and \hat{\beta_1} to calculate them. We will not divulge too much into the mathematical derivation, but it is important to understand the idea behind achieving the estimates.

Often, you will see the term OLS or ordinary least squares used in conjunction with simple linear regression. This refers to the method of estimating \hat{\beta_0} and \hat{\beta_1}. The idea behind this method revolves around the residual. Recall that the residual is the distance between a data point and the fitted value (on the line). The goal is to minimize the sum of squared residuals with respect to \beta_0 and \beta_1, that is (Wooldridge 27):

    \[ min \sum_{i=1}^{n} \hat{\mu_i}^2 = min \sum_{i=1}^{n} (y_i -\hat{y_i})^2 = min \sum_{i=1}^{n}(y_i - \hat{\beta_0} +\hat{\beta_1}x_i)^2 \]

We will need to take the partial derivatives with respect to \beta_0 and \beta_1 and set them to zero. The solution to the system of equations will minimize the sum of squared residuals. We call these following equations the first order conditions:

(1)   \begin{equation*}  -2 \sum_{i=1}^{n}(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0 \end{equation*}

(2)   \begin{equation*}  -2 \sum_{i=1}^{n} x_i(y_i - \hat{\beta_0} - \hat{\beta_1}x_i) = 0 \end{equation*}

With some algebra, one can see that (Stock and Watson 115):

    \[ \hat{\beta_1} = \frac{\sum_{i=1}^{n} (x_i - \bar{x}) (y_i - \bar{y}) }{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]

    \[ \hat{\beta_0} = \bar{y} - \hat{\beta_1}\bar{x} \]

At this point, you may be overwhelmed with the theory and lack of practicality of the SLR model. Let us now consider an application of regressions in Finance using the capital asset pricing model:

    \[ r - r_f = \beta(r_m - r_f) \]

\beta captures the sensitivity of stock returns to changes in returns of the market portfolio (Brealey et al 386). For example, Apple’s current \beta is reported at 1.32 (Google Finance) with respect to NASDAQ. If \beta is bigger than 1, we know the stock has higher risk than that of the market portfolio. Conversely, \beta is less than 1 suggests that the stock is less risky. The market portfolio always has a beta of 1.  As we can see, the capital asset pricing model looks similar to that of a simple linear regression model. If we include an error term, we can use OLS to estimate the parameter \beta by regressing the equity risk premium (r - r_f) on the market premium (r_m - r_f) (Stock and Watson 118).

The Simple Linear Regression Model serves as a building block for many more complex models. In future articles, we will study the underlying assumptions in which the linear regression model depends upon. If those conditions fail, we will explore strategies to mitigate the potential issues that may arise during our regression analysis. Furthermore, we will see that this model can be extended to more than just one regressor.


  1. Brealey, Richard A., Stewart C. Myers, Alan J. Marcus, Devashis Mitra, and William Lim. Fundamentals of Corporate Finance. 6th ed. New York: McGraw-Hill, 2011. Print.
  2. Stock, James H., and Mark W. Watson. Introduction to Econometrics. 3rd ed. Boston: Pearson/Addison Wesley, 2007. Print.
  3. Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 6th ed. Boston: Cengage Learning, 2013. Print.

Introduction to Econometrics

Today, we live in a world where information is at one’s disposal with the single click of a button. The result of this is a growing demand for methods in understanding, analyzing, and modelling data. In economics, we refer to the development and usage of statistical techniques as econometrics.

Econometrics begins with an economic question: a relationship in which we are interested in studying. This could be a theory we wish to prove, or a policy’s effects we are trying to understand. Once we have posed a question, we can hypothesize a model that we believe would capture the relationship (Wooldridge 2). For example, consider the economic question, “What affects a person’s wage rate?”. Suppose that we believe education is a factor, and relationship is captured with the equation:

    \[ Wage = f(Education) \]

Perhaps to most of us, this follows quite naturally, despite not having done any form of analysis. However, an econometrician will often tell you this is not the case and a deeper study of this question is required.

From an econometric perspective, the above equation fails to answer two crucial questions:

  • What is the magnitude in which education affects wage?
  • Is education the only factor in determining wage?

Luckily, we can quickly transform our equation into an econometric regression model.

    \[ Wage_i = \beta_0 + \beta_1 \times Education_i + \epsilon \]

By collecting enough data (sets of information), we are able to conduct an empirical analysis to determine the estimates for the parameters \beta_0 and \beta_1. The result is a relationship explained through a numerical equation, backed by a set of observations. With statistical tests, we can also assess the strength of our model. If it is weak, we know that some important factors may be missing, such as age. We can continue this process until an optimal model is achieved.

While econometric methods do carry enormous predictive and analytical power, they cannot be used indiscriminately. In economics, we are highly interested in causality through ceteris paribus: Keeping all other things equal (Wooldridge 12). However, it is often difficult to create controlled experiments to achieve this concept. Thus, when creating econometric models, it is imperative that we preserve ceteris paribus when selecting our sample of data. For example, assume the government made a policy change so that families with under $50,000 household income would receive a subsidy for their child’s education. We could measure the effects of this subsidy by observing the same group of families through the policy change. However, if different samples of families were drawn before and after the subsidy, we may inadvertently include the variation across households in addition to the effects of the policy change.

Another area to consider is the interpretation of statistical correlation. Sometimes, econometric models may report substantial correlation between two variables. In particular, time-series data (variables that change over time) often exhibit correlation if not corrected for when estimating. Indeed, statistical correlation does not imply causation. For example, consider number of pregnancies and number of doctors in a city. There may be data that shows strong correlation between the two variables, but we know that this is likely a coincidence and not a causal effect.

Econometrics uses a set of statistical tools in economic settings to derive conclusions and empirical results. With the prolific demand for studying data, it is important for us as economists to understand this field of study. In a society surrounded by data, econometrics is our key to better understand and observe the world.


Wooldridge, Jeffrey M. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western 2013. Print.