House price prediction with gradient boosted trees under different loss functions

The paper is published as a New Journal Paper and can be downloaded here

Abstract of the Paper 
Many banks and credit institutions are required to assess the value of dwellings in their mortgage portfolio. This valuation often relies on an Automated Valuation Model (AVM). Moreover, these institutions often report the models accuracy by two numbers: The fraction of predictions within ±20%±20% and ±10%±10% range from the true values. Until recently, AVMs tended to be hedonic regression models, but lately machine learning approaches like random forest and gradient boosted trees have been increasingly applied. Both the traditional approaches and the machine learning approaches rely on minimising mean squared prediction error, and not the number of predictions in the ±20%±20% and ±10%±10% range. We investigate whether introducing a loss function closer to the AVMs actual loss measure improves performance in machine learning approaches, specifically for a gradient boosted tree approach. This loss function yields an improvement from 89.4%89.4% to 90.0%90.0% of predictions within ±20%±20% of the true value on a data set of N=126719N=126719 transactions from the Norwegian housing market between 2013 and 2015, with the biggest improvements in performance coming from the lower price segments. We also find that a weighted average of models with different loss functions improves performance further, yielding 90.4%90.4% of the observations within ±20%±20% of the true value.

Published 10. November 2022 - 17:20 - Updated 10. November 2022 - 17:20