For What It’s Worth: Measuring Land Value in the Era of Big Data and Machine Learning (PDF)

This paper develops a new method for valuing land, a key asset on a nation’s balance sheet. The method first employs an unsupervised machine learning method, kmeans clustering, to discretize unobserved heterogeneity, which we then combine with a supervised learning algorithm, gradient boosted trees (GBT), to obtain property-level price predictions and estimates of the land component. Our initial results from a large national dataset show this approach routinely outperforms hedonic regression methods (as used by the U.K.’s Office for National Statistics, for example) in out-of-sample price predictions. To exploit the best of both methods, we further explore a composite approach using model stacking, finding it outperforms all methods in out-of-sample tests and a benchmark test against nearby vacant land sales. In an application, we value residential, commercial, industrial, and agricultural land for the entire contiguous U.S. from 2006-2015. The results offer new insights into valuation and demonstrate how a unified method can build national and subnational estimates of land value from detailed, parcel-level data. We discuss further applications to economic policy and the property valuation literature more generally.


Scott Wentland , Gary Cornwall , and Jeremy G. Moulton

JEL Code(s) E01 Published