Examining the utility of the random ensemble and remotely sensed image data to predict Pinus patula forest age in KwaZulu-Natal, South Africa.
The mapping of forest age is important for effective forest inventory as age is indicative of a number of plant physiological processes. Field survey techniques have traditionally been used to collect forest inventory data, but these methods are costly and time-consuming. Remote sensing offers an alternative which is time-effective and cost-effective and can cover large areas. The aim of this research was to assess the capabilities of multispectral and hyperspectral remotely sensed image data and the statistical method, random forest, for Pinus patula age prediction. The first section of this study used spatial and spectral data derived from multispectral QuickBird imagery to predict forest age. Five co-occurrence texture measures (variance, contrast, correlation, homogeneity, and dissimilarity) were calculated on QuickBird panchromatic imagery (0.6 m spatial resolution) using 12 moving window sizes. The spectral data was extracted from visible and near infrared (NIR) QuickBird imagery (2.4 m spatial resolution). Using the random forest ensemble, various methods of combining the spectral and texture variables were evaluated. The best model was achieved using backward variable selection which aims to find the fewest number of input bands while maintaining the highest predictive accuracy. Only five of the original 64 variables were used in the final model (R2 = 0.68). The second part of this study examined the utility of the random forest ensemble and AISA Eagle hyperspectral image data to predict P. patula age. Random forest was used to determine the optimal subset of hyperspectral bands that could predict P. patula age. Two sequential variable selection methods were tested: forward and backward variable selection. Although both methods resulted in the same root mean square error (3.097), the backward variable selection method was unable to significantly reduce the large hyperspectral dataset and selected 206 variables for the model. The forward variable selection method successfully reduced the large dataset to only nine optimal bands while maintaining the highest predictive accuracy from the hyperspectral dataset (R2 = 0.6). Overall, we concluded that (i) remotely sensed data can produce accurate models for P. patula age prediction, (ii) random forest is an effective tool for the combination of spectral and spatial multispectral data, (iii) random forest is an effective tool for variable selection of a high dimensional hyperspectral dataset, and (iv), although random forest has mainly been used as a classifier, it is also a very effective tool for prediction.