|
Explosive growth in the size of spatial databases has highlighted the need for spatial
data analysis and spatial data mining techniques to mine the interesting but implicit
spatial patterns within these large databases. Extracting useful and interesting patterns
from massive geo-spatial datasets is important for many application domains,
such as regional economics, ecology and environmental management, public safety,
transportation, public health, business, and travel and tourism [14, 57, 59], because
space is everywhere. Many classical data mining algorithms, such as linear regression,
assume that the learning samples are independently and identically distributed
(i.i.d.). This assumption is violated in the case of spatial data due to spatial autocorrelation
[2, 57] and in such cases classical linear regression yields a weak model
with not only low prediction accuracy [59] but also residual error exhibiting spatial
dependence. Modeling spatial dependencies improves overall classification and
prediction accuracies.
The spatial autoregression model (SAR) [18, 31, 57] is a generalization of the
linear regression model to account for spatial autocorrelation. It has been successfully
used to analyze spatial datasets in regional economics and ecology [14, 59].
The model yields better classification and prediction accuracy [14, 59] for many
spatial datasets exhibiting strong spatial autocorrelation. However, it is computationally
expensive to estimate the parameters of SAR. For example, it can take an
hour of computation for a spatial dataset with 10,000 observation points on a single
IBM Regatta processor using a 1.3GHz pSeries 690 Power4 architecture with 3.2
GB memory [32, 33]. This has limited the use of SAR to small problems, despite its
promise to improve classification and prediction accuracy for larger spatial datasets.
For example, SAR was applied to accurately estimate crop parameters [61] using
airborne spectral imagery; however, the study was limited to 74 pixels. A second
study, reported in [41], was limited to 3888 observation points. |