Wei Kang is a Research Scientist at UCR Inland Center for Sustainable Development (ICSD). Her research interests are methodological - spatial statistics/econometrics, as well as empirical - housing, neighborhood change, inequality, growth, and convergence. Her current projects focus on affordable housing and housing policies. She is the core developer of the widely used open-source spatial analysis python library – PySAL.
PhD in Geography, 2018
Arizona State University
MSc in Cartology and GISystem, 2014
BSc in Geographic Information System, 2011
Participated NSF projects include:
As spatial statistics are essential to the geographical inquiry, accessible and flexible software offering relevant functionalities is highly desired. Python Spatial Analysis Library (PySAL) represents an endeavor towards this end. It is an open-source python library and ecosystem hosting a wide array of spatial statistical and visualization methods. Since its first public release in 2010, PySAL has been applied to address various research questions, used as teaching materials for pedagogical purposes in regular classes and conference workshops serving a wide audience, and integrated into general GIS software such as ArcGIS and QGIS. This entry first gives an overview of the history and new development with PySAL. This is followed by a discussion of PySAL’s new hierarchical structure, and two different modes of accessing PySAL’s functionalities to perform various spatial statistical tasks, including exploratory spatial data analysis, spatial regression, and geovisualization. Next, a discussion is provided on how to find and utilize useful materials for studying and using spatial statistical functions from PySAL and how to get involved with the PySAL community as a user and prospective developer. The entry ends with a brief discussion of future development with PySAL.
There is a recent surge in research focused on urban transformations in the United States via empirical analysis of neighborhood sequences. The alignment-based sequence analysis methods have seen many applications in urban neighborhood change research. However, it is unclear to what extent these methods are robust in terms of producing consistent and converging neighborhood sequence typologies. This article sheds light on this issue by applying four sequence analysis methods to the same data set – 50 largest Metropolitan Statistical Areas (MSAs) of the United States from 1970 to 2010, and finds that these methods do not provide converging neighborhood sequence typologies, and their behavior varies across MSAs, thus prohibiting meaningful comparisons of similar studies. MSAs with higher average household income in 1970 tend to be less sensitive to the choice of the SA methods. In other words, when investigating neighborhood change in these MSAs, different SA methods tend to produce a more converging neighborhood sequence typology. Comparatively, for MSAs hosting neighborhoods which have experienced frequent changes during the period 1970–2010, they are less likely to produce similar typologies with different SA methods. In addition, there is a big difference in the neighborhood sequence typology between applying the classic SA methods with varying costs and using the SA variant focusing on the second-order sequence property. After comparing the behavior of these methods, we highlight one method (OMecenter) which leverages the socioeconomic similarities of neighborhoods and suggest researchers consider it as the building block towards designing a meaningful sequence analysis method for neighborhood change research.
Income mobility measures provide convenient and concise ways to reveal the dynamic nature of regional income distributions. Statistical inference about these measures is important especially when it comes to a comparison of two regional income systems. Although the analytical sampling distributions of relevant estimators and test statistics have been asymptotically derived, their properties in small sample settings and in the presence of contemporaneous spatial dependence within a regional income system are underexplored. We approach these issues via a series of Monte Carlo experiments that require the proposal of a novel data generating process capable of generating spatially dependent time series given a transition probability matrix and a specified level of spatial dependence. Results suggest that when sample size is small, the mobility estimator is biased while spatial dependence inflates its asymptotic variance, raising the Type I error rate for a one-sample test. For the two-sample test of the difference in mobility between two regional economic systems, the size tends to become increasingly upward biased with stronger spatial dependence in either income system, which indicates that conclusions about differences in mobility between two different regional systems need to be drawn with caution as the presence of spatial dependence can lead to false positives. In light of this, we suggest adjustments for the critical values of relevant test statistics.
Empirical applications of the Markov chain model and its spatial extensions suffer from issues induced by the sparse transition probability matrix, which usually results from adopting maximum likelihood estimators (MLEs). Two discrete kernel estimators with cross-validated parameters are proposed for reducing the sparsity in the estimated transition probability matrix. Monte Carlo experiments suggest that these estimators are not only quite effective in producing a much less sparse matrix, alleviating issues related to sparsity, but also superior to MLEs in terms of lowering the mean squared error for individual and total transition probability, giving rise to the better recovery of the underlying dynamics.
Scale is a fundamental geographic concept, and a substantial literature exists discussing the various roles that scale plays in different geographical contexts. Relatively little work exists, though, that provides a means of measuring the geographic scale over which different processes operate. Here we demonstrate how geographically weighted regression (GWR) can be adapted to provide such measures. GWR explores the potential spatial nonstationarity of relationships and provides a measure of the spatial scale at which processes operate through the determination of an optimal bandwidth. Classical GWR assumes that all of the processes being modeled operate at the same spatial scale, however. The work here relaxes this assumption by allowing different processes to operate at different spatial scales. This is achieved by deriving an optimal bandwidth vector in which each element indicates the spatial scale at which a particular process takes place. This new version of GWR is termed multiscale geographically weighted regression (MGWR), which is similar in intent to Bayesian nonseparable spatially varying coefficients (SVC) models, although potentially providing a more flexible and scalable framework in which to examine multiscale processes. Model calibration and bandwidth vector selection in MGWR are conducted using a back-fitting algorithm. We compare the performance of GWR and MGWR by applying both frameworks to two simulated data sets with known properties and to an empirical data set on Irish famine. Results indicate that MGWR not only is superior in replicating parameter surfaces with different levels of spatial heterogeneity but provides valuable information on the scale at which different processes operate.