Fishing for neighborhood trajectory patterns - a comparison of sequence analysis methods


There is a recent surge in research focused on urban transformations in the United States via empirical analysis of neighborhood trajectories. Driven by an interest in the social and economic restructuring of cities and associated consequences like gentrification and displacement, this work seeks to uncover emergent patterns in the evolution of neighborhood socioeconomic characteristics over time. Using census tracts as the proxy for neighborhoods, the empirical analysis is often comprised of two rounds of clustering: the first round classifies neighborhoods into discrete types based on selected socioeconomic attributes, yielding a temporal trajectory of types for each neighborhood; the second round takes the trajectory similarity matrix produced from sequence analysis as the input to derive a typology of prototypical neighborhood trajectories. The optimal matching (OM) algorithm, which was developed originally for matching protein and DNA sequences in biology and used extensively for analyzing strings in computer science, has become the dominant sequence analysis technique in the neighborhood literature. It generally works by finding the minimum cost for transforming one sequence to another using a combination of operations including substitution, insertion, deletion and transposition. Costs of the operations could be parameterized differently and may be theory-driven or data-driven. Applications in the neighborhood literature often adopt the data-driven approach, based either on socioeconomic dissimilarities in contemporary experience or empirical transition probabilities between neighborhood types. It is unclear, however, how sensitive the trajectory typology is to the choice of operation costs or which trajectory pattern could be easily “fished out” by each choice. It should also be noted that the current literature focuses solely on substitution costs while setting costs of other operations so expensive that they are unlikely to be chosen in the OM process. This means that current research considers only one sequence characteristic when determining the similarity of any two neighborhoods, that is, the year in which a specific neighborhood type appears. We argue that considering other characteristics, including the order in which successive neighborhood types appears and the duration of a neighborhood type, could help reveal interesting trajectory patterns that are critically important for understanding urban socioeconomic transformations. Therefore, incorporating other cost choices or sequence analysis methods which could be effective in “fishing out” these sequence characteristics could be promising for the neighborhood change research. In this article, we support these arguments through an empirical review of sequence analysis methods applicable for uncovering neighborhood trajectory patterns from different aspects. We do so by applying these various methods to the same datasets - all the 383 Metropolitan Statistical Areas (MSAs) of the United States at census years 1970, 1980, 1990, 2000 and 2010. We demonstrate that a method or cost choice could be effective in revealing one particular characteristic of neighborhood evolution while failing to provide useful information in other aspects, and thus, researchers should take caution both when adopting a method and interpreting results.

Nov 10, 2018 10:00 AM
San Antonio, TX