What is spatial intelligence? For our purposes, it is data related to location. Spatial intelligence is applicable to any scale, from a room or a building to a city or a planet, and may range from the occupancy information of a single workstation to one’s global web of social-media connections. Human beings have amazing spatial intuition (we inhabit a range of spaces daily). But today’s plethora of data from devices and systems allows for even greater spatial understanding, with digital tools and scalable algorithms—including machine learning—allowing us to execute ever-more complex analyses and visualizations on the information. To demonstrate our point, we’ve leveraged these resources to show the process applied to an example question: Where might Amazon situate its second North American headquarters?
Let’s kick things off with an extensive lists of caveats, with the first being this: We are in no way associated with the selection of Amazon’s forthcoming headquarters. Second, the company’s decision will be driven by intangibles and financial negotiations between developers, states, and municipalities, which is why Amazon has an open RFP for single responses from each city. Finally, since this theoretical exercise relies solely on publicly available open data sets, we are limiting our analysis to the United States, as we do not have even access to international data points.
What Does our Intuition Say?
Let us begin with intuition. We distributed an internal survey to over 100 of our practice area leaders and to members of our Cities + Sites group. Below are the results, with the answers based on each respondent’s personal metrics, experiences, and gut intuition.
Top U.S. Survey Results
Geo-Spatial Data Mapping
Our next step was to apply data insight to the problem. We started with a series of demographic variables tied to regions in the continental U.S.
These data visualizations confirmed many things we knew, plus affirmed some things we suspected. The U.S. population—as well as younger Americans—are concentrated in major metropolitan areas, with smaller college towns as outliers. These urban areas feature dense transportation networks around major airport hubs. The Midwest’s open plains offer great opportunity for wind-power generation, while current (well-tracked) solar installations remain most effective in the Southwest and in the super-MSA that is the Northeast.
For our next step we took our nationwide dataset and broke it down to the areas of focus—MSAs and regions within a 45-minute drive of an international airport. For each MSA, we included a broad series of demographic indicators such as age, ethnicity, education, employment, commuting time, commuting methods, and migration patterns.
Interactive Data Analysis
For our first exercise we combined multiple variables into single metrics so that we could visualize differences between areas. Next, to reinforce our core belief in maintaining diversity, we calculated the variance across several indicators of race, income, and education. We figured out the total percentage of white-collar jobs in each MSA, as that could hint at poachable talent and existing business infrastructure. Finally, we spatially intersected our MSA regions with other more granular datasets, aggregating information about road density; wind-power potential; post-secondary education levels; and average and total Medicare discharges and spending.
The resulting interactive map is embedded below. You can use it to explore the subsets of data by filtering and visually heat-mapping variables, both singularly and in combinations. Here is a sample analysis using the mapping abilities.
The advantage of our method is that we can easily intersect multiple variables, turning the sets of data on and off as we need them. This allows us to highlight and understand the underlying differences between the MSAs, viewing the results visually in interactive maps. Amazon is using its impact and success in Seattle to craft the parameters of its RFP for a second headquarters and we can visually compare their current home against other potential cities.
Below is a closer look at the Seattle MSA with all the data layers visible. We can visually contrast what we are seeing here with other MSAs of interest, including those from Atlanta, Chicago, Dallas, Denver, and Philadelphia.
Interactive Map: What do you see?
Below is the interactive map that you can explore and use to draw your own insights (note: embedded map disabled by default on mobile). You can scroll through variables on the right and toggle widgets to show and hide data. Each variable’s histogram reveals its distribution of data over a period of time. Adjust layer visibility and heat-map variables to style the map, and then filter via ranges to subset data. To open a full-size version of the map in a separate window, click the link below.
Interactive Map Instructions
Click and drag to pan and use -/+ buttons to zoom in and out. Holding Shift + Left Mouse dragging a box will zoom to region.
Applying Advanced Data Science
We hope we have demonstrated how we can harness and analyze public data to produce insights, information that we then can map in visual, interactive ways. We can draw from a wide range of census info, research conducted by government agencies, and even satellite imagery. By choosing and blending specific variables, we also can target complex goals such as diversity and sustainability.
Our process provides great possibilities. However, selecting and modifying each variable can be time-consuming. Enter machine learning, a branch of data science that uses algorithms that become progressively smarter with each iteration. These algorithms perform at scale; they also allow users to add additional variables and produce increasingly detailed results, all in a time-efficient way.
Question: What other cities have a similar composition to Seattle, where Amazon’s first headquarters is located?
Machine learning allows to get more granular, incorporating 203 individual data points for each of the 8,136 zip codes that spatially intersected one of our 53 MSAs. We also took the 17 variables that we aggregated via spatial intersects or calculated as a variance of demographic indicators for race, income, and education. (See the full list here.) Having now amassed 1,652,509 data points, we utilized a method of dimensionality reduction known as t-distributed Stochastic Neighbor Embedding to cluster this high-dimensional data; we also opted for a distribution built on Principal Component Analysis, another means by which to direct an analysis to its most meaningful variables.
Our aim was to group like zip codes based on all of 220 variables we provided. To do so we selected the data that we wanted the algorithm to operate upon, but we did not specify the combinations or weighting that should be used to reduce the data’s dimensions into a 2D field of points. It was an example of unsupervised machine learning, and here are the results.
Intuition and Data Science
The MSA data led to regionally linked zip codes clustering together. Many jumbled together in the center of the graphic, with radial clusters for other grouped or individual MSAs. We also had a distribution and mingling of zip codes into separate clusters or regions.
To underscore our findings, we have color coded the plot to show the Seattle MSA zip codes alongside those of the top 5 responses in our initial Amazon survey. Austin (orange) and Boston (cyan) were relatively well-aligned with Seattle (red); Atlanta (purple) was separated into four distinct and tight groupings, two of which were close to the bulk of Seattle. Dallas (pink) started near the center but grew out in a totally different direction; Nashville (green) was scattered across a region between the Atlanta groupings.
These findings suggested that the top survey responses of Austin and Boston were solid options. However, Atlanta appeared to be more (data) segregated than the two Seattle clusters, while Boston was more densely grouped than they were.
While the purpose of this analytical exercise was neither to support nor challenge our original survey, the survey’s second most popular response, Austin, seemed to cluster most closely with Seattle.
Best Fit(s) Using Data Science Method
For the final step, we looked at the most similar MSAs within this clustering analysis. This methodology showed that Sacramento (magenta) and San Diego (blue) were actually very similar in distribution to Seattle (still red). Denver (lavender) and Salt Lake City (salmon) joined Austin (orange) as the closest fit from the original survey. Philadelphia (sea green) was also a potential contender, aligning much like Boston did in the previous graphic.
What did we learn?
We hope this article has increased your knowledge of spatial intelligence, showing how we can leverage new tools to analyze and present location data. To summarize this exercise, we began with a survey of our experienced designers and planners to see what their intuition said. We collected and organized publicly available data and then embarked on a mapping expedition to interpret and visually present the parameters of the Amazon RFP. Finally, we utilized machine learning to analyze 220 different variables from each potential city and its zip code sub-regions. Sacramento and San Diego shared the most similarities with the Seattle market, but Austin—our survey’s second most common response—was also well-supported by our analysis. The principles exhibited here can be applied to many potential projects and investigations.
Data science augments and expands our intuition, but it also requires raw information and a solid problem framework.
When we tallied the results from the 100+ global professionals we surveyed as part of our first-step gut-check, Atlanta, Austin, and Boston came out as the top contenders for the second Amazon headquarters. The decisions were not based on scientific measures, but rather on intuition and on-the-ground experience.
Excellent at capturing intangibles quickly and freely. Useful to apply at every stage in a process.
Hard to apply at large scale and complexity. At a minimum requires validation.
These interactive visualizations aggregated common demographic indicators to compare relative diversity. Data for additional layers was mined from public education, healthcare, and energy records. This technique allowed for visual comparisons of MSAs against Amazon’s existing location in Seattle.
Maps and visual analysis allow users and stakeholders to “see” more complex relationships in a scale-free environment.
Limited ability to compare variables against one another. Blending variables creates a layer of abstraction.
We used machine learning to analyze 220 variables across over 8,000+ zip codes, identifying cities that were similar to Seattle in composition and breakdown. Sacramento and San Diego were the best fits, but Austin and Denver were not far behind. Data science allows for complex analysis using machine cognition.
Allows for complex analysis of hundreds/thousands of variables with efficient processes.
Difficult to communicate to a lay-audience. Requires many steps of iterating/testing and requires clean data to operate.