Geographic data can come in various granularities, zip, state, country, etc. Sometimes, you have data that is more granular than what you want to show in your analysis. For example, you might have data on the zipcode level, but you want to analyze it by state. Displayr can automatically convert smaller geographic regions into bigger ones automatically to save you much time in mapping things manually.
This article describes how you can automatically combine smaller geographic categories like zip codes or postal codes into larger ones like regions or states.
Requirements
A dataset which contains geographic variables.
Currently, the regions that are supported are:
-
United States
-
Canada
-
United Kingdom
-
Europe
-
Australia
-
New Zealand
Options are also available if the user has data from two adjacent regions (e.g. if doing a multi-country study):
-
United States and Canada
-
Europe (including UK)
-
Australia and New Zealand
The output can be one of several different geographic designations, including states, provinces, regions, counties, and countries, and these depend on the way each of the regions define their geographic levels. Do note that our geographic map visualization shown above does not support all aggregations, see How to Access an Exhaustive List of Geographic Entities Available for Geographic Maps.
Method
Use Case 1 - Combining zip codes, postcodes, and other unambiguous geographies like state
When the input variable the user has selected is not ambiguous, they just need to run the option from the menu. There are some geographic names, like cities, which are ambiguous (they can refer to more than one place), and for this, see the next option.
In this example, we will combine UK Postcodes into Counties
- In the Variables and Questions tab, select the variable you want to aggregate (in this case postcode)
- Select Automate > Browse Online Library > Create New Variables > By Geography > United Kingdom > To Counties.
The results are as follows:
- To view the results, on the Outputs window, select the new variable from the Blue drop-down menu.
- At this point, you might decide that the counties aren't very useful and you would prefer regions instead. To do so,
- Switch to the Variables and Questions menu.
- Right-click the new variable you created and select Edit R Variable from the menu.
- From the Inputs > Output geographic type menu, select Region. There are other selections as well.
- Click Update R Variable
The results are as follows:
Use Case 2 - Combining ambiguous place names
Some geographic names can refer to more than one place. For example, there are multiple places called “Brooklyn” in the United States. It is impossible for the software to know exactly which “Brooklyn” is which unless the user provides some additional, unambiguous information. For example, if combining place names from the US into counties, the user could supply an additional variable telling us what State each place is in. Then the places could be mapped to counties. The feature will detect if there is ambiguity in the data the user has selected, and it will prompt the user to select an additional variable to disambiguate the places.
For example, assume you want to combine United States cities into counties.
- In the Variables and Questions tab, select the variable you want to aggregate (in this case city)
- Select Automate > Browse Online Library > Create New Variables > By Geography > United States > To Counties. In this example, because certain cities could be in multiple locations, more information is required. We will use the variable State to help us select the correct location for each city where there is an ambiguity.
- Click OK.
- To view the results, on the Outputs window, select the new variable from the Blue drop-down menu.
Use Case 3 - World Region
The World section of the menu is not limited to any specific region or country, but it is limited in the type of data it can use.
The World section can map either:
-
A pair of latitude/longitude variables
-
A single variable containing IP addresses
into either the Country that corresponds to that data point or the State or Province.
- In the Variables and Questions tab, select latitude and longitude
- Select Automate > Browse Online Library > Create New Variables > By Geography > World > To States/Provinces.
The results are as follows:
Options
Variable The input variable containing geographic data to be combined into categories.
Combine by Use this control to toggle between the other methods for combining categories in the the Automatically Combine Categories menu such as By Value > Tidy Categories.
World region The geographic region that the input data/variable comes from.
Input data type The type of data/geographic unit, such as States, Postcodes, or Place (city, town, etc.), that the input variable contains.
Output geographic type The desired geographic unit to combine the input data into. Must be a larger type than Input data type; e.g. it is possible to map U.S. counties to U.S. states, but not the other way around.
Check spelling If this option is selected then approximate matching is performed using the Levenshtein distance, instead of requiring exact matching when looking up the input data values in the regional database.
Check neighboring region Select this option if the input data comes from more than one region than the one specified by World region. For example, with World region set to USA and this option selected, matches for the input data will also be looked for within Canada.
Supplementary variable Only shown when Input data type is Place (city, town, etc.). Use this dropbox to supply an additional variable with geographic info (such as state or region) to disambiguate place names in the input data that could represent multiple distinct locations in the region.
Next
How to Create a Geographic Map