Most communication between users and websites requires IP addresses. However, when profiling web traffic it is more useful to consider where users are physically located. Geocoding is the process of translating an IP address to a physical location.
Geocoding is an imprecise science. It works by looking up an IP address in a database. Databases use a variety of information sources to link IPs and locations, such as tracing web traffic and ownership of IPs. However there is no permanent mapping from an IP address to a location, so this is a "best efforts" service. Geocoding is not an exact science and offers no guarantee of accuracy
This QScript used below creates a new country variable from an IP address variable. To demonstrate geocoding, we'll use the IP addresses of the universities listed on this website. We are using universities because they each have a defined location, enabling us to check if this matches their geocoding. The first task is to convert URLs to IP addresses using DNS. The 25 IP addresses are listed below. If you are analyzing traffic from a website then you'll have a list of IP addresses already (rather than URLs).
Note that we are using IPv4 addresses but this works with IPv6 addresses as well.
Requirements
- A data set that contains either URL or IP addresses.
Method
- To perform geocoding, select the variable containing the IP addresses from the Variable and Questions tab.
- Go to Automate > Browse Online Library > Data > Geocode IPs
A new categorical variable containing the countries deduced from the IP addresses is added to the data set. Below is a table of counts of this new variable.
Notes
Geocoding in Q uses the flipAPI package which uses the rgeolocate package, which in turn uses a MaxMind database.