Geographic Lookup From IP Address

Grepping through Log Files

Managing a few small web sites and some web applications, I’m constantly finding myself grepping through log files looking at access activity¬† and suspicious entries. I’m always interested in looking up the geographic location of where access requests are coming from. Is the traffic local or am I getting hit from Russian or Chinese IP addresses? There are a lot of online tools to get geographic information from an IP address, such as here and here, buts its time consuming when you want to look up lots of addresses. So I wanted a script to do more of a bulk lookup.

IP lookup API’s for scripting goodness

While looking around the internet for a good, free API for geographic lookup, I found ipstack, a free IP Geolocation API service. Signup is required in order to get an API key, but the free version should give you most of the functionality needed, with 10,000 requests per month. The API is pretty easy to use also, and a simple curl request can do the trick. The API returns the results by default in JSON format.

In this example I’m making a curl request to the ipstack api, requesting the geographic location for Cloudflair’s 1.1.1.1 DNS server.¬†The response comes back in JSON so I’m piping the output to jq, which is a really handy JSON parser for bash.

curl http://api.ipstack.com/1.1.1.1?access_key=xxxxxxx | jq

The ipstack API has different options to only request some of the data fields and not others. I’ll leave it to the reader to look them up if interested.

Parsing Log Files for feedstock

So this was a really good start, but I want to script multiple requests to the API and store the results in an sql database for easy lookup. I’m a big fan of using databases for information storage and usually opt for using sql even if its a bit overkill.

Apache is generally pretty good at keeping log files. The default apache config keeps aceess logs in /var/log/apache2/access.log , where errors are kept in /var/log/apache2/error.log. This may be different on your system, so RTFM. These files are nothing if not dense but they are reliably formatted. Awk is your friend here. You can use awk to pull out the log file IP addresses. From there, the rest is easy peasy.

sudo awk '{ print $1 } ' /var/log/apache2/access.log | sort | uniq

I’m piping the results of awk through sort an uniq, which sort the results numerically and strip out duplicate results. If you have multiple log files you could always cat them together before sending them through awk.

Now the magic happens

Now we want to run through each IP address, make the request from ipstack and store the geographic results. Its possible to do this purely with bash,but my bash game isn’t quite that strong. I chose to write a small Go program to do the job.

package main

import (
"database/sql"
"encoding/json"
"fmt"
_ "github.com/go-sql-driver/mysql"
"io/ioutil"
"net/http"
"os"
)

type Result struct {
IP string
Continent_name string
Country_name string
Region_name string
City string
}

func main() {
var myResponse Result
args := os.Args[1:]

if len(args) != 1 {
panic("Need an arguemnt")
}
target := args[0]
fmt.Printf("Using target %s", target)
url := "http://api.ipstack.com/" + target + "?access_key=xxxxxx&fields=ip,continent_name,country_name,region_name,city"
resp, err := http.Get(url)
if err != nil {
panic(err)
}

defer resp.Body.Close()
contents, err := ioutil.ReadAll(resp.Body)
err = json.Unmarshal(contents, &myResponse)
if err != nil {
panic(err)
}

//connect to database
db, err := sql.Open("mysql", "ipLookup:@/IPTrack")
if err != nil {
panic(err)
}

stmt, err := db.Prepare("INSERT INTO Results (IP,Continent, Country,Region,City) VALUES (?,?,?,?,?)")
if err != nil {
panic(err)
}

_, err = stmt.Exec(myResponse.IP, myResponse.Continent_name, myResponse.Country_name, myResponse.Region_name, myResponse.City)
if err != nil {
panic(err)
}
db.Close()
}

Remember, my standard Code Caveat applies.

You can pipe the output of the awk command above to this program, which will load the ipstack api lookup results to sql. What you’re left with is a handy database full of geographic locations of your visitors. Pretty neat.

This makes it really clear this particular server is getting a lot of attention from China. This is good information to now make a decision as to whether I want to impose a geographic firewall rule or not. I’ll leave that up to the reader to decide.