Navigation:
In this article, I’m going to teach you how to parse archived raw access logs from your cPanel VPS (Virtual Private Server) or dedicated server. Reviewing requests from your archived raw access logs can help bring to light a common problematic request or user-causing server issues that you might not have been able to catch otherwise.
Before trying to follow along with this guide, you should have already read my article about how to enable raw access log archiving for all cPanel accounts so that you actually have archived raw access logs to review.
The method we’ll be going over for parsing these raw access logs is very handy, as you can do it on the server directly, instead of having to access the raw access logs in cPanel which requires you to download the logs to your own computer first.
To follow along with this guide you’ll need root access to either your VPS or dedicated server so that you have full access to read all of the archived logs.
Review archived raw access logs
Using the steps below I’ll show you how to connect to your server and run a command to read through your various archived raw access logs.
- Login to your server via SSH as the root user.
- Review all requests that happened during the month of January 2013 using the following command: grep “Jan/2013” /home/*/logs/*-Jan-2013.gz | lessYou’ll be able to use Page Up and Page Down to scroll up and down through all of the log data. You can also use a forward slash / which will put the less command into search mode. So for instance after typing / if you follow it with 8/Jan you’ll be dropped right to the section of the logs for January 8th. Once you are done reviewing the log this way, you can simply hit q to quit the less command. You should see entries like this, in this case, we can see these lines are from our example.com site belonging to the userna5 user:/home/userna5/logs/example.com-Jan-2013.gz:123.123.123.123 – – [01/Jan/2013:00:09:10 -0500] “GET /category/Linux/ HTTP/1.1” 200 3063 “-” “Mozilla/5.0 (compatible; AhrefsBot/4.0; +https://ahrefs.com/robot/)”
/home/userna5/logs/example.com-Jan-2013.gz:123.123.123.123 – – [01/Jan/2013:02:57:05 -0500] “GET /2010/12/ HTTP/1.1” 200 5197 “-” “Mozilla/5.0 (compatible; AhrefsBot/4.0; +https://ahrefs.com/robot/)”
/home/userna5/logs/example.com-Jan-2013.gz:123.123.123.123 – – [01/Jan/2013:04:06:32 -0500] “POST /wp-cron.php HTTP/1.0” 200 – “-” “WordPress/3.4.1; https://atomlabs.net”
/home/userna5/logs/example.com-Jan-2013.gz:123.123.123.123 – – [01/Jan/2013:04:06:29 -0500] “GET /wp-login.php HTTP/1.1” 200 2147 “-” “Mozilla/5.0 (compatible; AhrefsBot/4.0; +https://ahrefs.com/robot/)”
Parse IPs from archived raw access logs
Below I’ll show how to parse out all of the IP addresses from your raw access logs for the example.com domain.
- Run this command: grep “Jan/2013” /home/userna5/logs/example.com-Jan-2013.gz | sed ‘s#:# #’ | awk ‘{print $2}’ | sort -n | uniq -c | sort -nThis will spit back info like this:76 123.123.123.129
80 123.123.123.124
599 123.123.123.125
6512 123.123.123.123
Parse User-Agents from archived raw access logs
Parse out all of the User-agents from your raw access logs for the example.com domain.
- Run this command:zgrep “Jan/2013″ /home/userna5/logs/example.com-Jan-2013.gz | awk -F” ‘{print $6}’ | sort | uniq -c | sort -nThis will spit back info like this:192 Mozilla/5.0 (compatible; YandexBot/3.0; +https://yandex.com/bots)
340 Mozilla/5.0 (compatible; Baiduspider/2.0; +https://www.baidu.com/search/spider.html)
1509 Mozilla/5.0 (compatible; SISTRIX Crawler; https://crawler.sistrix.net/)
5548 Mozilla/5.0 (compatible; AhrefsBot/4.0; +https://ahrefs.com/robot/)
Parse requested URLs from archived raw access logs
Below I’ll show how to parse out all of the requested URLs from your raw access logs for the example.com domain.
- Run this command:zgrep “Jan/2013” /home/userna5/logs/example.com-Jan-2013.gz | awk ‘{print $7}’ | sort | uniq -c | sort -nThis will spit back info like this:172 /wp-login.php
201 /robots.txt
380 /
2017 /opencart/undefined
Parse referrers from archived raw access logs
Below I’ll show how to parse out all of the referrers from your raw access logs for the example.com domain.
- Run this command:zgrep “Jan/2013″ /home/userna5/logs/example.com-Jan-2013.gz | awk -F” ‘{print $4}’ | sort | uniq -c | sort -nThis will spit back info like this:219 https://example.com/prestashop/index.php
337 https://example.com/list/admin/
2009 https://example.com/
2522 https://example.com/opencart/
You should now fully understand how you can parse your archived raw access logs on your server to get a better understanding of requests that have been going on, possibly causing server usage issues.
You might also be interested in reading my article about blocking unwanted users from your site using .htaccess for an in-depth explanation on how you could block any users that are causing an excessive amount of requests to your sites.