Monitoring Apache Logs
You need a way to proactively monitor for known (and unknown) conditions and to get notified when one of these conditions requires your attention. Typical approaches include alerts for urgent issues and dashboards to watch stats on a regular basis.
Alerts are immediate notifications that give you a chance to proactively fix issues before too many customers are impacted or report support issues. Unfortunately, most Unix distributions don’t have convenient alerting tools, but you can try to build your own using cron jobs or postfix. Log management services will continuously check your logs to see if an alert should fired and can notify you right away. Here are some great alerts to consider for Apache. The search examples are for Loggly and you can find good thresholds by looking at trends during your peak site usage periods.
|Your error rate is too high||apache.status:>400||>100 in last 15 minutes|
|Your traffic is much higher than normal||logtype:apache||>1000 in last 15 minutes|
|Your site is much slower than desired||apache.requestTimeMillis:>1000||>200 in last 15 minutes|
Tips on setting up Apache alerts:
- Check every 15 minutes over a 15 minute window. If you select less, you may have issues when there is an outage or a burst of data that gets queued.
- Set the email to go to a mailing list owned by your Ops team or use a service like PagerDuty to alert your on-call person or send an SMS.
Dashboards are a great way to stay on top of what’s happening on your site, as told by your Apache logs. Unix command line tools don’t offer good graphical dashboards, but log management systems have great capabilities built in. Here are some dashboards to set up for Apache, with an example screenshot showing each below.
- Apache Status Over Time tells you if there is an unexpected increase in traffic or error rate.
- Apache Response Time tells you if response times are slow due to servers getting overloaded or new code deployments. In the trend view, select Timeline chart, a numeric field of apache.requestTimeMillis and the operator is Average or Max.
- Apache Traffic By IP tells you if traffic issues are overwhelmingly caused by one or a small number of clients.
- Top Error URIs tells you which URIs have the most errors.