LoggingThe Ultimate Guide

your open-source resource for understanding, analyzing, and troubleshooting system logs

curated byloggly

1

Troubleshooting with Linux Logs

Troubleshooting is the main reason people create logs. Often you’ll want to diagnose why a problem happened with your Linux system or application. An error message or a sequence of events can give you clues to the root cause, indicate how to reproduce the issue, and point out ways to fix it. Here are a few use cases for things you might want to troubleshoot in your logs.

Cause of Login Failures

If you want to check if your system is secure, you can check your authentication logs for failed login attempts and unfamiliar successes. Authentication failures occur when someone passes incorrect or otherwise invalid login credentials, often to ssh for remote access or su for local access to another user’s permissions. These are logged by the pluggable authentication module, or pam for short. Look in your logs for strings like Failed password and user unknown. Successful authentication records include strings like Accepted password and session opened.

Failure Examples:

Success Examples:

You can use grep to find which users accounts have the most failed logins. These are the accounts that potential attackers are trying and failing to access. This example is for an Ubuntu system.

You’ll need to write a different command for each application and message because there is no standard format. Log management systems that automatically parse logs will effectively normalize them and help you extract key fields like username.

Log management systems can extract the usernames from your Linux logs using automated parsing. This lets you see an overview of the users and filter on them with a single click. In this example, we can see that the root user logged in over 2,700 times because we are filtering the logs to show login attempts only for the root user.

Root login attempts

Log management systems also let you view graphs over time to spot unusual trends. If someone had one or two failed logins within a few minutes, it might be that a real user forgot his or her password. However, if there are hundreds of failed logins or they are all different usernames, it’s more likely that someone is trying to attack the system. Here you can see that on March 12, someone tried to login as test and nagios several hundred times. This is clearly not a legitimate use of the system.

Timeline of failed login count

Cause of Reboots

Sometimes a server can stop due to a system crash or reboot. How do you know when it happened and who did it?

Shutdown Command

If someone ran the shutdown command manually, you can see it in the auth log file. Here you can see that someone remotely logged in from the IP 50.0.134.125 as the user ubuntu and then shut the system down.

Kernel Initializing

If you want to see when the server restarted regardless of reason (including crashes) you can search logs from the kernel initializing. You’d search for the facility kernel messages and Initializing cpu.

Detect Memory Problems

There are lots of reasons a server might crash, but one common cause is running out of memory.

When your system is low on memory, processes are killed, typically in the order of which ones will release the most resources. The error occurs when your system is using all of its memory and a new or existing process attempts to access additional memory. Look in your log files for strings like Out of Memory or for kernel warnings like to kill. These strings indicate that your system intentionally killed the process or application rather than allowing the process to crash.

Examples:

You can find these logs using a tool like grep. This example is for Ubuntu:

Keep in mind that grep itself uses memory, so you might cause an out of memory error just by running grep. This is another reason it’s a fabulous idea to centralize your logs!

Log Cron Job Errors

The cron daemon is a scheduler that runs processes at specified dates and times. If the process fails to run or fails to finish, then a cron error appears in your log files. You can find these files in /var/log/cron, /var/log/messages, and /var/log/syslog depending on your distribution. There are many reasons a cron job can fail. Usually the problems lie with the process rather than the cron daemon itself.

By default, cron jobs output through email using Postfix. Here is a log showing that an email was sent. Unfortunately, you cannot see the contents of the message here.

You should consider logging the cron standard output to help debug problems. Here is how you can redirect your cron standard output to syslog using the logger command. Replace the echo command with your own script and helloCron with whatever you want to set the appName to.

Which creates the log entries:

Each cron job will log differently based on the specific type of job and how it outputs data. Hopefully there are clues to the root cause of problems within the logs, or you can add additional logging as needed.

  • Bradley Marshall

    Thanks for this!

Written & Contributed by

Amy

Sadequl

This guide will help software developers and system administrators become experts at using logs to better run their systems. This is a vendor-neutral, community effort featuring examples from a variety of solutions

Meet Our Contributors Become a contributor