LoggingThe Ultimate Guide

your open-source resource for understanding, analyzing, and troubleshooting system logs

curated byloggly

1

Troubleshooting with Windows Logs

The most common reason people look at Windows logs is to troubleshoot a problem with their systems or applications. In this guide, we present common troubleshooting use cases and describe how to diagnose the root cause of the problem using events in your logs.

Looking for Failed Logon Attempts

Check Windows Security logs for failed logon attempts and unfamiliar access patterns. Authentication failures occur when someone or some application passes incorrect or otherwise invalid logon credentials.

The Security log includes security-related events, especially those related to authentication and access. These logs are your best place to search for unauthorized access attempts to your system.

The following events are of particular value in the Security log:

Successfully Logged On

These events include all successful logon attempts to a system. They include information such as:

  • Logon Type: the method that was used to log on, such as using the local or remote keyboard (over the network). This field value is expressed as an integer, the most common being 2 (local keyboard) and 3 (network).
  • Account Name: the account that logged on
  • Source Network Address: the source IP address where the logon request originated from

To get a better picture, you can read a description about all the fields in this event.

Here’s an example of successful logon event:

This event is generated when a logon session is created. It is generated on the computer that was accessed.

The subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe. 

The logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network). 

The New Logon fields indicate the account for whom the new logon was created, that is, the account that was logged on.

The network fields indicate where a remote logon request originated. Workstation name is not always available and can be left blank in some cases.

The impersonation level field indicates the extent to which a process in the logon session can impersonate.

The authentication information fields provide detailed information about this specific logon request.

– Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.

– Transited services indicate which intermediate services have participated in this logon request.

– Package name indicates which sub-protocol was used among the NTLM protocols.

– Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

Failed to Log On

These events show all failed attempts to log on to a system. This could be due to someone trying to hack into a system. However, it could also mean someone forgot his or her password, the account had expired, or an application was configured with the wrong password. These events include the following pieces of information:

  • Logon type: the method that was used to log on, such as using the local keyboard or over the network. This field value is expressed as an integer, the most common being 2 (local keyboard) and 3 (network).
  • Subject fields: the account that failed to log on, including its ID, name, and domain.
  • Failure information: the reason the logon attempt failed, such as a locked-out user or expired credentials. This field includes a text explanation and a code for the status and sub-status.
  • Network information fields: The location of the account that attempted to log on.

To learn more, you can read a description of all the fields of this event.

Here’s an example of an unsuccessful logon attempt event from the Security log:

This event is generated when a logon request fails. It is generated on the computer where access was attempted.

The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).

The Process Information fields indicate which account and process on the system requested the logon.

The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.

The authentication information fields provide detailed information about this specific logon request.

– Transited services indicate which intermediate services have participated in this logon request.

– Package name indicates which sub-protocol was used among the NTLM protocols.

– Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

Well-written applications will also log authentication failure events. Here’s an example of a  failed logon attempt in SQL Server. It includes information about who attempted to log on and why the attempt failed.

Special Privileges Assigned

The Security log also captures events when an account has been granted elevated privileges. In the image below, we are looking at one such entry where a user has been granted Local Administrator privilege:

Event Viewer

The General tab’s message says a member (a user account) was added to the Local Administrator’s group. The subject part of the event detail says who granted this privilege; in this case it’s the sysadmin user account under mytestdomain Active Directory domain. The user account which has been granted this privilege is listed under the Member section. Finally, the Group where the user was added is shown in the Group section.

As you can imagine, you can write custom scripts to filter these events for security audit reporting. You can also create a custom view to view these events.

These events include all successful logons by users with administrator privileges. The information includes items such as:

  • Subject: the account that logged on, including its ID, name, and domain
  • Privileges: a list of all the administrator privileges assigned to the user

To learn more, you can read a description of all the fields of this log event.

Here is another example of an event related to elevated permissions:

Why Did My Server or Application Crash?

If you are investigating why your server or application crashed, a great place to start looking is the Event log. The Application or System log can tell you when and why the crash happened. For example, it can give you a clue if this was due to a system or application problem.

Almost all critical errors generate more than one event log entry; that is, there is a “lead up” to the critical error message where a number of previous warnings or critical messages show what’s going on. When troubleshooting, it’s therefore necessary to look at messages immediately before the final critical error.

To find these events, you can filter your log data for a particular application name, then by critical or error events, and finally sort them by date. The following are three of the most common events you might see when troubleshooting a crash.

Unexpected Reboot

An unexpected reboot error appears in the log when the system fails to shut down and restart gracefully. A likely cause of this error is that the operating system stopped responding and crashed, or the server lost power. Again, look for which events came up before this to see a possible root cause.

Here’s an excerpt from one such event’s details:

Application Hang

An application hang error appears in the Event log when a program running in your server stops responding. In this case, your server’s hardware and the OS were functioning properly but the application was either stuck in a loop or waiting for a resource that wasn’t available at the time.

The text of the Application event below shows how a program stopped responding to Windows and Windows had to shut it down.

Application Fault

An application fault error appears in Event log when a program running in your server encounters a critical error. This error is almost always a bug in the application code or an issue with memory running out. Here’s an example from the IIS server where the offending app is w3wp.exe.

Finding the Root Cause of a Failed Service

A Windows service is a special kind of application that runs in the background and has its own Windows session. Often people want to know why a particular service didn’t start or didn’t run successfully.

You can find service failures in the Application log by filtering on “Service Control Manager” source and then filtering for critical or error events. Here are two common examples of failed service events.

Service Failed to Start

This error is logged when a service fails to start normally. We can see the service (in this case the Group Policy Client service) didn’t start in a timely fashion. The event and its message mainly tell us when the problem happened, so that’s why we need to look at messages that immediately precede it to find the root cause.

Service Timeout

A service timeout error appears when a service doesn’t start within the expected period of time (default is 3 seconds). Normally services are designed to start quickly and then run continuously to spread out processing load. This could be due to the service waiting for a resource that wasn’t available at the time. Here’s an example event generated from the Windows Error Reporting Service.

Windows Update Failure

One common Windows system administration task is to watch if computers in the network are failing to get Windows updates. These updates often contain security patches, so it’s important they run successfully.

The Windows Server Update Service (WSUS) is a Windows patch management tool that automatically downloads patches and security updates for Microsoft products from the Microsoft website and applies those patches to Windows computers in the network. In most production installations, administrators would want some sort of control over what patches are applied and when they get applied. This is to avoid unexpected behavior like automatic reboots or applications breaking after a patch cycle. In many organizations, a centralized WSUS server is used to download all patches, and administrators then schedule their distribution. The status of a Windows update run is therefore important to monitor.

In the example below, we can see a Windows application update (in this case Microsoft Office) has failed to install a service pack, and it has given an error code we can look up.

Scheduled Task Delayed or Failed

Another service people often watch is the Windows Task Scheduler. It’s similar to the Linux cron daemon because it lets us schedule and run programs, scripts, or commands on a recurring basis. Tasks can be be scheduled for specific times or run in response to a trigger. To give an example, a Windows Scheduled Task could be running a PowerShell backup script every night or copying files to an FTP server once every week. 

The events generated from the Windows Task scheduler can help you confirm if your tasks are running according to the triggers and schedules you defined or if they are failing to launch. The Task Scheduler window has its own event viewer which you can use, or you can view the log file directly at C:WindowsTasksSchedLgU.txt. Here’s an example of an event from the log.

Example event:

What other troubleshooting use cases do you run into? Please add your comments below!

Written & Contributed by

Amy

Sadequl

This guide will help software developers and system administrators become experts at using logs to better run their systems. This is a vendor-neutral, community effort featuring examples from a variety of solutions

Meet Our Contributors Become a contributor