Sending JSON Format Logs From Syslog-ng
Péter Czanik is community manager at BalaBit, developers of syslog-ng. In his limited free time he is interested in non-x86 architectures, and works on one of his PPC or ARM machines. Follow him on twitter: @PCzanik.
Loggly released the second generation of their Logging as a Service (LaaS) about a year ago. There were many incremental improvements ever since, but the new feature I have used most from the beginning is the JSON parsing. When you send logs in JSON format, Loggly not only stores them but also parses the fields and indexes the resulting value pairs. This makes searching and alerting a lot easier.
For example, think about looking for 404 errors on a web server. Doing a full text search on “404” will give many false results, as it could be UID, PID, file size, or any other number. Searching for “apache.status:404” in parsed logs gives always the desired results.
Loggly Dynamic Field Explorer offers an even easier approach: Browse through your web server logs, and Loggly will automatically display counts of 404 values. Dynamic Field Explorer provides summaries of all log types that are automatically parsed by Loggly. Note that syslog-ng’s patterndb can be used to create parsers not yet supported by Dynamic Field Explorer.
Basic message forwarding from syslog-ng on Linux is very well-documented on the Loggly website. Loggly also helps you configure your local syslog-ng to send your logs to Loggly servers in no time. You can either use a Python-based configurator script or a configuration sample. To retrieve the configuration sample, click the “Source Setup” link in your browser after logging in to Loggly and copy and paste it into your syslog-ng.conf file. There is also a dedicated syslog-ng page on the website, which describes how to send logs in a more secure, encrypted way.
Loggly automatically parses Apache and JSON logs. However, only a few log messages are originally generated in one of these standard formats. Most logs are free-form text (like SSH logs), or they are based on a fixed column structure (like Apache logs). Some of them use the new IETF syslog protocol (RFC 5424), which has support for name-value pairs (SDATA). Most of these logs can be parsed by syslog-ng and turned into JSON messages.
Sending JSON-formatted messages is not covered by the basic configuration; therefore it requires some text editing skills. Because the configuration syntax of syslog-ng is straightforward and well-documented, this is quite easy. There is also a mailing list where you can ask for help.
Sending Name-value Pairs from syslog-ng on Linux
On Linux, you can use PatternDB to parse log messages and generate name-value pairs from them. These can then be forwarded to Loggly using the JSON output. In the following example, I use the SSH pattern from the BalaBit pattern repository, which is available at https://github.com/balabit/syslog-ng-patterndb. It parses the username, authentication method, source host, and several other fields from the SSH syslog messages and even adds a bit of extra information.
The following is the configuration snippet. As usual with Loggly, use your own “Customer Token” when using it:
Enter the following query in the search field to find root logins:
If you want to receive an alert on root logins, first save this query. To do this, click the star on the left and select “Save this search as…”. To set different parameters, including which saved search query to run, click “Alerts” and select “Add new”.
The SSH pattern is only one of the patterns available in the BalaBit Git repository. If you cannot find a pattern for your favorite application, it is not too difficult to write your own. The documentation has a thorough description of PatternDB. To and related information is also collected here.
Sending logs from the syslog-ng Windows Agent Through a Central Server
By default, the syslog-ng Windows Agent sends RFC5424 log messages. This has the advantage of sending the name-value pairs of the Windows event as SDATA (structured data) along with the original log message. From the syslog-ng point of view, instead of parsing the messages, you can create filters based on name-value pairs that are sent along with the message. From the Loggly point of view, once SDATA has reached a central syslog-ng server, it can easily be turned into JSON-formatted log messages and forwarded to Loggly for easy querying and alerting.
Configuring this in the agent side is easy: Add a new server in the configuration interface and point it to port 601 of your local central syslog-ng server.
On the syslog-ng side, the following configuration will collect IETF logs and forward it to Loggly in JSON format. Once again, you have to replace the “customer token” with your own. To make sure that you have entered it correctly, log in to the Loggly web interface and click “Source Setup.” It will be displayed under “Customer Tokens”.
In this configuration, “s_windows” adds an IETF syslog source, “LogglyFormat” is a slightly modified Loggly template, which replaces the message part with JSON data, “d_loggly” is the Loggly destination, and at the end the log statement glues all of these together. Read more about the syslog-ng Windows Agent here and here.
Parsing Logs with a Fixed-column Structure (like Apache)
Many applications save their logs using a fixed column structure. The best known one is the Apache access log, but many other web and FTP servers use a similar log structure. Loggly automatically parses Apache logs, but we’ll use it as a parsing example because it’s a well-known format. The csv-parser() in syslog-ng can easily turn these log files into name-value pairs. The CSV parser doesn’t just parse comma separated values, but any log type with a fixed column structure. For example, as it was mentioned in the introduction, a 404 status message can be assigned to a name-value pair named “request_status”. Sending these logs to Loggly requires that syslog-ng read the log file using a file source. As demonstrated in the following example, this requires the flag “no-parse” to be enabled; otherwise, syslog-ng would parse it for generic syslog fields such as date, program, and so on. The next part is a CSV parser, where the fields for the Apache log messages are defined. It is followed by a modified Loggly template, which includes all JSON fields starting with “APACHE”. The Loggly destination in this example is modified to use TLS. To learn how to configure that, read the abovementioned Loggly syslog-ng page. The log statement at the end glues all of these together.
Read more about the CSV parser here.
Parsing JSON Messages Before Sending Them to Loggly
As I have stated at the beginning, one of my favorite features in Loggly is JSON parsing, and there are already some applications that emit JSON formatted log messages. Still, there are situations when it is necessary to parse and modify these messages with syslog-ng before sending them to Loggly.
The most common reason to parse JSON messages before sending off to Loggly is to limit the amount of logs leaving your central syslog-ng server. Bandwidth from your office might be quite expensive, and sending lots of logs might even slow down other types of communication. If you are interested just in client IPs and URLs, the majority of fields can be discarded before sending them to Loggly or they can be saved only locally. Similar actions can be taken if sensitive data is not allowed to leave your data center. One example is demonstrated above: Only those name-value pairs are forwarded to Loggly that begin with “APACHE”, while syslog-ng generates several name-value pairs about each message. For more examples and complete reference, check the documentation here.
The other reason to parse JSON messages is to rewrite a part of the message, often for compliance reasons. A common example for this is replacing the middle of credit card numbers with asterisk characters for security reasons, replacing user names or client IP addresses with hash values for anonymity reasons, and so on. An interesting blog covering this part of PCI-DSS is available here.
As the included configuration examples are not easy to read, they are now easily available as simple rewrite functions here. Note that running anonymization on a single JSON field instead of the whole message has some performance benefits. Creating hashes is documented here.
Tips and Tricks
If your machines are behind a firewall, port 514 might be closed due to the previous bad experiences with the r* commands. Even if Loggly does not provide another unencrypted syslog port, there is a workaround: Use encrypted syslog, which can be sent to port 6514. Using encryption is well-documented on the Loggly syslog-ng page, and it is easy to apply to any of the above configurations. It also has the added bonus that nobody can read your logs on the way to Loggly.
The benefits of logging in JSON are huge since they translate directly faster fixes to operational problems and much less time spent finding critical log data. Loggly and syslog-ng make it quick and easy for you to start reaping these benefits.