Handling logstash input multiline codec

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

The multiline codec will collapse multiline messages and merge them into a single event. The default limit is 500 lines. If there are over 500 lines appended, the multiline codec split the message to the next 500 lines and so forth. This post demonstrates how to deal with this situation. Elasticsearch receives in tags the multiline_codec_max_lines_reached.

    "tags" => [
        [0] "multiline",
        [1] "multiline_codec_max_lines_reached"

Basically you can use a regular expression to handle these lines. As testing basis I just take some java application logs, e.g. JBoss EAP.

2016-09-14 16:47:12,845 INFO  [default-threads - 22] [497a1672-52ff-42b7-b53c-df202834c2f5] [APP] [] [] [TERMINAL] (TxManager) Final response message with escaped sensitive data: ...

Let’s assume a regular log line always starts with an ISO Timestamp. For demonstration purposes I lower the regular limit to 25 lines.

input {
  file {
    path => "/var/log/test/server.log"
    start_position => beginning
    codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601}"
      negate => true
      what => previous
	  max_lines => 25
    sincedb_path => "/dev/null"
output { stdout => { codec => "rubydebug" }}

You can use a regular expression (line starts with ISO timestamp) to properly grok the message and in else do anything you like, e.g. drop the message.

filter {
  if [message] =~ "^([0-9]{4})-?(1[0-2]|0[1-9])-?(3[01]|0[1-9]|[12][0-9])" {
	grok {
	  match => { "message" => "%{TIMESTAMP_ISO8601:datetime}\s%{WORD:level}\s*\[%{DATA:thread}\]\s\[%{DATA:requestid}?\]\s+\[%{WORD:appid}?\]\s+\[%{DATA:sessionid}?\]\s+\[%{DATA:trxid}?\]\s+\[%{DATA:terminalid}?\]\s+\(%{DATA:class}\)\s+%{GREEDYDATA:logmessage}?" }
  else {
	# drop continued multilines
	drop { }

If you do nothing just ensure not to grok the message or grok it properly.