Discussion:
[rsyslog] Best practice for an application to get structured data to rsyslog
Ezell, Matthew A.
2015-04-15 03:31:38 UTC
Permalink
Hello-

What is the current "best practice" for a portable application to get
structured data to rsyslog?

Most modern syslog daemons seem to support some type of JSON format, but
applications still tend to use the old syslog(3) function for logging. If
an application emits CEE JSON directly to syslog(3), and no special
configuration is made to enable JSON parsing, the "typical" output file
(/var/log/messages or distribution-specific equivalent) gets JSON printed
to the log. That may be undesirable in the common case.

Ideally, there would be a syslog()-like library call that an application
could make to provide a "normal" syslog message as well as structured
data. /var/log/messages would just get the "normal" syslog message, but
System Administrators who care about structured logging could log the
structured data to an alternate file, forward it to a central syslog
daemon, or log it to a document store (mongodb, ElasticSearch, etc). That
library would (again, ideally) be pervasive (available *by default* on
most systems, like syslog.h today) or dead-simple to ship with an
application (meaning a license that allows redistribution and a minimal
number of files to pull into the application repository).

I've read up on CEE and LumberJack, but both projects seem to be
dead/crufty at this point. There's libumberlog and liblogging, but it's
not clear that either of them fit the use case of being able to detect if
the host "wants" structured logging and responding appropriately.


I've also seen systemd's journal sd_journal_send(), which seems like a
nice interface, but systemd is strictly linux-only. On linux, it looks
like regular syslog would just get the message part (to log into
/var/log/messages), but journalctl and rsyslog's imjournal could get at
the structured data. That's really what I want, but without the annoyance
of systemd being new and linux-only. I'd prefer not to pepper an
application with #ifdef's to figure out if it should use the journald
functions or something else.


I'd like to see structured logging become the norm - is it possible to
make it easy for application developers to add structured logging
capabilities without introducing JSON to /var/log/messages for "simple"
use cases?

Thanks for any advice you can provide,
~Matt

---
Matt Ezell
HPC Systems Administrator
Oak Ridge National Laboratory


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
singh.janmejay
2015-04-15 03:49:39 UTC
Permalink
Have you looked at mmjsonparse? It solves the problem of
de-serializing structured-messages handed-over to rsyslog in
JSON-serialized form.

For dual-mode: structured and unstructured, 2 common approaches exist.
- Passing structured messages as JSON and optionally handling the
differently on Rsyslog side
- Parsing semi/un-structured messages to generate structured form
Post by Ezell, Matthew A.
Hello-
What is the current "best practice" for a portable application to get
structured data to rsyslog?
Most modern syslog daemons seem to support some type of JSON format, but
applications still tend to use the old syslog(3) function for logging. If
an application emits CEE JSON directly to syslog(3), and no special
configuration is made to enable JSON parsing, the "typical" output file
(/var/log/messages or distribution-specific equivalent) gets JSON printed
to the log. That may be undesirable in the common case.
Ideally, there would be a syslog()-like library call that an application
could make to provide a "normal" syslog message as well as structured
data. /var/log/messages would just get the "normal" syslog message, but
System Administrators who care about structured logging could log the
structured data to an alternate file, forward it to a central syslog
daemon, or log it to a document store (mongodb, ElasticSearch, etc). That
library would (again, ideally) be pervasive (available *by default* on
most systems, like syslog.h today) or dead-simple to ship with an
application (meaning a license that allows redistribution and a minimal
number of files to pull into the application repository).
I've read up on CEE and LumberJack, but both projects seem to be
dead/crufty at this point. There's libumberlog and liblogging, but it's
not clear that either of them fit the use case of being able to detect if
the host "wants" structured logging and responding appropriately.
I've also seen systemd's journal sd_journal_send(), which seems like a
nice interface, but systemd is strictly linux-only. On linux, it looks
like regular syslog would just get the message part (to log into
/var/log/messages), but journalctl and rsyslog's imjournal could get at
the structured data. That's really what I want, but without the annoyance
of systemd being new and linux-only. I'd prefer not to pepper an
application with #ifdef's to figure out if it should use the journald
functions or something else.
I'd like to see structured logging become the norm - is it possible to
make it easy for application developers to add structured logging
capabilities without introducing JSON to /var/log/messages for "simple"
use cases?
Thanks for any advice you can provide,
~Matt
---
Matt Ezell
HPC Systems Administrator
Oak Ridge National Laboratory
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
--
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Ezell, Matthew A.
2015-04-15 04:28:03 UTC
Permalink
Sure, as a system administrator it's pretty clear how best to handle this.
If there's CEE JSON data coming over the wire, use mmjsonparse. If it's
unstructured traditional syslog(3) data, use mmnormalize to try to extract
relevant fields based on rules I setup. Write the traditional "message"
field to /var/log/messages and send the structured data to ElasticSearch.
But I'm a system administrator who cares about structured logging, so I
would have a custom rsyslog setup to handle this seamlessly.

The question is really from the application developer's point of view.
How do you log structured data in a way that doesn't change the format of
/var/log/messages for most users, but provides additional information for
those system administrators who choose to handle the structured data?

Imagine going to the developers of OpenSSH and requesting that they start
logging structured data. If they simply changed all their syslog(3) calls
to output CEE JSON instead instead of plain strings, it's going to break
just about every brute-force login detection system out there. That's
unacceptable. What is the *right* thing for them to do?

~Matt

---
Matt Ezell
HPC Systems Administrator
Oak Ridge National Laboratory
Post by singh.janmejay
Have you looked at mmjsonparse? It solves the problem of
de-serializing structured-messages handed-over to rsyslog in
JSON-serialized form.
For dual-mode: structured and unstructured, 2 common approaches exist.
- Passing structured messages as JSON and optionally handling the
differently on Rsyslog side
- Parsing semi/un-structured messages to generate structured form
Post by Ezell, Matthew A.
Hello-
What is the current "best practice" for a portable application to get
structured data to rsyslog?
Most modern syslog daemons seem to support some type of JSON format, but
applications still tend to use the old syslog(3) function for logging.
If
an application emits CEE JSON directly to syslog(3), and no special
configuration is made to enable JSON parsing, the "typical" output file
(/var/log/messages or distribution-specific equivalent) gets JSON printed
to the log. That may be undesirable in the common case.
Ideally, there would be a syslog()-like library call that an application
could make to provide a "normal" syslog message as well as structured
data. /var/log/messages would just get the "normal" syslog message, but
System Administrators who care about structured logging could log the
structured data to an alternate file, forward it to a central syslog
daemon, or log it to a document store (mongodb, ElasticSearch, etc).
That
library would (again, ideally) be pervasive (available *by default* on
most systems, like syslog.h today) or dead-simple to ship with an
application (meaning a license that allows redistribution and a minimal
number of files to pull into the application repository).
I've read up on CEE and LumberJack, but both projects seem to be
dead/crufty at this point. There's libumberlog and liblogging, but it's
not clear that either of them fit the use case of being able to detect if
the host "wants" structured logging and responding appropriately.
I've also seen systemd's journal sd_journal_send(), which seems like a
nice interface, but systemd is strictly linux-only. On linux, it looks
like regular syslog would just get the message part (to log into
/var/log/messages), but journalctl and rsyslog's imjournal could get at
the structured data. That's really what I want, but without the annoyance
of systemd being new and linux-only. I'd prefer not to pepper an
application with #ifdef's to figure out if it should use the journald
functions or something else.
I'd like to see structured logging become the norm - is it possible to
make it easy for application developers to add structured logging
capabilities without introducing JSON to /var/log/messages for "simple"
use cases?
Thanks for any advice you can provide,
~Matt
---
Matt Ezell
HPC Systems Administrator
Oak Ridge National Laboratory
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.
--
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
David Lang
2015-04-15 04:48:05 UTC
Permalink
Post by Ezell, Matthew A.
Sure, as a system administrator it's pretty clear how best to handle this.
If there's CEE JSON data coming over the wire, use mmjsonparse. If it's
unstructured traditional syslog(3) data, use mmnormalize to try to extract
relevant fields based on rules I setup. Write the traditional "message"
field to /var/log/messages and send the structured data to ElasticSearch.
But I'm a system administrator who cares about structured logging, so I
would have a custom rsyslog setup to handle this seamlessly.
The question is really from the application developer's point of view.
How do you log structured data in a way that doesn't change the format of
/var/log/messages for most users, but provides additional information for
those system administrators who choose to handle the structured data?
Imagine going to the developers of OpenSSH and requesting that they start
logging structured data. If they simply changed all their syslog(3) calls
to output CEE JSON instead instead of plain strings, it's going to break
just about every brute-force login detection system out there. That's
unacceptable. What is the *right* thing for them to do?
do like ossec does and have a config option that switches to JSON output.

since they have to have their software work everywhere that it's working today,
they can't change it's output at all. anything they do will break parsers.

but with a config switch (which a distro could turn on by default), they can
output a different format, and that format could be JSON with the old log text
in a msg field (again though, which is the source of truth if they differ)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Brian Knox
2015-04-15 10:51:17 UTC
Permalink
We keep our logs in JSON format and don't find it to be a drawback. We
have logs searchable in elasticsearch - and for working with logs on disk,
have a small program that logs can be piped through that strips out
everything but the json which makes it very easy to pipe logs to jq (a
command line json processor - see https://stedolan.github.io/jq/ ).
Post by Ezell, Matthew A.
Sure, as a system administrator it's pretty clear how best to handle this.
Post by Ezell, Matthew A.
If there's CEE JSON data coming over the wire, use mmjsonparse. If it's
unstructured traditional syslog(3) data, use mmnormalize to try to extract
relevant fields based on rules I setup. Write the traditional "message"
field to /var/log/messages and send the structured data to ElasticSearch.
But I'm a system administrator who cares about structured logging, so I
would have a custom rsyslog setup to handle this seamlessly.
The question is really from the application developer's point of view.
How do you log structured data in a way that doesn't change the format of
/var/log/messages for most users, but provides additional information for
those system administrators who choose to handle the structured data?
Imagine going to the developers of OpenSSH and requesting that they start
logging structured data. If they simply changed all their syslog(3) calls
to output CEE JSON instead instead of plain strings, it's going to break
just about every brute-force login detection system out there. That's
unacceptable. What is the *right* thing for them to do?
do like ossec does and have a config option that switches to JSON output.
since they have to have their software work everywhere that it's working
today, they can't change it's output at all. anything they do will break
parsers.
but with a config switch (which a distro could turn on by default), they
can output a different format, and that format could be JSON with the old
log text in a msg field (again though, which is the source of truth if they
differ)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
David Lang
2015-04-15 19:16:17 UTC
Permalink
This is why I love opensource software, comment on a need and someone will point
at a tool.

Thanks!
wq
Post by Brian Knox
We keep our logs in JSON format and don't find it to be a drawback. We
have logs searchable in elasticsearch - and for working with logs on disk,
have a small program that logs can be piped through that strips out
everything but the json which makes it very easy to pipe logs to jq (a
command line json processor - see https://stedolan.github.io/jq/ ).
Post by Ezell, Matthew A.
Sure, as a system administrator it's pretty clear how best to handle this.
Post by Ezell, Matthew A.
If there's CEE JSON data coming over the wire, use mmjsonparse. If it's
unstructured traditional syslog(3) data, use mmnormalize to try to extract
relevant fields based on rules I setup. Write the traditional "message"
field to /var/log/messages and send the structured data to ElasticSearch.
But I'm a system administrator who cares about structured logging, so I
would have a custom rsyslog setup to handle this seamlessly.
The question is really from the application developer's point of view.
How do you log structured data in a way that doesn't change the format of
/var/log/messages for most users, but provides additional information for
those system administrators who choose to handle the structured data?
Imagine going to the developers of OpenSSH and requesting that they start
logging structured data. If they simply changed all their syslog(3) calls
to output CEE JSON instead instead of plain strings, it's going to break
just about every brute-force login detection system out there. That's
unacceptable. What is the *right* thing for them to do?
do like ossec does and have a config option that switches to JSON output.
since they have to have their software work everywhere that it's working
today, they can't change it's output at all. anything they do will break
parsers.
but with a config switch (which a distro could turn on by default), they
can output a different format, and that format could be JSON with the old
log text in a msg field (again though, which is the source of truth if they
differ)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Brett Delle Grazie
2015-04-16 06:16:37 UTC
Permalink
Post by David Lang
This is why I love opensource software, comment on a need and someone will
point at a tool.
Thanks!
wq
We keep our logs in JSON format and don't find it to be a drawback. We
Post by Brian Knox
have logs searchable in elasticsearch - and for working with logs on disk,
have a small program that logs can be piped through that strips out
everything but the json which makes it very easy to pipe logs to jq (a
command line json processor - see https://stedolan.github.io/jq/ ).
We've done something similar - we get our applications to output RFC 5424
directly to rsyslog and then ship the logs to a central store and to
Elastic Search.
The application's logging library must support RFC 5424 though. This has
that processing is slightly more distributed and the number of
transformations is reduced slightly.
Post by David Lang
Post by Brian Knox
Post by Ezell, Matthew A.
Sure, as a system administrator it's pretty clear how best to handle this.
Post by Ezell, Matthew A.
If there's CEE JSON data coming over the wire, use mmjsonparse. If it's
unstructured traditional syslog(3) data, use mmnormalize to try to extract
relevant fields based on rules I setup. Write the traditional "message"
field to /var/log/messages and send the structured data to
ElasticSearch.
But I'm a system administrator who cares about structured logging, so I
would have a custom rsyslog setup to handle this seamlessly.
The question is really from the application developer's point of view.
How do you log structured data in a way that doesn't change the format of
/var/log/messages for most users, but provides additional information for
those system administrators who choose to handle the structured data?
Imagine going to the developers of OpenSSH and requesting that they start
logging structured data. If they simply changed all their syslog(3) calls
to output CEE JSON instead instead of plain strings, it's going to break
just about every brute-force login detection system out there. That's
unacceptable. What is the *right* thing for them to do?
do like ossec does and have a config option that switches to JSON output.
since they have to have their software work everywhere that it's working
today, they can't change it's output at all. anything they do will break
parsers.
but with a config switch (which a distro could turn on by default), they
can output a different format, and that format could be JSON with the old
log text in a msg field (again though, which is the source of truth if they
differ)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
--
Kind regards,

Brett Delle Grazie
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
David Lang
2015-04-15 04:43:17 UTC
Permalink
Post by Ezell, Matthew A.
Hello-
What is the current "best practice" for a portable application to get
structured data to rsyslog?
Most modern syslog daemons seem to support some type of JSON format, but
applications still tend to use the old syslog(3) function for logging. If
an application emits CEE JSON directly to syslog(3), and no special
configuration is made to enable JSON parsing, the "typical" output file
(/var/log/messages or distribution-specific equivalent) gets JSON printed
to the log. That may be undesirable in the common case.
the question is why it is undesirable and how much effort you are willing to do
to fix the issue.

Having the application log the data twice (once in JSON and a second time in
some other format) is just begging for the two to get out of sync and report
different data.

So you are left with two choices

1. start with JSON and have something create human readable data from it

2. start with something human readable and have something parse it to create the
JSON

Rsyslog can go either way, but it's probably slightly more efficient and less
complicated to output human readable text and parse it with liblognorm
(mmnormalize module). If you are creating the logging output, you can make it
nicely structured so the parser is easy to define.

What I do is I ask for the apps to output in JSON wherever possible, and I don't
worry about creating a human friendly message in a text file. I write the JSON
(or a subset of it) to the text file and if someone needs it prettier, they can
read the JSON and convert it. 99.99+% of the time the logs are going to be
processed by software, and it's easier to have everything in JSON than to have
the software doing the processing parse the data (shell scripts are the only
thing I don't have a good JSON option for yet, but I haven't gone looking.
Perl, Python, etc all do it easily enough)

what I do is to take whatever message was output and then run mmjsonparse
against it. If it's cee JSON (insert grumble about the requirement for the cee
cookie ;-) I have all the variables, but no $!msg field. If I have a $!msg
field, then I parse it using mmnormalize to extract variables from it. If there
isn't a $!msg field, I set $!msg=$mesg so that I have something I can spit out
when I'm doing a 'plain' logfile.

I also add metadata to the JSON (fromhost-ip, received time, hostname of relay,
and an environment tag so that later on I can trivially tell the difference
between dev and prod copies of the same software)
Post by Ezell, Matthew A.
Ideally, there would be a syslog()-like library call that an application
could make to provide a "normal" syslog message as well as structured
data. /var/log/messages would just get the "normal" syslog message, but
System Administrators who care about structured logging could log the
structured data to an alternate file, forward it to a central syslog
daemon, or log it to a document store (mongodb, ElasticSearch, etc). That
library would (again, ideally) be pervasive (available *by default* on
most systems, like syslog.h today) or dead-simple to ship with an
application (meaning a license that allows redistribution and a minimal
number of files to pull into the application repository).
I've read up on CEE and LumberJack, but both projects seem to be
dead/crufty at this point. There's libumberlog and liblogging, but it's
not clear that either of them fit the use case of being able to detect if
the host "wants" structured logging and responding appropriately.
I've also seen systemd's journal sd_journal_send(), which seems like a
nice interface, but systemd is strictly linux-only. On linux, it looks
like regular syslog would just get the message part (to log into
/var/log/messages), but journalctl and rsyslog's imjournal could get at
the structured data. That's really what I want, but without the annoyance
of systemd being new and linux-only. I'd prefer not to pepper an
application with #ifdef's to figure out if it should use the journald
functions or something else.
I'd like to see structured logging become the norm - is it possible to
make it easy for application developers to add structured logging
capabilities without introducing JSON to /var/log/messages for "simple"
use cases?
look at liblogging, it was started as part of lumberjack for exactly this
purpose.

The problem is how do you make sure that your two copies of the log contain the
same info.

In practice, it's proved easier to just do it in the application with a
logformat config option.

part of the problem is that there are just too many different structured log
formats in use

there is

name=value name=value
name=value<tab>name=value
name=value|name=value
name=value,name=value
name=value;name=value

several of the above with ':' or ': ' instead of '='

some that use name=value in some areas and name: value in others

then some that have fixed width fields (variable number of spaces), and your
typical human readable messages.

I take the approach of converting everything into JSON and manipulating that. It
seems like that is what many of the logging daemons do internally.

Frankly, I am more concerned with getting an application to write _any_ sort of
structure in the logs and avoid the "this field may or may not be in the
message" type of thing so that the logs are easily parsable. What format they
use is far less important to me, they all need some work to parse.

I do encourage JSON when I have a choice. There are things you can do with a
two-dimentional structure that are just a mess to do with name-value pairs
(think cases where you may have multiple names/IPs/Ports in a message. which is
easier

{source:{ip:1.2.3.4, name=foo, port=5678},dest:{ip:5.6.7.8, name=bar, port=5678}}
or
sourceip=1.2.3.4 destip=5.6.7.8 sourcename=foo destname=bar sourceport=5678 destport=5678

now, add a third thing (say nat info) and a few more parameters (interface,
machineid, username) and name-value pairs get _really_ ugly



As far the idea of getting everyone to log in one format

it's a nice dream, but there's no chance of getting everyone to use the same
library

It would be good to get a replacement for the systemd sd_journal_send() call
that would send structured data to syslog, so that any apps that get modified to
do something special for systemd can have that work leveraged to work better
even without it.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Dave Caplinger
2015-04-15 15:25:44 UTC
Permalink
Post by David Lang
Post by Ezell, Matthew A.
Hello-
What is the current "best practice" for a portable application to get
structured data to rsyslog?
... gets JSON printed to the log. That may be undesirable in the common case.
the question is why it is undesirable and how much effort you are willing to do
to fix the issue.
...
Post by David Lang
What I do is I ask for the apps to output in JSON wherever possible, and I don't
worry about creating a human friendly message in a text file. I write the JSON
(or a subset of it) to the text file and if someone needs it prettier, they can
read the JSON and convert it.
For example: as long as there aren’t character set conversion issues such as writing Windows-1252 encoded strings into what should be UTF-8 JSON encoding, then tools like 'jq' <http://stedolan.github.io/jq/> are very helpful for pulling fields out of JSON-format logs. It can be as simple as: pipe the file to "jq -r '.msg'" to get the raw logs back out.
Post by David Lang
what I do is to take whatever message was output and then run mmjsonparse
against it. If it's cee JSON (insert grumble about the requirement for the cee
cookie ;-) I have all the variables, but no $!msg field. If I have a $!msg
field, then I parse it using mmnormalize to extract variables from it. If there
isn't a $!msg field, I set $!msg=$mesg so that I have something I can spit out
when I'm doing a 'plain' logfile.
I also add metadata to the JSON (fromhost-ip, received time, hostname of relay,
and an environment tag so that later on I can trivially tell the difference
between dev and prod copies of the same software)
We do something very similar to this, and I suspect so do other high-volume Rsyslog users such as Radu at Sematext.

I feel this should just be Rsyslog’s recommended operational practice. If you’re building a log monitoring infrastructure today, this is how you should be doing it. Free-form text syslog should be considered a legacy encoding that is of course still supported as an input format (and if you must, an output format). Maybe we should put such a recommended config prominently on the Rsyslog web site to help overcome any lingering impressions that syslog is a legacy logging format that has been replaced by journald.
--
Dave Caplinger, Director of Architecture | Solutionary — An NTT Group Security Company

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSC
Radu Gheorghe
2015-04-15 15:47:52 UTC
Permalink
On Wed, Apr 15, 2015 at 6:25 PM, Dave Caplinger <
[...]
Post by David Lang
Post by David Lang
what I do is to take whatever message was output and then run mmjsonparse
against it. If it's cee JSON (insert grumble about the requirement for
the cee
Post by David Lang
cookie ;-) I have all the variables, but no $!msg field. If I have a
$!msg
Post by David Lang
field, then I parse it using mmnormalize to extract variables from it.
If there
Post by David Lang
isn't a $!msg field, I set $!msg=$mesg so that I have something I can
spit out
Post by David Lang
when I'm doing a 'plain' logfile.
I also add metadata to the JSON (fromhost-ip, received time, hostname of
relay,
Post by David Lang
and an environment tag so that later on I can trivially tell the
difference
Post by David Lang
between dev and prod copies of the same software)
We do something very similar to this, and I suspect so do other
high-volume Rsyslog users such as Radu at Sematext.
Yes, we actually check whether parsing worked:

if $parsesuccess == "OK" then
...

and use different templates for JSON and non-JSON messages. For JSON ones
we use the $!all-json variable to get us all parsed properties. You could
also use the jsonmesg property to get everything (parsed + syslog
variables) but some info will be duplicated that way.

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
David Lang
2015-04-15 16:07:59 UTC
Permalink
What I do on my relay boxes

$template structured_forwarding,"<%pri%>%timereported% %hostname% %syslogtag% @cee:%$!%\n"
module(load="imudp" timerequery="4")
module(load="imtcp" maxsessions="1000")
module(load="mmjsonparse")
input(type="imudp" port="514" ruleset="relay")
input(type="imtcp" port="514" ruleset="relay")
ruleset(name="relay" parser=["rsyslog.ciscoios","rsyslog.rfc5424","rsyslog.rfc3164"]){
action(type="mmjsonparse")
# if the message we got was in JSON from the beginning, there won't be a $!msg
variable
if $!msg == "" then set $!msg = $msg;
set $!trusted!origserver = $fromhost-ip;
set $!trusted!edge!time = $timegenerated;
set $!trusted!edge!relay = $$myhostname;
set $!trusted!edge!input = $inputname;
set $!trusted!environment = "NonProd";
action(type="omfwd" Target="10.1.5.5" Port="514" Protocol="tcp" queue.type="FixedArray" template="structured_forwarding" name="relay_remote")
}

I do the rest of the parsing on the central system (it's fast enough and it
avoids bloating the messages that are relayed)

David Lang
Date: Wed, 15 Apr 2015 18:47:52 +0300
Subject: Re: [rsyslog] Best practice for an application to get structured data
to rsyslog
On Wed, Apr 15, 2015 at 6:25 PM, Dave Caplinger <
[...]
Post by David Lang
Post by David Lang
what I do is to take whatever message was output and then run mmjsonparse
against it. If it's cee JSON (insert grumble about the requirement for
the cee
Post by David Lang
cookie ;-) I have all the variables, but no $!msg field. If I have a
$!msg
Post by David Lang
field, then I parse it using mmnormalize to extract variables from it.
If there
Post by David Lang
isn't a $!msg field, I set $!msg=$mesg so that I have something I can
spit out
Post by David Lang
when I'm doing a 'plain' logfile.
I also add metadata to the JSON (fromhost-ip, received time, hostname of
relay,
Post by David Lang
and an environment tag so that later on I can trivially tell the
difference
Post by David Lang
between dev and prod copies of the same software)
We do something very similar to this, and I suspect so do other
high-volume Rsyslog users such as Radu at Sematext.
if $parsesuccess == "OK" then
...
and use different templates for JSON and non-JSON messages. For JSON ones
we use the $!all-json variable to get us all parsed properties. You could
also use the jsonmesg property to get everything (parsed + syslog
variables) but some info will be duplicated that way.
Best regards,
Radu
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Ezell, Matthew A.
2015-04-16 20:16:54 UTC
Permalink
Post by David Lang
look at liblogging, it was started as part of lumberjack for exactly this
purpose.
liblogging does not currently support structured logging. I've opened an
issue in GitHub to track that:
https://github.com/rsyslog/liblogging/issues/22
Post by David Lang
As far the idea of getting everyone to log in one format
it's a nice dream, but there's no chance of getting everyone to use the same
library
It would be good to get a replacement for the systemd sd_journal_send() call
that would send structured data to syslog, so that any apps that get modified to
do something special for systemd can have that work leveraged to work better
even without it.
liblogging talks about a 'journalemu' mode, but the code isn't in the
repository. I agree it's more likely that people will use the journal api
than something new that gets proposed.

If we want something ubiquitous, it needs to be easy to use/program to.
And ideally it would ship by default from the distros or be a trivial
library that can be pulled into a project. The journal API, despite its
Linux-only nature, seems to have the most popularity (after the venerable
syslog(3), of course).

~Matt

---
Matt Ezell
HPC Systems Administrator
Oak Ridge National Laboratory

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Loading...