[rsyslog] Are we building an ERK stack?

+1

Our current scenario (dockerized!):

imfile_forwarder-->imrelp-->rsyslog-->redis-->logstash(grok+geoip)-->elastic

We are using redis as memory buffer and to split into multiple
channels/lists (using dynakey ATM). We see kafka on the horizon.

We are also using several logstash containers to balance load, prevent
single point of failure, etc.

What we're thinking after past days messages:

imfile_forwarder-->imrelp-->rsyslog-->elastic

Having multiple rsyslog instances with simpler configs (instead of 5k
lines with thousand of rulesets, templates and so), being able to geoip,
reliable queues...

I wont dare to say it's time to review/refactor rsyslog, but

Post by Bob Gregory
There've been a few discussions over the last few days that are all
* Is it better to use Rsyslog's omelasticsearch rather than pushing to
logstash?
* Should we have a minimal log shipper component as distinct from rsyslog's
processing capabilities?
* Ought we to have an imhiredis module?
Really what we're talking about is replacing Logstash (and the various
beats) with rsyslog. I'm perfectly happy with that, Logstash is a
resource-expensive and fickle beast that spoils my otherwise pristine log
pipeline, but I do think the community ought to think about whether this is
the direction they want to take.
For my part, I'm quite happy to help build an imhiredis (and imkafka?)
module but only if I can actually dogfood it, which means replacing
Logstash in our own environment.
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.
Are we jointly interested in building the REK stack and, if so, can we
start to work out the feature set we're missing, and the documentation we'd
need for this to work? I'm a little concerned that if we tackle the usecase
piece-meal, we'll end up with lots of disjointed parts that don't really
solve the problem: logstash is not an adequate logstash.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-23 13:20:40 UTC

Post by m***@gmail.com
+1
imfile_forwarder-->imrelp-->rsyslog-->redis-->logstash(grok+geoip)-->elastic
We are using redis as memory buffer and to split into multiple
channels/lists (using dynakey ATM). We see kafka on the horizon.
We are also using several logstash containers to balance load, prevent
single point of failure, etc.
imfile_forwarder-->imrelp-->rsyslog-->elastic
Having multiple rsyslog instances with simpler configs (instead of 5k lines
with thousand of rulesets, templates and so), being able to geoip, reliable
queues...
I wont dare to say it's time to review/refactor rsyslog, but
http://youtu.be/0O5h4enjrHw

refactoring per se is not a problem, we just need to keep it in
managable pieces. We had big refactoring almost every year :-)

Rainer

Post by Bob Gregory
There've been a few discussions over the last few days that are all
* Is it better to use Rsyslog's omelasticsearch rather than pushing to
logstash?
* Should we have a minimal log shipper component as distinct from rsyslog's
processing capabilities?
* Ought we to have an imhiredis module?
Really what we're talking about is replacing Logstash (and the various
beats) with rsyslog. I'm perfectly happy with that, Logstash is a
resource-expensive and fickle beast that spoils my otherwise pristine log
pipeline, but I do think the community ought to think about whether this is
the direction they want to take.
For my part, I'm quite happy to help build an imhiredis (and imkafka?)
module but only if I can actually dogfood it, which means replacing
Logstash in our own environment.
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.
Are we jointly interested in building the REK stack and, if so, can we
start to work out the feature set we're missing, and the documentation we'd
need for this to work? I'm a little concerned that if we tackle the usecase
piece-meal, we'll end up with lots of disjointed parts that don't really
solve the problem: logstash is not an adequate logstash.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.

chenlin rao

2016-11-23 13:59:14 UTC

ERK +1, I have published my experiement at
http://www.slideshare.net/chenryn/elk-stack-at-weibocom
rsyslog-imsock
-> rsyslog-omfwd
-> rsyslog-imptcp
->
rsyslog-mmnormalize/rsyslog-mmgrok/rsyslog-mmdblookup/rsyslog-mmfields/rainerscripts...
-> rsyslog-omkafka -> kafka -> hangout -> es cluster
-> rsyslog-omprog -> python scripts -> zabbix

I had open my rsyslog-mmdblookup for geoip2 lookup, then david lang tell me
this can be done with lookup_table function. I think there should be a good
article about this great function and geoip lookup practice.

Post by m***@gmail.com
+1
imfile_forwarder-->imrelp-->rsyslog-->redis-->logstash(

grok+geoip)-->elastic

Post by m***@gmail.com
We are using redis as memory buffer and to split into multiple
channels/lists (using dynakey ATM). We see kafka on the horizon.
We are also using several logstash containers to balance load, prevent
single point of failure, etc.
imfile_forwarder-->imrelp-->rsyslog-->elastic
Having multiple rsyslog instances with simpler configs (instead of 5k

lines

Post by m***@gmail.com
with thousand of rulesets, templates and so), being able to geoip,

reliable

Post by m***@gmail.com
queues...
I wont dare to say it's time to review/refactor rsyslog, but
http://youtu.be/0O5h4enjrHw

refactoring per se is not a problem, we just need to keep it in
managable pieces. We had big refactoring almost every year :-)
Rainer

log

Post by Bob Gregory
pipeline, but I do think the community ought to think about whether this is
the direction they want to take.
For my part, I'm quite happy to help build an imhiredis (and imkafka?)
module but only if I can actually dogfood it, which means replacing
Logstash in our own environment.
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc.

etc.

Post by Bob Gregory
etc.
Are we jointly interested in building the REK stack and, if so, can we
start to work out the feature set we're missing, and the documentation we'd
need for this to work? I'm a little concerned that if we tackle the usecase
piece-meal, we'll end up with lots of disjointed parts that don't really
solve the problem: logstash is not an adequate logstash.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you

DON'T

Post by Bob Gregory
LIKE THAT.

Post by m***@gmail.com
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

m***@gmail.com

2016-11-23 14:39:57 UTC

http://www.slideshare.net/chenryn/elk-stack-at-weibocom

I NEED the english version :P
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

chenlin rao

2016-11-25 02:46:27 UTC

re-upload an english version. The content was a little old though.

Post by chenlin rao
http://www.slideshare.net/chenryn/elk-stack-at-weibocom
I NEED the english version :P
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-11-25 06:58:26 UTC

reading through the slides, a couple comments.

I've found that queue type FixedArray is slightly (but measureably) faster than
LinkedList

I suspect that the problems you were running into with slide 52 were the json-c
threading problems that have now been solved with libjsonfast

I'd be very interested in seeing speed comparisons between lookuptable and your
mmdblookup

At your log volumes, I expect that creating a sting module (sm*, C version of a
template definition) would make a noticable performance difference. We saw >10%
when we changed the default templates to C definitions.

It's a very useful slide deck. How has the 5.x version of ES changed things
there.

David Lang

Date: Fri, 25 Nov 2016 10:46:27 +0800
Subject: Re: [rsyslog] Are we building an ERK stack?
re-upload an english version. The content was a little old though.

chenlin rao

2016-11-25 07:26:39 UTC

Yes, it's a slide nearly 1.5 years ago. After that, we:

- change to use omkafka + <https://github.com/childe/hangout> instead of
omelasticsearch directly. The reason I have said in another mail days ago.
- rewrite most of mmgrok into mmnormalize+rainerscript. Except PHP slowlog
only. We want to translate the memory address of each line into "xxxxx",
but seems can't be done in rsyslog, so a mmexternal here.
- try to use streaming compress with imptcp (between shipper and rsyslog
server), the bandwidth saved about 2/3. But discard msgs in peak every
night. So roll back.

No experiments about ES5 now. The author of hangout above told me ES5.0.0
has some terrible problems(
https://github.com/elastic/elasticsearch/issues/21612
https://github.com/elastic/elasticsearch/issues/21611), so waiting for
upgrade.

Post by David Lang
reading through the slides, a couple comments.
I've found that queue type FixedArray is slightly (but measureably) faster
than LinkedList
I suspect that the problems you were running into with slide 52 were the
json-c threading problems that have now been solved with libjsonfast
I'd be very interested in seeing speed comparisons between lookuptable and
your mmdblookup
At your log volumes, I expect that creating a sting module (sm*, C version
of a template definition) would make a noticable performance difference. We
saw >10% when we changed the default templates to C definitions.
It's a very useful slide deck. How has the 5.x version of ES changed
things there.
David Lang
Date: Fri, 25 Nov 2016 10:46:27 +0800

Subject: Re: [rsyslog] Are we building an ERK stack?
re-upload an english version. The content was a little old though.

rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________

Rainer Gerhards

2016-11-25 07:39:52 UTC

Post by chenlin rao
- rewrite most of mmgrok into mmnormalize+rainerscript. Except PHP slowlog
only. We want to translate the memory address of each line into "xxxxx",
but seems can't be done in rsyslog, so a mmexternal here.

focussed question: how exactly do you detect memory address? I ask
because there is mmanon, which does something similiar to IP
addresses, and I *think* it could be extended to other objects if only
we know pricesely what to look for and how to transform it.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

chenlin rao

2016-11-25 08:17:57 UTC

No, I don't detect them, just capture them with a special regexp because I
only need to process PHP slowlog. which memory addr appear in the beginning
`\[0x\w+\]`... The use case for slow functions stack without mem addrs can
be found at the slide 25 (pie charts for nested sub terms aggs).

Post by chenlin rao

Post by chenlin rao
- rewrite most of mmgrok into mmnormalize+rainerscript. Except PHP

slowlog

Post by chenlin rao
only. We want to translate the memory address of each line into "xxxxx",
but seems can't be done in rsyslog, so a mmexternal here.

focussed question: how exactly do you detect memory address? I ask
because there is mmanon, which does something similiar to IP
addresses, and I *think* it could be extended to other objects if only
we know pricesely what to look for and how to transform it.
Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

m***@gmail.com

2016-11-25 08:46:45 UTC

Thanks!

It's your mmdblookup opensourced?

Post by chenlin rao
re-upload an english version. The content was a little old though.

chenlin rao

2016-11-26 04:50:50 UTC

https://github.com/rsyslog/rsyslog/pull/1099

Post by m***@gmail.com
Thanks!
It's your mmdblookup opensourced?
re-upload an english version. The content was a little old though.

Post by chenlin rao
http://www.slideshare.net/chenryn/elk-stack-at-weibocom

Post by m***@gmail.com
I NEED the english version :P
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-11-23 14:00:48 UTC

Post by m***@gmail.com
imfile_forwarder-->imrelp-->rsyslog-->redis-->logstash(grok+geoip)-->elastic
We are using redis as memory buffer and to split into multiple channels/lists
(using dynakey ATM). We see kafka on the horizon.
We are also using several logstash containers to balance load, prevent single
point of failure, etc.
imfile_forwarder-->imrelp-->rsyslog-->elastic
Having multiple rsyslog instances with simpler configs (instead of 5k lines
with thousand of rulesets, templates and so), being able to geoip, reliable
queues...
I wont dare to say it's time to review/refactor rsyslog, but
http://youtu.be/0O5h4enjrHw

there are probably ways to simplify the configs, 5K lines of configs seems
excessive :-) how much of this is rulebase config vs rsyslog config?

Rsyslog is designed to be fast and supports a lot of threading options for speed
(most defined implicitly by the creation of queues), so you should not need to
have lots of different instances.

I've had single instances of rsyslog processing 100K messages/sec in real-world
use, and people have benchmarked rsyslog with simple configs at over 1M
messages/sec in a VM

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-23 14:51:15 UTC

Post by David Lang
there are probably ways to simplify the configs, 5K lines of configs
seems excessive :-) how much of this is rulebase config vs rsyslog config?

Each app generates app-access.log, app-tomcat.log, app-application.log
files. imfile allow me to add filename as metadata, but nothing more.
As each application belongs to a workgroup, part of an organizational
unit and is running on some (multiple) hosts, at the end I have
approximately this for each app:

template(name="json_appX" type="list") {
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=" {")
constant(value="\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",")
constant(value="\",\"file\":\"")
property(name="$!metadata!filename")
constant(value="\",\"msg\":\"")
property(name="msg" format="jsonr")
constant(value="\"}")
}
ruleset(name="json_appX") {
action(
template="json_appX"
type="omrelp"
target="server"
port="20514"
action.resumeRetryCount="-1"
action.reportSuspension="on"
queue.maxdiskspace="5M"
queue.type="LinkedList"
queue.filename="appX.qi"
queue.SaveOnShutdown="on"
)
}
input(type="imfile" file="/logs/appX/access.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/tomcat.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/application.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")

which becomes 5K lines of config file.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-23 14:57:13 UTC

Would the capability to add metadata in imfile input help? That would be
easy to add. If not, what would give you the Mets data?

Rainer

Sent from phone, thus brief.

Post by David Lang
there are probably ways to simplify the configs, 5K lines of configs seems

Post by David Lang
excessive :-) how much of this is rulebase config vs rsyslog config?

Each app generates app-access.log, app-tomcat.log, app-application.log
files. imfile allow me to add filename as metadata, but nothing more.
As each application belongs to a workgroup, part of an organizational unit
and is running on some (multiple) hosts, at the end I have approximately
template(name="json_appX" type="list") {
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=" {")
constant(value="\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",")
constant(value="\",\"file\":\"")
property(name="$!metadata!filename")
constant(value="\",\"msg\":\"")
property(name="msg" format="jsonr")
constant(value="\"}")
}
ruleset(name="json_appX") {
action(
template="json_appX"
type="omrelp"
target="server"
port="20514"
action.resumeRetryCount="-1"
action.reportSuspension="on"
queue.maxdiskspace="5M"
queue.type="LinkedList"
queue.filename="appX.qi"
queue.SaveOnShutdown="on"
)
}
input(type="imfile" file="/logs/appX/access.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/tomcat.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/application.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
which becomes 5K lines of config file.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

m***@gmail.com

2016-11-23 15:05:33 UTC

Of course it will help.

Let me take the risk: is there a rsyslog wiki where I could start
documenting what ERK should look like? :P

markdown is mandatory.

Post by Rainer Gerhards
Would the capability to add metadata in imfile input help? That would be
easy to add. If not, what would give you the Mets data?
Rainer
Sent from phone, thus brief.

Post by David Lang
there are probably ways to simplify the configs, 5K lines of configs seems

Post by David Lang
excessive :-) how much of this is rulebase config vs rsyslog config?

Each app generates app-access.log, app-tomcat.log, app-application.log
files. imfile allow me to add filename as metadata, but nothing more.
As each application belongs to a workgroup, part of an organizational unit
and is running on some (multiple) hosts, at the end I have approximately
template(name="json_appX" type="list") {
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=" {")
constant(value="\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",")
constant(value="\",\"file\":\"")
property(name="$!metadata!filename")
constant(value="\",\"msg\":\"")
property(name="msg" format="jsonr")
constant(value="\"}")
}
ruleset(name="json_appX") {
action(
template="json_appX"
type="omrelp"
target="server"
port="20514"
action.resumeRetryCount="-1"
action.reportSuspension="on"
queue.maxdiskspace="5M"
queue.type="LinkedList"
queue.filename="appX.qi"
queue.SaveOnShutdown="on"
)
}
input(type="imfile" file="/logs/appX/access.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/tomcat.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/application.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
which becomes 5K lines of config file.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

Rainer Gerhards

2016-11-23 15:17:38 UTC

wiki.rsyslog.com

But I think it has not been updated for a while. You may need an account,
if so, let me know.

I am not sure if the wiki is the best place to do it. We think about
retiring it for a while, it was mainly a spam dump...

While I haven't tried it, a GitHub wiki might be better, especially from a
visibility pov. I could enable it if there are no objections. I don't know
though how granular the GitHub premissons are.

Rainer

Sent from phone, thus brief.

Post by m***@gmail.com
Of course it will help.
Let me take the risk: is there a rsyslog wiki where I could start
documenting what ERK should look like? :P
markdown is mandatory.

Post by Rainer Gerhards
Would the capability to add metadata in imfile input help? That would be
easy to add. If not, what would give you the Mets data?
Rainer
Sent from phone, thus brief.
there are probably ways to simplify the configs, 5K lines of configs seems

Post by David Lang
excessive :-) how much of this is rulebase config vs rsyslog config?
Each app generates app-access.log, app-tomcat.log, app-application.log

files. imfile allow me to add filename as metadata, but nothing more.
As each application belongs to a workgroup, part of an organizational unit
and is running on some (multiple) hosts, at the end I have approximately
template(name="json_appX" type="list") {
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=" {")
constant(value="\"group\":\"group\","\"unit\":\"unit\",\"app
\":\"appX\",")
constant(value="\",\"file\":\"")
property(name="$!metadata!filename")
constant(value="\",\"msg\":\"")
property(name="msg" format="jsonr")
constant(value="\"}")
}
ruleset(name="json_appX") {
action(
template="json_appX"
type="omrelp"
target="server"
port="20514"
action.resumeRetryCount="-1"
action.reportSuspension="on"
queue.maxdiskspace="5M"
queue.type="LinkedList"
queue.filename="appX.qi"
queue.SaveOnShutdown="on"
)
}
input(type="imfile" file="/logs/appX/access.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/tomcat.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/application.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
which becomes 5K lines of config file.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-11-23 15:37:07 UTC

Post by Rainer Gerhards
wiki.rsyslog.com
But I think it has not been updated for a while. You may need an account,
if so, let me know.
I am not sure if the wiki is the best place to do it. We think about
retiring it for a while, it was mainly a spam dump...
While I haven't tried it, a GitHub wiki might be better, especially from a
visibility pov. I could enable it if there are no objections. I don't know
though how granular the GitHub premissons are.

anything that's on the old rsyslog wiki is rather out of date at this point. I
think turning on the github wiki and trying it would be a good move.

David Lang

Post by Rainer Gerhards
Rainer
Sent from phone, thus brief.

Post by m***@gmail.com
Of course it will help.
Let me take the risk: is there a rsyslog wiki where I could start
documenting what ERK should look like? :P
markdown is mandatory.

Post by David Lang
excessive :-) how much of this is rulebase config vs rsyslog config?
Each app generates app-access.log, app-tomcat.log, app-application.log

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-11-23 15:28:45 UTC

Post by David Lang
there are probably ways to simplify the configs, 5K lines of configs seems
excessive :-) how much of this is rulebase config vs rsyslog config?

Each app generates app-access.log, app-tomcat.log, app-application.log files.
imfile allow me to add filename as metadata, but nothing more.
As each application belongs to a workgroup, part of an organizational unit
and is running on some (multiple) hosts, at the end I have approximately this
template(name="json_appX" type="list") {
property(name="hostname")
constant(value=" ")
property(name="syslogtag")
constant(value=" {")
constant(value="\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",")
constant(value="\",\"file\":\"")
property(name="$!metadata!filename")
constant(value="\",\"msg\":\"")
property(name="msg" format="jsonr")
constant(value="\"}")
}
ruleset(name="json_appX") {
action(
template="json_appX"
type="omrelp"
target="server"
port="20514"
action.resumeRetryCount="-1"
action.reportSuspension="on"
queue.maxdiskspace="5M"
queue.type="LinkedList"
queue.filename="appX.qi"
queue.SaveOnShutdown="on"
)
}
input(type="imfile" file="/logs/appX/access.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/tomcat.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")
input(type="imfile" file="/logs/appX/application.log" tag="group/appX"
addMetadata="on" ruleset="json_appX" PersistStateInterval="1")

you should be able to collapse all the different templates into one. Instead of
hard-coding the group/unit/app in each template, you should have that be a
variable that you set.

The ugly way to do this would be a series of

if $programname = "group/appX" then set $.owner = "\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",";

statements.

a far more elegant way to do this would be to do a table lookup on the
programname and have it return the string.

you can also simplify the template a bit. Instead of crafting the json in the
template, create a variable that has what you want in/under it and output that
variable. but compared to collapsing all the templates together, that's a minor
change :-)

I question the value of having a separate sending queue for each app. I think
it's better to send them in one combined firehose and split them on the
receiving side. It makes it less disruptive when you find you want to change the
groupings of things and all those queues on the sender can eat up a lot of ram.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-23 17:39:04 UTC

Post by David Lang
The ugly way to do this would be a series of
if $programname = "group/appX" then set $.owner =
"\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",";

do having multiple templates affect performance? (what I really noticed
it's they affect loading time!)

Post by David Lang
a far more elegant way to do this would be to do a table lookup on the
programname and have it return the string.

I have readed about how lookup tables can be used for geoIP. Could you
provide a link to doc where there's an example?

Post by David Lang
you can also simplify the template a bit. Instead of crafting the json
in the template, create a variable that has what you want in/under it
and output that variable. but compared to collapsing all the templates
together, that's a minor change :-)

one variable for each file and one template which use it, isnt it?

Post by David Lang
I question the value of having a separate sending queue for each app.
I think it's better to send them in one combined firehose and split
them on the receiving side. It makes it less disruptive when you find
you want to change the groupings of things and all those queues on the
sender can eat up a lot of ram.

Probably this is because i came from redis.
Talking about elastic, probably ingest node would be the best option,
while having index name as metadata.

I'll have an eye on that too.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-11-23 18:33:49 UTC

Post by David Lang
The ugly way to do this would be a series of
if $programname = "group/appX" then set $.owner =
"\"group\":\"group\","\"unit\":\"unit\",\"app\":\"appX\",";

do having multiple templates affect performance? (what I really noticed it's
they affect loading time!)

not really, but we haven't had anyone experiment with thousands of them, so it's
possible, but unlikely that there would be a measureable slowdown as rsyslog
finds the right one to use.

The bigger overhead is in interpreting the template, that's where simplifying it
to be $! or $!foo would be a big win (or writing a string module)

Post by David Lang
a far more elegant way to do this would be to do a table lookup on the
programname and have it return the string.

I have readed about how lookup tables can be used for geoIP. Could you
provide a link to doc where there's an example?

there isn't a good writeup, but if you read on how to use the maxmind database,
the perl example has you create an array where the first element is the decimal
equivalent of the first IP address that matches the data.

This is exactly the structure that a sparse array lookup table is intended for.
I beleive there is a function that will take an IPv4 address and return a
decimal number (if not, we need to add one). Use that function to create a
number, lookup the number in the lookup table, and have it return the data.

Post by David Lang
you can also simplify the template a bit. Instead of crafting the json in
the template, create a variable that has what you want in/under it and
output that variable. but compared to collapsing all the templates
together, that's a minor change :-)

one variable for each file and one template which use it, isnt it?

$!foo!bar = "abc" maps to {"foo": { "bar": "abc" } } in json and if you put
%$!foo% in a template, what you will get is '{ "bar": "abc" }'

so where you had group, unit, app, msg, and a couple other things, and then
combined them with {}," into a json string, you could instead do

set $!foo!group = "A";
set $!foo!unit = "b";
set $!foo!msg = $!msg;
...

and then replace all that hard-to-read json construction in the template with
$!foo

Post by David Lang
I question the value of having a separate sending queue for each app. I
think it's better to send them in one combined firehose and split them on
the receiving side. It makes it less disruptive when you find you want to
change the groupings of things and all those queues on the sender can eat
up a lot of ram.

Probably this is because i came from redis.

That's what I'm thinking. With logstash you are forced to use something external
for queueing and lots of separate instances (and separate parser sets) or things
just don't work well.

With rsyslog, the performance is 100-1000x as fast, and a lot of the stuff is
built-in, so you don't need to split things up as much, and the reduction in the
communications overhead adds to your wins.

Talking about elastic, probably ingest node would be the best option, while
having index name as metadata.

There are two approaches, and I haven't tried them under fire on a ES cluster to
know which is the best.

dedicate a node to ingest the data

spread the traffic across many different nodes and have a local copy of rsyslog
receive the data and push it into the local ES instance.

I suspect that properly managed, a dedicated injest node will be a win.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-24 15:41:42 UTC

Post by David Lang
not really, but we haven't had anyone experiment with thousands of
them, so it's possible, but unlikely that there would be a measureable
slowdown as rsyslog finds the right one to use.
The bigger overhead is in interpreting the template, that's where
simplifying it to be $! or $!foo would be a big win (or writing a
string module)

Memory went above 5GB for our first dirty try (several rulesets, several
queues...). I'll change that soon.

Post by David Lang
there isn't a good writeup, but if you read on how to use the maxmind
database, the perl example has you create an array where the first
element is the decimal equivalent of the first IP address that matches
the data.
This is exactly the structure that a sparse array lookup table is
intended for. I beleive there is a function that will take an IPv4
address and return a decimal number (if not, we need to add one). Use
that function to create a number, lookup the number in the lookup
table, and have it return the data.

The second paragraph is correct, however I haven't used them yet in
rsyslog. I'll document them then.

Thanks a lot, David, for your kind help, experienced comments and wise
advice.
You deserve another prize ;)
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-11-24 16:50:27 UTC

Post by David Lang
not really, but we haven't had anyone experiment with thousands of them, so
it's possible, but unlikely that there would be a measureable slowdown as
rsyslog finds the right one to use.
The bigger overhead is in interpreting the template, that's where
simplifying it to be $! or $!foo would be a big win (or writing a string
module)

Memory went above 5GB for our first dirty try (several rulesets, several
queues...). I'll change that soon.

probably all the queues.

David Lang
kk
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-23 13:19:25 UTC

I am really extremely interested in this proposal and would appreciate
if we could go forward with it. Just let me explain my situation a
bit,which hopefully helps to understand how I act and what are my
limits. I don't like disappointed people, and so I think talking about
limits is essential to get to an agreement. Sorry that the posting is
a bit length!

I am with Adiscon, and Adiscon still sponsors most of the development
for rsyslog. Adiscon is a very small shop (less than 10 folks) and we
do have a big budget. That's fine with all of us, as we do not aim at
getting rich but aim at having a satisfactory and happy life, which is
unequal to being rich in our PoV ;) We still need to pay bills, and so
we a) sell closed-source Windows products and b) sell consulting and
support contracts.

Rsyslog revenue is small, it typically (barely) funds me and half a
support engineer. I put in quite a bit of my free time as I am
personally interested in this project. Besides rsyslog, I also have
some other appointments, for example I am currently working towards
two academic research projects, where one is targeted towards logging.

Development-wise, this boils down to me being the development
ressource, and often not at 100%. If we receive sponsored or custom
work, I can add development ressources inside Adiscon, so this
actually increases development capability.

More important is that Adiscon does not monetize rsyslog in any other
way: we do not sell appliances, we do not offer logging as a service
and we do not run a large network that we monitor with rsyslog. We
really do one thing (development and support for rsyslog) and we do
that thing well.

Among others, this means we do not have need for Kibana, redis, kafka,
... So we also do not use it. So we do not know it. And learning
*everything* just to develop rsyslog is out of reach giving the
ressources we have.

So far the reality check. The good news is the rsyslog community. It
may not be the fastest growing open source community on earth, but it
is very healthy and very knowledgable. And we have seen good, quality
growth especially in the past two years. We have a lot of different
talents, and we have folks that actually use all these subsystems that
Adiscon doesn't even know before someone asked a question.

As a community, I think we can make the ERK stack a reality. I am very
open to changing things, and rsyslog has been refactored more than
once since it's inception. Another round is not a problem.

If the community helps to shape what actually *needs* to be done
(leaving out the "nice to have" to go to a doable workload), and if
some folks inside the community help to implement it, I think we can
come very far, and can even do so quickly. What is now hopefully
obvious from my initial remarks is that I *alone* cannot do all of the
big hauling. But again, we had great contributions and we have great
contributors! So, yes we can ;-)

For example and to be honest, I frankly admit that I didn't know about
Riemann until 10 minutes ago. So developing any integration into it
will take a lot of time first learning and understanding how it works.
This usually is prohibitive expensive for me to do. If, however, we
have someone who already knows the ins and outs, we can either work
together on getting something done (with me doing the rsyslog bits),
or I can educate that person to know the bare minimum required to
integrate into rsyslog. Rsyslog integration is not very hard if you do
not insist on knowing every detail. And I can fine-tune it afterwards.
But it must be a team effort, for any one person, learning the "other
part" is probably too time consuming.

This is why I mean we need to act as a community.

If we can form such (virtual) teams, I would be extremely interested
in participating and moving rsyslog forward towards new goals. I think
I may even get Adiscon to put in some extra effort for a while. And I
personally would find such a community effort uber-cool ;-)

What do you think?

Sorry again for the long posting,
Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-11-23 13:55:22 UTC

if you look at the graphic on the main page of rsyslog.com you see that we have
a very large number of inputs and outputs. We already have omelasticsearch, and
onhiredis, adding an imhiredis just adds symetry to things and is not a large
deviation

Rsyslog is a log processing engine that accepts logs from many sources and
delivers them to many destinations, the more sources and destinations we support
the better.

Post by Bob Gregory
For my part, I'm quite happy to help build an imhiredis (and imkafka?)
module but only if I can actually dogfood it, which means replacing
Logstash in our own environment.

good, we are aiming to make that not only possible, but a generally accepted
practice :-)

Post by Bob Gregory
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.

for GeoIP tagging, take a look at the table lookup capability. It was designed
with the maxmind GeoIP database in mind.

what do you mena by a Riemann output plugin

Post by Bob Gregory
Are we jointly interested in building the REK stack and, if so, can we
start to work out the feature set we're missing, and the documentation we'd
need for this to work? I'm a little concerned that if we tackle the usecase
piece-meal, we'll end up with lots of disjointed parts that don't really
solve the problem: logstash is not an adequate logstash.

We are always interested in expanding rsyslog to fill in gaps in routing and
formatting logs, we try to avoid getting involved in analyzing and summarizing
logs (but do a bit of that), leaving that job for other tools.

Please do list the things you think are missing.

Documentation is always needed. Unfortunantly, too many of us deep in the guts
of rsyslog are bad at writing docs.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-23 17:19:24 UTC

Hi all

In order to improve the first draft of ERK project, I would like to get
some feedback from you.

What features are you missing/you think it may be improved in rsyslog?

Please, try to be as more clear/self-explanatory/simple as you can for
better understanding.

* logstash memory footprint is quite high compared to rsyslog,
although both "doing the same".
* rsyslog configuration can't be reloaded live
* dynamic variables (calculated on each message processing) aren't
supported on templates
* combine multiple variables into one to build a "date" field isn't
possible

Regards

PS: those with deep knowledge, please, start thinking how you'll solve
them...

David Lang

2016-11-23 18:16:14 UTC

In order to improve the first draft of ERK project, I would like to get some
feedback from you.
What features are you missing/you think it may be improved in rsyslog?
Please, try to be as more clear/self-explanatory/simple as you can for better
understanding.
* logstash memory footprint is quite high compared to rsyslog,
although both "doing the same".

that's not something to fix in rsyslog :-)

* rsyslog configuration can't be reloaded live

true

* dynamic variables (calculated on each message processing) aren't
supported on templates

false. that's what templates do. You can use any variable in a template.

* combine multiple variables into one to build a "date" field isn't
possible

you can combine variables to form a string that looks like a date in the output,
but you can't take arbitrary date parts in a log message and parse them into a
real timestamp field that would let you output it in different formats.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-23 18:44:11 UTC

Post by David Lang
you can combine variables to form a string that looks like a date in
the output, but you can't take arbitrary date parts in a log message
and parse them into a real timestamp field that would let you output
it in different formats.

back on my pipeline proposal, wouldn't this solve the issue?
pipeline {
input()
processor() //extract %year%,%month%,%day%
processor() //merge "%year%:%month%:%day%" as date type
property/field
output()
}

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-11-23 18:51:33 UTC

Post by David Lang
you can combine variables to form a string that looks like a date in the
output, but you can't take arbitrary date parts in a log message and parse
them into a real timestamp field that would let you output it in different
formats.

you don't need to invent pipelines and change how rsyslog processes things, you
need need to add the merge function.

The problem is the fact that there are so many ways timestamp data can be
scattered in a log message. take a look at the output of date --help and look at
all the formatting options. I guarantee that some log somewhere will use every
one of them.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-23 18:54:58 UTC

Merge Looks like String concat, which I think we support (but I may be
wrong).

Sent from phone, thus brief.

Post by David Lang
you can combine variables to form a string that looks like a date in the

Post by David Lang
output, but you can't take arbitrary date parts in a log message and parse
them into a real timestamp field that would let you output it in different
formats.

you don't need to invent pipelines and change how rsyslog processes
things, you need need to add the merge function.
The problem is the fact that there are so many ways timestamp data can be
scattered in a log message. take a look at the output of date --help and
look at all the formatting options. I guarantee that some log somewhere
will use every one of them.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-11-23 19:08:44 UTC

string cat is the simple part, the problem is then being able to treat the
result as a real timestamp (including outputting it in different formats)

there was a small thread on this today.

David Lang

Post by Rainer Gerhards
Merge Looks like String concat, which I think we support (but I may be
wrong).
Sent from phone, thus brief.

Post by David Lang
you can combine variables to form a string that looks like a date in the

Post by David Lang
output, but you can't take arbitrary date parts in a log message and parse
them into a real timestamp field that would let you output it in different
formats.

you don't need to invent pipelines and change how rsyslog processes
things, you need need to add the merge function.
The problem is the fact that there are so many ways timestamp data can be
scattered in a log message. take a look at the output of date --help and
look at all the formatting options. I guarantee that some log somewhere
will use every one of them.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

m***@gmail.com

2016-11-23 18:58:15 UTC

Post by David Lang
The problem is the fact that there are so many ways timestamp data can
be scattered in a log message. take a look at the output of date
--help and look at all the formatting options. I guarantee that some
log somewhere will use every one of them.

IIRC, you had found a solution to this...
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-23 19:38:46 UTC

Working, spamming mail list and writing on wiki at the same time. A
lovely afternoon...

Please, add your lines: https://github.com/rsyslog/rsyslog/wiki
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Bob Gregory

2016-11-24 11:52:03 UTC

https://io.made.com/blog/rek-it/

I wrote this up earlier.

Post by m***@gmail.com
Working, spamming mail list and writing on wiki at the same time. A
lovely afternoon...
Please, add your lines: https://github.com/rsyslog/rsyslog/wiki
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

m***@gmail.com

2016-11-24 12:08:01 UTC

Doing the **same** here.

Currently I'm dealing with https://github.com/rsyslog/rsyslog/issues/625
in order to have "one configuration file for each application", and
copying them to rsyslog.d directory.

As we are concerned about high availability and load balancing, we plan
to deploy multiple instances.
Still pending to decide if RELP->ES is done by the same rsyslog process
or spplited in several stages.

Any discussion is much appreciated and highly valuable :)

Post by Bob Gregory
https://io.made.com/blog/rek-it/
I wrote this up earlier.

David Lang

2016-11-24 15:22:56 UTC

As we are concerned about high availability and load balancing, we plan to
deploy multiple instances.

just a note that while rsyslog doesn't implement load balancing itself, it has
features to support load balancing environments, so you pick the load balancer
you want on the receiving end and have rsyslog disconnect every X messages to
give the load balancer a chance to work.

I think this only works if you do IP based load balancing, rather than DNS based
load balancing (especially as so many systems now run a caching DNS locally)

Personally, I use corosync (clusterlabs.org) but you can also use haproxy, lvs,
or a commercial load balancer like f5

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-11-24 15:42:37 UTC

or docker swarm mode :D

Post by m***@gmail.com
As we are concerned about high availability and load balancing, we
plan to deploy multiple instances.

just a note that while rsyslog doesn't implement load balancing
itself, it has features to support load balancing environments, so you
pick the load balancer you want on the receiving end and have rsyslog
disconnect every X messages to give the load balancer a chance to work.
I think this only works if you do IP based load balancing, rather than
DNS based load balancing (especially as so many systems now run a
caching DNS locally)
Personally, I use corosync (clusterlabs.org) but you can also use
haproxy, lvs, or a commercial load balancer like f5
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.

matthew.gaetano

2016-11-24 23:21:29 UTC

Using the Rebind Interval and TCP load balancing has its limits. The higher
the volume and velocity the harder it becomes too balance overall. The
interval can not be too high as it risks overloading a single node in a
cluster. The interval can not be too low as it risks overhead with the
opening and closing of TCP connections.

Message based load balancing would present a more uniform spread amongst a
clustered destination. It would also mean not having to reset the TCP
connections as often, or at all. This is where message broker applications
like Redis or Kafka come into play.

-----
~Regards

Matthew Gaetano
--
View this message in context: http://rsyslog-users.1305293.n2.nabble.com/Are-we-building-an-ERK-stack-tp7591564p7591672.html
Sent from the rsyslog-users mailing list archive at Nabble.com.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-11-24 23:38:36 UTC

Post by matthew.gaetano
Using the Rebind Interval and TCP load balancing has its limits. The higher
the volume and velocity the harder it becomes too balance overall. The
interval can not be too high as it risks overloading a single node in a
cluster. The interval can not be too low as it risks overhead with the
opening and closing of TCP connections.

in theory yes, in practice, I'm not so sure. I've had no problems using the
rebind interval process at over 100K messages/sec load balanced across 20
machines.

Now, I did this by running rsyslog on the destination machines and then having
it deliver the messages to the local process that were the final destination.
Rsyslog was easily able to receive and buffer the bursts.

I aim to have the rebind interval for N destinations be ~1/N to 1/2n seconds

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

matthew.gaetano

2016-11-24 23:45:10 UTC

Which would be a lot easier to do if we didn't have to rely solely on message
count to delimit time or percentages.

-----
~Regards

Matthew Gaetano
--
View this message in context: http://rsyslog-users.1305293.n2.nabble.com/Are-we-building-an-ERK-stack-tp7591564p7591674.html
Sent from the rsyslog-users mailing list archive at Nabble.com.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-25 06:21:08 UTC

Post by matthew.gaetano
Which would be a lot easier to do if we didn't have to rely solely on message
count to delimit time or percentages.

IMHO it would be an interesting experiment to create a queue mode
"redis" or "kafka". Given everything that's going on right now, there
almost for sure is not time to do the experiment, but that's something
a thought abot for a while (0mq might be an even better choice).

I just couldn't stand dispense that idea ;-)

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-24 12:27:25 UTC

Post by Bob Gregory
https://io.made.com/blog/rek-it/
I wrote this up earlier.

very good! Love to see the work coming in and participate in the effort!

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-11-24 16:42:55 UTC

I added a project to rsyslog on github, where we can bind Issues to:

https://github.com/rsyslog/rsyslog/projects/1

I guess I must make the relationship, just let me know what you think
qualifies whenever you open something new.

Rainer

Post by Rainer Gerhards

Post by Bob Gregory
https://io.made.com/blog/rek-it/
I wrote this up earlier.

very good! Love to see the work coming in and participate in the effort!
Rainer

m***@gmail.com

2016-11-24 16:55:46 UTC

shouldn't that work for documentation? (as commented previously, I would
love to have 1 repo!)

Post by Rainer Gerhards
https://github.com/rsyslog/rsyslog/projects/1
I guess I must make the relationship, just let me know what you think
qualifies whenever you open something new.

Maybe you could match project to milestones, but I think projects are
"wider".

BTW: With Bob's approval, I think a good application name can be TREK
(trekkies could become angry :P)

Post by Rainer Gerhards
Rainer

Post by Rainer Gerhards

Post by Bob Gregory
https://io.made.com/blog/rek-it/
I wrote this up earlier.

very good! Love to see the work coming in and participate in the effort!
Rainer

m***@gmail.com

2016-12-01 18:49:10 UTC

Hi Bob.

Today we finally found some time to have an eye on our
rsyslog-normalizer-indexer which uses omelasticsearch

According to
http://www.rsyslog.com/doc/v8-stable/configuration/modules/omelasticsearch.html
indexing parameter *errorfile* helps to store failed indexing attempts.

How do you handle those errors?
We are thinking on

* setting errorfile=file
* imfile ruleset=omelasticsearch
* elastic template like: {index="errors" msg="rawmsg" }, and keep an
eye on that

What do you think?

Post by Bob Gregory
https://io.made.com/blog/rek-it/
I wrote this up earlier.

David Lang

2016-12-01 22:08:25 UTC

Post by m***@gmail.com
Hi Bob.
Today we finally found some time to have an eye on our
rsyslog-normalizer-indexer which uses omelasticsearch
According to
http://www.rsyslog.com/doc/v8-stable/configuration/modules/omelasticsearch.html
indexing parameter *errorfile* helps to store failed indexing attempts.
How do you handle those errors?
We are thinking on
* setting errorfile=file
* imfile ruleset=omelasticsearch
* elastic template like: {index="errors" msg="rawmsg" }, and keep an
eye on that
What do you think?

I think that you are going to end up with some grief, if the message could not
be insterted into ES for some reason, I think the odds are good that you will
find that rawmsg can't be inserted either.

I would keep the errorfile as a file and look at it periodially. I expect that
when you first start things up, you will run into a number of errors, but once
you work your way though them, the error rate will be low.

Set your monitoring system to monitor the size of the errorfile, and it it
starts growing significantly, generate an alert.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

m***@gmail.com

2016-12-02 08:35:47 UTC

Post by David Lang
I think that you are going to end up with some grief, if the message
could not be insterted into ES for some reason, I think the odds are
good that you will find that rawmsg can't be inserted either.

After sending the email I though the same...

Post by David Lang
I would keep the errorfile as a file and look at it periodially. I
expect that when you first start things up, you will run into a number
of errors, but once you work your way though them, the error rate will
be low.
Set your monitoring system to monitor the size of the errorfile, and
it it starts growing significantly, generate an alert.

Would love to have a more unattended/XXth century way, if anyone knows.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Bob Gregory

2016-12-02 08:40:57 UTC

You may well be able to insert the rejected log into a different index.
Most of our failed logs are down to a mismatch between the mapping config
and the fields in json logs.

An error index that treats the whole message as a single blob should work
fine.

After sending the email I though the same...

Would love to have a more unattended/XXth century way, if anyone knows.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-12-02 08:43:41 UTC

Post by Bob Gregory
You may well be able to insert the rejected log into a different index.
Most of our failed logs are down to a mismatch between the mapping config
and the fields in json logs.
An error index that treats the whole message as a single blob should work
fine.

Bob Gregory

2016-12-02 09:16:47 UTC

I'm not sure that's true in the general case.

Of the errors I've had with our elk stack, upward of 95% have been caused
by type errors (json field should be an int but is an object); some small
handful have failed because a message was truncated somewhere asking the
line; a smaller number have failed because somebody hand-crafted json and
forgot about a trailing comma or quote.
Overwhelmingly, the data aren't corrupted: they were invalid at source in a
way that would still allow them to be read as plain Unicode strings.

Obviously I accept that given enough data, I'll see more interesting
failure modes that need more thought, but reading from the errorfile and
pushing to a separate error index would work very well in our environment.

what bytes would need to be escaped?

what if it's invalid unicode junk, etc.

almost by definition we are talking about corrupt data.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-12-02 09:31:22 UTC

Post by Bob Gregory
I'm not sure that's true in the general case.
Of the errors I've had with our elk stack, upward of 95% have been caused
by type errors (json field should be an int but is an object); some small
handful have failed because a message was truncated somewhere asking the
line; a smaller number have failed because somebody hand-crafted json and
forgot about a trailing comma or quote.
Overwhelmingly, the data aren't corrupted: they were invalid at source in a
way that would still allow them to be read as plain Unicode strings.
Obviously I accept that given enough data, I'll see more interesting
failure modes that need more thought, but reading from the errorfile and
pushing to a separate error index would work very well in our environment.

I get _really_ nervous about even low probability failure modes in my failure
paths. Murphy likes me too much :-)

doing it your way, you still have the failedlog messages from your failure path
that you will need to monitor, so you have reduced the scope of the problem, but
still have the same basic problem.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Rainer Gerhards

2016-12-02 10:01:07 UTC

FYI: the original intent of the error file was to provide errors in a
way that makes it easy to (semi?) automatically handle them via a
different procedure (which my re-inject them once the problem has been
solved).

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

mostolog--- via rsyslog

2016-12-15 11:32:51 UTC

Hi

At this moment we are frowarding RELP messages to Elasticsearch using
omelasticsearch plugin, but sadly message appears as json instead of
storing each properties. eg: message is { "app": "app1"... instead of
indexed document having a app property.

Should we specify an especial param on rsyslog, a setting on elastic...?

Regards
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

mostolog--- via rsyslog

2016-12-15 12:58:43 UTC

Solved using json template (code blindness).

Is there any way to set fields and use them (like @timestamp) but not
indexing them on elastic? (hidden fields)

Post by mostolog--- via rsyslog
Hi
At this moment we are frowarding RELP messages to Elasticsearch using
omelasticsearch plugin, but sadly message appears as json instead of
storing each properties. eg: message is { "app": "app1"... instead of
indexed document having a app property.
Should we specify an especial param on rsyslog, a setting on elastic...?
Regards

Brian Knox via rsyslog

2016-12-15 13:23:44 UTC

I noticed looking through the code that it looks like the error file
routine in omelasticsearch is not tied into the stats system - we use
impstats to monitor our rsyslog pipelines, and having a counter for write
errors would be super useful.

I've submitted a PR to add the counter:
https://github.com/rsyslog/rsyslog/pull/1331

Cheers,
Brian

On Thu, Dec 15, 2016 at 7:58 AM mostolog--- via rsyslog <

Post by mostolog--- via rsyslog
Solved using json template (code blindness).
indexing them on elastic? (hidden fields)

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-12-16 09:27:39 UTC

Post by mostolog--- via rsyslog
Solved using json template (code blindness).
indexing them on elastic? (hidden fields)

This is exactly why we have $. variables as well as $! variables. They work
exactly the same, but by convention, $! variables are where you put things that
you are going to want to send elsewhere, and $. variables are where you put
things that you need to create for your internal logic, templates, etc but don't
want to send to the destinatino as part of your log content

if you get something that you don't want to send, you can unset $!foo; to remove
it from the $! set of data.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

mostolog--- via rsyslog

2016-12-16 12:29:32 UTC

Post by David Lang
This is exactly why we have $. variables as well as $! variables. They
work exactly the same, but by convention, $! variables are where you
put things that you are going to want to send elsewhere, and $.
variables are where you put things that you need to create for your
internal logic, templates, etc but don't want to send to the
destinatino as part of your log content
if you get something that you don't want to send, you can unset $!foo;
to remove it from the $! set of data.

I didn't know that (if ever read, I forgot).
I'll document that on filters.rst
:P

Still, I'm having some issues with @timestamp. I'll let you know if we
found any problem.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-11-23 18:18:32 UTC

Post by Bob Gregory
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.

With a bit of digging, I can't find where Riemann defines what the over-the-wire
format is that you would need to deliver logs to it.

I see hints that it uses protobuf to serialize things, and has an
application-level ack mechanism similar to what we have in relp, but the levels
of indirection are stacked high, and the API documenation only points you at the
function defintions.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Bob Gregory

2016-11-23 18:32:49 UTC

I can easily enough knock together an omriemann - it's protobuf over TCP or
UDP. TCP allows for message ack.

There are a couple of C clients that are useful as prior art, and I've
worked with a bunch of clients in python, haskell and golang.

Post by Bob Gregory
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.

With a bit of digging, I can't find where Riemann defines what the over-the-wire
format is that you would need to deliver logs to it.
I see hints that it uses protobuf to serialize things, and has an
application-level ack mechanism similar to what we have in relp, but the levels
of indirection are stacked high, and the API documenation only points you at the
function defintions.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

David Lang

2016-11-23 18:36:18 UTC

Post by Bob Gregory
I can easily enough knock together an omriemann - it's protobuf over TCP or
UDP. TCP allows for message ack.
There are a couple of C clients that are useful as prior art, and I've
worked with a bunch of clients in python, haskell and golang.

if there is a C client with an ASL2 compatible license, you could probably
cut-n-paste your way to making it work. look at the omrelp module to get one
that properly handles acks, batches, and encryption.

how many messages can riemann have outstanding (received but not acked) at one
time?

David Lang

Post by Bob Gregory

Post by Bob Gregory
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.

With a bit of digging, I can't find where Riemann defines what the over-the-wire
format is that you would need to deliver logs to it.
I see hints that it uses protobuf to serialize things, and has an
application-level ack mechanism similar to what we have in relp, but the levels
of indirection are stacked high, and the API documenation only points you at the
function defintions.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

m***@gmail.com

2016-11-23 18:39:08 UTC

As main promoter (ring the bell and run like hell), could you make some
tests comparing filebeat vs imfile performance and footprint?

Post by Bob Gregory
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.

With a bit of digging, I can't find where Riemann defines what the over-the-wire
format is that you would need to deliver logs to it.
I see hints that it uses protobuf to serialize things, and has an
application-level ack mechanism similar to what we have in relp, but the levels
of indirection are stacked high, and the API documenation only points you at the
function defintions.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

Adam Williams

2016-11-25 19:40:11 UTC

That would be pretty great!

We have been for a couple of years sending messages to Riemann by having
omprog start up a Ruby script that basically looks like this:

```
def process_log_entries(io, &block)
until io.eof?
process_log_entry(io.gets.chomp, &block)
end
end

require 'riemann/client'
riemann = Riemann::Client.new(host: 'localhost', port: 5555, timeout: 5)

process_log_entries($stdin) do |event|
riemann << event
end
```

```
action(type="omprog"
binary="/usr/sbin/omriemann"
template="omriemann-json"
queue.type="linkedlist"
queue.size="50000"
queue.dequeuebatchsize="100"
queue.filename="riemannqueue"
queue.highwatermark="40000"
queue.lowwatermark="20000"
queue.maxdiskspace="5g"
queue.saveonshutdown="on")
```

My understanding is that omprog will create a few of these processes if
necessary to keep queues happy. I have certainly seen times when there are
a couple of omriemann.rb processes owned by rsyslog!

- Adam Williams
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Dave Cottlehuber

2016-12-04 23:24:28 UTC

Post by Bob Gregory
For that, I'd like to see better support for GeoIP tagging, a Riemann
output plugin, some better guidance on "failed message queues", etc. etc.
etc.

Hi David, Bob,

https://github.com/algernon/riemann-c-client may be of interest to use
it directly -- its been dropped into collectd as a library now as well,
and is ported to Debian & FreeBSD already, that I know of. The protobuf
wire format is
https://github.com/algernon/riemann-c-client/blob/master/lib/riemann/proto/riemann.proto
if that's helpful. License is LGPL3 and I can't work out what rsyslog
mainly comes under.

What I've found useful with collectd and riemann was to be able to set
specific custom tags per instance (rsyslog server in our case) which
makes the sorting in riemann very easy prior to parsing any specific
message output. Mainly source & instance type:

A+
Dave
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-12-04 23:51:15 UTC

Post by Dave Cottlehuber
I can't work out what rsyslog
mainly comes under.

Rsyslog is moving to ASL 2.0 (we have a few files that are gpl and are going to
need to be replaced)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-12-05 00:05:54 UTC

Post by Dave Cottlehuber
https://github.com/algernon/riemann-c-client may be of interest to use
it directly -- its been dropped into collectd as a library now as well,
and is ported to Debian & FreeBSD already, that I know of. The protobuf
wire format is
https://github.com/algernon/riemann-c-client/blob/master/lib/riemann/proto/riemann.proto
if that's helpful.

it is.

Post by Dave Cottlehuber
What I've found useful with collectd and riemann was to be able to set
specific custom tags per instance (rsyslog server in our case) which
makes the sorting in riemann very easy prior to parsing any specific

it looks like the protobuf allows a lot of options in terms of how to store the
data.

We can make educated guesses as to what makes sense fro the riemann point of
view, but they will only be guesses

as far as tags go, tagging it as being from rsyslog is an obvious item, and if
we have tags from mmnormalize, they should go here. What else?

should service be the programname or the faclity?

where would facility/severity be stored? is severity == metric?

what sort of stuff normally goes in the description field?

for the attributes, one obvious one is the message, but beyond that it's less
clear. Given that rsyslog internally tracks things as JSON, I think putting each
json object as an attribute makes sense, but attributes can't be nested.
Internally to rsyslog, we deal with nested objects by flattening them and
seperating the tiers with a ! (i.e. {foo:{bar:baz}} == foo!bar:baz), is this
reasonable from a riemann point of view? should we use a different character
instead?

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Bob Gregory

2016-12-05 07:54:51 UTC

Hi David,

It's probably best if you _don't_ try to map syslog fields into
riemann fields because the two technologies are accomplishing different
things. Riemann is for processing metrics - numerical data about the state
of our systems, while syslog is about logs - narrative textual data about
our systems.

Service, tags, etc will need to be configured by the end-user; we shouldn't
be guessing what they might be based on our understanding of the log
message.

The reason I would need a Riemann output is that I have three use cases
where I forward data in logs to Riemann from logstash -

1) Logstash's heartbeat (so I can measure latency on my processing pipeline)
2) ERROR and CRITICAL logs so I can alert on them
3) Metrics encoded into json logs by applications.

Service is the "Thing under measurement". The closest analogue would be
programname, but one program might have many services. For example: "http
response time ms", "Bytes read", "Active users", "messages received". Each
of the keys in the key/value messages raised by impstats is a single
service.

Tags are used to aggregate and filter services, they're arbitrary bits of
data; eg. "Message type", "User account type", "ec2 instance type", "site
map area". Our biggest use case for them is in asynchronous processing
pipelines, where we use them to tag the messages we're processing so that
we can see overall throughput and latency, but drill down when we have to.

The metric is the actual measurement, it's a number.

The closest analogue to severity is the "state", which is an arbitrary
string. Usually people use the statuses "ok", "warning", "error" etc. but
it's entirely arbitrary. They're mostly used to trigger state changes in
Riemann.

Description is a narrative description of an event. We only use these in a
single use-case, which is that we forward all logs of ERROR level and
higher to riemann so that it can count them, and send us roll-up emails
every hour, or trigger pagerduty. In this use-case, we set the description
to the incoming log message.

Lastly, the TTL is used to control how long a message should be held
in-memory by Riemann. It can be used to keep a snapshot of current state.
We use it for heartbeats - when an event's TTL expires, if we haven't
received another of the same event, we can raise an alert.

Hope that makes more sense - if you're interested in learning more about
Riemann, there's a great introductory video on the site. http://riemann.io/

The only fields that are required are the host, the service, and the metric.

-- Bob

https://github.com/algernon/riemann-c-client/blob/master/lib/riemann/proto/riemann.proto

Post by Dave Cottlehuber
if that's helpful.

it is.

it looks like the protobuf allows a lot of options in terms of how to store
the
data.

We can make educated guesses as to what makes sense fro the riemann point of
view, but they will only be guesses

as far as tags go, tagging it as being from rsyslog is an obvious item, and
if
we have tags from mmnormalize, they should go here. What else?

should service be the programname or the faclity?

where would facility/severity be stored? is severity == metric?

what sort of stuff normally goes in the description field?

for the attributes, one obvious one is the message, but beyond that it's
less
clear. Given that rsyslog internally tracks things as JSON, I think putting
each
json object as an attribute makes sense, but attributes can't be nested.
Internally to rsyslog, we deal with nested objects by flattening them and
seperating the tiers with a ! (i.e. {foo:{bar:baz}} == foo!bar:baz), is this
reasonable from a riemann point of view? should we use a different character
instead?

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

David Lang

2016-12-05 09:45:15 UTC

Post by Dave Cottlehuber
Hi David,
It's probably best if you _don't_ try to map syslog fields into
riemann fields because the two technologies are accomplishing different
things. Riemann is for processing metrics - numerical data about the state
of our systems, while syslog is about logs - narrative textual data about
our systems.
Service, tags, etc will need to be configured by the end-user; we shouldn't
be guessing what they might be based on our understanding of the log
message.

we need to try and come up with a reasonable default value for parameters.

Post by Dave Cottlehuber
The reason I would need a Riemann output is that I have three use cases
where I forward data in logs to Riemann from logstash -
1) Logstash's heartbeat (so I can measure latency on my processing pipeline)
2) ERROR and CRITICAL logs so I can alert on them
3) Metrics encoded into json logs by applications.
Service is the "Thing under measurement". The closest analogue would be
programname, but one program might have many services. For example: "http
response time ms", "Bytes read", "Active users", "messages received". Each
of the keys in the key/value messages raised by impstats is a single
service.
Tags are used to aggregate and filter services, they're arbitrary bits of
data; eg. "Message type", "User account type", "ec2 instance type", "site
map area". Our biggest use case for them is in asynchronous processing
pipelines, where we use them to tag the messages we're processing so that
we can see overall throughput and latency, but drill down when we have to.
The metric is the actual measurement, it's a number.

there is only one set of metrics per event (sint64 metric_sint64, double
metric_d, float metric_f), which do you use (or do you use multiple of them?).
Is there an expectation that you only use one?

how do you signal metric types?

i.e.:

values that are counters (running total of messages processed)
vaues that are gauges (the number of messages in a queue)

what are attributes? they are name-value pairs of strings, and you can have an
arbitrary number per event.

Post by Dave Cottlehuber
The closest analogue to severity is the "state", which is an arbitrary
string. Usually people use the statuses "ok", "warning", "error" etc. but
it's entirely arbitrary. They're mostly used to trigger state changes in
Riemann.
Description is a narrative description of an event. We only use these in a
single use-case, which is that we forward all logs of ERROR level and
higher to riemann so that it can count them, and send us roll-up emails
every hour, or trigger pagerduty. In this use-case, we set the description
to the incoming log message.
Lastly, the TTL is used to control how long a message should be held
in-memory by Riemann. It can be used to keep a snapshot of current state.
We use it for heartbeats - when an event's TTL expires, if we haven't
received another of the same event, we can raise an alert.
Hope that makes more sense - if you're interested in learning more about
Riemann, there's a great introductory video on the site. http://riemann.io/
The only fields that are required are the host, the service, and the metric.

so as I am understanding you, I would look at something along the lines of

host

default to $hostname, point at a variable

time is a 64 bit number

default to $timestamp, point at a either a timestamp variable (to be converted
to unix time) or a strong variable that should be convertable to a number

service is a string

default to programname, point at a variable

description is a string

default to $!msg if it exists, otherwise $msg, point at a variable

state is a strong

default to severity, point at a variable

tags are an array of one or more strings

default to $!event.tags, point at a json object that contains objects and/or
arrays, include only the values of those objects/arrays

TTL is a number

default to 0 if not defined, point at json string variable that should be
convertable to a number

metric is a number

no default,
point at a variable:

if the variable is a single json object, convert the value from a string to a number

if the variable contains multiple objects, flatten them and append the
object name to the service and convert the value from a string to a number and
send each item separately

so if you have service = "a ", the json object {"foo": {"bar":"1", "baz":"2.5"}}
and pass $!foo as the metric, it will send two messages to riemann:

1) service 'a bar' metric 1
2) service 'a baz' metric 2.5

attributes are name-value pairs of items

no default
point at a json object that contains objects, flatten sub-objects and send as name-value pairs

This would not require any different formats for impstats, you could take the
json output and feed it directly to this module.

you could also do some tweaking of the data before it's sent (using dyn-stats
names as part of the service)

the statsd module would be very similar, it should have the option to send
normal stats or the datadog extended stats (basically adding tags to the normal
statsd output)

I used foreach and a bunch of custom formats to do similar output to datadog
statsd.

David Lang

Post by Dave Cottlehuber
-- Bob

https://github.com/algernon/riemann-c-client/blob/master/lib/riemann/proto/riemann.proto

Post by Dave Cottlehuber
if that's helpful.

it is.

it looks like the protobuf allows a lot of options in terms of how to store the
data.
We can make educated guesses as to what makes sense fro the riemann point of
view, but they will only be guesses
as far as tags go, tagging it as being from rsyslog is an obvious item, and if
we have tags from mmnormalize, they should go here. What else?
should service be the programname or the faclity?
where would facility/severity be stored? is severity == metric?
what sort of stuff normally goes in the description field?
for the attributes, one obvious one is the message, but beyond that it's less
clear. Given that rsyslog internally tracks things as JSON, I think putting each
json object as an attribute makes sense, but attributes can't be nested.
Internally to rsyslog, we deal with nested objects by flattening them and
seperating the tiers with a ! (i.e. {foo:{bar:baz}} == foo!bar:baz), is this
reasonable from a riemann point of view? should we use a different character
instead?
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Brian Knox

2016-11-26 13:59:55 UTC