Aaron Nichols
2009-10-20 15:32:07 UTC
Hello,
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
This leaves me with a few questions:
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron
We have two rsyslog "writer" instances which are receiving syslog
messages and writing them to an NFS mount on a NAS. The NAS is connected via
10gb ethernet but has a number of other applications writing to it as well.
Typical write volume for our log writers is 2-3Mbps to the NAS. Yesterday I
attempted to resolve some complaints with logs being "bursty" - events
appear to be written in batches which sounded consistent with buffering on
the rsyslog daemon or not syncing the file with every message. These bursts
could come as infrequently as every 30 seconds however, which seems
excessive. I took a number of steps to remedy this but ultimately had to
reverse the changes because write volume (measured at the disks) to the NAS
went from 2Mbps up to 9Mbps and iops went from 30 up to 900 at the peak. I'm
sure there is an explanation for this - but there are some subtle
configuration changes which had an impact and I don't understand exactly
why.
First, the main change was to set "$ActionFileEnableSync on" in each logging
servers configuration. From what I can tell this globally enables the
ability to turn on file sync and it defaults to "off". This change, combined
with removing the "-" from all the actions in my configuration should have
resulted in all files being synced all the time - probably the worst case
scenario and I'm not all that suprised this caused problems.
Second, before rolling the change back 100% - I removed the
"$ActionFileEnableSync on" configuration from the servers but left the
action configurations without the "-" - assuming that the global
configuration option would completely disable file sync and using a "-"
would have no effect. This did not seem to be the case - it did dramatically
decrease the volume of traffic to the NAS but still the volume was about
double (5Mbps) the normal volume.
Third, I added "-" before all the action lines in the configuration and this
brought volume back down to the 2-3Mbps we are used to.
This leaves me with a few questions:
1) I would expect file sync to increase transactions with the NAS but not
increase the volume of data being written to the NAS as measured at the
disk. Can someone shed light on why this would so profoundly impact write
volume? Load on the rsyslog servers also went down substantially after
making this change - presumably because there were fewer queued transactions
- not sure.
2) Is there some difference between what "$ActionFileEnableSync on" does vs.
adding a "-" before an action other than one is global and one is
per-action? I thought these were just two different levels of granularity
for configuring the same thing but that doesn't appear to be the case.
3) Is it expected to have rsyslog take up to 30 seconds to flush messages to
a logfile? These are BUSY logs - being written to multiple times per second,
so it's not a delay on the client. I can observe these same clients logging
messages to another logging server (the old server) running syslog-ng and
messages are observable in near real-time.
Ultimate - #3 is the problem I am trying to solve, but I'm very curious why
these changes had the impact they did and why globally disabling file sync
using "$ActionFileEnableSync" would not entire decrease log volume until I
add "-" to all the actions.
Let me know if I can add info that would help shed light on this.
Thanks,
Aaron