Tim Smith
10 years ago
Hi,
I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward logs
to multiple destinations:
- one copy to Splunk syslog listener
- one copy to local flume process over TCP
- one copy to a remote RSyslog receiver, X and Y (RHEL 6.x)
Forwarding copies to Splunk and Flume works fine. However, forwarding to
the remote Syslog receivers gets stuck in a strange way. The forwarding is
setup as:
RSyslog-Server-A -> RSyslog-Server-X
RSyslog-Server-B -> RSyslog-Server-Y
All four - A,B, X and Y are running exactly the same version of RSyslog -
8.6.2-2, from the adiscon repo:
rsyslog-8.6.0-2.el6.x86_64
What happens is A/B stop sending logs to X/Y. Looking at the send/receive
TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on
A/B gets stuck. As an example, this connection lingers forever (extracted
with netstat -an | grep EST):
tcp 0 103660 10.24.62.9:47081 10.2.1.2:514
ESTABLISHED
Observations:
==========
- The connection remains established with the same number of bytes in the
sendQ
- No data is transferred over the "stuck" connection, looking at tcpdump
- Re-starting the receive end, X/Y, does not help
- I don't see an action suspended error in the rsyslog logs
- Running the send side in debug doesn't help - I easily ended up with 100+
Gigs of debug logs without the issue manifesting itself. The A/B pair
handle lots of traffic and running rsyslogd in debug mode reduces their
throughput - perhaps the issue does not manifest at lower EPS.
- Only re-starting the send side, A/B, resolves the issue.
I tweaked omfwd action to change TCP_Framing from default to octet-based.
Here is the send side omfwd config on A/B:
--------------------
action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514"
Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X"
queue.maxdiskspace="10G" queue.Size="8640000"
queue.dequeuebatchsize="4096" queue.type="LinkedList"
queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on"
queue.workerThreads="4" RebindInterval="10000000" template="fwdformat" )
--------------------
The receive side, X/Y, config:
--------------------
module(load="imptcp" threads="16") # needs to be done just once
global (
workdirectory="/data/rsyslog/queues"
maxmessagesize="64K"
debug.logfile="/data/rsyslog/debug/debug.log"
net.enabledns="off"
)
$DebugLevel 0
main_queue (
queue.FileName="globalqueue"
queue.Type="LinkedList"
queue.MaxDiskSpace="250g"
queue.maxfilesize="5g"
queue.Size="864000000"
queue.dequeuebatchsize="1000"
queue.TimeoutEnqueue="0"
queue.workerThreads="4"
queue.SaveOnShutdown="on"
)
ruleset(name="aggregate") {
action (name="to_flume"
type="omfwd"
Target="localhost"
Port="5614"
Protocol="tcp"
queue.filename="to_flume"
queue.size="360000000"
queue.maxdiskspace="360G"
queue.highwatermark="216000000" # 60% of queue.size
queue.discardmark="288000000" # 80% of queue.size
queue.type="LinkedList"
queue.dequeuebatchsize="4096"
queue.timeoutenqueue="0"
queue.maxfilesize="4G"
queue.saveonshutdown="on"
queue.workerThreads="4"
RebindInterval="10000000"
template="rawfwd"
) stop
}
input(type="imptcp" port="514" ruleset="aggregate")
--------------------
Any pointers to troubleshoot and smoke out the bug will be highly
appreciated :)
Thanks
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
I have a pair of Linux/RHEL servers (RHEL 6.x), A and B, that forward logs
to multiple destinations:
- one copy to Splunk syslog listener
- one copy to local flume process over TCP
- one copy to a remote RSyslog receiver, X and Y (RHEL 6.x)
Forwarding copies to Splunk and Flume works fine. However, forwarding to
the remote Syslog receivers gets stuck in a strange way. The forwarding is
setup as:
RSyslog-Server-A -> RSyslog-Server-X
RSyslog-Server-B -> RSyslog-Server-Y
All four - A,B, X and Y are running exactly the same version of RSyslog -
8.6.2-2, from the adiscon repo:
rsyslog-8.6.0-2.el6.x86_64
What happens is A/B stop sending logs to X/Y. Looking at the send/receive
TCP queues at both ends, the receive queue on X/Y is clear but the sendQ on
A/B gets stuck. As an example, this connection lingers forever (extracted
with netstat -an | grep EST):
tcp 0 103660 10.24.62.9:47081 10.2.1.2:514
ESTABLISHED
Observations:
==========
- The connection remains established with the same number of bytes in the
sendQ
- No data is transferred over the "stuck" connection, looking at tcpdump
- Re-starting the receive end, X/Y, does not help
- I don't see an action suspended error in the rsyslog logs
- Running the send side in debug doesn't help - I easily ended up with 100+
Gigs of debug logs without the issue manifesting itself. The A/B pair
handle lots of traffic and running rsyslogd in debug mode reduces their
throughput - perhaps the issue does not manifest at lower EPS.
- Only re-starting the send side, A/B, resolves the issue.
I tweaked omfwd action to change TCP_Framing from default to octet-based.
Here is the send side omfwd config on A/B:
--------------------
action (name="it_tcp_X" type="omfwd" Target="X.abc.com" Port="514"
Protocol="tcp" TCP_Framing="octet-counted" queue.filename="it_tcp_X"
queue.maxdiskspace="10G" queue.Size="8640000"
queue.dequeuebatchsize="4096" queue.type="LinkedList"
queue.timeoutenqueue="0" queue.maxfilesize="1G" queue.saveonshutdown="on"
queue.workerThreads="4" RebindInterval="10000000" template="fwdformat" )
--------------------
The receive side, X/Y, config:
--------------------
module(load="imptcp" threads="16") # needs to be done just once
global (
workdirectory="/data/rsyslog/queues"
maxmessagesize="64K"
debug.logfile="/data/rsyslog/debug/debug.log"
net.enabledns="off"
)
$DebugLevel 0
main_queue (
queue.FileName="globalqueue"
queue.Type="LinkedList"
queue.MaxDiskSpace="250g"
queue.maxfilesize="5g"
queue.Size="864000000"
queue.dequeuebatchsize="1000"
queue.TimeoutEnqueue="0"
queue.workerThreads="4"
queue.SaveOnShutdown="on"
)
ruleset(name="aggregate") {
action (name="to_flume"
type="omfwd"
Target="localhost"
Port="5614"
Protocol="tcp"
queue.filename="to_flume"
queue.size="360000000"
queue.maxdiskspace="360G"
queue.highwatermark="216000000" # 60% of queue.size
queue.discardmark="288000000" # 80% of queue.size
queue.type="LinkedList"
queue.dequeuebatchsize="4096"
queue.timeoutenqueue="0"
queue.maxfilesize="4G"
queue.saveonshutdown="on"
queue.workerThreads="4"
RebindInterval="10000000"
template="rawfwd"
) stop
}
input(type="imptcp" port="514" ruleset="aggregate")
--------------------
Any pointers to troubleshoot and smoke out the bug will be highly
appreciated :)
Thanks
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.