Need Help Troubleshooting Network Issue (Client TCP Connections Stuck in FIN_WAIT_2)
up vote
0
down vote
favorite
(Note: I originally asked this question on the "Network Engineering" side, but a moderator there rejected it as "off topic" and told me to ask here instead.)
I am running a video surveillance server called ZoneMinder (version 1.26.5) on a Fedora 18 Linux box. ZoneMinder has a web-based user interface and uses a CGI executable called "zms" to transmit an MJPEG video stream to a web browser over TCP. The problem is that sometimes the video stream connection does not terminate properly; if I am viewing a video stream and close the browser window, the underlying TCP connection remains open and the zms process on the server continues to send video frames across the network. This occurs even if I terminate ALL instances of the browser on the Windows machine (verified using Task Manager). My expectation is that Windows should immediately shut down the TCP connection once the browser process terminates, but for some unknown reason that doesn't always happen, and Windows continues to accept packets on the connection indefinitely. When this problem occurs, the zms process on the server still sees the connection as open and will continue to stream video until either the Windows machine is powered down or the zms process is killed (manually, from the command shell). When reviewing surveillance events it's not uncommon to accumulate a dozen or more of these "zombie" zms processes; if I don't log on to the ZoneMinder server machine via SSH and kill these processes manually they will continue to run indefinitely, consuming disk and network I/O bandwidth and bogging down the rest of the system.
Once in the failed state, running netstat on the Windows machine shows the TCP connection is in the FIN_WAIT_2 state. A Wireshark capture shows that the Windows machine is still acknowledging segments on the connection even though there is no longer a running process receiving that data.
I have 3 Windows machines: One desktop running Windows 7 Pro SP1, one desktop running Win 7 Home Premium SP1, and one laptop running Win 7 Home Premium SP1. Of these three, the two desktop machines exhibit the problem intermittently, whereas the laptop never exhibits the problem.
I normally use the Firefox browser, but I also tried Chrome. Both work 100% on the laptop and fail intermittently on the desktops. Using Firefox and Chrome on other platforms that I have tried, such as Linux and Android, never exhibit the problem.
One of the Windows machines that fails is connected to the same gigabit switch as the ZoneMinder server box; the Windows laptop that always works is connected to a WiFi AP and reaches the ZoneMinder server through a second GigE switch. The Android devices connect both from inside and from outside beyond the firewall with no issues.
To eliminate the possibility of a network driver issue, on one of the desktop machines I tried swapping out Realtek network card with an Intel NIC, but the failure still occurs.
I've now run out of ideas; how can I troubleshoot this further? I can provide Wireshark captures if that would be helpful (they are large - ~100MB - so I've left them off for now).
Thanks for your help!
tcp
add a comment |
up vote
0
down vote
favorite
(Note: I originally asked this question on the "Network Engineering" side, but a moderator there rejected it as "off topic" and told me to ask here instead.)
I am running a video surveillance server called ZoneMinder (version 1.26.5) on a Fedora 18 Linux box. ZoneMinder has a web-based user interface and uses a CGI executable called "zms" to transmit an MJPEG video stream to a web browser over TCP. The problem is that sometimes the video stream connection does not terminate properly; if I am viewing a video stream and close the browser window, the underlying TCP connection remains open and the zms process on the server continues to send video frames across the network. This occurs even if I terminate ALL instances of the browser on the Windows machine (verified using Task Manager). My expectation is that Windows should immediately shut down the TCP connection once the browser process terminates, but for some unknown reason that doesn't always happen, and Windows continues to accept packets on the connection indefinitely. When this problem occurs, the zms process on the server still sees the connection as open and will continue to stream video until either the Windows machine is powered down or the zms process is killed (manually, from the command shell). When reviewing surveillance events it's not uncommon to accumulate a dozen or more of these "zombie" zms processes; if I don't log on to the ZoneMinder server machine via SSH and kill these processes manually they will continue to run indefinitely, consuming disk and network I/O bandwidth and bogging down the rest of the system.
Once in the failed state, running netstat on the Windows machine shows the TCP connection is in the FIN_WAIT_2 state. A Wireshark capture shows that the Windows machine is still acknowledging segments on the connection even though there is no longer a running process receiving that data.
I have 3 Windows machines: One desktop running Windows 7 Pro SP1, one desktop running Win 7 Home Premium SP1, and one laptop running Win 7 Home Premium SP1. Of these three, the two desktop machines exhibit the problem intermittently, whereas the laptop never exhibits the problem.
I normally use the Firefox browser, but I also tried Chrome. Both work 100% on the laptop and fail intermittently on the desktops. Using Firefox and Chrome on other platforms that I have tried, such as Linux and Android, never exhibit the problem.
One of the Windows machines that fails is connected to the same gigabit switch as the ZoneMinder server box; the Windows laptop that always works is connected to a WiFi AP and reaches the ZoneMinder server through a second GigE switch. The Android devices connect both from inside and from outside beyond the firewall with no issues.
To eliminate the possibility of a network driver issue, on one of the desktop machines I tried swapping out Realtek network card with an Intel NIC, but the failure still occurs.
I've now run out of ideas; how can I troubleshoot this further? I can provide Wireshark captures if that would be helpful (they are large - ~100MB - so I've left them off for now).
Thanks for your help!
tcp
Interesting - I knew that the CRC calculation could be offloaded to the hardware but not the rest of that stuff. In any case, I don't think this applies to my situation, as I am able to capture everything in Wireshark, not just the 3-way handshake. I added the registry keys anyway (they did not previously exist) but still no success. I think there are at least 2 problems here: 1. Why the server does not send a FIN in response to the client's FIN (which is ACKed by the server), and 2. Why does Windows not send a RST segment to the server when the client process terminates.
– dvarapala
Jun 20 '14 at 15:39
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
(Note: I originally asked this question on the "Network Engineering" side, but a moderator there rejected it as "off topic" and told me to ask here instead.)
I am running a video surveillance server called ZoneMinder (version 1.26.5) on a Fedora 18 Linux box. ZoneMinder has a web-based user interface and uses a CGI executable called "zms" to transmit an MJPEG video stream to a web browser over TCP. The problem is that sometimes the video stream connection does not terminate properly; if I am viewing a video stream and close the browser window, the underlying TCP connection remains open and the zms process on the server continues to send video frames across the network. This occurs even if I terminate ALL instances of the browser on the Windows machine (verified using Task Manager). My expectation is that Windows should immediately shut down the TCP connection once the browser process terminates, but for some unknown reason that doesn't always happen, and Windows continues to accept packets on the connection indefinitely. When this problem occurs, the zms process on the server still sees the connection as open and will continue to stream video until either the Windows machine is powered down or the zms process is killed (manually, from the command shell). When reviewing surveillance events it's not uncommon to accumulate a dozen or more of these "zombie" zms processes; if I don't log on to the ZoneMinder server machine via SSH and kill these processes manually they will continue to run indefinitely, consuming disk and network I/O bandwidth and bogging down the rest of the system.
Once in the failed state, running netstat on the Windows machine shows the TCP connection is in the FIN_WAIT_2 state. A Wireshark capture shows that the Windows machine is still acknowledging segments on the connection even though there is no longer a running process receiving that data.
I have 3 Windows machines: One desktop running Windows 7 Pro SP1, one desktop running Win 7 Home Premium SP1, and one laptop running Win 7 Home Premium SP1. Of these three, the two desktop machines exhibit the problem intermittently, whereas the laptop never exhibits the problem.
I normally use the Firefox browser, but I also tried Chrome. Both work 100% on the laptop and fail intermittently on the desktops. Using Firefox and Chrome on other platforms that I have tried, such as Linux and Android, never exhibit the problem.
One of the Windows machines that fails is connected to the same gigabit switch as the ZoneMinder server box; the Windows laptop that always works is connected to a WiFi AP and reaches the ZoneMinder server through a second GigE switch. The Android devices connect both from inside and from outside beyond the firewall with no issues.
To eliminate the possibility of a network driver issue, on one of the desktop machines I tried swapping out Realtek network card with an Intel NIC, but the failure still occurs.
I've now run out of ideas; how can I troubleshoot this further? I can provide Wireshark captures if that would be helpful (they are large - ~100MB - so I've left them off for now).
Thanks for your help!
tcp
(Note: I originally asked this question on the "Network Engineering" side, but a moderator there rejected it as "off topic" and told me to ask here instead.)
I am running a video surveillance server called ZoneMinder (version 1.26.5) on a Fedora 18 Linux box. ZoneMinder has a web-based user interface and uses a CGI executable called "zms" to transmit an MJPEG video stream to a web browser over TCP. The problem is that sometimes the video stream connection does not terminate properly; if I am viewing a video stream and close the browser window, the underlying TCP connection remains open and the zms process on the server continues to send video frames across the network. This occurs even if I terminate ALL instances of the browser on the Windows machine (verified using Task Manager). My expectation is that Windows should immediately shut down the TCP connection once the browser process terminates, but for some unknown reason that doesn't always happen, and Windows continues to accept packets on the connection indefinitely. When this problem occurs, the zms process on the server still sees the connection as open and will continue to stream video until either the Windows machine is powered down or the zms process is killed (manually, from the command shell). When reviewing surveillance events it's not uncommon to accumulate a dozen or more of these "zombie" zms processes; if I don't log on to the ZoneMinder server machine via SSH and kill these processes manually they will continue to run indefinitely, consuming disk and network I/O bandwidth and bogging down the rest of the system.
Once in the failed state, running netstat on the Windows machine shows the TCP connection is in the FIN_WAIT_2 state. A Wireshark capture shows that the Windows machine is still acknowledging segments on the connection even though there is no longer a running process receiving that data.
I have 3 Windows machines: One desktop running Windows 7 Pro SP1, one desktop running Win 7 Home Premium SP1, and one laptop running Win 7 Home Premium SP1. Of these three, the two desktop machines exhibit the problem intermittently, whereas the laptop never exhibits the problem.
I normally use the Firefox browser, but I also tried Chrome. Both work 100% on the laptop and fail intermittently on the desktops. Using Firefox and Chrome on other platforms that I have tried, such as Linux and Android, never exhibit the problem.
One of the Windows machines that fails is connected to the same gigabit switch as the ZoneMinder server box; the Windows laptop that always works is connected to a WiFi AP and reaches the ZoneMinder server through a second GigE switch. The Android devices connect both from inside and from outside beyond the firewall with no issues.
To eliminate the possibility of a network driver issue, on one of the desktop machines I tried swapping out Realtek network card with an Intel NIC, but the failure still occurs.
I've now run out of ideas; how can I troubleshoot this further? I can provide Wireshark captures if that would be helpful (they are large - ~100MB - so I've left them off for now).
Thanks for your help!
tcp
tcp
asked Jun 16 '14 at 19:20
dvarapala
112
112
Interesting - I knew that the CRC calculation could be offloaded to the hardware but not the rest of that stuff. In any case, I don't think this applies to my situation, as I am able to capture everything in Wireshark, not just the 3-way handshake. I added the registry keys anyway (they did not previously exist) but still no success. I think there are at least 2 problems here: 1. Why the server does not send a FIN in response to the client's FIN (which is ACKed by the server), and 2. Why does Windows not send a RST segment to the server when the client process terminates.
– dvarapala
Jun 20 '14 at 15:39
add a comment |
Interesting - I knew that the CRC calculation could be offloaded to the hardware but not the rest of that stuff. In any case, I don't think this applies to my situation, as I am able to capture everything in Wireshark, not just the 3-way handshake. I added the registry keys anyway (they did not previously exist) but still no success. I think there are at least 2 problems here: 1. Why the server does not send a FIN in response to the client's FIN (which is ACKed by the server), and 2. Why does Windows not send a RST segment to the server when the client process terminates.
– dvarapala
Jun 20 '14 at 15:39
Interesting - I knew that the CRC calculation could be offloaded to the hardware but not the rest of that stuff. In any case, I don't think this applies to my situation, as I am able to capture everything in Wireshark, not just the 3-way handshake. I added the registry keys anyway (they did not previously exist) but still no success. I think there are at least 2 problems here: 1. Why the server does not send a FIN in response to the client's FIN (which is ACKed by the server), and 2. Why does Windows not send a RST segment to the server when the client process terminates.
– dvarapala
Jun 20 '14 at 15:39
Interesting - I knew that the CRC calculation could be offloaded to the hardware but not the rest of that stuff. In any case, I don't think this applies to my situation, as I am able to capture everything in Wireshark, not just the 3-way handshake. I added the registry keys anyway (they did not previously exist) but still no success. I think there are at least 2 problems here: 1. Why the server does not send a FIN in response to the client's FIN (which is ACKed by the server), and 2. Why does Windows not send a RST segment to the server when the client process terminates.
– dvarapala
Jun 20 '14 at 15:39
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
The TCP state FIN_WAIT_2
means that the application has closed and the client sent a FIN to the server. The server sends an ACK and should tell the application server to begin shutting down. Then it should send a FIN to the client. Your client is waiting on the server to send its FIN.
Your Windows machines exhibiting the behavior maybe using TCP Chimney offloading which offloads some TCP housekeeping chores to the NIC e.g. ACKing data and closing connections. Once the application closes, the NIC takes over handling final closing of the connection. This could be why your machine continues to ACK data even though the browser is closed.
You could try mitigating the problem by disabling TCP Chimney on Windows. Instructions are here.
However, that doesn't address the root cause of why the server doesn't send a FIN. With traffic captures on both client and server, you can:
- Verify that the client does send a FIN
- Verify that the server receives the FIN
- Verify that the server sends a FIN
- Verify that the client receives the FIN
Likely, there's a gap in one of those steps. If all steps complete, then the problem is with the client and it might be TCP Chimney offloading.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The TCP state FIN_WAIT_2
means that the application has closed and the client sent a FIN to the server. The server sends an ACK and should tell the application server to begin shutting down. Then it should send a FIN to the client. Your client is waiting on the server to send its FIN.
Your Windows machines exhibiting the behavior maybe using TCP Chimney offloading which offloads some TCP housekeeping chores to the NIC e.g. ACKing data and closing connections. Once the application closes, the NIC takes over handling final closing of the connection. This could be why your machine continues to ACK data even though the browser is closed.
You could try mitigating the problem by disabling TCP Chimney on Windows. Instructions are here.
However, that doesn't address the root cause of why the server doesn't send a FIN. With traffic captures on both client and server, you can:
- Verify that the client does send a FIN
- Verify that the server receives the FIN
- Verify that the server sends a FIN
- Verify that the client receives the FIN
Likely, there's a gap in one of those steps. If all steps complete, then the problem is with the client and it might be TCP Chimney offloading.
add a comment |
up vote
1
down vote
The TCP state FIN_WAIT_2
means that the application has closed and the client sent a FIN to the server. The server sends an ACK and should tell the application server to begin shutting down. Then it should send a FIN to the client. Your client is waiting on the server to send its FIN.
Your Windows machines exhibiting the behavior maybe using TCP Chimney offloading which offloads some TCP housekeeping chores to the NIC e.g. ACKing data and closing connections. Once the application closes, the NIC takes over handling final closing of the connection. This could be why your machine continues to ACK data even though the browser is closed.
You could try mitigating the problem by disabling TCP Chimney on Windows. Instructions are here.
However, that doesn't address the root cause of why the server doesn't send a FIN. With traffic captures on both client and server, you can:
- Verify that the client does send a FIN
- Verify that the server receives the FIN
- Verify that the server sends a FIN
- Verify that the client receives the FIN
Likely, there's a gap in one of those steps. If all steps complete, then the problem is with the client and it might be TCP Chimney offloading.
add a comment |
up vote
1
down vote
up vote
1
down vote
The TCP state FIN_WAIT_2
means that the application has closed and the client sent a FIN to the server. The server sends an ACK and should tell the application server to begin shutting down. Then it should send a FIN to the client. Your client is waiting on the server to send its FIN.
Your Windows machines exhibiting the behavior maybe using TCP Chimney offloading which offloads some TCP housekeeping chores to the NIC e.g. ACKing data and closing connections. Once the application closes, the NIC takes over handling final closing of the connection. This could be why your machine continues to ACK data even though the browser is closed.
You could try mitigating the problem by disabling TCP Chimney on Windows. Instructions are here.
However, that doesn't address the root cause of why the server doesn't send a FIN. With traffic captures on both client and server, you can:
- Verify that the client does send a FIN
- Verify that the server receives the FIN
- Verify that the server sends a FIN
- Verify that the client receives the FIN
Likely, there's a gap in one of those steps. If all steps complete, then the problem is with the client and it might be TCP Chimney offloading.
The TCP state FIN_WAIT_2
means that the application has closed and the client sent a FIN to the server. The server sends an ACK and should tell the application server to begin shutting down. Then it should send a FIN to the client. Your client is waiting on the server to send its FIN.
Your Windows machines exhibiting the behavior maybe using TCP Chimney offloading which offloads some TCP housekeeping chores to the NIC e.g. ACKing data and closing connections. Once the application closes, the NIC takes over handling final closing of the connection. This could be why your machine continues to ACK data even though the browser is closed.
You could try mitigating the problem by disabling TCP Chimney on Windows. Instructions are here.
However, that doesn't address the root cause of why the server doesn't send a FIN. With traffic captures on both client and server, you can:
- Verify that the client does send a FIN
- Verify that the server receives the FIN
- Verify that the server sends a FIN
- Verify that the client receives the FIN
Likely, there's a gap in one of those steps. If all steps complete, then the problem is with the client and it might be TCP Chimney offloading.
answered Jun 17 '14 at 23:20
karyhead
1112
1112
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f769564%2fneed-help-troubleshooting-network-issue-client-tcp-connections-stuck-in-fin-wai%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Interesting - I knew that the CRC calculation could be offloaded to the hardware but not the rest of that stuff. In any case, I don't think this applies to my situation, as I am able to capture everything in Wireshark, not just the 3-way handshake. I added the registry keys anyway (they did not previously exist) but still no success. I think there are at least 2 problems here: 1. Why the server does not send a FIN in response to the client's FIN (which is ACKed by the server), and 2. Why does Windows not send a RST segment to the server when the client process terminates.
– dvarapala
Jun 20 '14 at 15:39