(2018-08-06, 05:49 PM)tkurki Wrote: [ -> ]Could you try once more with
Code:
sudo lsof -i | grep 30330
when the problem is occurring?
What I am looking for is the other end of the single active connection, which is not included in the listing with all the CLOSE_WAIT connections that you sent:
Code:
kplex 3363 pi 11u IPv4 434938 0t0 TCP 10.10.10.1:30330->10.10.10.1:55516 (ESTABLISHED)
My guess is that lsof is not listing all network connections without sudo.
What I am looking for is something like this:
Code:
lsof -i | grep 30330
kplex 8862 tjk 6u IPv6 0x1b4146e9b77ce467 0t0 TCP *:30330 (LISTEN)
kplex 8862 tjk 7u IPv6 0x1b4146e9af129447 0t0 TCP localhost:30330->localhost:56510 (ESTABLISHED)
nc 8863 tjk 5u IPv6 0x1b4146e9b69ff987 0t0 TCP localhost:56510->localhost:30330 (ESTABLISHED)
You can see kplex server listening for new connections and then one established connection from application nc.
As far as I understand this kplex is not handling connections that are closed by a client properly but keeps the file descriptor open with the connection in CLOSE_WAIT state - when there is no data being processed. From what you are saying it closes the connections properly if there is data flowing.
What i don't understand is that in your listings signalk-server does not have any open connections: it is only listening for new http connections on port 3000.
########
OK, just turned off the node red sending NMEA to UDP flow. Maybe worth mentioning there is a sigK node red app running as well.
Code:
pi@openplotter:~ $ sudo lsof -i | grep 30330
node-red 1392 pi 26u IPv4 203279 0t0 TCP 10.10.10.1:53648->10.10.10.1:30330 (ESTABLISHED)
kplex 7684 pi 5u IPv4 98545 0t0 TCP *:30330 (LISTEN)
kplex 7684 pi 9u IPv4 97032 0t0 TCP 10.10.10.1:30330->10.10.10.1:57268 (CLOSE_WAIT)
kplex 7684 pi 10u IPv4 113133 0t0 TCP 10.10.10.1:30330->10.10.10.1:53648 (ESTABLISHED)
kplex 7684 pi 11u IPv4 97045 0t0 TCP localhost:30330->localhost:53508 (CLOSE_WAIT)
kplex 7684 pi 12u IPv4 202443 0t0 TCP localhost:30330->localhost:46636 (CLOSE_WAIT)
kplex 7684 pi 13u IPv4 205935 0t0 TCP localhost:30330->localhost:46678 (CLOSE_WAIT)
kplex 7684 pi 14u IPv4 206015 0t0 TCP localhost:30330->localhost:46720 (CLOSE_WAIT)
kplex 7684 pi 15u IPv4 206033 0t0 TCP localhost:30330->localhost:46762 (CLOSE_WAIT)
kplex 7684 pi 16u IPv4 206082 0t0 TCP localhost:30330->localhost:46804 (CLOSE_WAIT)
kplex 7684 pi 17u IPv4 206123 0t0 TCP localhost:30330->localhost:46846 (ESTABLISHED)
node 8745 pi 12u IPv4 206143 0t0 TCP localhost:46846->localhost:30330 (ESTABLISHED)
HTH, thanks
Here's what I think is happening.
kplex has a not-quite-conventional approach to its data flow in that it doesn't use select(), poll() or similar but uses a separate thread for every input and every output.
Read interfaces sit in a blocking read. If the other end of a tcp socket closes the connection the read will return an error, kplex knows the connection is closed and will terminate the interface. If the interface is bi-directional it will also terminate its "pair" in the write thread.
But this is an output-only interface. What seems to be happening here is that because no data is being sent to kplex, it's not writing to any of its open sockets. Because it doesn't write to anything, it doesn't get an error to tell it the socket it is writing to is closed and so doesn't know to terminate the thread and close the socket.
if I'm right, removing "direction=out" would make the socket bi-drectional and cause the read thread to terminate the connection when the client shut down.
Undesirable behaviour for sure but this isn't so much a bug as a condition which wasn't anticipated (no outputs to write and repeated reconnections)
Fantastic. So far anyway. Just manually edited the sigk 30330 in kplex to both and seems stable so far, though only been a few minutes. Will leave it overnight and report back in the morning. Nice one mate!
Sent from my SM-T813 using Tapatalk
Code:
tjk:kplex-1.3.4 tjk$ cat kplex.conf
[udp]
name=system
direction=in
address=127.0.0.1
port=10110
[tcp]
name=signalk
direction=out
mode=server
port=30330
tjk:kplex-1.3.4 tjk$ kplex -f kplex.conf
and then
Code:
tjk:kplex-1.3.4 tjk$ nc localhost 30330
^C
tjk:kplex-1.3.4 tjk$ lsof -i | grep 30330
kplex 11140 tjk 6u IPv6 0x1b4146e9b8e619a7 0t0 TCP *:30330 (LISTEN)
kplex 11140 tjk 7u IPv6 0x1b4146e9a25909c7 0t0 TCP localhost:30330->localhost:59735 (CLOSE_WAIT)
results in CLOSE_WAIT state.
When I remove the direction=out line the connection closes properly.
It turns out that SK server does have reconnect logic for idle tcp connections and there is no way to disable it.
I don't think that makes sense for tcp connections, that are reliable by virtue of tcp.
I'll get back to this here once I've done something about it.
with -
Code:
[tcp]
name=signalk
direction=both
mode=server
port=30330
I'm now getting -
Code:
pi@openplotter:~ $ sudo lsof -i | grep 30330
node 465 pi 51u IPv4 19129 0t0 TCP localhost:58468->localhost:30330 (ESTABLISHED)
kplex 1445 pi 5u IPv4 17971 0t0 TCP *:30330 (LISTEN)
kplex 1445 pi 9u IPv4 21685 0t0 TCP localhost:30330->localhost:58468 (ESTABLISHED)
kplex 1445 pi 10u IPv4 17191 0t0 TCP 10.10.10.1:30330->10.10.10.1:58608 (ESTABLISHED)
node-red 1461 pi 21u IPv4 17229 0t0 TCP 10.10.10.1:58608->10.10.10.1:30330 (ESTABLISHED)
(2018-08-06, 09:13 PM)tkurki Wrote: [ -> ]Sorry about that and thanks for the clarification. I thought I was linking the latest settings, as the link was to master in Github. Now I see that there are no 1.x tags in Github - have you moved the source someplace else?
A very small change would make the picture a bit clearer: moving the text TCP localhost 30330 next to the kplex box, like 20220 is next to PyPilot. This would make it more clear that kplex is the server, not SK.
The old version and stable is master branch, the current version and active is beta branch. Beta branch will become master branch soon.
https://github.com/sailoog/openplotter/tree/beta
I have done that suggested change in data routing image, thanks.
Thanks tkurki and stripydog (signal k and kplex developers) for going into this issue.
(2018-08-06, 11:11 PM)stripydog Wrote: [ -> ]Undesirable behaviour for sure but this isn't so much a bug as a condition which wasn't anticipated (no outputs to write and repeated reconnections)
Short of a complete rewrite (yes, yes, kplex two-dot-oh) I can think of a couple of ways to mitigate against the problem we saw here: make keepalives work where kplex is a server (rather than a client) or provide an option to output some kind of heartbeat (proprietary sentence which should be ignored by receivers). All comment/suggestionsgratefully received (but not guaranteed to be acted upon :-)
(2018-08-07, 09:34 PM)stripydog Wrote: [ -> ]Short of a complete rewrite (yes, yes, kplex two-dot-oh) I can think of a couple of ways to mitigate against the problem we saw here: make keepalives work where kplex is a server (rather than a client) or provide an option to output some kind of heartbeat (proprietary sentence which should be ignored by receivers). All comment/suggestionsgratefully received (but not guaranteed to be acted upon :-)
So the easiest way is a proprietary sentence which should be ignored by receivers. Is it a good idea to do it in OpenPlotter or do you want to add one in kplex?
(2018-08-07, 09:34 PM)stripydog Wrote: [ -> ] (2018-08-06, 11:11 PM)stripydog Wrote: [ -> ]Undesirable behaviour for sure but this isn't so much a bug as a condition which wasn't anticipated (no outputs to write and repeated reconnections)
Short of a complete rewrite (yes, yes, kplex two-dot-oh) I can think of a couple of ways to mitigate against the problem we saw here: make keepalives work where kplex is a server (rather than a client) or provide an option to output some kind of heartbeat (proprietary sentence which should be ignored by receivers). All comment/suggestionsgratefully received (but not guaranteed to be acted upon :-)
FYI I am going to remove the disconnect-on-idle behavior in SK server anyway. That should make this particular interaction go away.
Outputting bogus heartbeats seems like a kludge to me - the client doesn't need them, this is kplex internal. How about a watchdog thread that closes connections that the client has closed (that are in CLOSE_WAIT), no matter what?