(Reported on the ThinLinc-technical mailing list by Jens Langner - thanks!) The load average numbers for Windows servers has a tendency to drop way into negative numbers, due to the algorithm not taking the number of cpu:s into account when doing the calculations. The problematic part of the algorithm is this: > free_bogomips = EST_BOGOMIPS * (1 - loadinfo.loadavg) loadinfo.loadavg is a value reported from the Windows side. What this actually means and what range it should have is a bit unclear judging from comment 18 and 20 of bug 3864. It used to be a number in the range from 0 to 1, but is now a value from 0 to the number of cores in the Windows system. The load balancer however still assumes that the number is 0 for no load and 1 for full load on all cores. When servers gain more and more cores, the load balancer will report the server running with full load at merely 1/(number of cores) load.
For reference, the loadavg we report from VSM agent is adjusted on the agent side, not the master. It's probably best to use the same principle here.
Fixed in r27829. With regards to comment #2: I decided against changing the nrpe_nt code back to reporting 0..1 because it would affect all other users of nrpe_nt.
(In reply to comment #3) > Fixed in r27829. > > With regards to comment #2: I decided against changing the nrpe_nt code back to > reporting 0..1 because it would affect all other users of nrpe_nt. This is confusing, it's better to revert to the earlier behaviour; how it worked before: r116 | hean01 | 2012-05-25 08:28:02 +0200 (fre, 25 maj 2012) | 4 lines
Fixed in r28035, r28036.
Looks good now.
As I have been the initial reporter of this bug and I just installed 4.1.1 on our systems I am curious what might be the actual status of affairs regarding the windows load balancing algorithm in 4.1.1? As I don't have access to the sources of ThinLinc I can only try to guess from the comments above about what was actually changed and to me it seems nothing was actually changed and the behavior of the tl-best-winserver and check_nrpe functionality is actually the same like in 4.1.0?!? Is this actually the case and if so, why wasn't it changed and this bug closed? And if not, what was actually changed in the algorithm?
(In reply to comment #7) > As I have been the initial reporter of this bug and I just installed 4.1.1 on > our systems I am curious what might be the actual status of affairs regarding > the windows load balancing algorithm in 4.1.1? As I don't have access to the > sources of ThinLinc I can only try to guess from the comments above about what > was actually changed and to me it seems nothing was actually changed and the > behavior of the tl-best-winserver and check_nrpe functionality is actually the > same like in 4.1.0?!? > > Is this actually the case and if so, why wasn't it changed and this bug closed? > And if not, what was actually changed in the algorithm? Hi Jens, The initial fix for this bug was to scale the load value back into the range of 0-1 from 0-<cpus>. I initially solved this by scaling the value I received from wts-tools on the "client" side (on the ThinLinc server). However everyone wasn't happy with this solution, which led me to reverting my own fix, and then later reverting the change in nrpe_nt which changed the load value reported from 0-1 to 0-<cpus>. Since all changes in the 4.1.1 release happened on the Windows side of things, this means you also need to upgrade wts-tools to 4.1.1 when you upgrade your ThinLinc server to 4.1.1. Perhaps this wasn't communicated in a clear enough way from the comments here or the release notes.
(In reply to comment #8) > The initial fix for this bug was to scale the load value back into the > range of 0-1 from 0-<cpus>. I initially solved this by scaling the > value I received from wts-tools on the "client" side (on the ThinLinc > server). However everyone wasn't happy with this solution, which led > me to reverting my own fix, and then later reverting the change in > nrpe_nt which changed the load value reported from 0-1 to 0-<cpus>. Thanks for that information. Now its clear to me what exactly was changed and that the load_avg value returned by check_nrpe will only be between 0 - 1. Thus, I changed my own 'tl-best-winserver' script to reflect that change with ThinLinc 4.1.1. For reference and in case you are interested to review or somehow integrate my tl-best-winserver script with ThinLinc (it might be interesting for some users) please find the latest version here: https://github.com/hzdr/thinstation/blob/master/ts/5.1/packages/hzdr/bin/scripts/tl-best-winserver To explain why we are having an own version of tl-best-winserver, see here: 1. On our ThinClients (thinstation-based) we run an own GUI which allows to either directly connect to our windows terminal servers via xfreerdp or if a user chooses to connect to a Linux server it uses ThinLinc instead. Thus we needed a possibility to query our windows servers for the same load balancing information like ThinLinc is doing it internally. 2. we needed a tl-best-winserver command-line program which allows to override the username which is currently not possible with the version coming with ThinLinc. > Since all changes in the 4.1.1 release happened on the Windows side of > things, this means you also need to upgrade wts-tools to 4.1.1 when > you upgrade your ThinLinc server to 4.1.1. Perhaps this wasn't > communicated in a clear enough way from the comments here or the > release notes. Indeed, the release notes weren't particular clear on that as well as the comments here.