Bug 8561 - Load balancing and handling of faulty agents are not properly modularized
Summary: Load balancing and handling of faulty agents are not properly modularized
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: VSM Server (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.19.0
Assignee: Tobias
URL:
Keywords: prosaic, samuel_tester
Depends on:
Blocks:
 
Reported: 2025-04-01 08:46 CEST by Tobias
Modified: 2025-04-03 12:48 CEST (History)
1 user (show)

See Also:
Acceptance Criteria:
MUST: * The load balancer must clearly separate load balancing and handling of faulty agents * Load info objects should not concern themselves with the load balancer's ranking


Attachments

Description Tobias cendio 2025-04-01 08:46:23 CEST
These systems are currently quite coupled and should be more modular.

Agent load info should be stored in an object for the load balancer or others to make use of. The load info objects should not be involved in how they are ranked by the load balancer when suggested for sessions.

Furthermore, the load balancer should separate its naive load balancing system -- the one operating under ideal circumstances -- and handling of faulty agents.
Comment 8 Tobias cendio 2025-04-02 08:12:35 CEST
>MUST:
>* The load balancer must clearly separate load balancing and handling of faulty agents 
Check. This has improved modularity now, which facilitates reworking the faulty agents handling (bug 8552)
>* Load info objects should not concern themselves with the load balancer's ranking
Check.
Comment 9 Samuel Mannehed cendio 2025-04-02 16:04:55 CEST
I reviewed the commits, they look good - aside from the fact that we unnecessarily sort the list of agents twice. Reopening despite this being a minor thing.

The commits were nicely separated and fully covered by updated or new unittests, nice.

I also tested in a small cluster with 3 agents running CentOS 8 and jenkins build 3976. Things behave as expected and vsmserver.log looks OK.
Comment 10 Tobias cendio 2025-04-03 10:23:30 CEST
(In reply to Samuel Mannehed from comment #9)

In the current state, we do indeed effectively sort the rating twice, which certainly appears inefficient. The idea was that there should be 2 distinct independent steps:

1. first sort: naive sorting where ideal agent performance is assumed
2. handle faulty agents: account for whatever ways agents have been underperforming

Since penalty points are simply negative rating in the current interpretation of penalty points, the result is a second sort by rating minus penalty points.

If we'd stop here one should probably merge these sortings as in the previous implementation. However since the load balancing first sort (step 1) is being changed quite a bit (bug 4429; commits are imminent) and the penalty system is being redesigned (bug 8552) we can live with this imperfect state for now, and aim for modularized steps 1 and 2.

That being said, this bug will move forward when bug 4429 have progressed and the new load balancing scheme is in place.
Comment 13 Samuel Mannehed cendio 2025-04-03 12:48:35 CEST
I consider this to be done now that comment #9 has been fixed. We only sort once now. But, as described in comment #10, the penalty system might be redesigned in bug 8552.

Note You need to log in before you can comment on or make changes to this bug.