A commandline tool which can query information from ThinLinc environment using the xml-rpc interface that tladm webmin module uses. Providing functionality such as querying information like all sessions, session by specific server, server information and more. Also actions should be handled such as terminating a session, sending notify message to specific user or server. Examples of usage and output (no action argument = query): # List sessions on specific server tl-adm sessions @myserver SessId User State Connected -------------------------------------------------------------- #1 hean01@myserver Active 2011-01-06 08:32:22 #2 astrand@myserver Inactive 2011-01-03 08:02:43 #3 astrand@myserver Active 2010-12-20 16:20:10 # Terminate session 2 on servername tl-adm session #2@myserver terminate # Show user session information can show more entries # if user have more than one session on server tl-adm session hean01@myserver # Notify all users on specific server tl-adm notify @myserver "The system is restarted in 10 min, please save your work and log out" And here's follows a examples of script usage maintainence.sh: #!/bin/sh # take the current server offline, no new connections tl-adm server @$1 disable # notify current sessions online tl-adm notify @$1 "The system is going down for maintainance in 10 minutes, please save your work, log out and log again" # wait 10 minutes sleep 10m # Notify of shutdown tl-adm notify @$1 "Shutting down system, your session will be logged out" # Terminates all active sessions tl-adm server @$1 offline
Some interim commands: http://lists.cendio.se/pipermail/thinlinc-technical/2013-January/000230.html
A first prototype of such a tool is available on bug 5268. It probably needs some cleanup to promote code reuse, but otherwise it's rather close to the target. It however uses the local Unix socket to talk to vsmserver, which means it can only be used on the same machine as the master. (tlwebadm has the same limitation right now though)
> A first prototype of such a tool is available on bug 5268. It probably needs > some cleanup to promote code reuse, but otherwise it's rather close to the > target. It however uses the local Unix socket to talk to vsmserver, which means > it can only be used on the same machine as the master. > > (tlwebadm has the same limitation right now though) > Currently there are a rudiment remote api over XML-RPC designed and used for IPC between master and agent. The ACL is also rudiment and builds upon lowport and list of known addresses. However the current implemented ACL puts a restrain to tlwebadm in such way that it for example can't do authorized calls remotely to master and agent. This is the reason to why tlwebadm can't administrate the full cluster and only works locally for master and agent server. Doors open up for cluster administration, if we add another authorization mechanism that is not "bound to process running as root on server were rpc service is running".
After our start up meeting we've come up with a non-exhaustive list of things that need to be done for this bug: • Investigate and write user stories • Compile acceptance criteria from the user stories • Naming • Find a terminology for the command line interface that matches the rest of the product. • Come up with a name for the tool. Preferably by looking at conventions used by similar tools. • In the case we settle with something along the lines of `tlctl', do we use a hyphen or not? We have historically been inconsistent regarding this. • Generating man pages from ReStructured Text such that the documentation for the tool can be used in both the TAG and man pages. • Setting up a modular base architecture for the tool. During the initial meeting we discussed placing this tool in a subdirectory of tlmisc/. • Descide how we should handle future bugs related to this tool? Do we add a new bugzilla component or place bugs related to this tool under an already existing component? When we are at a stage where we can start working on the tool it self, these are some of the things that have to be done: • Implementing parsing of command line input. • Investigate suitable command line parsing libraries available in python. • Implementing communication with the master and agent. • Possibly finding a neat and typable unique session identifier to ease command line interaction. This can possibly be somewhat tricky since we want the identifier to be very much unique to a session. • Implementation of pretty printing and formatting. These are all relatively parallelizable.
We also found this document about writing sane command line tools which might be of use https://clig.dev/
As an initial effort, we plan on giving the tool session management capabilities such as: • Listing sessions • Terminating sessions • Abandoning sessions Since we have future plans on making this tool relatively expansive, with multiple different subcommands, a possible idea is to give it another subcommand apart from that for session management. This would give the benefit of forcing us to make the architecture modular while giving the users a hint that the tool can be used for things other than session management. A possible candidate for this would be a subcommand for showing load information. The load information is read only and likely easy to implement.
One possible way to gather good user stories solidly grounded in reality would be to make a post on the ThinLinc community discussing the matter.
We did some brainstorming and came up with the following user stories. Terminate session ═════════════════ Someone calls to support and has issues with their session. Then I want to be able to terminate that session, so that I won't have to guide an inexperienced user to do the same operation through the client. Without the tool, you have to terminate the session through TLWebadm. Find running session ════════════════════ Someone calls to support and asks for a session that they could reach at time X, but can't anymore. Then I want to be able to check if that session is still running, so that I know where to concentrate the troubleshooting effort. Without the tool, you have to check the session through TLWebadm. Killing zombie sessions ═══════════════════════ Some sessions on some agents have been abandoned - they aren't shown in the session database, and the user can't reach them. Then I want to find all the abandoned sessions, so that I can kill them. Without the tool, you have to go to each agent and list all running sessions, find which of those are not listed in TLWebadm, and kill those. Drain agent ═══════════ Some agents are to be decommissioned, and we want them empty before that. Then I want to set an agent in "drainage" mode, such that the agent is up but doesn't allow any new sessions, so that eventually becomes free of sessions such that it can be upgraded or decommissioned. Without the tool, you have to follow the steps below, which are non-trivial: https://www.cendio.com/resources/docs/tag-devel/html/config_cluster.html#stopping-new-session-creation-on-select-agents-in-a-cluster Verify empty agent ══════════════════ We are about to decommission an agent. then I want to check that it doesn't have any running sessions, so that the users are unaffected. Without the tool, you either have to go through the sessions in TLWebadm, or check that no sessions are running on each agent machine. Performance troubleshooting ══════════════════════════ Someone reported performance issues somewhere in the cluster. Then I want to identify which agent(s) are affected and which session on that agent is responsible, so that I can identify and act on problematic sessions. Without the tool, you have to check each individual agent for resource hogs. Abandoning sessions ═══════════════════ An agent exploded and is lost. Then I want to be able to abandon all sessions running on that specific agent, so that the user is not prompted to manually abandon it the next time they connect. Without the tool, you have to go through the list of sessions in TLWebadm, and abandon each affected session. Verify upgrade ══════════════ Some agents have been upgraded. I want to list the agents and their corresponding versions, so I can verify that the correct ones were upgraded. Without the tool, you have to go to each agent machine and check /opt/thinlinc/etc/thinlinc-release.
On top of the user stories in comment 24, there are also likely some stories related to obtaining statistics from a list of sessions. We've gotten customer requests in the past about this as it can help in determining a suitable number of licenses and in tuning parameters things like session lifetime limiting.
For the name of the tool, the current suggestion is tlctl, I've looked into our naming standard and the standard for other commands. Most of our script names use hyphens. This includes both scripts that can be used by an admin, like tl-config and tl-notify, and scripts only for internal use. On the other hand, 'ctl' is usually added as a suffix to the script name without any hyphen, e.g. journalctl, systemctl and hostnamectl. We also discussed some other options for the name instead of tlctl, but didn't come up with any that seemed better: * tl-cmd - Issue: Kinda sounds like something will be executed * tl-adm - Issue: May cause confusion since tlwebadm sometimes is just called tladm For now, tlctl is probably the best option. It doesn't follow the standard of our scripts, but sysadmins are probably more familiar with systemctl etc. than our scripts, so it should still feel like a quite evident name.
There are a few common approaches to the syntax of subcommands. One popular approach is format <command noun verb> or <command verb noun>, used by e.g. docker and IBM Cloud Kubernetes Service [1]. Both are quite straightforward, but it seems like the <noun verb> format is more common. One thing to consider though is if the noun should be in plural or singular, but singular seems to be the preferred one. > docker container create > ibmcloud ks cluster ls [1]: https://uxdesign.cc/user-experience-clis-and-breaking-the-world-baed8709244f Command 'loginctl' uses another strategy to get around the plural or singular issue, though it doesn't really feel as neat as the <noun verb> approach. > loginctl list-sessions > loginsctl session-status We decided to use the <noun verb> approach. In this first version, we will use "long names" (e.g. list instead of ls), and later shorten command names if needed.
We should also be sure that tlctl uses the same terminology as in WebAdmin. We have used 'terminate' when killing a session, and when listing the sessions we should keep the column headers the same as for the sessions under Status -> Sessions. Otherwise, I don't see anything that seems relevant at this stage, but when adding future functionality we should revisit the corresponding parts in WebAdmin.
(In reply to Linn from comment #26) > Most of our script names use hyphens. This includes both scripts that can be > used by an admin, like tl-config and tl-notify, and scripts only for > internal use. On the other hand, 'ctl' is usually added as a suffix to the > script name without any hyphen, e.g. journalctl, systemctl and hostnamectl. > This has become very popular recently with systemd as it includes no less than 16 commands ending in "ctl". However it is not the only one, as my system has 40 such commands, meaning there are 24 commands that are not systemd related. Extending this I can search in dnf¹ and find 144 packages with tools ending in "ctl". So I'd say it's a fairly common naming these days. ¹ dnf repoquery --file '/usr/*bin/*[a-z]ctl' | sed 's@\(.*\)-[0-9]:.*@\1@' | sort | uniq | wc -l Hyphens are fairly common as well, and I can see them being used to group commands. E.g. the commands to configure IPMI and PPPoE all use a common prefix with a hyphen to group their commands. One could probably argue that this is the "old" way of doing things, before commands started having the verb and noun approach. I can't find any obvious alternatives to the "ctl" suffix that is commonly used. "adm" has a few entries, mostly for tools dealing with hardware. In dnf I can only find 19 packages in total that have tools ending in "adm". So the biggest problem right now is probably that we will be inconsistent with other ThinLinc commands. Some should probably be hidden in libexec (bug 3070), some might be moved to this new command (e.g. "tl-config" might become "tlctl config"), but there are some that are unclear what to do with. Looking at our competitors both Citrix and NoMachine have multiple commands, using a "ctx" and "nx" prefix respectively. No hyphen though. So one long term plan could be to migrate commands from "tl-" to just "tl".
argparse seems to be the current argument parser module in Python's standard library, with getopt and optparse being softly deprecated. So that should probably be what we look at first. It has explicit support for the kind of sub-module thinking we want. So we just to check how it fits with how we want to do the module architecture: https://docs.python.org/3/library/argparse.html#sub-commands We should also check that it parses arguments the way we want, i.e. the way most other commands do.
If we want to have a dynamic loading of our submodules, this is apparently possible now by using pkgutil: https://docs.python.org/3/library/pkgutil.html#pkgutil.iter_modules An example here: https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-namespace-packages
(In reply to Pierre Ossman from comment #30) > > We should also check that it parses arguments the way we want, i.e. the way > most other commands do. Urgh. argparse has a bunch of non-standard behaviour (e.g. allow_abbrev). It seems we have to review this extremely carefully and check that we can disable all the odd stuff. :/ There is this issue that doesn't look like it will ever be fixed: https://bugs.python.org/issue9334 So it does look like argparse might be out. :/
If we decide to go down the route of using argparse for argument parsing there is also argcomplete (https://kislyuk.github.io/argcomplete/) for automagical shell completion. I have no experience of using this tool myself, but having a single point of truth for grammar-related things is generally very much worthwhile as duplicate grammar in different forms tends to be a source of bugs.
(In reply to Pierre Ossman from comment #32) > > Urgh. argparse has a bunch of non-standard behaviour (e.g. allow_abbrev). It > seems we have to review this extremely carefully and check that we can > disable all the odd stuff. :/ Apparently that isn't as non-standard as I thought. glibc's getopt_long() has this behaviour as well, and it is not optional! Still, it is a very broken behaviour as it prevents adding more options in the future. So we really want to avoid it.
We're using this new command as an experiment for bug 7049 (provide man pages for ThinLinc commands). It seems possible to use reStructuredText and Sphinx to provide documentation to both the TAG and man pages at the same time. The tricky part seems to be how to install things. See bug 7049 for more.
We now have an initial command-line argument parser in place. One of the things it not yet properly handled is the `--' argument. This argument is used to disable option parsing for the rest of the line (e.g. for providing positional arguments starting with a hyphen). The primary decision point here is the behavior if the `--' argument is given before a subcommand: > tlctl -- session list I have taken a look at a couple of different command-line tools to see how they handle it. First out is podman which is very much broken in this regard: > $ podman -- top > Error: unrecognized command `podman top` > Did you mean this? > top Then there are git and kubectl which both refuse to parse past the `--': > $ git -- status > unknown option: -- > $ kubectl -- top > <`kubectl --help' output> And last there are docker and systemctl which both gladly accept the `--' before the subcommand(s): > $ docker -- ps > <`docker ps' output> > $ systemctl -- status vsmserver > <`systemctl status vsmserver' output>
(In reply to William Sjöblom from comment #56) After some back and forth, we've decided to go with the approach where `--' is only allowed after the last subcommand with the reasoning that the change to a more permissive approach further on is easier than the other way around.
The command line parser is now in a state that can be considered done, at least for now. The primary goal of the design was to provide a minimal command-line parsing module that adheres to the following principles: • A tree-like structure of nested parsers for providing subcommands This allows us to build a parser for each subcommand in their respective module, which can then be loaded dynamically. • Parsing error behavior should mimic already existing commands. Examples of command-line tools with similar goals include but are not limited to: git, systemctl (and similar commands part of systemd), podman, docker, and kubectl. • Options should be able to flow left-to-right on the command line, but not the other way around. Given a global option `--verbose' (thus belonging to the root parser), one should be able to write either `tlctl --verbose session list' or `tlctl session list --verbose'. This makes it simple adding common global flags to an already entered command by pressing `\uarr <space> --verbose' in the terminal. • Non-allowing towards abbreviated long options. This is to help not breaking shell scripts when adding more options in the future. • Disabling parsing of options by using `--' should only be allowed after the last subcommand. See comment 56 and comment 60.
Regarding error handling when doing the calls to VSMServer, we have identified a number of different errors: Cannot connect to VSMServer --------------------------- Happens both when the vsmserver service is not running and when the command is run on an agent machine. Permission denied ----------------- Happens when tlctl is not run with sudo. In webadm, all sessions calls are run as root. For tlctl, we should consider if all vsm calls have to be run with sudo, or if some calls should be allowed without it. Multiple vsm calls in a row --------------------------- Some functionality requires multiple vsm calls, e.g. when terminating a session we first want to get the session info and then terminate it. Here we need to consider what should happen if the info we rely on is updated in between the two calls. Bad input data -------------- We shouldn't cause any errors for non-existing input data.
I did some smoke testing on SLES12 and ran into the following stack trace: > Traceback (most recent call last): > File "/opt/thinlinc/sbin/tlctl", line 52, in <module> > i1iiiiIIIiIi ( ) > File "/opt/thinlinc/sbin/tlctl", line 40, in i1iiiiIIIiIi > ii1iIi1i11i = iI111iiIi11i ( ) > File "/opt/thinlinc/sbin/tlctl", line 23, in iI111iiIi11i > O0o000ooO = importlib . import_module ( 'thinlinc.tlctl.' + i11IiIIiii1II . name ) > AttributeError: 'tuple' object has no attribute 'name'
It was discussed if --help should redirect to man page, like git does. We'll skip that for now as so far our commands are simple enough that the --help output is sufficient. If complexity grows we can always change that later.
I've gone through the notes here and I believe we've done almost everything we've set out to do. Some last polish that remains: * We just have reference documentation (man pages). We should probably have some references to tlctl in the guide sections of the TAG as well. * "tlctl help" should redirect to --help to give familiarity with commands such as git or docker.
Everything should now be complete here.
Areas that need testing: • Options parser (William, Pierre) • man-pages (Pierre) • man • HTML TAG • PDF TAG • TAG chapter (Pierre) • Cohesion of the different subcommands • man pages • `--help' pages • exit codes • errors, warnings, and regular output • RPC (Linn) The name in parentheses denotes the primary persons involved in that area.
I've taken a look at "TAG chapter" and "Cohesion" in comment 117 using server build #2578: TAG chapter ═══════════ • Potentially clarify that the commands are to be run on the master machine. Or is that superfluous since most first-time users tend to use the same machine as master and agent? • A couple of grammatical errors Cohesion ════════ man pages ───────── Looks good and cohesive! `--help' pages ────────────── • All subcommand descriptions are in the form of actions (e.g. "List currently running sessions" or "Control a ThinLinc cluster") except for those in `tlctl session --help' and `tlctl load --help'. Note that changes here also need to be reflected in tlctl(8). • `tlctl load show --help': "info" → "information" for consistency exit codes ────────── • The `EX_*' exits in load.py are not consistent with how these are used in other places of tlctl nor how their usage is described in sysexits(3). errors, warnings, and output ──────────────────────────── • `print_error/warning' and are not used in `tlctl session terminate' and `tlctl load'. These instead roll their own prints without the `argv[0]' prefix. • `tlctl load list' prints a trailing newline while other commands do not.
I've also had a look at the RPC code in `__init__.py'. The code here is close to identical to that of tlwebadm except for the added authorization check and slight changes to exception handling which are not needed as we are not running as a daemon. The authorization check seemingly does its job and the rest of the code has been tested both as part of bug 425 and tlwebaccess. Good enough for me.
(In reply to William Sjöblom from comment #118) > TAG chapter > ═══════════ > > • Potentially clarify that the commands are to be run on the master > machine. Or is that superfluous since most first-time users tend to > use the same machine as master and agent? It's mentioned on the detailed page (the man page) for tlctl. From what I can find, we don't really mention this fact for other tools either, not even tlwebadm. So for now I'll punt the issue forward with everything else. > • A couple of grammatical errors > Fixed.
I have reviewed the fix mentioned in comment 125 and the CLI server configuration section now looks good. I have also corrected all the cohesion issues mentioned in comment 118 except for the `tlctl load' exit codes. These exit codes are a bigger design issue than just "the wrong exit code is being used" and will be handled under bug 7834 instead.
The questionable exit code mentioned in comment 118 has now been completely removed under bug 7834. So *all* complaints in that comment should now be fixed.
This looks good. I have verified the following using server build 2586: * Option parser: * Regular output is printed to stdout * --help is printed to stdout * Errors are printed to stderr * The "help" subcommand works well: * on the top level (tlctl help) * on the second level (tlctl session help) * it's not available, as intended, on the third level (tlctl load list help) * The "--help" flag works well: * on the top level (tlctl --help) * on the second level (tlctl session --help) * on the third level (tlctl load list --help) * Options are sorted in "--help" output * Subcommands are sorted in "--help" output * "--help" output looks good * "tlctl --version" works well (on all 3 levels) * Usage error appears in these scenarios: * no command ($ tlctl) * bad command ($ tlctl foo) * no subcommand ($ tlctl session) * bad subcommand ($ tlctl session foo) * nonexistent sub-subcommand ($ tlctl session list foo) * nonexistent option ($ tlctl session list --foo) * too many arguments ($ tlctl session list --agent foo bar) * Usage error is helpful * Arguments with spaces can be passed using quotation marks ($ tlctl session list --subcluster "Lab A") * Man pages: * "man tlctl" looks good and refers to "tlctl-session" and "tlctl-load" * "man tlctl-session" looks good * "man tlctl-load" looks good * Options are sorted in the man pages * Subcommand are sorted in the man pages * Long option or subcommand names makes the description appear on the next row * The content from the man pages is properly available in the TAG too