Bug 3707 - Command line administration tools corresponding to tlwebadm interface
Summary: Command line administration tools corresponding to tlwebadm interface
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Other (show other bugs)
Version: 3.1.2
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.15.0
Assignee: William Sjöblom
URL:
Keywords: relnotes, samuel_tester
Depends on: 425 5268 7833 7834
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-10 10:16 CET by Henrik Andersson
Modified: 2022-04-21 16:59 CEST (History)
3 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Henrik Andersson cendio 2011-01-10 10:16:47 CET
A commandline tool which can query information from ThinLinc 
environment using the xml-rpc interface that tladm webmin module 
uses.

Providing functionality such as querying information like 
all sessions, session by specific server, server information
and more.
Also actions should be handled such as terminating a session,
sending notify message to specific user or server.

Examples of usage and output (no action argument = query):

# List sessions on specific server
tl-adm sessions @myserver

SessId  User                 State      Connected
--------------------------------------------------------------
#1       hean01@myserver    Active     2011-01-06 08:32:22
#2       astrand@myserver   Inactive   2011-01-03 08:02:43
#3       astrand@myserver   Active     2010-12-20 16:20:10

# Terminate session 2 on servername
tl-adm session #2@myserver terminate

# Show user session information can show more entries
# if user have more than one session on server
tl-adm session hean01@myserver

# Notify all users on specific server
tl-adm notify @myserver "The system is restarted in 10 min, please save your work and log out"

And here's follows a examples of script usage
maintainence.sh:

#!/bin/sh
# take the current server offline, no new connections
tl-adm server @$1 disable
# notify current sessions online
tl-adm notify @$1 "The system is going down for maintainance in 10 minutes, please save your work, log out and log again"
# wait 10 minutes
sleep 10m
# Notify of shutdown
tl-adm notify @$1 "Shutting down system, your session will be logged out" 
# Terminates all active sessions
tl-adm server @$1 offline
Comment 2 Peter Åstrand cendio 2013-03-13 08:40:20 CET
Some interim commands:

http://lists.cendio.se/pipermail/thinlinc-technical/2013-January/000230.html
Comment 3 Pierre Ossman cendio 2015-03-18 14:34:52 CET
A first prototype of such a tool is available on bug 5268. It probably needs some cleanup to promote code reuse, but otherwise it's rather close to the target. It however uses the local Unix socket to talk to vsmserver, which means it can only be used on the same machine as the master.

(tlwebadm has the same limitation right now though)
Comment 6 Henrik Andersson cendio 2018-10-05 13:25:15 CEST
> A first prototype of such a tool is available on bug 5268. It probably needs
> some cleanup to promote code reuse, but otherwise it's rather close to the
> target. It however uses the local Unix socket to talk to vsmserver, which means
> it can only be used on the same machine as the master.
> 
> (tlwebadm has the same limitation right now though)
>

Currently there are a rudiment remote api over XML-RPC designed and used for IPC between master and agent. The ACL is also rudiment and builds upon lowport and list of known addresses.

However the current implemented ACL puts a restrain to tlwebadm in such way that it for example can't do authorized calls remotely to master and agent. This is the reason to why tlwebadm can't administrate the full cluster and only works locally for master and agent server.

Doors open up for cluster administration, if we add another authorization mechanism that is not "bound to process running as root on server were rpc service is running".
Comment 20 William Sjöblom cendio 2022-02-01 14:13:04 CET
After our start up meeting we've come up with a non-exhaustive list of
things that need to be done for this bug:

• Investigate and write user stories
• Compile acceptance criteria from the user stories
• Naming
  • Find a terminology for the command line interface that matches the
    rest of the product.
  • Come up with a name for the tool. Preferably by looking at
    conventions used by similar tools.
    • In the case we settle with something along the lines of `tlctl',
      do we use a hyphen or not? We have historically been inconsistent
      regarding this.
• Generating man pages from ReStructured Text such that the
  documentation for the tool can be used in both the TAG and man pages.
• Setting up a modular base architecture for the tool. During the
  initial meeting we discussed placing this tool in a subdirectory of
  tlmisc/.
• Descide how we should handle future bugs related to this tool? Do we
  add a new bugzilla component or place bugs related to this tool under
  an already existing component?


When we are at a stage where we can start working on the tool it self,
these are some of the things that have to be done:
• Implementing parsing of command line input.
  • Investigate suitable command line parsing libraries available in
    python.
• Implementing communication with the master and agent.
  • Possibly finding a neat and typable unique session identifier to
    ease command line interaction. This can possibly be somewhat tricky
    since we want the identifier to be very much unique to a session.
• Implementation of pretty printing and formatting.


These are all relatively parallelizable.
Comment 21 William Sjöblom cendio 2022-02-01 14:14:34 CET
We also found this document about writing sane command line tools which might be of use https://clig.dev/
Comment 22 William Sjöblom cendio 2022-02-01 14:25:00 CET
As an initial effort, we plan on giving the tool session management
capabilities such as:
• Listing sessions
• Terminating sessions
• Abandoning sessions


Since we have future plans on making this tool relatively expansive,
with multiple different subcommands, a possible idea is to give it
another subcommand apart from that for session management. This would
give the benefit of forcing us to make the architecture modular while
giving the users a hint that the tool can be used for things other than
session management. A possible candidate for this would be a subcommand
for showing load information. The load information is read only and
likely easy to implement.
Comment 23 William Sjöblom cendio 2022-02-01 14:59:14 CET
One possible way to gather good user stories solidly grounded in reality would be to make a post on the ThinLinc community discussing the matter.
Comment 24 Linn cendio 2022-02-02 14:11:40 CET
We did some brainstorming and came up with the following user stories.

Terminate session
═════════════════
Someone calls to support and has issues with their session.
Then I want to be able to terminate that session, so that I won't have to guide an inexperienced user to do the same operation through the client. 

Without the tool, you have to terminate the session through TLWebadm.


Find running session
════════════════════
Someone calls to support and asks for a session that they could reach at time X, but can't anymore.
Then I want to be able to check if that session is still running, so that I know where to concentrate the troubleshooting effort.

Without the tool, you have to check the session through TLWebadm.


Killing zombie sessions
═══════════════════════
Some sessions on some agents have been abandoned - they aren't shown in the session database, and the user can't reach them.
Then I want to find all the abandoned sessions, so that I can kill them.

Without the tool, you have to go to each agent and list all running sessions, find which of those are not listed in TLWebadm, and kill those. 


Drain agent
═══════════
Some agents are to be decommissioned, and we want them empty before that. 
Then I want to set an agent in "drainage" mode, such that the agent is up but doesn't allow any new sessions, so that eventually becomes free of sessions such that it can be upgraded or decommissioned.

Without the tool, you have to follow the steps below, which are non-trivial:
https://www.cendio.com/resources/docs/tag-devel/html/config_cluster.html#stopping-new-session-creation-on-select-agents-in-a-cluster


Verify empty agent
══════════════════
We are about to decommission an agent.
then I want to check that it doesn't have any running sessions, so that the users are unaffected.

Without the tool, you either have to go through the sessions in TLWebadm, or check that no sessions are running on each agent machine.

Performance troubleshooting
══════════════════════════
Someone reported performance issues somewhere in the cluster.
Then I want to identify which agent(s) are affected and which session on that agent is responsible, so that I can identify and act on problematic sessions. 

Without the tool, you have to check each individual agent for resource hogs.


Abandoning sessions
═══════════════════
An agent exploded and is lost. 
Then I want to be able to abandon all sessions running on that specific agent,
so that the user is not prompted to manually abandon it the next time they connect.

Without the tool, you have to go through the list of sessions in TLWebadm, and abandon each affected session.


Verify upgrade
══════════════
Some agents have been upgraded.
I want to list the agents and their corresponding versions, so I can verify that the correct ones were upgraded.

Without the tool, you have to go to each agent machine and check /opt/thinlinc/etc/thinlinc-release.
Comment 25 William Sjöblom cendio 2022-02-02 14:39:51 CET
On top of the user stories in comment 24, there are also likely some stories related to obtaining statistics from a list of sessions. We've gotten customer requests in the past about this as it can help in determining a suitable number of licenses and in tuning parameters things like session lifetime limiting.
Comment 26 Linn cendio 2022-02-03 16:19:41 CET
For the name of the tool, the current suggestion is tlctl, I've looked into our naming standard and the standard for other commands.

Most of our script names use hyphens. This includes both scripts that can be used by an admin, like tl-config and tl-notify, and scripts only for internal use. On the other hand, 'ctl' is usually added as a suffix to the script name without any hyphen, e.g. journalctl, systemctl and hostnamectl.

We also discussed some other options for the name instead of tlctl, but didn't come up with any that seemed better:

 * tl-cmd
   - Issue: Kinda sounds like something will be executed
 * tl-adm
   - Issue: May cause confusion since tlwebadm sometimes is just called tladm
   
For now, tlctl is probably the best option. It doesn't follow the standard of our scripts, but sysadmins are probably more familiar with systemctl etc. than our scripts, so it should still feel like a quite evident name.
Comment 27 Linn cendio 2022-02-03 16:23:53 CET
There are a few common approaches to the syntax of subcommands.

One popular approach is format <command noun verb> or <command verb noun>, used by e.g. docker and IBM Cloud Kubernetes Service [1]. Both are quite straightforward, but it seems like the <noun verb> format is more common. One thing to consider though is if the noun should be in plural or singular, but singular seems to be the preferred one.

> docker container create
> ibmcloud ks cluster ls

[1]: https://uxdesign.cc/user-experience-clis-and-breaking-the-world-baed8709244f


Command 'loginctl' uses another strategy to get around the plural or singular issue, though it doesn't really feel as neat as the <noun verb> approach.

> loginctl list-sessions
> loginsctl session-status

We decided to use the <noun verb> approach. In this first version, we will use "long names" (e.g. list instead of ls), and later shorten command names if needed.
Comment 28 Linn cendio 2022-02-03 16:58:46 CET
We should also be sure that tlctl uses the same terminology as in WebAdmin. 

We have used 'terminate' when killing a session, and when listing the sessions we should keep the column headers the same as for the sessions under Status -> Sessions.

Otherwise, I don't see anything that seems relevant at this stage, but when adding future functionality we should revisit the corresponding parts in WebAdmin.
Comment 29 Pierre Ossman cendio 2022-02-04 13:01:56 CET
(In reply to Linn from comment #26)
> Most of our script names use hyphens. This includes both scripts that can be
> used by an admin, like tl-config and tl-notify, and scripts only for
> internal use. On the other hand, 'ctl' is usually added as a suffix to the
> script name without any hyphen, e.g. journalctl, systemctl and hostnamectl.
> 

This has become very popular recently with systemd as it includes no less than 16 commands ending in "ctl".

However it is not the only one, as my system has 40 such commands, meaning there are 24 commands that are not systemd related. Extending this I can search in dnf¹ and find 144 packages with tools ending in "ctl". So I'd say it's a fairly common naming these days.

¹ dnf repoquery --file '/usr/*bin/*[a-z]ctl' | sed 's@\(.*\)-[0-9]:.*@\1@' | sort | uniq | wc -l

Hyphens are fairly common as well, and I can see them being used to group commands. E.g. the commands to configure IPMI and PPPoE all use a common prefix with a hyphen to group their commands. One could probably argue that this is the "old" way of doing things, before commands started having the verb and noun approach.

I can't find any obvious alternatives to the "ctl" suffix that is commonly used. "adm" has a few entries, mostly for tools dealing with hardware. In dnf I can only find 19 packages in total that have tools ending in "adm".

So the biggest problem right now is probably that we will be inconsistent with other ThinLinc commands. Some should probably be hidden in libexec (bug 3070), some might be moved to this new command (e.g. "tl-config" might become "tlctl config"), but there are some that are unclear what to do with.

Looking at our competitors both Citrix and NoMachine have multiple commands, using a "ctx" and "nx" prefix respectively. No hyphen though. So one long term plan could be to migrate commands from "tl-" to just "tl".
Comment 30 Pierre Ossman cendio 2022-02-04 14:06:30 CET
argparse seems to be the current argument parser module in Python's standard library, with getopt and optparse being softly deprecated. So that should probably be what we look at first.

It has explicit support for the kind of sub-module thinking we want. So we just to check how it fits with how we want to do the module architecture:

https://docs.python.org/3/library/argparse.html#sub-commands

We should also check that it parses arguments the way we want, i.e. the way most other commands do.
Comment 31 Pierre Ossman cendio 2022-02-04 14:08:00 CET
If we want to have a dynamic loading of our submodules, this is apparently possible now by using pkgutil:

https://docs.python.org/3/library/pkgutil.html#pkgutil.iter_modules

An example here:

https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-namespace-packages
Comment 32 Pierre Ossman cendio 2022-02-04 14:50:14 CET
(In reply to Pierre Ossman from comment #30)
> 
> We should also check that it parses arguments the way we want, i.e. the way
> most other commands do.

Urgh. argparse has a bunch of non-standard behaviour (e.g. allow_abbrev). It seems we have to review this extremely carefully and check that we can disable all the odd stuff. :/

There is this issue that doesn't look like it will ever be fixed:

https://bugs.python.org/issue9334

So it does look like argparse might be out. :/
Comment 33 William Sjöblom cendio 2022-02-07 08:37:49 CET
If we decide to go down the route of using argparse for argument parsing there is also argcomplete (https://kislyuk.github.io/argcomplete/) for automagical shell completion. I have no experience of using this tool myself, but having a single point of truth for grammar-related things is generally very much worthwhile as duplicate grammar in different forms tends to be a source of bugs.
Comment 34 Pierre Ossman cendio 2022-02-07 09:26:05 CET
(In reply to Pierre Ossman from comment #32)
> 
> Urgh. argparse has a bunch of non-standard behaviour (e.g. allow_abbrev). It
> seems we have to review this extremely carefully and check that we can
> disable all the odd stuff. :/

Apparently that isn't as non-standard as I thought. glibc's getopt_long() has this behaviour as well, and it is not optional!

Still, it is a very broken behaviour as it prevents adding more options in the future. So we really want to avoid it.
Comment 44 Pierre Ossman cendio 2022-02-10 14:00:52 CET
We're using this new command as an experiment for bug 7049 (provide man pages for ThinLinc commands). It seems possible to use reStructuredText and Sphinx to provide documentation to both the TAG and man pages at the same time. The tricky part seems to be how to install things. See bug 7049 for more.
Comment 56 William Sjöblom cendio 2022-02-14 12:45:38 CET
We now have an initial command-line argument parser in place.

One of the things it not yet properly handled is the `--' argument. This argument is used to disable option parsing for the rest of the line (e.g. for providing positional arguments starting with a hyphen).

The primary decision point here is the behavior if the `--' argument is given before a subcommand:
> tlctl -- session list

I have taken a look at a couple of different command-line tools to see how they handle it. First out is podman which is very much broken in this regard:
> $ podman -- top
> Error: unrecognized command `podman top`
> Did you mean this?
>	top
Then there are git and kubectl which both refuse to parse past the `--':
> $ git -- status
> unknown option: --
> $ kubectl -- top
> <`kubectl --help' output>
And last there are docker and systemctl which both gladly accept the `--' before the subcommand(s):
> $ docker -- ps
> <`docker ps' output>
> $ systemctl -- status vsmserver
> <`systemctl status vsmserver' output>
Comment 60 William Sjöblom cendio 2022-02-14 14:50:43 CET
(In reply to William Sjöblom from comment #56)

After some back and forth, we've decided to go with the approach where `--' is only allowed after the last subcommand with the reasoning that the change to a more permissive approach further on is easier than the other way around.
Comment 61 William Sjöblom cendio 2022-02-15 09:30:38 CET
The command line parser is now in a state that can be considered done,
at least for now. The primary goal of the design was to provide a
minimal command-line parsing module that adheres to the following
principles:

• A tree-like structure of nested parsers for providing subcommands

  This allows us to build a parser for each subcommand in their
  respective module, which can then be loaded dynamically.

• Parsing error behavior should mimic already existing commands.

  Examples of command-line tools with similar goals include but are not
  limited to: git, systemctl (and similar commands part of systemd),
  podman, docker, and kubectl.

• Options should be able to flow left-to-right on the command line, but
  not the other way around.

  Given a global option `--verbose' (thus belonging to the root
  parser), one should be able to write either `tlctl --verbose session
  list' or `tlctl session list --verbose'. This makes it simple adding
  common global flags to an already entered command by pressing `\uarr
  <space> --verbose' in the terminal.

• Non-allowing towards abbreviated long options.

  This is to help not breaking shell scripts when adding more options 
  in the future.

• Disabling parsing of options by using `--' should only be allowed
  after the last subcommand.

  See comment 56 and comment 60.
Comment 66 Linn cendio 2022-03-01 12:22:09 CET
Regarding error handling when doing the calls to VSMServer, we have identified a number of different errors:

Cannot connect to VSMServer
---------------------------
Happens both when the vsmserver service is not running and when the command is run on an agent machine.

Permission denied
-----------------
Happens when tlctl is not run with sudo. In webadm, all sessions calls are run as root. For tlctl, we should consider if all vsm calls have to be run with sudo, or if some calls should be allowed without it.

Multiple vsm calls in a row
---------------------------
Some functionality requires multiple vsm calls, e.g. when terminating a session we first want to get the session info and then terminate it. Here we need to consider what should happen if the info we rely on is updated in between the two calls.

Bad input data
--------------
We shouldn't cause any errors for non-existing input data.
Comment 94 William Sjöblom cendio 2022-03-23 11:10:03 CET
I did some smoke testing on SLES12 and ran into the following stack trace:
> Traceback (most recent call last):
>   File "/opt/thinlinc/sbin/tlctl", line 52, in <module>
>     i1iiiiIIIiIi ( )
>   File "/opt/thinlinc/sbin/tlctl", line 40, in i1iiiiIIIiIi
>     ii1iIi1i11i = iI111iiIi11i ( )
>   File "/opt/thinlinc/sbin/tlctl", line 23, in iI111iiIi11i
>     O0o000ooO = importlib . import_module ( 'thinlinc.tlctl.' + i11IiIIiii1II . name )
>  AttributeError: 'tuple' object has no attribute 'name'
Comment 103 Pierre Ossman cendio 2022-04-08 14:00:30 CEST
It was discussed if --help should redirect to man page, like git does. We'll skip that for now as so far our commands are simple enough that the --help output is sufficient. If complexity grows we can always change that later.
Comment 104 Pierre Ossman cendio 2022-04-11 09:42:02 CEST
I've gone through the notes here and I believe we've done almost everything we've set out to do. Some last polish that remains:

 * We just have reference documentation (man pages). We should probably have some
   references to tlctl in the guide sections of the TAG as well.

 * "tlctl help" should redirect to --help to give familiarity with commands such
   as git or docker.
Comment 116 Pierre Ossman cendio 2022-04-12 12:28:17 CEST
Everything should now be complete here.
Comment 117 William Sjöblom cendio 2022-04-12 15:32:00 CEST
Areas that need testing:
• Options parser (William, Pierre)
• man-pages (Pierre)
  • man
  • HTML TAG
  • PDF TAG
• TAG chapter (Pierre)
• Cohesion of the different subcommands
  • man pages
  • `--help' pages
  • exit codes
  • errors, warnings, and regular output
• RPC (Linn)

The name in parentheses denotes the primary persons involved in that 
area.
Comment 118 William Sjöblom cendio 2022-04-13 13:37:41 CEST
I've taken a look at "TAG chapter" and "Cohesion" in comment 117 using
server build #2578:

TAG chapter
═══════════

  • Potentially clarify that the commands are to be run on the master
    machine. Or is that superfluous since most first-time users tend to
    use the same machine as master and agent?
  • A couple of grammatical errors

Cohesion
════════

man pages
─────────

  Looks good and cohesive!

`--help' pages
──────────────

  • All subcommand descriptions are in the form of actions (e.g. "List
    currently running sessions" or "Control a ThinLinc cluster") except
    for those in `tlctl session --help' and `tlctl load --help'. Note
    that changes here also need to be reflected in tlctl(8).
  • `tlctl load show --help': "info" → "information" for consistency

exit codes
──────────

  • The `EX_*' exits in load.py are not consistent with how these are
    used in other places of tlctl nor how their usage is described in
    sysexits(3).

errors, warnings, and output
────────────────────────────

  • `print_error/warning' and are not used in `tlctl session terminate'
     and `tlctl load'. These instead roll their own prints without the
     `argv[0]' prefix.
  • `tlctl load list' prints a trailing newline while other commands do
    not.
Comment 119 William Sjöblom cendio 2022-04-13 14:34:17 CEST
I've also had a look at the RPC code in `__init__.py'. The code here is close to identical to that of tlwebadm except for the added authorization check and slight changes to exception handling which are not needed as we are not running as a daemon. The authorization check seemingly does its job and the rest of the code has been tested both as part of bug 425 and tlwebaccess. Good enough for me.
Comment 125 Pierre Ossman cendio 2022-04-14 08:29:40 CEST
(In reply to William Sjöblom from comment #118)
> TAG chapter
> ═══════════
> 
>   • Potentially clarify that the commands are to be run on the master
>     machine. Or is that superfluous since most first-time users tend to
>     use the same machine as master and agent?

It's mentioned on the detailed page (the man page) for tlctl. From what I can find, we don't really mention this fact for other tools either, not even tlwebadm. So for now I'll punt the issue forward with everything else.

>   • A couple of grammatical errors
> 

Fixed.
Comment 127 William Sjöblom cendio 2022-04-14 09:58:44 CEST
I have reviewed the fix mentioned in comment 125 and the CLI server configuration section now looks good.

I have also corrected all the cohesion issues mentioned in comment 118 except for the `tlctl load' exit codes. These exit codes are a bigger design issue than just "the wrong exit code is being used" and will be handled under bug 7834 instead.
Comment 128 William Sjöblom cendio 2022-04-14 12:20:45 CEST
The questionable exit code mentioned in comment 118 has now been completely removed under bug 7834. So *all* complaints in that comment should now be fixed.
Comment 129 Samuel Mannehed cendio 2022-04-21 16:59:24 CEST
This looks good. I have verified the following using server build 2586:

* Option parser:
  * Regular output is printed to stdout
  * --help is printed to stdout
  * Errors are printed to stderr
  * The "help" subcommand works well:
    * on the top level (tlctl help)
    * on the second level (tlctl session help)
    * it's not available, as intended, on the third level (tlctl load list help)
  * The "--help" flag works well:
    * on the top level (tlctl --help)
    * on the second level (tlctl session --help)
    * on the third level (tlctl load list --help)
  * Options are sorted in "--help" output
  * Subcommands are sorted in "--help" output
  * "--help" output looks good
  * "tlctl --version" works well (on all 3 levels)
  * Usage error appears in these scenarios:
    * no command ($ tlctl)
    * bad command ($ tlctl foo)
    * no subcommand ($ tlctl session)
    * bad subcommand ($ tlctl session foo)
    * nonexistent sub-subcommand ($ tlctl session list foo)
    * nonexistent option ($ tlctl session list --foo)
    * too many arguments ($ tlctl session list --agent foo bar)
  * Usage error is helpful
  * Arguments with spaces can be passed using quotation marks ($ tlctl session list --subcluster "Lab A")

* Man pages:
  * "man tlctl" looks good and refers to "tlctl-session" and "tlctl-load"
  * "man tlctl-session" looks good
  * "man tlctl-load" looks good
  * Options are sorted in the man pages
  * Subcommand are sorted in the man pages
  * Long option or subcommand names makes the description appear on the next row
  * The content from the man pages is properly available in the TAG too

Note You need to log in before you can comment on or make changes to this bug.