We can probably do a lot more when it comes to making sure a ThinLinc cluster is easy to administrate. We need to sort out tangible steps and prioritise them so we have an actual plan on how to improve this. This bug will function as an investigation bug to come up with a plan.
The biggest disadvantage of the current design of a ThinLinc cluster is the configuration back-end. The configuration of a ThinLinc cluster is made local on each master and agent were configuration needs to be synced between each node in the cluster to propagate changed configuration. There is a tool (tl-rsync-all) shipped with ThinLinc to ease the task of keeping configuration in sync, however this tool relies on configuration key /vsmserver/terminalservers which is only valid on the ThinLinc master server(s). Syncing configuration files from one node to another is not done seamlessly. There are node specific configuration mixed with shared cluster configuration. This implies that we need to identify and separate which configuration keys that are local to the node and which keys that are global to the ThinLinc cluster. As an example of realted issue see bug #4952. ThinLinc Web Administration only works on local configuration files for the node the service is running on. This implies that administrator needs to sync the changes of configuration files made to the other nodes in the cluster. Configuring a ThinLinc cluster (> 1 agent) which uses a loadbalancer do distribute sessions over agents in the cluster implies that all agents are configured the same way and hosts the same profiles. Summary of identified problems: - Administrator needs to manually sync configuration in a ThinLinc cluster considering its identified problems. - One host is required to be the "main" repository for configuration of the ThinLinc cluster, the ThinLinc master. - If using HA, configuration needs to be manually synced over to the fail-over master, due to tl-rsync-all does not consider this setup. - There are configuration keys that prevents synchronization of configuration to all nodes without manual intervention on each node. See bug #4952 as an example. - tl-rsync-all requires that root is allowed to ssh into nodes in the ThinLinc cluster. - ThinLinc Web Administrator follows the same restriction as the configuration files. It shold only be used on master and to sync changes to a cluster manual intervention is needed eg. tl-rsync-all. - There are also other configuration files that need to be synchronized such as TLDC related parts, x[startup|logout].d, session[startup|reconnect].d among others. We need to indentify problems and restrictions on the environment when syncing these.
Citrix XenApp Farm stores it's configuration in a centralized datastore [1] (SQL server). It seems that all farm related information is stored into this datastore and it is a single point of failure. [1] Reference http://support.citrix.com/proddocs/topic/xenapp65-planning/ps-planning-datastore-intro-v2.html
A good source of tools and their use for administrators of a XenApp farm. http://support.citrix.com/proddocs/topic/xenapp65-admin/ps-commands-wrapper-v2.html
(In reply to comment #3) > A good source of tools and their use for administrators of a XenApp farm. > > http://support.citrix.com/proddocs/topic/xenapp65-admin/ps-commands-wrapper-v2.html We have a bug realted to a administration tool for ThinLinc; bug #3707
I couldn't find information on how NoMachine stores its configuration but it appears to be distributed (synced) to the other nodes in the cluster indicated by their [1] Server Administators Guide - Advanced Features. [1] https://www.nomachine.com/DT09K00058#13
(In reply to comment #5) > I couldn't find information on how NoMachine stores its configuration but it > appears to be distributed (synced) to the other nodes in the cluster indicated > by their [1] Server Administators Guide - Advanced Features. > > [1] https://www.nomachine.com/DT09K00058#13 NoMachine uses configuration files as we do as indicated by https://www.nomachine.com/DT09K00059#2.1.
I looked into how people tend to sync configuration of apache web cluster a lot of results points into homemade scripts using rsync, however for enterprise use configuration seems to end up using puppet and cfengine. There were also other approaches were version control system (svn/git) was used to maintain the configuration files in a cluster. Actually one of our customers uses git to version control their configuration and to deploy a pull is performed on the nodes. There is a good source of information in the [1] bootstrap paper about configuration and why "pull" methodology is a win over a "push". [1] http://www.infrastructures.org/papers/bootstrap/bootstrap.html
Consider storing configuration into a datastore instead of plain text config files... - Live changes hooks for changes ? - What if, sessionstore would live in the same place and other persistent data / states ? - How would this conflict with the current "config file" approach ? - Restrictions / drawbacks over configuratio files - Central vs. Distributed - Would benefits win over complexity ?
(In reply to comment #8) > - Live changes hooks for changes ? > Clarification; Master and Agent is listening on configuration changes and applies them live. Is there any benefits for this ? How to handle out of synced configuration, eg. 2 of three agents updated their configuration but one missed it for any reason... Should configuration be partitioned into live update-able and non live update-able...
(In reply to comment #9) > (In reply to comment #8) > > - Live changes hooks for changes ? > > > > Clarification; Master and Agent is listening on configuration changes and > applies them live. Is there any benefits for this ? How to handle out of synced > configuration, eg. 2 of three agents updated their configuration but one missed > it for any reason... Should configuration be partitioned into live update-able > and non live update-able... This is in contrast to, having configuration to propagate using push or pull by a command.
Could we use anything else as a backend for configuration files and what would the pros and cons be: * git / svn Pros: Inherits same pros as using normal configuration files. Configuration changes are version-ed. git or svn is well known by administrators. Hooks could be used to react on changes. Cons: Central source for configuration, however it's not a single point of failure. * distributed datastore Pros: We could remove all "communication" between master and agents and use the datastore for pushing around data in the cluster. Service and admin tools use the same standardize api no matter which data is needed from cluster. Supports live hooks, a service can listen on data changes, think web administration and other viewers. Cons: Complexity. Binary format, can however be solved with a load/dump into current configuration file format, for easy editing. * database Pros: Inherits the same pros as distributed datastore above. Easy to manage by administrators due to well know technique. Cons: A database server is required by ThinLinc. Single point of failure, however this could be solved by administrator but needs deeper knowledge about the database. However we can't provide a non single point of failure setup, out-of-the-box. * Configuration files Pros: Easy to handle, well known by any user. Cons: Not optimal in a cluster were syncing of configurations are needed, this requires expertise by adminsitrator to accomplish correctly.
(In reply to comment #11) > Could we use anything else as a backend for configuration files and what would > the pros and cons be: > > * git / svn > Pros: Inherits same pros as using normal configuration files. Configuration > changes are version-ed. git or svn is well known by administrators. > Hooks could be used to react on changes. > Cons: Central source for configuration, however it's not a single point of > failure. > > * distributed datastore > Pros: We could remove all "communication" between master and agents and use > the > datastore for pushing around data in the cluster. Service and admin > tools > use the same standardize api no matter which data is needed from > cluster. > Supports live hooks, a service can listen on data changes, think web > administration and other viewers. > Cons: Complexity. Binary format, can however be solved with a load/dump into > current configuration file format, for easy editing. > > * database > Pros: Inherits the same pros as distributed datastore above. Easy to manage by > administrators due to well know technique. > Cons: A database server is required by ThinLinc. Single point of failure, > however this could be solved by administrator but needs deeper knowledge > about the database. However we can't provide a non single point of > failure setup, out-of-the-box. > > * Configuration files > Pros: Easy to handle, well known by any user. > Cons: Not optimal in a cluster were syncing of configurations are needed, this > requires expertise by adminsitrator to accomplish correctly. Could a configuration management tool be used such as puppet / cfengine ?
There are cluster configuration that goes live in production as soon its stored on disk such as profiles.hconf, TLDC among others. Do we need to stage configuration in cluster ? eg. any changes does not go live before administrator use a operation to take it live. Considering that we have configuration that either goes live directly or at a service restart we need to make all configuration behave the same way.
Services and configuration in a ThinLinc cluster is separated into two parts, master and agent which implies that all servers in a ThinLinc cluster is not transparently the same as each other. An administrator can't consider each "node" in a ThinLinc cluster as the same in regards of design, monitoring, configuration management etc. This is somewhat unclear in documentation and also an uncommon approach for a cluster. Let say if we bundled master and agent service into a ThinLinc "node" and they shared the same datastores / configuration which would simplify the whole setup. And administration of the cluster would generally be simplified as he could consider any "node" server in the ThinLinc cluster as a clone of each others.
How to upgrade a ThinLinc cluster from version X to Y, should we provide tools to simplify this task ? can we or should we do it at all ?
Report finished for further investigation.