Bug 4916 - better/easier cluster administration
Summary: better/easier cluster administration
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Other (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.2.0
Assignee: Henrik Andersson
URL:
Keywords: prosaic
Depends on:
Blocks: 5189
  Show dependency treegraph
 
Reported: 2013-11-25 13:33 CET by Pierre Ossman
Modified: 2015-03-20 10:19 CET (History)
0 users

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2013-11-25 13:33:23 CET
We can probably do a lot more when it comes to making sure a ThinLinc cluster is easy to administrate. We need to sort out tangible steps and prioritise them so we have an actual plan on how to improve this.

This bug will function as an investigation bug to come up with a plan.
Comment 1 Henrik Andersson cendio 2014-02-24 11:47:02 CET
The biggest disadvantage of the current design of a ThinLinc
cluster is the configuration back-end. The configuration of a
ThinLinc cluster is made local on each master and agent were
configuration needs to be synced between each node in the cluster
to propagate changed configuration.

There is a tool (tl-rsync-all) shipped with ThinLinc to ease the
task of keeping configuration in sync, however this tool relies
on configuration key /vsmserver/terminalservers which is only
valid on the ThinLinc master server(s).

Syncing configuration files from one node to another is not done
seamlessly. There are node specific configuration mixed with
shared cluster configuration. This implies that we need to
identify and separate which configuration keys that are local to
the node and which keys that are global to the ThinLinc cluster.
As an example of realted issue see bug #4952.

ThinLinc Web Administration only works on local configuration
files for the node the service is running on. This implies that
administrator needs to sync the changes of configuration files
made to the other nodes in the cluster.

Configuring a ThinLinc cluster (> 1 agent) which uses a
loadbalancer do distribute sessions over agents in the cluster
implies that all agents are configured the same way and hosts the
same profiles.


Summary of identified problems:

- Administrator needs to manually sync configuration in a
  ThinLinc cluster considering its identified problems.

- One host is required to be the "main" repository for
  configuration of the ThinLinc cluster, the ThinLinc master.

- If using HA, configuration needs to be manually synced over to
  the fail-over master, due to tl-rsync-all does not consider this
  setup.

- There are configuration keys that prevents synchronization of
  configuration to all nodes without manual intervention on each
  node. See bug #4952 as an example.

- tl-rsync-all requires that root is allowed to ssh into nodes in
  the ThinLinc cluster.

- ThinLinc Web Administrator follows the same restriction as the
  configuration files. It shold only be used on master and to
  sync changes to a cluster manual intervention is needed
  eg. tl-rsync-all.

- There are also other configuration files that need to be
  synchronized such as TLDC related parts, x[startup|logout].d,
  session[startup|reconnect].d among others. We need to indentify
  problems and restrictions on the environment when syncing these.
Comment 2 Henrik Andersson cendio 2014-02-24 13:22:14 CET
Citrix XenApp Farm stores it's configuration in a centralized
datastore [1] (SQL server). It seems that all farm related
information is stored into this datastore and it is a single
point of failure.

[1] Reference http://support.citrix.com/proddocs/topic/xenapp65-planning/ps-planning-datastore-intro-v2.html
Comment 3 Henrik Andersson cendio 2014-02-24 13:25:47 CET
A good source of tools and their use for administrators of a XenApp farm.

http://support.citrix.com/proddocs/topic/xenapp65-admin/ps-commands-wrapper-v2.html
Comment 4 Henrik Andersson cendio 2014-02-24 13:28:32 CET
(In reply to comment #3)
> A good source of tools and their use for administrators of a XenApp farm.
> 
> http://support.citrix.com/proddocs/topic/xenapp65-admin/ps-commands-wrapper-v2.html

We have a bug realted to a administration tool for ThinLinc; bug #3707
Comment 5 Henrik Andersson cendio 2014-02-24 13:41:06 CET
I couldn't find information on how NoMachine stores its configuration but it appears to be distributed (synced) to the other nodes in the cluster indicated by their [1] Server Administators Guide - Advanced Features.

[1] https://www.nomachine.com/DT09K00058#13
Comment 6 Henrik Andersson cendio 2014-02-24 13:44:06 CET
(In reply to comment #5)
> I couldn't find information on how NoMachine stores its configuration but it
> appears to be distributed (synced) to the other nodes in the cluster indicated
> by their [1] Server Administators Guide - Advanced Features.
> 
> [1] https://www.nomachine.com/DT09K00058#13

NoMachine uses configuration files as we do as indicated by
https://www.nomachine.com/DT09K00059#2.1.
Comment 7 Henrik Andersson cendio 2014-02-24 14:33:15 CET
I looked into how people tend to sync configuration of apache web cluster a lot of results points into homemade scripts using rsync, however for enterprise use configuration seems to end up using puppet and cfengine.

There were also other approaches were version control system (svn/git) was used to maintain the configuration files in a cluster. Actually one of our customers uses git to version control their configuration and to deploy a pull is performed on the nodes.

There is a good source of information in the [1] bootstrap paper about configuration and why "pull" methodology is a win over a "push".

[1] http://www.infrastructures.org/papers/bootstrap/bootstrap.html
Comment 8 Henrik Andersson cendio 2014-02-25 07:54:18 CET
Consider storing configuration into a datastore instead of plain text config files...

- Live changes hooks for changes ?

- What if, sessionstore would live in the same place and other persistent data / states ?

- How would this conflict with the current "config file" approach ?

- Restrictions / drawbacks over configuratio files

- Central vs. Distributed

- Would benefits win over complexity ?
Comment 9 Henrik Andersson cendio 2014-02-25 07:59:47 CET
(In reply to comment #8)
> - Live changes hooks for changes ?
> 

Clarification; Master and Agent is listening on configuration changes and applies them live. Is there any benefits for this ? How to handle out of synced configuration, eg. 2 of three agents updated their configuration but one missed it for any reason... Should configuration be partitioned into live update-able and non live update-able...
Comment 10 Henrik Andersson cendio 2014-02-25 08:01:20 CET
(In reply to comment #9)
> (In reply to comment #8)
> > - Live changes hooks for changes ?
> > 
> 
> Clarification; Master and Agent is listening on configuration changes and
> applies them live. Is there any benefits for this ? How to handle out of synced
> configuration, eg. 2 of three agents updated their configuration but one missed
> it for any reason... Should configuration be partitioned into live update-able
> and non live update-able...

This is in contrast to, having configuration to propagate using push or pull by a command.
Comment 11 Henrik Andersson cendio 2014-02-25 08:23:53 CET
Could we use anything else as a backend for configuration files and what would the pros and cons be:

* git / svn
 Pros: Inherits same pros as using normal configuration files. Configuration 
       changes are version-ed. git or svn is well known by administrators.
       Hooks could be used to react on changes.
 Cons: Central source for configuration, however it's not a single point of   
       failure.     

* distributed datastore
 Pros: We could remove all "communication" between master and agents and use the   
       datastore for pushing around data in the cluster. Service and admin tools 
       use the same standardize api no matter which data is needed from cluster.
       Supports live hooks, a service can listen on data changes, think web 
       administration and other viewers.
 Cons: Complexity. Binary format, can however be solved with a load/dump into 
       current configuration file format, for easy editing.

* database
 Pros: Inherits the same pros as distributed datastore above. Easy to manage by    
       administrators due to well know technique. 
 Cons: A database server is required by ThinLinc. Single point of failure, 
       however this could be solved by administrator but needs deeper knowledge 
       about the database. However we can't provide a non single point of
       failure setup, out-of-the-box.

* Configuration files
 Pros: Easy to handle, well known by any user.
 Cons: Not optimal in a cluster were syncing of configurations are needed, this 
       requires expertise by adminsitrator to accomplish correctly.
Comment 12 Henrik Andersson cendio 2014-02-25 08:34:00 CET
(In reply to comment #11)
> Could we use anything else as a backend for configuration files and what would
> the pros and cons be:
> 
> * git / svn
>  Pros: Inherits same pros as using normal configuration files. Configuration 
>        changes are version-ed. git or svn is well known by administrators.
>        Hooks could be used to react on changes.
>  Cons: Central source for configuration, however it's not a single point of   
>        failure.     
> 
> * distributed datastore
>  Pros: We could remove all "communication" between master and agents and use
> the   
>        datastore for pushing around data in the cluster. Service and admin
> tools 
>        use the same standardize api no matter which data is needed from
> cluster.
>        Supports live hooks, a service can listen on data changes, think web 
>        administration and other viewers.
>  Cons: Complexity. Binary format, can however be solved with a load/dump into 
>        current configuration file format, for easy editing.
> 
> * database
>  Pros: Inherits the same pros as distributed datastore above. Easy to manage by 
>        administrators due to well know technique. 
>  Cons: A database server is required by ThinLinc. Single point of failure, 
>        however this could be solved by administrator but needs deeper knowledge 
>        about the database. However we can't provide a non single point of
>        failure setup, out-of-the-box.
> 
> * Configuration files
>  Pros: Easy to handle, well known by any user.
>  Cons: Not optimal in a cluster were syncing of configurations are needed, this 
>        requires expertise by adminsitrator to accomplish correctly.

Could a configuration management tool be used such as puppet / cfengine ?
Comment 13 Henrik Andersson cendio 2014-02-27 08:01:05 CET
There are cluster configuration that goes live in production as soon its stored on disk such as profiles.hconf, TLDC among others. Do we need to stage configuration in cluster ? eg. any changes does not go live before administrator use a operation to take it live. Considering that we have configuration that either goes live directly or at a service restart we need to make all configuration behave the same way.
Comment 14 Henrik Andersson cendio 2014-02-27 08:29:56 CET
Services and configuration in a ThinLinc cluster is separated into two parts, master and agent which implies that all servers in a ThinLinc cluster is not transparently the same as each other. An administrator can't consider each "node" in a ThinLinc cluster as the same in regards of design, monitoring, configuration management etc.

This is somewhat unclear in documentation and also an uncommon approach for a cluster. Let say if we bundled master and agent service into a ThinLinc "node" and they shared the same datastores / configuration which would simplify the whole setup. And administration of the cluster would generally be simplified as he could consider any "node" server in the ThinLinc cluster as a clone of each others.
Comment 15 Henrik Andersson cendio 2014-02-27 11:34:48 CET
How to upgrade a ThinLinc cluster from version X to Y, should we provide tools to simplify this task ? can we or should we do it at all ?
Comment 17 Henrik Andersson cendio 2014-03-11 08:01:31 CET
Report finished for further investigation.

Note You need to log in before you can comment on or make changes to this bug.