4916 – better/easier cluster administration

Bug 4916 - better/easier cluster administration

Summary: better/easier cluster administration

Status:	CLOSED FIXED

Alias:	None

Product:	ThinLinc
Classification:	Unclassified
Component:	Other (show other bugs)
Version:	trunk
Hardware:	PC Unknown

Importance:	P2 Normal
Target Milestone:	4.2.0
Assignee:	Henrik Andersson

URL:
Keywords:	prosaic

Depends on:
Blocks:	5189
	Show dependency tree / graph

Reported:	2013-11-25 13:33 CET by Pierre Ossman
Modified:	2015-03-20 10:19 CET (History)
CC List:	0 users

See Also:
Acceptance Criteria:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Pierre Ossman cendio

2013-11-25 13:33:23 CET

We can probably do a lot more when it comes to making sure a ThinLinc cluster is easy to administrate. We need to sort out tangible steps and prioritise them so we have an actual plan on how to improve this.

This bug will function as an investigation bug to come up with a plan.

Comment 1 Henrik Andersson cendio

2014-02-24 11:47:02 CET

The biggest disadvantage of the current design of a ThinLinc
cluster is the configuration back-end. The configuration of a
ThinLinc cluster is made local on each master and agent were
configuration needs to be synced between each node in the cluster
to propagate changed configuration.

There is a tool (tl-rsync-all) shipped with ThinLinc to ease the
task of keeping configuration in sync, however this tool relies
on configuration key /vsmserver/terminalservers which is only
valid on the ThinLinc master server(s).

Syncing configuration files from one node to another is not done
seamlessly. There are node specific configuration mixed with
shared cluster configuration. This implies that we need to
identify and separate which configuration keys that are local to
the node and which keys that are global to the ThinLinc cluster.
As an example of realted issue see bug #4952.

ThinLinc Web Administration only works on local configuration
files for the node the service is running on. This implies that
administrator needs to sync the changes of configuration files
made to the other nodes in the cluster.

Configuring a ThinLinc cluster (> 1 agent) which uses a
loadbalancer do distribute sessions over agents in the cluster
implies that all agents are configured the same way and hosts the
same profiles.


Summary of identified problems:

- Administrator needs to manually sync configuration in a
  ThinLinc cluster considering its identified problems.

- One host is required to be the "main" repository for
  configuration of the ThinLinc cluster, the ThinLinc master.

- If using HA, configuration needs to be manually synced over to
  the fail-over master, due to tl-rsync-all does not consider this
  setup.

- There are configuration keys that prevents synchronization of
  configuration to all nodes without manual intervention on each
  node. See bug #4952 as an example.

- tl-rsync-all requires that root is allowed to ssh into nodes in
  the ThinLinc cluster.

- ThinLinc Web Administrator follows the same restriction as the
  configuration files. It shold only be used on master and to
  sync changes to a cluster manual intervention is needed
  eg. tl-rsync-all.

- There are also other configuration files that need to be
  synchronized such as TLDC related parts, x[startup|logout].d,
  session[startup|reconnect].d among others. We need to indentify
  problems and restrictions on the environment when syncing these.

Comment 2 Henrik Andersson cendio

2014-02-24 13:22:14 CET

Citrix XenApp Farm stores it's configuration in a centralized
datastore [1] (SQL server). It seems that all farm related
information is stored into this datastore and it is a single
point of failure.

[1] Reference http://support.citrix.com/proddocs/topic/xenapp65-planning/ps-planning-datastore-intro-v2.html

Comment 3 Henrik Andersson cendio

2014-02-24 13:25:47 CET

A good source of tools and their use for administrators of a XenApp farm.

http://support.citrix.com/proddocs/topic/xenapp65-admin/ps-commands-wrapper-v2.html

Comment 4 Henrik Andersson cendio

2014-02-24 13:28:32 CET

(In reply to comment #3)
> A good source of tools and their use for administrators of a XenApp farm.
> 
> http://support.citrix.com/proddocs/topic/xenapp65-admin/ps-commands-wrapper-v2.html

We have a bug realted to a administration tool for ThinLinc; bug #3707

Comment 5 Henrik Andersson cendio

2014-02-24 13:41:06 CET

I couldn't find information on how NoMachine stores its configuration but it appears to be distributed (synced) to the other nodes in the cluster indicated by their [1] Server Administators Guide - Advanced Features.

[1] https://www.nomachine.com/DT09K00058#13

Comment 6 Henrik Andersson cendio

2014-02-24 13:44:06 CET

(In reply to comment #5)
> I couldn't find information on how NoMachine stores its configuration but it
> appears to be distributed (synced) to the other nodes in the cluster indicated
> by their [1] Server Administators Guide - Advanced Features.
> 
> [1] https://www.nomachine.com/DT09K00058#13

NoMachine uses configuration files as we do as indicated by
https://www.nomachine.com/DT09K00059#2.1.

Comment 7 Henrik Andersson cendio

2014-02-24 14:33:15 CET

I looked into how people tend to sync configuration of apache web cluster a lot of results points into homemade scripts using rsync, however for enterprise use configuration seems to end up using puppet and cfengine.

There were also other approaches were version control system (svn/git) was used to maintain the configuration files in a cluster. Actually one of our customers uses git to version control their configuration and to deploy a pull is performed on the nodes.

There is a good source of information in the [1] bootstrap paper about configuration and why "pull" methodology is a win over a "push".

[1] http://www.infrastructures.org/papers/bootstrap/bootstrap.html

Comment 8 Henrik Andersson cendio

2014-02-25 07:54:18 CET

Consider storing configuration into a datastore instead of plain text config files...

- Live changes hooks for changes ?

- What if, sessionstore would live in the same place and other persistent data / states ?

- How would this conflict with the current "config file" approach ?

- Restrictions / drawbacks over configuratio files

- Central vs. Distributed

- Would benefits win over complexity ?

Comment 9 Henrik Andersson cendio

2014-02-25 07:59:47 CET

(In reply to comment #8)
> - Live changes hooks for changes ?
> 

Clarification; Master and Agent is listening on configuration changes and applies them live. Is there any benefits for this ? How to handle out of synced configuration, eg. 2 of three agents updated their configuration but one missed it for any reason... Should configuration be partitioned into live update-able and non live update-able...

Comment 10 Henrik Andersson cendio

2014-02-25 08:01:20 CET

(In reply to comment #9)
> (In reply to comment #8)
> > - Live changes hooks for changes ?
> > 
> 
> Clarification; Master and Agent is listening on configuration changes and
> applies them live. Is there any benefits for this ? How to handle out of synced
> configuration, eg. 2 of three agents updated their configuration but one missed
> it for any reason... Should configuration be partitioned into live update-able
> and non live update-able...

This is in contrast to, having configuration to propagate using push or pull by a command.

Comment 11 Henrik Andersson cendio

2014-02-25 08:23:53 CET

Could we use anything else as a backend for configuration files and what would the pros and cons be:

* git / svn
Pros: Inherits same pros as using normal configuration files. Configuration
changes are version-ed. git or svn is well known by administrators.
Hooks could be used to react on changes.
Cons: Central source for configuration, however it's not a single point of
failure.

* distributed datastore
Pros: We could remove all "communication" between master and agents and use the
datastore for pushing around data in the cluster. Service and admin tools
use the same standardize api no matter which data is needed from cluster.
Supports live hooks, a service can listen on data changes, think web
administration and other viewers.
Cons: Complexity. Binary format, can however be solved with a load/dump into
current configuration file format, for easy editing.

* database
Pros: Inherits the same pros as distributed datastore above. Easy to manage by
administrators due to well know technique.
Cons: A database server is required by ThinLinc. Single point of failure,
however this could be solved by administrator but needs deeper knowledge
about the database. However we can't provide a non single point of
failure setup, out-of-the-box.

* Configuration files
Pros: Easy to handle, well known by any user.
Cons: Not optimal in a cluster were syncing of configurations are needed, this
requires expertise by adminsitrator to accomplish correctly.

Comment 12 Henrik Andersson cendio

2014-02-25 08:34:00 CET

(In reply to comment #11)
> Could we use anything else as a backend for configuration files and what would
> the pros and cons be:
> 
> * git / svn
>  Pros: Inherits same pros as using normal configuration files. Configuration 
>        changes are version-ed. git or svn is well known by administrators.
>        Hooks could be used to react on changes.
>  Cons: Central source for configuration, however it's not a single point of   
>        failure.     
> 
> * distributed datastore
>  Pros: We could remove all "communication" between master and agents and use
> the   
>        datastore for pushing around data in the cluster. Service and admin
> tools 
>        use the same standardize api no matter which data is needed from
> cluster.
>        Supports live hooks, a service can listen on data changes, think web 
>        administration and other viewers.
>  Cons: Complexity. Binary format, can however be solved with a load/dump into 
>        current configuration file format, for easy editing.
> 
> * database
>  Pros: Inherits the same pros as distributed datastore above. Easy to manage by 
>        administrators due to well know technique. 
>  Cons: A database server is required by ThinLinc. Single point of failure, 
>        however this could be solved by administrator but needs deeper knowledge 
>        about the database. However we can't provide a non single point of
>        failure setup, out-of-the-box.
> 
> * Configuration files
>  Pros: Easy to handle, well known by any user.
>  Cons: Not optimal in a cluster were syncing of configurations are needed, this 
>        requires expertise by adminsitrator to accomplish correctly.

Could a configuration management tool be used such as puppet / cfengine ?

Comment 13 Henrik Andersson cendio

2014-02-27 08:01:05 CET

There are cluster configuration that goes live in production as soon its stored on disk such as profiles.hconf, TLDC among others. Do we need to stage configuration in cluster ? eg. any changes does not go live before administrator use a operation to take it live. Considering that we have configuration that either goes live directly or at a service restart we need to make all configuration behave the same way.

Comment 14 Henrik Andersson cendio

2014-02-27 08:29:56 CET

Services and configuration in a ThinLinc cluster is separated into two parts, master and agent which implies that all servers in a ThinLinc cluster is not transparently the same as each other. An administrator can't consider each "node" in a ThinLinc cluster as the same in regards of design, monitoring, configuration management etc.

This is somewhat unclear in documentation and also an uncommon approach for a cluster. Let say if we bundled master and agent service into a ThinLinc "node" and they shared the same datastores / configuration which would simplify the whole setup. And administration of the cluster would generally be simplified as he could consider any "node" server in the ThinLinc cluster as a clone of each others.

Comment 15 Henrik Andersson cendio

2014-02-27 11:34:48 CET

How to upgrade a ThinLinc cluster from version X to Y, should we provide tools to simplify this task ? can we or should we do it at all ?

Comment 17 Henrik Andersson cendio

2014-03-11 08:01:31 CET

Report finished for further investigation.

Note You need to log in before you can comment on or make changes to this bug.