NAMEggd2 - GRIO bandwidth management daemon
SYNOPSISggd2 [-d level] [-f] [-m bw] [-s bw] [-r percent] [-u milliseconds]
DESCRIPTIONggd2 is a user-level process, started at boot, that manages the I/O
bandwidth of a collection of GRIO-managed storage volumes.
At startup, ggd2 forks a child worker process that communicates with the
kernel and is responsible for the bulk of the GRIO processing. The parent
process monitors the activity of this child and if necessary, during a
server relocation for example, will restart it.
ggd2 is responsible for activating and deactivating the GRIO kernel I/O
scheduler, processing client requests to reserve and release bandwidth,
tracking bandwidth utilisation, managing unreserved bandwidth, and
enforcing the GRIO software licenses.
OPTIONSggd2 takes the following options:
-d level
This option enables debug mode and sets the debug level. ggd2 has 5
debug levels: 0-4 inclusive. Level 0 is the default at which only
serious error conditions are logged. Normally level 0 messages are
logged to the syslog.
When debug mode is enabled ggd2 creates an additional log file
called /var/tmp/ggd2log<pid> and logs to both the syslog and this
file. level is an integer value specifying the maximum debug level
that should be printed to this log file.
-f By default ggd2 is started as a daemon. The -f option indicates that
it should run in the foreground.
-m bw
If a volume has spare bandwidth not actively reserved by guaranteed
rate streams, ggd2 will temporarily allow this bandwidth to be
allocated to non-guaranteed user and system I/O.
The -m option takes an integer argument bw that specifies the
minimum amount of bandwidth in KB/sec that ggd2 will allocate for
non-GRIO I/O per GRIO managed volume. All nodes issuing non-GRIO I/O
will receive a fair share of this minimum bandwidth.
For example, passing '-m2048' causes ggd2 to allocate a minimum of
2048 KB/sec to each GRIO managed volume. This bandwidth becomes
permanently allocated to non-GRIO I/O and cannot be reserved for
guaranteed rate I/O. A suffix K or M can be added to explicitly
specify bandwidth in KB/sec or MB/sec. For example passing '-m3M'
causes ggd2 to allocate a minimum of 3 MB/sec to each GRIO managed
volume.
Page 1
ggd2(3X)ggd2(3X)-s bw
If a volume has spare bandwidth not actively reserved by guaranteed
rate streams, ggd2 will temporarily allow this bandwidth to be
allocated to non-guaranteed user and system I/O.
The -s option takes an integer argument bw that specifies the
minimum amount of bandwidth in KB/sec that ggd2 will allocate for
non-GRIO I/O per node. A node will be allocated the bigger value
specified by this option or option -m.
For example, passing '-s2048' causes ggd2 to allocate a minimum of
2048 KB/sec to each node accessing a GRIO managed volume. This
bandwidth becomes permanently allocated to non-GRIO I/O and cannot
be reserved for guaranteed rate I/O. A suffix K or M can be added to
explicitly specify bandwidth in KB/sec or MB/sec. For example
passing '-s3M' causes ggd2 to allocate a minimum of 3 MB/sec to each
node accessing a GRIO managed volume.
-r percent
By default ggd2 will allow any unreserved bandwidth to be allocated
for servicing non-GRIO I/O. This maximizes the total throughput of
the system. However, as ggd2 only makes adjustments to these
allocations periodically, a new reservation may block until ggd2 can
reclaim the requested bandwidth.
The -r option disables this behaviour for a proportion of each
volume's available qualified bandwidth. Reservation requests can be
serviced directly from this pool of 'cached' free bandwidth without
blocking.
percent is the percentage of each volume's qualified bandwidth that
ggd2 attempts to keep unallocated, expressed as an integer in the
range 0-100. Its value should be chosen based on the expected I/O
utilisation levels, the importance of minimizing the stream creation
latency, and the expected rate at which reservation requests will be
made.
Using this option causes a proportion of the unreserved I/O capacity
to remain unused and reduces the total throughput and efficiency of
the system for non-GRIO I/O. It should only be used if minimizing
reservation latency is a priority.
For example, given a volume with a qualified bandwidth of 200
MB/sec, passing -r20 will instruct ggd2 to try to keep upto 20% or
40 MB/sec of any remaining unreserved bandwidth cached and available
for servicing reservation requests directly. ggd2 adjusts this cache
of free bandwidth every time the Distributed Bandwidth Allocator
runs (discussed below), which defaults to once every two seconds.
Thus with these settings ggd2 will be able to grant an additional
40MB/sec every two seconds without blocking any reservation
requests.
Page 2
ggd2(3X)ggd2(3X)-u milliseconds
The ggd2 daemon periodically adjusts the amount of bandwidth
allocated to the individual non-GRIO streams for its managed
filesystems. This functionality is referred to as the distributed
bandwidth allocator (DBA). The DBA is responsible for determining
how unreserved bandwidth is distributed between the nodes with
access to the filesystem. By default, the DBA runs every two
seconds, constantly allocating free bandwidth to nodes based on a
range of dynamically monitored demand and utilization metrics.
The -u option is used to change DBA allocation interval from the
default value of two seconds (2000 milliseconds) to any value in the
range 250 to 100000. Note that the rate at which the DBA runs
affects the delay that an application or node that does not have a
GRIO reservation might experience when it starts doing I/O. The
longer the interval, the longer a node may have to wait (with its
I/O paused) before ggd2 will increase its allocation.
LOCAL VERSUS SHARED VOLUMES
Following XVM, a managed volume can be either local to a node or shared
amongst all nodes in the cluster. A local volume is said to be in the
local volume domain, shared volumes in the cluster volume domain.
With one exception, each distinct volume domain is managed by a unique
instance of the ggd2 daemon. Local volumes are always managed by the ggd2
running on the node to which they are attached. Shared volumes are
managed by a single instance of ggd2 running on one of the server-capable
nodes within the cluster. This node is referred to as the GRIO server. It
is either elected automatically or selected by the administrator.
If a node has both local volumes and has also been selected as the GRIO2
server the ggd2 running on that node will serve dual-duty and manage both
its local and the cluster volume domains.
A user process can request guaranteed rate I/O to a volume using the
grio_reserve(3X) and grio_reserve_fd(3X) library calls. Requests are
forwarded to the ggd2 actively managing the target volume domain.
Requests to volumes in the local domain are sent to the local instance of
ggd2. Requests to cluster volumes are forwarded to the GRIO server, which
may be running on a different node in the cluster.
ENCAPSULATION & DISTRIBUTED BANDWIDTH ALLOCATION
When ggd2 begins managing a volume, every node with access to that volume
is notified. Each node in turn creates a dedicated system stream for that
volume termed the nongrio stream. From that point on, all user and system
I/O that doesn't have an explicit GRIO reservation is encapsulated by
this stream and managed by the GRIO scheduler. For a locally-attached
volume there is a single nongrio stream. For a shared volume there is a
nongrio stream on each node with access to the volume.
Page 3
ggd2(3X)ggd2(3X)
To keep the total throughput of the volume high even when there are
active GRIO streams, ggd2 attempts to allocate the unreserved portion of
the qualified filesystem bandwidth for use by non-GRIO applications. This
bandwidth is effectively lent for short periods of time until ggd2
receives a new request for guaranteed rate bandwidth, at which point it
is reclaimed.
ggd2 periodically adjusts the amount of bandwidth allocated to individual
nongrio streams for its managed volumes. This functionality is referred
to as the Distributed Bandwidth Allocator or DBA. The DBA is responsible
for determining how unreserved bandwidth is distributed between the nodes
with access to the volume. The DBA runs by default every two seconds,
constantly allocating free bandwidth to nodes based on a range of
dynamically monitored demand and utilisation metrics.
Calls to reserve bandwidth may block until the next DBA cycle,
applications must therefore be prepared for delays when setting up
guaranteed rate streams. Refer to grio_reserve(3X) for more information.
To help manage this, the -r option can be used to cause ggd2 to put aside
a pool of otherwise unreserved bandwidth (i.e. not allotated for non-GRIO
use), from which new reservations can be processed directly.
GRIO VERSION 1 & 2
With the introduction of GRIO Version 2 IRIX kernels now support two
guaranteed rate I/O subsystems. Both the Version 1 (GRIOv1) and Version 2
(GRIOv2) subsystems can be installed together, however, undefined
behaviour may result if both are enabled simultaneously. As a result,
mixed operation is forcibly prevented with a kernel interlock. Only one
subsystem may be active at a time and once a subsystem has been activated
it remains active until the next reboot.
GRIO Version 2 is inactive by default. The following sequence of commands
can be used to activate it:
# chkconfig grio off
# chkconfig grio2 on
# /etc/init.d/grio2 start
Note that if GRIO Version 1 has been run at any point the system must be
rebooted to allow the kernel to be reinitialized with the GRIOv2
scheduler. This is done to ensure a user does not inadvertently activate
both real-time schedulers by mistake.
API calls disabled in the running kernel (eg. v2 calls to a v1-enabled
kernel) return ENOSYS. v2 calls return EAGAIN if the GRIOv2 subsystem has
not yet been initialized.
For both subsystems the switch to a particular mode is triggered by
startup of the userspace daemons. ggd2 enables the v2 subsystem and ggd
(the v1 bandwidth manager) enables the v1 subsystem. The daemons are in
turn controlled by chkconfig variables (grio2 and grio respectively)
along with updated init scripts. The init scripts also make checks to
Page 4
ggd2(3X)ggd2(3X)
ensure that only one userspace is running at a time.
CONFIGURING GGD2
Local volumes are configured using the information provided in
/etc/griotab. This file lists the volumes that should be managed by GRIO
and the maximum qualified bandwidth they can deliver. This file is read
at startup and whenever ggd2 receives SIGHUP (i.e. killall -HUP ggd2).
Refer to griotab(4) for more information.
If the GRIO shared volume support has been installed, ggd2 will query the
cluster configuration database (CDB) for GRIO volume configuration
information. ggd2 dynamically tracks updates to CDB. The standard cluster
administration tools such as cmgr(1M) are used to configure shared GRIO
volumes. The cmgr commands to mark a filesystem as GRIO-managed, and set
its qualified bandwidth are shown below:
$ /usr/cluster/bin/cmgr
Welcome to SGI Cluster Manager Command-Line Interface
cmgr> modify cxfs_filesystem <fs_name> in cluster <cluster_name>
cmgr> set grio_managed to true
cmgr> set grio_qualified_bandwidth to <qualified_bandwidth>
cmgr> done
To show the current status of a shared filesystem:
cmgr> show cxfs_filesystem <fs_name> in cluster <cluster_name>
...
GRIO Managed Filesystem: true
GRIO Managed Bandwidth: <qualified_bandwidth>
...
Note that currently cmgr requires that a filesystem be disabled, and
therefore unmounted, before it can be modified. This means that a GRIO
volume must be taken offline while its parameters are adjusted.
LICENSING
The GRIO licensing regime controls a number of configuration parameters
including the total number of active streams and the total aggregate
qualified bandwidth of volumes under management. Separate license types
are provided for the local and cluster volume domains, and license
constraints are enforced for each volume domain separately. GRIO uses
FLEXlm licensing, see license.dat(5)ggd2 checks the license at startup, whenever it detects a configuration
change, or when it is explicitly signalled with a SIGHUP.
License enforcement for streams is straightforward. The license for a
given volume domain specifies a maximum number of active streams. All
reservation requests above this limit are denied.
Page 5
ggd2(3X)ggd2(3X)
In the case of bandwidth, the license specifies the maximum total
aggregate qualified bandwidths for all volumes within the volume domain.
For the local domain, this would be the sum of the qualified bandwidths
listed in /etc/griotab. ggd2 validates the configuration at startup and
whenever the configuration is changed. If the configuration of a volume
domain is altered and becomes unlicensed ggd2 enters a passive mode in
which all further requests pertaining to that domain, with the exception
of release requests are denied. A message is sent to the syslog and the
volume domain remains deactivated until the configuration returns to a
licensed state. Another message will then be logged indicating the
domain is again active.
FILES
/etc/griotab
/var/flexlm/license.dat
SEE ALSOgrio2(5), grio_avail(3X), grio_bind(3X), grio_get_stream(3X),
grio_modify(3X), grio_release(3X), grio_reserve(3X), grio_unbind(3X)
Page 6