PerfDisk monitors the usage of every single disk. If just one disk exceeds the threshold, the check sends an alarm.
BadlyPerformingDisks is a new check that first analyses every disk, counts every disk that exceeds a defined threshold and only sends an alarm if a specified amount of disks exceed this threshold.
This simplified example shows a scenario with 6 disks:
./check_netapp_pro PerfDisk ... -w 85 -c 95
NETAPP_PRO PERFDISK WARNING - 6 disks checked, 0 critical and 3 warning
1.10.20 (/aggr1_st6_sata/plex0/rg1): 91.7% (WARNING)
1.10.19 (/aggr1_st6_sata/plex0/rg1): 91.2% (WARNING)
1.10.21 (/aggr1_st6_sata/plex0/rg1): 91.2% (WARNING)
1.10.10 (/aggr1_st6_sata/plex0/rg0): 76.9%
1.10.5 (/aggr1_st6_sata/plex0/rg0): 76.7%
1.10.14 (/aggr1_st6_sata/plex0/rg0): 76.4%
For this example we can configure BadlyPerformingDisks to return OK by defining a disk as “highly-utilized” if its usage is higher than 90%:
./check_netapp_pro BadlyPerformingDisks ... -w 80 -c 95 --highly_utilized=90
NETAPP_PRO PERFDISK OK - 6 disks checked, 3 of them (50.0%) are highly utilized(usage > 90%).
In our case, 3 disks fall into this category – so 50% for a total of 6 disks. 50% is much lower than the 80% that we set in our example, after which a WARNING will be triggered.
In order to receive a CRITICAL result with the same data from the PerfDisk check above, we have to set a lower percentage of 45%, and anything below will be regarded as OK:
./check_netapp_pro BadlyPerformingDisks ... -w 40 -c 45 --highly_utilized=90
NETAPP_PRO PERFDISK CRITICAL - 6 disks checked, 3 of them (50.0%) are highly utilized(usage > 90%).
Another option is to change the threshold used to define a disk as “highly-utilized”:
./check_netapp_pro BadlyPerformingDisks ... -w 80 -c 95 --highly_utilized=70
NETAPP_PRO PERFDISK CRITICAL - 6 disks checked, 6 of them (100.0%) are highly utilized(usage > 70%).
Evaluation per Aggregate
Just like PerfDisk, BadlyPerformingDisks can only evaluate the disks belonging to a specific aggregate. This can be done by setting the option ‑‑raid_group=pattern. For example, let’s examine all disks for aggregate aggr1 with ‑‑raid_group=^aggr1$, since pattern is interpreted as a regular expression.