This article has been archived. Please see Coder Docs for the updated version.
Any sufficiently large Coder deployment requires proper monitoring, both from a stability/uptime perspective as well as a compute cost and performance perspective. Coder's use of Kubernetes allows platform administrators to monitor developer workspaces as you would any other server workload.
Node Utilization Metrics
CPU and Memory usage/utilization of the underlying Kubernetes Nodes constitutes the single most important metric. Excessive Node resource contention leads to the throttling of developer Environments, and excessive underutilization suggests unnecessary cloud spend.
As a platform admin, there are several tools at your disposal to balance the tradeoff between Environment performance and cloud cost. Read more here.
Development Workspace Metrics
Coder provides a set of Kubernetes resource labels that allow monitoring tools to map cluster resources to Coder product-level resource identifiers. The charts below track the "CPU/Memory Limit Utilization" of each Environment container, labeled appropriately with the "username" and "Environment name" identifiers.
Use this view to track which users may require larger CPU allocations to allow greater "burst-ability" under peak loads. But, remember that with a CPU/Memory provision rate of greater than 1:1, users may be throttled below their CPU limit if the underlying Kubernetes Node is experiencing CPU contention. The charts above can provide insight into Node CPU utilization.
Control Plane Monitoring
Monitoring the Coder control plane can aid infrastructure admins in maintaining proper uptime. The following charts provide high-level insight into the state of the Coder API server.