Saint is a project I built last year that was designed to replace grafana for all of my use cases. In this post I'd like to talk about what it is, how it works, why I built it, and whats next.
Historically I've used mostly ClickHouse for aggregating and storing observability data, which when combined with Grafana can provide a powerful tool for monitoring and alerting. Despite this I've actually always hated tools like Datadog or Grafana, they seem to stray so far away from the traditional systems tooling I'm comfortable with.
Grafana itself comes with a plethora of issues, just from memory I've dealt with:
Saint works on a simple premise similar to "code as configuration." Dashboards in saint are defined within TypeScript files contained within a git repository. Panels that can display graphs, tables, logs or other data on the frontend are just implemented via async functions that return structured data. Dashboards are just a collection of functions to generate panels (and their associated row/column layout), variables to allow input from users and metadata for categorization and tagging.
Lets take a look at an example dashboard to understand more how we can use code to define graphs and other visualizations. Keep in mind much of this code is still well within the prototype phase and would benefit a lot from well defined SDK/API abstractions. Those will come in the future but for we'll pretend those sharp rough edges don't exist!
The following is an example of a dashboard which uses yamon generated GPU data stored in ClickHouse for monitoring GPU temperature and utilization.
The actual code behind this dashboard remains relatively simple, we're able to use abstractions we defined elsewhere in our repository to reduce boilerplate or copypaste problems:
import { dashboard } from "@saint/sdk/mod.ts";
import { host, interval } from "../common.ts";
import { Args, graph } from "./host.ts";
// Simple functions can even be passed to the frontend for use in rendering
const tempFormat = (v: number) => `${v}° C`;
const pctFormat = (v: number) => `${(v * 100).toFixed()}%`;
// Produce a graph of temperature data based on the arguments
const temp = (args: Args) =>
graph(args, {
clauses: ["name LIKE 'gpu.nvidia.%.temperature'"],
title: "Temperature",
format: tempFormat,
});
// Produce a graph of utilization data based on the arguments
const util = (args: Args) =>
graph(args, {
clauses: ["name LIKE 'gpu.nvidia.%.utilization'"],
title: "Utilization",
format: pctFormat,
});
// Dashboards are defined like so and the layout allows for dynamically sizing your rows and columns as needed.
export const gpu = dashboard("gpu", [[temp], [util]], {
title: "GPU",
vars: [interval, { ...host, default: "eos" }],
tags: ["infra"],
refreshInterval: "15 seconds",
});
Everything required to produce this chart is defined in code, to query from ClickHouse we just use the npm module:
import { createClient } from "npm:@clickhouse/client";
export const client = createClient({
url: "http://my-clickhouse-server:8123",
});
To generate a chart we can use plot (via a very hacky mechanism that serializes the chart configuration for the frontend) and return a graph panel:
// Plot.make(...) helps reduce the data sent to the frontend due to the structure of the plot API interface
const plot = Plot.make([data], ([data]) => ({
y: { grid: true, nice: true },
marks: [
Plot.lineY(data, {
x: "when",
y: "value",
stroke: "name",
tip: "xy",
}),
Plot.ruleX(data, Plot.pointerX({ x: "when", py: "value", stroke: "red" })),
],
}));
return Panel.withOpts({ title: group }).graph(plot);
And of course, all of this is happening with TypeScript and the deno runtime, allowing us to use powerful abstractions and utilities when querying or manipulating our data (without having to figure out the SQL for it):
const lastBackupAge = async () => {
const result = await query<{ when: number }>(`
SELECT toUnixTimestamp(when) as when FROM yamon.logs
WHERE service='systemd' AND data='Finished run backups.'
ORDER BY when DESC
LIMIT 1;
`);
const lastBackupDate = new Date(result.data[0].when * 1000);
return Panel.withOpts({ grow: false }).value(
humanizeDuration(new Date().getTime() - lastBackupDate.getTime()),
{ label: "Last Backup Age" }
);
};
TypeScript gives us a massive leg up over a convoluted or bespoke UI like Grafana and Datadog offer:
On top of all this magic that just using TypeScript provides us, saint also executes all code on the server. Data is then sent to clients who are responsible for actually rendering graphs and other panels. This approach brings numerous advantages as well:
Of course, pretty dashboards are only a part of the observability story. Saint is also designed to manage alerts, and similarly, we get many of the advantages listed above. The alerting code is much less refined, but the general concept is sound and works great in my relatively limited use:
const BACKUP_WARNING_AGE = 1000 * 60 * 60 * 24;
const BACKUP_ALERT_AGE = 1000 * 60 * 60 * 28;
export const backupAgeAlert = alert("last backup age", async function* () {
const result = await query<{ when: number }>(`
SELECT toUnixTimestamp(when) as when FROM yamon.logs
WHERE service='systemd' AND data='Finished run backups.'
ORDER BY when DESC
LIMIT 1;
`);
const sinceLastBackup =
new Date().getTime() - new Date(result.data[0].when * 1000).getTime();
yield check(
"jak",
[
[Status.OK, sinceLastBackup < BACKUP_WARNING_AGE],
[Status.WARNING, sinceLastBackup < BACKUP_ALERT_AGE],
[Status.ALERT, true],
],
`Last backup was ${humanizeDuration(sinceLastBackup)} ago`
);
});
And to link saint up to some external service for tracking and managing notifications of alerts, we can still just use code:
useDiscordAlerts({
webhookURL: Deno.env.get("DISCORD_WEBHOOK_URL")!,
});
It's trivial to create custom integrations that handle alert status changes and respond accordingly.
I've been running and using saint for quite a while, and I've also had some clients play around with it. So far the feedback has been positive but a lot of polish around SDK and API interfaces is required. On top of this a lot of the existing SDK like the graphing implementation (which uses plot) is very hacky and would benefit greatly from a first-party version.
If you do have a use-case or interest in saint please feel free to reach out over the usual channels and I'll be happy to provide you access.