Project Overview of Saint

3/19/202512:00:00 AM

Saint is a project I built last year that was designed to replace grafana for all of my use cases. In this post I'd like to talk about what it is, how it works, why I built it, and whats next.

Why Not Grafana?

Historically I've used mostly ClickHouse for aggregating and storing observability data, which when combined with Grafana can provide a powerful tool for monitoring and alerting. Despite this I've actually always hated tools like Datadog or Grafana, they seem to stray so far away from the traditional systems tooling I'm comfortable with.

Grafana itself comes with a plethora of issues, just from memory I've dealt with:

  • Constant performance issues caused by all the data querying and processing happening in-browser.
  • Data source plugins would regularly be outdated or incompatible with some features.
  • Tons of boilerplate and manual copy-paste when initially creating dashboards.
  • Refactoring dashboards is basically impossible without separate tooling.
  • Buggy UI that often traded usability for fashion.
  • Data feels incredibly hard to work with and transform; if you don't query it in the format you need pain ensues.

How Does It Work

Saint works on a simple premise similar to "code as configuration." Dashboards in saint are defined within TypeScript files contained within a git repository. Panels that can display graphs, tables, logs or other data on the frontend are just implemented via async functions that return structured data. Dashboards are just a collection of functions to generate panels (and their associated row/column layout), variables to allow input from users and metadata for categorization and tagging.

Lets take a look at an example dashboard to understand more how we can use code to define graphs and other visualizations. Keep in mind much of this code is still well within the prototype phase and would benefit a lot from well defined SDK/API abstractions. Those will come in the future but for we'll pretend those sharp rough edges don't exist!

GPU Dashboard

The following is an example of a dashboard which uses yamon generated GPU data stored in ClickHouse for monitoring GPU temperature and utilization.

Example of the complete rendered dashboard showing two charts and various UI elements

The actual code behind this dashboard remains relatively simple, we're able to use abstractions we defined elsewhere in our repository to reduce boilerplate or copypaste problems:

import { dashboard } from "@saint/sdk/mod.ts";
import { host, interval } from "../common.ts";
import { Args, graph } from "./host.ts";

// Simple functions can even be passed to the frontend for use in rendering
const tempFormat = (v: number) => `${v}° C`;
const pctFormat = (v: number) => `${(v * 100).toFixed()}%`;

// Produce a graph of temperature data based on the arguments
const temp = (args: Args) =>
  graph(args, {
    clauses: ["name LIKE 'gpu.nvidia.%.temperature'"],
    title: "Temperature",
    format: tempFormat,
  });

// Produce a graph of utilization data based on the arguments
const util = (args: Args) =>
  graph(args, {
    clauses: ["name LIKE 'gpu.nvidia.%.utilization'"],
    title: "Utilization",
    format: pctFormat,
  });

// Dashboards are defined like so and the layout allows for dynamically sizing your rows and columns as needed.
export const gpu = dashboard("gpu", [[temp], [util]], {
  title: "GPU",
  vars: [interval, { ...host, default: "eos" }],
  tags: ["infra"],
  refreshInterval: "15 seconds",
});

Everything required to produce this chart is defined in code, to query from ClickHouse we just use the npm module:

import { createClient } from "npm:@clickhouse/client";

export const client = createClient({
  url: "http://my-clickhouse-server:8123",
});

To generate a chart we can use plot (via a very hacky mechanism that serializes the chart configuration for the frontend) and return a graph panel:

// Plot.make(...) helps reduce the data sent to the frontend due to the structure of the plot API interface
const plot = Plot.make([data], ([data]) => ({
  y: { grid: true, nice: true },
  marks: [
    Plot.lineY(data, {
      x: "when",
      y: "value",
      stroke: "name",
      tip: "xy",
    }),
    Plot.ruleX(data, Plot.pointerX({ x: "when", py: "value", stroke: "red" })),
  ],
}));

return Panel.withOpts({ title: group }).graph(plot);

And of course, all of this is happening with TypeScript and the deno runtime, allowing us to use powerful abstractions and utilities when querying or manipulating our data (without having to figure out the SQL for it):

const lastBackupAge = async () => {
  const result = await query<{ when: number }>(`
    SELECT toUnixTimestamp(when) as when FROM yamon.logs
    WHERE service='systemd' AND data='Finished run backups.'
    ORDER BY when DESC
    LIMIT 1; 
  `);

  const lastBackupDate = new Date(result.data[0].when * 1000);

  return Panel.withOpts({ grow: false }).value(
    humanizeDuration(new Date().getTime() - lastBackupDate.getTime()),
    { label: "Last Backup Age" }
  );
};

Why It's Better

TypeScript gives us a massive leg up over a convoluted or bespoke UI like Grafana and Datadog offer:

  • TypeScript allows the entire SDK and API to be self-documenting and easy to work with.
  • Access to an immense number of libraries and tools that allow you to easily connect and query data from pretty much anywhere.
  • Dashboards and alerts live within the code development lifecycle and the git repository saint uses can be tightly managed by your hosted VCS/forge system.
  • It's much easier to abstract common interfaces and provide users with business or application specific functions that make drafting new dashboards and alerts trivial.
  • Refactoring dashboards can be treated the same as refactoring code, and the tools we're already familiar with work great.

On top of all this magic that just using TypeScript provides us, saint also executes all code on the server. Data is then sent to clients who are responsible for actually rendering graphs and other panels. This approach brings numerous advantages as well:

  • Clients can no longer access datastores via a proxy or execute arbitrary queries/code. This massively improves the security footprint of saint.
  • Resources like client connections can be tightly managed by the server to avoid over-loading data stores.
  • Client performance is drastically improved and the server can even cache and send stale data for first-load while refreshing the dashboard in the background.
  • It's much easier to use saint as a non-graphical API consumer, for example with CLI tools or alternative rendering implementations.

What About Alerting?

Of course, pretty dashboards are only a part of the observability story. Saint is also designed to manage alerts, and similarly, we get many of the advantages listed above. The alerting code is much less refined, but the general concept is sound and works great in my relatively limited use:

const BACKUP_WARNING_AGE = 1000 * 60 * 60 * 24;
const BACKUP_ALERT_AGE = 1000 * 60 * 60 * 28;

export const backupAgeAlert = alert("last backup age", async function* () {
  const result = await query<{ when: number }>(`
    SELECT toUnixTimestamp(when) as when FROM yamon.logs
    WHERE service='systemd' AND data='Finished run backups.'
    ORDER BY when DESC
    LIMIT 1; 
  `);

  const sinceLastBackup =
    new Date().getTime() - new Date(result.data[0].when * 1000).getTime();

  yield check(
    "jak",
    [
      [Status.OK, sinceLastBackup < BACKUP_WARNING_AGE],
      [Status.WARNING, sinceLastBackup < BACKUP_ALERT_AGE],
      [Status.ALERT, true],
    ],
    `Last backup was ${humanizeDuration(sinceLastBackup)} ago`
  );
});

And to link saint up to some external service for tracking and managing notifications of alerts, we can still just use code:

useDiscordAlerts({
  webhookURL: Deno.env.get("DISCORD_WEBHOOK_URL")!,
});

It's trivial to create custom integrations that handle alert status changes and respond accordingly.

What's Next

I've been running and using saint for quite a while, and I've also had some clients play around with it. So far the feedback has been positive but a lot of polish around SDK and API interfaces is required. On top of this a lot of the existing SDK like the graphing implementation (which uses plot) is very hacky and would benefit greatly from a first-party version.

If you do have a use-case or interest in saint please feel free to reach out over the usual channels and I'll be happy to provide you access.