Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience metric #2105

Open
pgporada opened this issue May 14, 2020 · 4 comments
Open

Add convenience metric #2105

pgporada opened this issue May 14, 2020 · 4 comments

Comments

@pgporada
Copy link
Contributor

pgporada commented May 14, 2020

We rely on grafana, prometheus, and alertmanager for our monitoring stack. When metrics are ingested, they contain a non-human-friendly logid such as entries_added{logid="abcdef1234567890abc"}. In the docs there is an optional display name that can be set per shard. It would be a big help if a metric could be exported that contained the display_name, tree_state, and tree_type.

A metric such as this would be perfect shard_information{logid="abcdef1234567890abc",display_name="2020",tree_state="active",tree_type="LOG"}.

With this proposed metric, we would be able to use Prometheus' group_left and make the rest of our metrics human-friendly.

Here's an example of what we currently do in grafana. Configuring this across multiple dashboards is time consuming, requires manual intervention when adding a new shard, and is prone to error. Additionally, because this is done in grafana, the human-friendly name can't be sent to a prometheus alert.
shard-mapping

I've considered writing a database exporter to generate a metric, but I think it would be better suited to be built into trillian instead.

Perhaps this already exists and I've missed it. If not, thank you for considering it.

@pgporada pgporada changed the title Add convenience metrics Add convenience metric May 14, 2020
@pav-kv
Copy link
Contributor

pav-kv commented Jun 18, 2020

Internally, we expose these log names as another metric of the logsigner binary. On our dashboards we join it with logserver and logsigner metrics. We could do something similar here. Does your setup allow you joining metrics from different processes?

An alternative approach is to bake the logid->name mapping into your monitoring stack rather than Trillian metrics. I.e. you would have some (very short) map of {123->"2019", 456->"2020", ...}, and whenever you see a logid label you would automatically add a logname label from this mapping before it gets to the graphing/alerting phase. Is that possible in your stack?

@pav-kv
Copy link
Contributor

pav-kv commented Jun 18, 2020

I realise that one of the reasons we don't have this metric externally (yet) is that it's computed differently from others. All metrics in Trillian are set in-line when their value is known. The "log name / type" metric is collected in response to monitoring system's "pull" queries, not in-line. Does prometheus have such "callback" kinds of metrics? We could look at supporting them on the interface level.

@pav-kv
Copy link
Contributor

pav-kv commented Jun 18, 2020

By the looks of it, metrics like CounterFunc and GaugeFunc could be helpful, but they seem to be constrained to not have any non-constant labels. Is there a workaround?

@pgporada
Copy link
Contributor Author

pgporada commented Feb 10, 2021

As a workaround I setup a mysql datasource and used the following grafana variable config.

SELECT CAST(TreeId as CHAR) as __value, DisplayName as __text FROM Trees;

Which allows for a human readable logid mapping.

2023: abcsometrillianinternalidentifier123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants