Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "alertmanager_provider_alerts" metric #1589

Closed
wants to merge 1 commit into from

Conversation

Pharb
Copy link

@Pharb Pharb commented Oct 17, 2018

Fixes #1439 by adding a separate alertmanager_provider_alerts metric.

@Pharb
Copy link
Author

Pharb commented Nov 4, 2018

@simonpasquier As you were responding to #1439 previously, if you have time, could you take a quick look at this and maybe give some feedback for my changes? Thank you 👍

@juliusv
Copy link
Member

juliusv commented Dec 7, 2018

If this still preserves the state="active" label for resolved alerts, I think that's a bit confusing. Maybe rename that to suppressed="true|false"?

Or alternatively only expose non-resolved (actually active) alerts, to match what people would see on /alerts.

@juliusv
Copy link
Member

juliusv commented Dec 7, 2018

Ah, state can also be unprocessed (and maybe more, didn't check), so just true/false for suppressed wouldn't work.

@Pharb
Copy link
Author

Pharb commented Dec 7, 2018

@juliusv Thanks for your feedback. I agree that the whole alert "status/states" are a bit confusing in Alertmanager.

The alertmanager_alerts{state="firing"} metric from this PR should match the number of currently "active" alerts from the UI as I would expect as a user of Alertmanager.

@Pharb
Copy link
Author

Pharb commented Dec 16, 2018

I rebased against the current master due to a minor merge conflict of imports in cmd/alertmanager/main.go.

@Pharb
Copy link
Author

Pharb commented Dec 16, 2018

The continuous-integration/travis-ci/pr check failed due to an unrelated issue that currently affects master (e.g. https://travis-ci.org/prometheus/alertmanager/builds/468643002).

@nielsole
Copy link

Hey, are you still working on this? :)

@Pharb
Copy link
Author

Pharb commented Feb 25, 2019

@nielsole Yes, I just noticed a few days ago that there are now merge conflicts due to the refactoring in #1741. I will try to get those fixed in the next few days.

Sadly I still cannot say if the PR will then be accepted, but at least I cannot run many Alertmanager instances in production without this fix included... (currently I have my fork deployed in multiple production clusters).

@Pharb Pharb changed the title Add "firing" and "resolved" status to "alertmanager_alerts" metric Add "alertmanager_provider_alerts" metric Mar 6, 2019
@Pharb
Copy link
Author

Pharb commented Mar 6, 2019

Due to the recent refactoring in #1741 I changed the PR to add a new alertmanager_provider_alerts metric, because that seems to be the simplest solution at the moment.
Maybe that also avoids the confusion between having firing and active values on the same metric.

@juliusv Would be great to get this accepted, as this works pretty well for my use case and solves #1439 by providing an alternative to the alertmanager_alerts metric. Of course I'm also open for feedback on the implementation or other suggestions on how to solve this problem.

@nielsole
Copy link

Any updates on this? This week I would have time to support in any way :)

@Pharb Pharb force-pushed the pharb-alerts-metric branch 2 times, most recently from 86b50d2 to 3b6d5fe Compare May 22, 2019 09:54
@Pharb
Copy link
Author

Pharb commented May 22, 2019

Is there anything preventing this from being merged? Would be nice to get these reliable metrics built-in, this works quite well for me in production. @juliusv

Copy link
Member

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pharb sorry for the very late reply. One thing to remember is that resolved alerts can be removed at any time (eg when the garbage collection kicks in) meaning that alertmanager_provider_alerts{state="resolved"} will have occasional drops. I'd be inclined to expose only the current number of firing alerts but happy to hear about other opinions.

provider/mem/mem.go Outdated Show resolved Hide resolved
provider/mem/mem.go Outdated Show resolved Hide resolved
@Pharb Pharb force-pushed the pharb-alerts-metric branch 2 times, most recently from 6a2dccd to 091062b Compare June 17, 2019 16:27
@Pharb
Copy link
Author

Pharb commented Jun 17, 2019

@simonpasquier Thank you for taking a look at this.

As suggested by you, I removed alertmanager_provider_alerts{state="resolved"}, because I don't have a use case for that. If anybody would need that, it is very easy to add again. I also implemented and resolved your other comments.

@Pharb Pharb force-pushed the pharb-alerts-metric branch 2 times, most recently from 4ce97b3 to cb33b3c Compare October 1, 2019 13:02
@Pharb
Copy link
Author

Pharb commented Oct 1, 2019

@simonpasquier Thanks for your help so far.
Do you have any suggestion what I could do to get this merged?

see prometheus#1439

Signed-off-by: Patrick Harböck <patrick.harboeck@tngtech.com>
@Pharb
Copy link
Author

Pharb commented Jun 17, 2022

PR is now obsolete because an alternative implementation was merged yesterday: #2943

Issue is finally solved 🎉

@Pharb Pharb closed this Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric alertmanager_alerts reports "active" alerts when there are none
4 participants