Expires notify log sooner when possible #2982

roidelapluie · 2022-07-05T10:55:14Z

It seems useless to keep the notifications in the nflog for longer than
twice the repeat interval. This should help reduce memory usage of
clustered alertmanagers.

needs tests

Signed-off-by: Julien Pivotto roidelapluie@o11y.eu

roidelapluie · 2022-07-05T10:56:07Z

Attempt to fix #2961

roidelapluie · 2022-09-15T13:51:42Z

@gotjosh We have ran this and noticed a 30% memory decrease with alertmanager with low repeat interval. Any chances to get this merged?

simonpasquier · 2022-09-16T14:41:36Z

notify/notify.go

@@ -785,7 +785,12 @@ func (n SetNotifiesStage) Exec(ctx context.Context, l log.Logger, alerts ...*typ
 		return ctx, nil, errors.New("resolved alerts missing")
 	}

-	return ctx, alerts, n.nflog.Log(n.recv, gkey, firing, resolved)
+	var expiry time.Duration
+	if n, ok := RepeatInterval(ctx); ok {


The repeat interval should always be present in the context? At least *DedupStage.Exec() returns an error if this is the case. I suggest that we do the same here.

It seems useless to keep the notifications in the nflog for longer than twice the repeat interval. This should help reduce memory usage of clustered alertmanagers. Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>

simonpasquier

lgtm. I have a question on how the data retention interacts with repeat interval.

simonpasquier · 2022-10-14T12:53:47Z

nflog/nflog.go

@@ -415,6 +415,11 @@ func (l *Log) Log(r *pb.Receiver, gkey string, firingAlerts, resolvedAlerts []ui
 		}
 	}

+	expiresAt := now.Add(l.retention)
+	if expiry > 0 && l.retention > expiry {


wouldn't it make sense to use the default retention only if expiry > 0?
If repeat_interval is 10d (because why not), it would make sense to keep the log for longer than the default 120h retention. We could also get rid of this check:

alertmanager/cmd/alertmanager/main.go

Lines 488 to 500 in d034f11

if r.RouteOpts.RepeatInterval > *retention {

level.Warn(configLogger).Log(

"msg",

"repeat_interval is greater than the data retention period. It can lead to notifications being repeated more often than expected.",

"repeat_interval",

r.RouteOpts.RepeatInterval,

"retention",

*retention,

"route",

r.Key(),

)

}

})

roidelapluie · 2022-10-19T07:26:29Z

This would be change of behaviour, let's merge this while we think on the further change

roidelapluie requested a review from gotjosh July 5, 2022 10:56

roidelapluie force-pushed the expiresoon branch from f2f5789 to 7d9c41b Compare July 18, 2022 08:17

roidelapluie changed the title ~~[RFC] Expires notify log sooner when possible~~ Expires notify log sooner when possible Sep 15, 2022

simonpasquier reviewed Sep 16, 2022

View reviewed changes

Expires notify log sooner when possible

b044302

It seems useless to keep the notifications in the nflog for longer than twice the repeat interval. This should help reduce memory usage of clustered alertmanagers. Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>

roidelapluie force-pushed the expiresoon branch from 7d9c41b to b044302 Compare October 14, 2022 08:03

roidelapluie requested a review from simonpasquier October 14, 2022 08:05

simonpasquier approved these changes Oct 14, 2022

View reviewed changes

roidelapluie merged commit 21ca295 into prometheus:main Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expires notify log sooner when possible #2982

Expires notify log sooner when possible #2982

roidelapluie commented Jul 5, 2022 •

edited

roidelapluie commented Jul 5, 2022

roidelapluie commented Sep 15, 2022

simonpasquier Sep 16, 2022

roidelapluie Oct 14, 2022

simonpasquier left a comment

simonpasquier Oct 14, 2022

roidelapluie commented Oct 19, 2022

	if r.RouteOpts.RepeatInterval > *retention {
	level.Warn(configLogger).Log(
	"msg",
	"repeat_interval is greater than the data retention period. It can lead to notifications being repeated more often than expected.",
	"repeat_interval",
	r.RouteOpts.RepeatInterval,
	"retention",
	*retention,
	"route",
	r.Key(),
	)
	}
	})

Expires notify log sooner when possible #2982

Expires notify log sooner when possible #2982

Conversation

roidelapluie commented Jul 5, 2022 • edited

roidelapluie commented Jul 5, 2022

roidelapluie commented Sep 15, 2022

simonpasquier Sep 16, 2022

Choose a reason for hiding this comment

roidelapluie Oct 14, 2022

Choose a reason for hiding this comment

simonpasquier left a comment

Choose a reason for hiding this comment

simonpasquier Oct 14, 2022

Choose a reason for hiding this comment

roidelapluie commented Oct 19, 2022

roidelapluie commented Jul 5, 2022 •

edited