Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async log4j2 log events are not counted accurately #2176

Closed
bol-com-pschmitz opened this issue Jul 3, 2020 · 25 comments · Fixed by #2183
Closed

Async log4j2 log events are not counted accurately #2176

bol-com-pschmitz opened this issue Jul 3, 2020 · 25 comments · Fixed by #2183
Labels
bug A general bug help wanted An issue that a contributor can help us with
Milestone

Comments

@bol-com-pschmitz
Copy link

Micrometer provides the Log4j2Metrics meter binder to add metrics for Log4j2 log events. Unfortunately the implementation is flawed. It is implemented as a filter, but filters can be invoked multiple times and must therefore not have any side effects.

In practice the filter is invoked multiple times by Log4j2 in many circumstances (e.g. in the presence of multiple and/or async loggers). The result is that the counter is increased multiple times for each single log event and the counter values become too high.

@bol-com-pschmitz
Copy link
Author

bol-com-pschmitz commented Jul 3, 2020

Note that the issue is not that the filter is added multiple times. The same filter is invoked multiple times. You can for example see this in AsyncLoggingConfig.log(LogEvent, LoggerConfigPredicate). This first calls super.log(...), which invokes the filter, and then it calls logToAsyncDelegate(...), which again invokes the filter, and again when it is invoked on the background thread to perform the actual logging asynchronously.

@schlangz
Copy link

schlangz commented Jul 4, 2020

So this only applies to async loggers?

@shakuzen
Copy link
Member

shakuzen commented Jul 6, 2020

@bol-com-pschmitz thank you for the report. Just so we're explicit about the scenario and that a fix properly addresses it, would you be able to write a test demonstrating it like other tests in our Log4j2MetricsTest?

@shakuzen shakuzen added the bug A general bug label Jul 6, 2020
@shakuzen shakuzen added this to the 1.1.x milestone Jul 6, 2020
@bol-com-pschmitz
Copy link
Author

bol-com-pschmitz commented Jul 6, 2020

So this only applies to async loggers?

I'm not sure, possibly. That's how it occurred in my case. It doesn't matter though; Log4j2Metrics should not be implemented as a filter. Filters must always be able to be invoked multiple times and must never have side effects (such as incrementing metrics) as a general design principle.

@bol-com-pschmitz
Copy link
Author

@bol-com-pschmitz would you be able to write a test demonstrating it like other tests in our Log4j2MetricsTest?

I'll take a look.

@dennysfredericci
Copy link
Contributor

Hi all, hello from Belgium! :)

I just create a test for this issue and indeed the filter have been called multiple times for same event.

@bol-com-pschmitz
Copy link
Author

Thanks @dennysfredericci ! I was about to start doing that.

@dennysfredericci
Copy link
Contributor

dennysfredericci commented Jul 12, 2020

There is a method isEndOfBatch on Log4jEventLog which can be used to see if this is the "last" call.

/** * Returns {@code true} if this event is the last one in a batch, {@code false} otherwise. Used by asynchronous * Loggers and Appenders to signal to buffered downstream components when to flush to disk, as a more efficient * alternative to the {@code immediateFlush=true} configuration. * * @return whether this event is the last one in a batch. */ // see also LOG4J2-164 boolean isEndOfBatch();

I changed the implementation to test this idea, but it breaks all other tests.

@dennysfredericci
Copy link
Contributor

Hi Guys,

When we have an async logger configured a call to log.info or any other level triggers 2 times the filter, i found a way to use the isEndOfBatch attribute from LogEvent to check if this is the last call and then metrics counter will be incremented only once instead of multiple times.

I also create a simple spring-boot project to test it locally. You can see it here: https://github.com/dennysfredericci/springboot-micrometer-issue-2183

@dennysfredericci
Copy link
Contributor

Hi @shakuzen

Just noticed now, I did the PR for master branch should I change it to 1.1.X branch?

@shakuzen shakuzen modified the milestones: 1.1.x, 1.1.16 Jul 23, 2020
shakuzen pushed a commit to dennysfredericci/micrometer that referenced this issue Jul 23, 2020
shakuzen pushed a commit to dennysfredericci/micrometer that referenced this issue Jul 23, 2020
@shakuzen
Copy link
Member

Resolved by #2183

@shakuzen
Copy link
Member

@bol-com-pschmitz would you confirm if the issue is fixed for you? The fix is available in versions 1.1.16, 1.3.11, and it will be in the upcoming 1.5.3 release (you can try the latest snapshot version 1.5.3-SNAPSHOT for now).

@bol-com-pschmitz
Copy link
Author

bol-com-pschmitz commented Jul 30, 2020

@bol-com-pschmitz would you confirm if the issue is fixed for you? The fix is available in versions 1.1.16, 1.3.11, and it will be in the upcoming 1.5.3 release (you can try the latest snapshot version 1.5.3-SNAPSHOT for now).

I'm testing it. At first glance it looks like now it might be undercounting. I'm not sure what constitutes a "batch", but if it can result in more than one log entry then this is not going to work.

@shakuzen
Copy link
Member

shakuzen commented Aug 5, 2020

@bol-com-pschmitz Thanks for testing. Let us know if there's another issue. If we can get a unit test to reproduce, that'll be best so we can make sure it gets fixed properly and stays fixed.

bol-com-pschmitz added a commit to bol-com-pschmitz/micrometer that referenced this issue Aug 5, 2020
@bol-com-pschmitz
Copy link
Author

Unfortunately this approach does not work. It counts batches of log events, not log events. This small change to the unit test added by @dennysfredericci demonstrates it:

        assertThat(registry.get("log4j2.events").tags("level", "info").counter().count()).isEqualTo(0);
        logger.info("Hello, world!");
        logger.info("Hello, world!");
        logger.info("Hello, world!");
        assertThat(registry.get("log4j2.events").tags("level", "info").counter().count()).isEqualTo(3);

@bol-com-pschmitz
Copy link
Author

This means the issue has not been fixed and should be reopened.

@dennysfredericci
Copy link
Contributor

Indeed I can see it as well 😞

@shakuzen shakuzen changed the title Log4j2 log events are counted multiple times Async log4j2 log events are not counted accurately Aug 6, 2020
@shakuzen shakuzen added the help wanted An issue that a contributor can help us with label Aug 6, 2020
@shakuzen
Copy link
Member

shakuzen commented Aug 6, 2020

I'm no log4j2 expert, so I'm hoping we can get some help on a solution for this from someone more knowledgeable than me on this area. I've marked the issue as help wanted.

@dennysfredericci
Copy link
Contributor

So, let's became a log4j2 expert 😄

I just started a thread on log4j2 discussion list to see if is possible to use a filter for that use case.

dennysfredericci pushed a commit to dennysfredericci/micrometer that referenced this issue Aug 11, 2020
@dennysfredericci
Copy link
Contributor

dennysfredericci commented Aug 11, 2020

Sorry @dennysfredericci . I appreciate your effort! But I still think that this should not be implemented as a filter at all. Filters can be invoked any number of times and should not have side effects such as incrementing counters.

Hi @bol-com-pschmitz this seems an unexpected behavior, I got the message below on log4j mail list

"... Filters that are attached to Loggers, Appender-Refs and Appenders should only be called once ..."

I just included a reconfiguration call and now the filter is called just once, but I am getting the exception below.

2020-08-11 22:24:28,784 Test worker ERROR Could not register mbeans java.lang.NullPointerException
	at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.createRingBufferAdmin(AsyncLoggerConfigDisruptor.java:417)
	at org.apache.logging.log4j.core.async.AsyncLoggerConfig.createRingBufferAdmin(AsyncLoggerConfig.java:199)
	at org.apache.logging.log4j.core.jmx.Server.registerLoggerConfigs(Server.java:366)
	at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:186)
	at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:141)
	at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:629)
	at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:712)
	at org.apache.logging.log4j.core.config.Configurator.reconfigure(Configurator.java:239)
	at io.micrometer.core.instrument.binder.logging.Log4j2Metrics.bindTo(Log4j2Metrics.java:101)

In summary, this can be an issue in how we add the filters to log4j configuration.

@dennysfredericci
Copy link
Contributor

As is not possible to use log4j filters to count log events properly I requested a statistics feature for the log4j team.

With some luck, we can have these counters direct from log4j API.

@shakuzen
Copy link
Member

apache/logging-log4j2#1550 has been resolved and is marked for inclusion in log4j2 2.20.1. Once that is released and we've upgraded, we'll add a test along the lines of #2176 (comment) to ensure the issue no longer affects us. Hopefully then we can close this issue.

jonatan-ivanov added a commit that referenced this issue Oct 20, 2023
In case of async loggers, the MetricsFilter#filter method is/was called
multiple times. Because of this, a check was introduced in gh-2183
assuming that if the event has the isEndOfBatch flag set to true, that
is the last filter method call for that event.
Unfortunately, it turned out that this approach did not work, since it
did not just filter out the unwanted multiple calls on one event but it
also filtered out all the filter method calls on events that were not at
the end of the async batch. So Log4j2Metrics counted batches of events,
not the individual events.
Fortunately multiple filter invocations was fixed in Log4j2,
see apache/logging-log4j2#1550 and apache/logging-log4j2#1552.
Since now there will be only one filter method call,
the check introduced in gh-2183 can and should be removed (the call to
the filter method is before the isEndOfBatch flag is set so
the flag will always return false).

Closes gh-2176
See gh-2183
See gh-4253
jonatan-ivanov added a commit that referenced this issue Oct 20, 2023
In case of async loggers, the MetricsFilter#filter method is/was called
multiple times. Because of this, a check was introduced in gh-2183
assuming that if the event has the isEndOfBatch flag set to true, that
is the last filter method call for that event.
Unfortunately, it turned out that this approach did not work, since it
did not just filter out the unwanted multiple calls on one event but it
also filtered out all the filter method calls on events that were not at
the end of the async batch. So Log4j2Metrics counted batches of events,
not the individual events.
Fortunately multiple filter invocations was fixed in Log4j2,
see apache/logging-log4j2#1550 and apache/logging-log4j2#1552.
Since now there will be only one filter method call,
the check introduced in gh-2183 can and should be removed (the call to
the filter method is before the isEndOfBatch flag is set so
the flag will always return false).

Closes gh-2176
See gh-2183
See gh-4253
jonatan-ivanov added a commit that referenced this issue Oct 20, 2023
In case of async loggers, the MetricsFilter#filter method is/was called
multiple times. Because of this, a check was introduced in gh-2183
assuming that if the event has the isEndOfBatch flag set to true, that
is the last filter method call for that event.
Unfortunately, it turned out that this approach did not work, since it
did not just filter out the unwanted multiple calls on one event but it
also filtered out all the filter method calls on events that were not at
the end of the async batch. So Log4j2Metrics counted batches of events,
not the individual events.
Fortunately multiple filter invocations was fixed in Log4j2,
see apache/logging-log4j2#1550 and apache/logging-log4j2#1552.
Since now there will be only one filter method call,
the check introduced in gh-2183 can and should be removed (the call to
the filter method is before the isEndOfBatch flag is set so
the flag will always return false).

Closes gh-2176
See gh-2183
See gh-4253
@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Oct 20, 2023

It seems log4j2 2.20.1 was not released, the latest patch release on the 2.20.x line is 2.20.0 (release date: 2023-02-17) and the fix (apache/logging-log4j2#1550) was released in 2.21.0.

Our main branch was upgraded to this version and I eliminated the previous isEndOfBatch check in #4253 but I have a few questions:

  1. Do we need to handle any way if an earlier log4j2 version is on the classpath? I think we don't need to do anything since the behavior was broken one way or the other.
  2. Do we want to back-port this to earlier versions too, like 1.9.x? I think so since log4j2 is an optional dependency for us.
  3. Could you please check if my changes and what I wrote down in the commit message seem ok? (see 61b3074 and Bump org.apache.logging.log4j:log4j-core from 2.20.0 to 2.21.0 #4253)

@shakuzen
Copy link
Member

I think we're in a bit of a hard spot. I was hoping that the revised test would pass on new versions of log4j2 without changes to our main code. It seems that isn't the case. The changes made to main code reintroduce an issue (see #2183) of over-counting async logs for anyone on versions of log4j2 prior to 2.21.0. As you pointed out, the count isn't right either way, but it's somewhat better to under-count than to over-count, I think. And changing the behavior to over-counting in a patch release where the only remedy is to upgrade to a new minor version of a dependency doesn't feel great either. For those reasons, I am leaning toward applying the changes only in main for the upcoming 1.12.0 and noting the minimum version of log4j2 users of async loggers should be on. I could be convinced otherwise if someone has reasons they think we should do otherwise.

I think the commit message and changes look good to me.

jonatan-ivanov added a commit that referenced this issue Oct 25, 2023
In case of async loggers, the MetricsFilter#filter method is/was called
multiple times. Because of this, a check was introduced in gh-2183
assuming that if the event has the isEndOfBatch flag set to true, that
is the last filter method call for that event.
Unfortunately, it turned out that this approach did not work, since it
did not just filter out the unwanted multiple calls on one event but it
also filtered out all the filter method calls on events that were not at
the end of the async batch. So Log4j2Metrics counted batches of events,
not the individual events.
Fortunately multiple filter invocations was fixed in Log4j2,
see apache/logging-log4j2#1550 and apache/logging-log4j2#1552.
Since now there will be only one filter method call,
the check introduced in gh-2183 can and should be removed (the call to
the filter method is before the isEndOfBatch flag is set so
the flag will always return false).

Closes gh-2176
See gh-2183
See gh-4253
@jonatan-ivanov jonatan-ivanov modified the milestones: 1.9.x, 1.12.0 Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug help wanted An issue that a contributor can help us with
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants