Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between logs which stores in files and logs in UI #39686

Open
1 of 2 tasks
Zoynels opened this issue May 17, 2024 · 5 comments · May be fixed by #39700
Open
1 of 2 tasks

Difference between logs which stores in files and logs in UI #39686

Zoynels opened this issue May 17, 2024 · 5 comments · May be fixed by #39700

Comments

@Zoynels
Copy link

Zoynels commented May 17, 2024

Apache Airflow version

2.9.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

We print to log some information and saw that logs has difference between logs which stores in files and logs in UI (also downloaded logs). Difference is in "empty lines" (we need them in some cases).

stored in file -- real log
downloaded from UI
Web UI:
image

What you think should happen instead?

I think that they should be equal in information that was loggen and saved.

How to reproduce

from airflow import DAG
from airflow.decorators import task
import pendulum

with DAG(
    dag_id="test_logs",
    catchup=False,
    start_date=pendulum.datetime(2024, 5, 17, tz="UTC"),
    schedule=None,
    tags=["logs"],
) as dag1:

    @task()
    def print_log1():
        print("1\n2\n\n3\r\n4\n\r5\n \n6")
        import logging        
        logger = logging.getLogger(__name__)
        logger.info("1\n2\n\n3\r\n4\n\r5\n \n6")
    print_log1()

Operating System

debian / docker

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

standard apache/airflow:2.9.1

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Zoynels Zoynels added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 17, 2024
Copy link

boring-cyborg bot commented May 17, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@Taragolis
Copy link
Contributor

This happen due to log deduplication, which might happen when logs streaming from remote logging

def _interleave_logs(*logs):
records = []
for log in logs:
records.extend(_parse_timestamps_in_log_file(log.splitlines()))
last = None
for _, _, v in sorted(
records, key=lambda x: (x[0], x[1]) if x[0] else (pendulum.datetime(2000, 1, 1), x[1])
):
if v != last: # dedupe
yield v
last = v

@Taragolis Taragolis added area:logging good first issue and removed area:core needs-triage label for new issues that we didn't triage yet labels May 17, 2024
@Zoynels
Copy link
Author

Zoynels commented May 17, 2024

As I understood, the main problem is in log.splitlines(), which split log-string by simple lines and not by log-messages. Then function analyzes line by line and deduplicates lines, but we need to analyze and deduplicate log-messages.
As Airflow can be configured with a custom log-format, then we need to store the pattern in config (custom patterns) to split the whole log into log-messages.

@Taragolis
Copy link
Contributor

If you have a suggestion how improve logging feel free to raise a PR which will work with any type of existed loggers without breaking changes.

@jscheffl
Copy link
Contributor

I had the same multiple times, for example using DockerOperator which logs all stdout of the container upon failure. Also the logs are messed up not only because of the split lines but also because the file log handler per default tries to sort messages. This not only causes a lot of overhead on the server, it also changes the order any makes a confusion.

Looking forward that somebody raises a PR allowing log sorting and merging to be turned off :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants