Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola/rpm-ostree/replace-rt-kernel: do not hardcode kernel versions #1309

Merged
merged 1 commit into from
Jun 14, 2023

Conversation

jmarrero
Copy link
Member

@jmarrero jmarrero commented Jun 7, 2023

closes #1099

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2023
@jmarrero jmarrero marked this pull request as ready for review June 7, 2023 17:38
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@cgwalters
Copy link
Member

Continuing this thread #1306 (comment) on this PR:

Agree eventually we should stop parsing HTML for the previous kernel test but doing that is going to require handling the "repoquery" stuff either in rpm-ostree or temporarily layering dnf or something else

As far as the --from thing for the kernel-rt override, I think today that that is actually a no-op in this case and we'll just use the latest from whatever repo - we're not repository-locking. It seems actually a bit more confusing to me to provide it because it won't do anything.

@jmarrero jmarrero force-pushed the rt-kernel-vers branch 3 times, most recently from bdb6084 to 88e659d Compare June 8, 2023 13:19
@jmarrero jmarrero requested a review from cgwalters June 8, 2023 13:27
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 1550016 and 2 for PR HEAD 25c0b98 in total

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2023
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@travier
Copy link
Member

travier commented Jun 9, 2023

/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4874462 and 2 for PR HEAD f15d050 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 4874462 and 2 for PR HEAD ad4995c in total

@LorbusChris
Copy link
Member

/test scos-9-build-test-qemu

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c546e95 and 1 for PR HEAD ad4995c in total

@LorbusChris
Copy link
Member

I tested this manually on SCOS and can't get it to fail. If this fails again might try even more memory.

I'm also thinking this is a memory issue. In the SCOS community builds we give the pods running COSA 4gb ram and 1 full CPU.

@travier
Copy link
Member

travier commented Jun 12, 2023

Let's bump this to 4GB here for this test only, master only as we don't run SCOS on other branches: https://github.com/openshift/release/blob/master/ci-operator/config/openshift/os/openshift-os-master.yaml#L78

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 12, 2023
@jmarrero
Copy link
Member Author

lowering to 2GB and making this RHCOS only for now. I'll add this test for SCOS to my backlog.

## # We've seen some OOM when 1024M is used in similar tests:
## # https://github.com/coreos/fedora-coreos-tracker/issues/1506
## minMemory: 2048
## distros: rhcos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an scos switch here, rhcos is aliased to scos. You'll have to keep the denylist entry above but just for SCOS so c9s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could drop the denylist entry.. Maybe we use is_scos() inside the test?

If the reason it's failing is because of #1309 (comment) then we could just do the rt test on SCOS, which is still a pretty good test.

@dustymabe
Copy link
Member

what failures are we seeing with SCOS? Could it be that there is nothing to override because we are now using the latest version of the kernel in c9s and it matches the already baked in kernel in the OSTree?

@jmarrero
Copy link
Member Author

jmarrero commented Jun 12, 2023

Jun 12 14:01:32 qemu0 kola-runext-replace-rt-kernel[2030]: Resolving dependencies...done
Jun 12 14:01:32 qemu0 kola-runext-replace-rt-kernel[2030]: error: Bus owner changed, aborting. This likely means the daemon crashed; check logs with `journalctl -xe`.

error: Bus owner changed, aborting. is the error. If it was something with the repos I can tweak the test easily. Sadly this error I have seen it in low memory scenarios but for some reason even with 4GB I am seeing the test fail like this on SCOS.

#1315

@jmarrero
Copy link
Member Author

/retest

runv rm -rf /etc/yum.repos.d/*
runv curl -sSLf "${repo_url}" -o /etc/yum.repos.d/cs.repo
runv curl -sSLf https://centos.org/keys/RPM-GPG-KEY-CentOS-Official -o /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official
runv sed -i 's|^gpgkey.*|gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official|' /etc/yum.repos.d/cs.repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sed invocation also replaces other unrelated keys.

Analog to https://github.com/openshift/os/blob/master/extensions/Dockerfile#L13-L17:

Suggested change
runv sed -i 's|^gpgkey.*|gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Official|' /etc/yum.repos.d/cs.repo
runv sed -i 's|/usr/share/distribution-gpg-keys/centos|/etc/pki/rpm-gpg|' /etc/yum.repos.d/cs.repo

Edit: Edited for clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. This is running inside the RHCOS machine (throw away test machine). What change are you suggesting and why is it important?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like rpm-ostree may be failing when trying to verify the repos in c9s.repo. The keys for all of them have to be available. Blanket replacing all of them with one key won't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. but this test as written passes on RHCOS (@jmarrero, correct me if I am wrong).

According to @jmarrero it fails on SCOS though and we aren't sure why yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we do have repo_gpgcheck=0 on the repos in c9s.repo. I might've pulled a false alarm here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct this passes on RHCOS both on CI & locally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that those reference distribution-gpg-keys which we don't ship in the OS by default. I think @LorbusChris 's suggestion makes sense...

But I don't understand actually why we're removing the repo files by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because this is a test for RHCOS only at the moment, I am just making sure there nothing on yum.repos.d before downloading the centos repo. When I get to the SCOS test, the plan is to just use the provided repos/keys and just turn on the rt repo if it's disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see also #1316

@LorbusChris
Copy link
Member

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2023
@dustymabe
Copy link
Member

dustymabe commented Jun 13, 2023

error: Bus owner changed, aborting. is the error.

What do the journal logs say?

@LorbusChris
Copy link
Member

My previous hold was probably a false alarm, but I'm still unsure why this fails on SCOS
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2023
@jmarrero
Copy link
Member Author

I opened #1315 to investigate the SCOS failure, CI is not saving the journal for this test (unless I am looking in the wrong place, there is a link to the logs in the issue). On a SCOS VM running the test manually I could not get it to fail, so I need to build scos and run the test against it locally to see if it fails and debug further.

@jmarrero
Copy link
Member Author

/retest

@jmarrero
Copy link
Member Author

/test rhcos-92-build-test-qemu

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 14, 2023

@jmarrero: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, dustymabe, jmarrero, travier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,dustymabe,jmarrero,travier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 1f2c0eb into openshift:master Jun 14, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix rt kernel test to work across versions
7 participants