Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests in parallel on Windows and always cleanup test directories #5879

Merged
merged 9 commits into from
Oct 7, 2024

Conversation

chrisd8088
Copy link
Member

Since we revised our test suite to use the prove command in PR #3125, an undiagnosed bug in our lfstest-count-tests test helper program has prevented us from running our test scripts in parallel on Windows. Instead, several workarounds were introduced in that PR to avoid counting parallel test script executions and run all our test scripts sequentially on Windows, which slows down our CI jobs.

(The lfstest-count-tests program is used to track the number of active test scripts, so our lfstest-gitserver test server program can be started once, shared by all our test scripts, and then stopped after they are all complete. This works as expected on Linux and macOS systems.)

In this PR, we fix the bug in the lfstest-count-tests program, which is the result of the fact that we never call Close() on the test_count.lock file we open to ensure exclusive access to the test_count file. On Unix systems, this causes no problems, because when we call Remove() to delete the lock file, the operating system allows us to unlink the file while a file handle remains open. On Windows, however, this is not permitted, so the Remove() call does not succeed, leaving the lock file in place. Thus when we run the lfstest-count-tests program a second time, it is unable to acquire the lock and exits with an error.

With the bug fixed, we then remove the workarounds from PR #3125, which set the GIT_LFS_NO_TEST_COUNT and GIT_LFS_LOCK_ACQUIRE_DISABLED environment variables on Windows and use their presence to avoid running the lfstest-count-tests program and, when it is run, to avoid creating or removing the lock file.


In addition, we also make sure our test suite once again removes all the temporary directories it creates to hold the Git repositories created by our tests. Although we set the RM_GIT_LFS_TEST_DIR environment variable to yes when we use mktemp(1) to create a temporary directory, this is a remnant of PR #1107, and has not been honoured since the relevant cleanup code was removed from our shutdown() shell function in PR #3125. In this PR, we restore the cleanup steps from the original implementation.

Moreover, we ensure that we export the GIT_LFS_TEST_DIR and RM_GIT_LFS_TEST_DIR environment variables so that our test scripts can reuse the same temporary directory, instead of each script creating a unique directory, as happens now.

At present, running the make test build target on any platform results in a large number of stale temporary directories being left behind, which no longer occurs with the changes in this PR.


Finally, we remove some unused test variables, improve the lfstest-count-tests program's handling of file permissions and errors, revise our Linux RPM package builds to run more test scripts in parallel, and skip installing Strawberry Perl on Windows as our CI workflow now installs the Git for Windows SDK, which includes Perl and the prove command.

This PR will be most easily reviewed on a commit-by-commit basis, with whitespace differences ignored. Each commit has a detailed description of its changes.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
The SHUTDOWN_LFS environment variable was added as part of the
original integration test suite implementation in commit
b1f5025 of PR git-lfs#306, and was
used to indicate whether the Git server which runs as part of the
test suite should be stopped by the shutdown() shell function
at the end of a set of tests, or only when the last set of tests
completed.

When the integration test suite was refactored to use the "prove"
command and to align with Git's test suite, in PR git-lfs#3125, most
references to the SHUTDOWN_LFS environment variable were removed,
particularly in commits 1926e94
and a4cc106.

However, one reference was left in the t/testlib.sh script,
so we remove it now.
The LFS_CONFIG variable, which we set in our t/testenv.sh library,
is not referenced by any part of our current test suite.  The last
use of this variable was removed in 2015 in commit
1feee58 of PR git-lfs#335, when the
test suite was revised to create a private home directory for the
tests, so as to be able to set a standard global Git configuration
separate from that of the current user's configuration.  Hence we
can just remove the LFS_CONFIG variable now.
Since commit c591ff7 of PR git-lfs#3125 we
have installed Strawberry Perl before running our CI test suite on
Windows.  This was necessary when we used the AppVeyor service to run
our test suite on Windows, and subsequently also when we converted our
CI jobs to GitHub Actions in PR git-lfs#3808.  We require Perl because we use
the "prove" command, which runs all of our t/t-*.sh test scripts and
collects and summarizes the results.

However, since commit 8818630 of
PR git-lfs#5666 we have installed the Git for Windows SDK before running
our test suite, and it includes a full Perl distribution, which means
we no longer need to install Strawberry Perl separately.
Since commit 9b0adeb of PR git-lfs#3125,
when we converted our test suite to use the "prove" command, we have
run up to nine test jobs in parallel in our primary test builds.

However, as of commit 7fd5aa6 in
PR git-lfs#3144, we only use a maximum of four parallel test jobs when
building our Linux RPM packages.  This has the effect of slightly
slowing down our CI suite, because, ever since commit
212c051 in PR git-lfs#3932, we build a
range of Linux packages in our GitHub Actions CI workflow.

(Note that for historical reasons, we only run the full "make test"
Makefile target in our "t" test directory when building RPM packages;
in our Debian packages, we just run the Go language tests.  While this
discrepancy is worth addressing, for the moment we simply aim to
improve the runtime of our RPM package builds.)
In commit 1926e94 of PR git-lfs#3125 we
migrated our test suite to use the "prove" command, and introduced
the lfstest-count-tests utility to help control a common instance of
our lfstest-gitserver program, which can be shared and reused by all
the individual test scripts.  In that PR we also moved our test
suite into the "t" directory, and added a Makefile with a "test"
target that runs the "prove" command over all the individual test
scripts; this is still the framework we use today.

The lfstest-count-tests utility did not work as intended on the
Windows platform, however, as noted in the description of commit
9b73c80 in the same PR:

  On the AppVeyor CI Windows environment, there exist issues that prevent
  the lfstest-count-tests program from acquiring the test_count.lock file
  correctly.

Rather than debug this problem, several workarounds were added, such
as the GIT_LFS_NO_TEST_COUNT environment variable, which disables the
use of the lfstest-count-tests utility in our individual t/t-*.sh
test scripts, and the GIT_LFS_LOCK_ACQUIRE_DISABLED environment variable,
which is used to prevent the lfstest-count-tests utility from trying
to acquire the test_count.lock file at all on Windows.

The bug which causes the lfstest-count-tests program to be unable to
acquire the lock file the second time the program is run is the result
of the lock file not being removed when the program runs the first time.
This in turn is due to the fact that the file handle returned by
the call to the OpenFile() function of the "os" package is never
released with a call to Close(), so the subsequent call to Remove()
results in failure.

On Unix systems, this does not occur, because the file can be unlinked
while file handles to it remain open.  On Windows, however, all file
handles must be closed before the file can be deleted.

We fix the underlying bug here by simply calling Close() on the new
file handle immediately after creating the file, so long as no error
was returned by the call to OpenFile().

In addition, we make a number of other fixes and enhancements to the
lfstest-count-tests program.  We use the more conventional permissions
mode of 0777 for any directories we create, rather than 0666.
We include the O_RDWR flag with the others we pass to the OpenFile()
function, since we are technically supposed to provide at least one
of O_RDWR, O_RDONLY, and O_WRONLY.  And we output an error message
if the deferred call to our release() function is unable to delete
the lock file.
In commit 9b73c80 of PR git-lfs#3125 the
GIT_LFS_LOCK_ACQUIRE_DISABLED environment variable was introduced
for use with our test suite.  When this variable is set to the value
"1", it indicates to the lfstest-count-tests test utility that it
should skip creating a test_count.lock file before updating the
test_count file we use to track the number of active test scripts.

This was done because the lfstest-count-tests program had a bug which
prevented it from deleting the lock file on Windows, causing subsequent
invocations of the program to be unable to exclusively create it
again.  As this bug was not resolved at the time, the choice was
made to work around the problem by skipping the creation of a
lock file altogether on Windows.

This change was also made in conjunction with the addition of another
environment variable, GIT_LFS_NO_TEST_COUNT, in commit
c591ff7 of the same PR.  That variable
is also set only on Windows, and stops the setup() and shutdown()
shell functions defined in our t/testhelpers.sh script from calling
the lfstest-count-tests program.

As we have now resolved the underlying problem in the lfstest-count-tests
utility, in a prior commit in this PR, we can now remove both of
these environment variables and the test suite features they control.

We start by removing support for the GIT_LFS_LOCK_ACQUIRE_DISABLED
environment variable from the lfstest-count-tests utility, and
changing our script/cibuild script so it no longer sets the variable.

We can also remove the environment variable from the list of those
we unset before running certain tests.  We had to introduce this
behaviour in commit aca1aff of
PR git-lfs#3808, when we migrated our CI jobs to GitHub Actions.  However,
we will no longer need to unset these variables as we will now never
set them at all, and they will have no effect if they were set.
In commit c591ff7 of PR git-lfs#3125 the
GIT_LFS_NO_TEST_COUNT environment variable was introduced for use
with our test suite.  When this variable is set to a non-empty
value, it indicates to the setup() and shutdown() test helper
shell functions that they should skip running the lfstest-count-tests
utility program entirely.

This was done because the lfstest-count-tests program had a bug which
prevented it from deleting the lock file on Windows, causing subsequent
invocations of the program to be unable to exclusively create it
again.  As this bug was not resolved at the time, the choice was
made to work around the problem in part by not counting the number
of running test scripts on Windows.  (We also introduced the
GIT_LFS_LOCK_ACQUIRE_DISABLED environment variable, in commit
9b73c80 of the same PR, to further
address the problem.)

When running our test suite on Windows, we set the GIT_LFS_NO_TEST_COUNT
to "1", which causes the individual test scripts to skip invoking
lfstest-count-tests entirely.  Instead, the program is run just once
in the "test" t/Makefile target before executing the test scripts with
the "prove" command, and then once again afterwards, both times by
explicitly resetting the GIT_LFS_NO_TEST_COUNT variable to an empty
string value.  These two invocations cause the lfstest-count-tests
program to start the lfstest-gitserver program before all the test
scripts are executed and then stop it once they have all finished.

As we have now resolved the underlying problem in the lfstest-count-tests
utility, in a prior commit in this PR, we can remove this environment
variable and the test suite features it controls.

We change our script/cibuild script to no longer set the
GIT_LFS_NO_TEST_COUNT variable, and we revise the conditionals
in the setup() and shutdown() test helper shell functions so they
always call the lfstest-count-tests program, rather than only doing
so if the GIT_LFS_NO_TEST_COUNT variable is unset or empty.  We
also change the "test" target's recipe in our t/Makefile file to
not unset GIT_LFS_NO_TEST_COUNT when running lfstest-count-tests
before and after it runs the "prove" command.

Finally, as we have already removed the GIT_LFS_LOCK_ACQUIRE_DISABLED
environment variable and all support for it, in a previous commit in
this PR, we are now also able to eliminate the unset_vars() shell
functions in some of our t/t-*.sh test scripts.  These functions are used
to unset both the GIT_LFS_LOCK_ACQUIRE_DISABLED and GIT_LFS_NO_TEST_COUNT
variables before running the tests in those scripts.  We added this
behaviour in commit aca1aff of PR git-lfs#3808,
when we migrated our CI jobs to GitHub Actions.  However, we will no
longer need to unset these variables as we will now never set them at all,
and they will have no effect if they are set.
In commit e1c2c8a of PR git-lfs#1107 we
revised our test suite to use a separate test directory, either a
temporary one we create or one specified in advance in a GIT_LFS_TEST_DIR
environment variable.  If that variable is empty or unset, we populate
it with the output of a "mktemp -d" command with a directory name
prefix pattern of "git-lfs_TEMP".  We then create all of our test
repositories in subdirectories of the location given by GIT_LFS_TEST_DIR.

If we create a directory because GIT_LFS_TEST_DIR is empty or unset,
we also set the RM_GIT_LFS_TEST_DIR variable with a flag value of "yes".
As of commit e1c2c8a, we would then
check that variable in the shutdown() shell function, and if the variable
was set to "yes", we would then remove the directory specified by the
GIT_LFS_TEST_DIR variable if it matched the pattern we use for
temporary directories and if the KEEPTRASH variable was empty or unset.
This ensured that, in the normal case, we removed all of our
test artifacts after completing our test scripts.

In commit 1926e94 of PR git-lfs#3125 we
then migrated our test suite to use the "prove" command, and introduced
the lfstest-count-tests utility to help control a common instance of
our lfstest-gitserver program, which can be shared and reused by all
the individual test scripts.  In that PR we also moved our test
suite into the "t" directory, and added a Makefile with a "test"
target that runs the "prove" command over all the individual test
scripts; this is still the framework we use today.

In that commit, the "test" target's recipe in the t/Makefile file was
written so it first loads the t/testenv.sh script and then runs the
setup() shell function, which calls "lfstest-count-tests increment",
which starts the lfstest-gitserver program because it increments the
test count from zero to one.  Every test script run by the "prove"
command then calls setup() again, which further increments the test
count, and then as it exits it calls the shutdown() shell function,
which runs the "lfstest-count-tests decrement" command.  That never
decrements the test count below one, however, because of the initial
call to setup() made prior to running "prove".  Finally, the "test"
Makefile target recpie runs shutdown() after "prove" is complete, which
decrements the test count to zero, at which point lfstest-count-tests
sends a shutdown POST HTTP request to the running lfstest-gitserver
instance.

In this design, though, each individual test script loads the
t/testenv.sh script, and the script is directly loaded twice by
the "test" Makefile target's recipe.  In each case, the script runs
in a separate shell session, and so does not find a GIT_LFS_TEST_DIR
environment variable (unless one is set in the user's shell session
when running "make test").  Thus each invocation of t/testenv.sh
results in the creation of a new temporary test directory.

Commit 1926e94 also removed the
logic from the shutdown() shell function that handled the deletion
of the directory specified by the GIT_LFS_TEST_DIR variable.

The consequence is that when we run an individual test script
by itself, it leaves a single temporary test directory behind, and
when we run the "make test" Makefile target, every t/t-*.sh script
leaves a test directory behind, plus two additional ones created
when the recipe loads the t/testenv.sh script directly.

We can improve this situation by restoring the logic to clean up
the directory specified by the GIT_LFS_TEST_DIR variable (if the
RM_GIT_LFS_TEST_DIR variable contains "yes" and the other conditions
described above are met), and then adjusting our "test" target's
recipe so it runs all of its key steps in a single Bash shell
session.  We also make sure to export the GIT_LFS_TEST_DIR and
RM_GIT_LFS_TEST_DIR variables into the environment, so that
when t/testenv.sh is loaded by the recipe the first time, it
creates a temporary directory and exports those variables, which
ensures they are then available to all the subsequent test scripts
and the recipe's final call to shutdown().

By making these changes, we ensure that in the normal case, when
GIT_LFS_TEST_DIR is empty or unset and we run "make test", we create
only a single temporary test directory, use it in all the individual
test scripts, and then remove it completely afterwards.

In our t/t-env.sh and t/t-workflow.sh test scripts, we check the
output of the "git lfs env" command in a number of tests, comparing
it to the values we expect based on the test conditions, including
the values of all environment variables whose names begin with the
"GIT_" prefix.  As we now export the GIT_LFS_TEST_DIR variable into
the environment, it is among these environment variables, which
causes the tests to fail on Windows unless we make one additional
change.

In our CI jobs, we run our test suite on Windows using the Git Bash
environment provided by the Git for Windows project, which is in turn
based on the MSYS2 environment.  In this context, the mktemp(1) command
returns a path beginning with /tmp, so that is now what is stored in
our GIT_LFS_TEST_DIR environment variable.  However, MSYS2 translates
such Unix-style paths into actual Windows filesystem paths when it
passes command-line arguments and environment variables to any
programs it executes, including our Git LFS client.  Thus the
"git lfs env" command returns a Windows-style path for the
GIT_LFS_TEST_DIR variable, while our tests in our t/t-env.sh and
t/t-workflow.sh test scripts are expecting a Unix-style path beginning
with /tmp.

To resolve this, when running on Windows, we set the MSYS2_ENV_CONV_EXCL
environment variable at the top of these scripts to contain the
name of the GIT_LFS_TEST_DIR variable.  As described in the MSYS2
documentation, this ensures MSYS2 does not alter the value of the
GIT_LFS_TEST_DIR environment variable:

  https://www.msys2.org/docs/filesystem-paths/#environment-variables

Finally, we take the opportunity to ensure that the command
substitutions we use to set the GIT_LFS_TEST_DIR environment
variable are fully quoted, and to clean up some nearby excess
whitespace.
@chrisd8088 chrisd8088 marked this pull request as ready for review October 6, 2024 00:05
@chrisd8088 chrisd8088 requested a review from a team as a code owner October 6, 2024 00:05
Copy link
Member

@larsxschneider larsxschneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! Thank you 🙇

@chrisd8088 chrisd8088 merged commit f8cf14b into git-lfs:main Oct 7, 2024
10 checks passed
@chrisd8088 chrisd8088 deleted the tests-cleanup-dirs-fix-counts branch October 7, 2024 17:06
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Feb 6, 2025
As described in commit 8d456a7 of
PR git-lfs#5879, we currently depend on the Git for Windows SDK to install
a version of the "prove" command from Perl Test::Harness distribution,
because we use that command to run our suite of shell tests.

However, in our CI jobs on Windows, the SDK does not at present install
the "prove" command into a location listed in the PATH environment variable,
so these CI jobs having begun failing.

This change may be related to the recent migration of the SDK's
release artifacts to GitHub from Azure Blobs, as outlined in
PR git-for-windows/git-for-windows-automation#109, and is possibly
a consequence of PR git-for-windows/build-extra#588.

Regardless, the "prove" command is fortunately still included in the
"minimal" flavour of the SDK, since it is listed in the manifest for
that flavour:

  https://github.com/git-for-windows/git-sdk-64/blob/f845bb725e56e058a0fbf93aba18c70906a1068e/.sparse/minimal-sdk#L188

We simply need to ensure that the directory in which the command is
now located, namely /usr/bin/core_perl, is added to the set of paths
in the PATH variable for the Git Bash environment.

Note that we only need to add this extra path to PATH when we are
running our script/cibuild script, as that runs the default recipe
from our t/Makefile file, which in turn executes the "test" recipe,
which uses the "prove" command to run and summarize all our shell
test scripts.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Feb 6, 2025
As described in commit 8d456a7 of
PR git-lfs#5879, we currently depend on the Git for Windows SDK to install a
version of the "prove" command from the Perl Test::Harness distribution,
because we use that command to run our suite of shell tests.

However, in our CI jobs on Windows, the SDK does not at present install
the "prove" command into a location listed in the PATH environment variable,
so these CI jobs having begun failing.

This change may be related to the recent migration of the SDK's
release artifacts to GitHub from Azure Blobs, as outlined in
PR git-for-windows/git-for-windows-automation#109, and is possibly
a consequence of PR git-for-windows/build-extra#588.

Regardless, the "prove" command is fortunately still included in the
"minimal" flavour of the SDK, since it is listed in the manifest for
that flavour:

  https://github.com/git-for-windows/git-sdk-64/blob/f845bb725e56e058a0fbf93aba18c70906a1068e/.sparse/minimal-sdk#L188

We simply need to ensure that the directory in which the command is
now located, namely /usr/bin/core_perl, is added to the set of paths
in the PATH variable for the Git Bash environment.

Note that we only need to add this extra path to PATH when we are
running our script/cibuild script, as that runs the default recipe
from our t/Makefile file, which in turn executes the "test" recipe,
which uses the "prove" command to run and summarize all our shell
test scripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants