Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry: Handle failures determining disk usage #2112

Merged
merged 1 commit into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 4 additions & 2 deletions lib/status-report.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion lib/status-report.js.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 22 additions & 14 deletions lib/util.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion lib/util.js.map

Large diffs are not rendered by default.

13 changes: 8 additions & 5 deletions src/status-report.ts
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ export interface StatusReportBase {
/** Action runner hardware architecture (context runner.arch). */
runner_arch?: string;
/** Available disk space on the runner, in bytes. */
runner_available_disk_space_bytes: number;
runner_available_disk_space_bytes?: number;
/**
* Version of the runner image, for workflows running on GitHub-hosted runners. Absent otherwise.
*/
Expand All @@ -106,7 +106,7 @@ export interface StatusReportBase {
/** Action runner operating system release (x.y.z from os.release()). */
runner_os_release?: string;
/** Total disk space on the runner, in bytes. */
runner_total_disk_space_bytes: number;
runner_total_disk_space_bytes?: number;
/** Time the first action started. Normally the init action. */
started_at: string;
/** State this action is currently in. */
Expand Down Expand Up @@ -192,7 +192,7 @@ export async function createStatusReportBase(
actionName: ActionName,
status: ActionStatus,
actionStartedAt: Date,
diskInfo: DiskUsage,
diskInfo: DiskUsage | undefined,
cause?: string,
exception?: string,
): Promise<StatusReportBase> {
Expand Down Expand Up @@ -230,9 +230,7 @@ export async function createStatusReportBase(
job_name: jobName,
job_run_uuid: jobRunUUID,
ref,
runner_available_disk_space_bytes: diskInfo.numAvailableBytes,
runner_os: runnerOs,
runner_total_disk_space_bytes: diskInfo.numTotalBytes,
started_at: workflowStartedAt,
status,
testing_environment: testingEnvironment,
Expand All @@ -241,6 +239,11 @@ export async function createStatusReportBase(
workflow_run_id: workflowRunID,
};

if (diskInfo) {
statusReport.runner_available_disk_space_bytes = diskInfo.numAvailableBytes;
statusReport.runner_total_disk_space_bytes = diskInfo.numTotalBytes;
Comment on lines +243 to +244
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to capture the reason for the df failure and send it as part of the status report?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message wouldn't make it into kusto, but at least it would be available in splunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably add an internal error diagnostic for this. Let's address as potential follow up, as you suggest.

}

// Add optional parameters
if (cause) {
statusReport.cause = cause;
Expand Down
47 changes: 29 additions & 18 deletions src/util.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1011,26 +1011,37 @@ export interface DiskUsage {
numTotalBytes: number;
}

export async function checkDiskUsage(logger?: Logger): Promise<DiskUsage> {
const diskUsage = await checkDiskSpace(
getRequiredEnvParam("GITHUB_WORKSPACE"),
);
const gbInBytes = 1024 * 1024 * 1024;
if (logger && diskUsage.free < 2 * gbInBytes) {
const message =
"The Actions runner is running low on disk space " +
`(${(diskUsage.free / gbInBytes).toPrecision(4)} GB available).`;
if (process.env[EnvVar.HAS_WARNED_ABOUT_DISK_SPACE] !== "true") {
logger.warning(message);
} else {
logger.debug(message);
export async function checkDiskUsage(
logger?: Logger,
): Promise<DiskUsage | undefined> {
try {
const diskUsage = await checkDiskSpace(
getRequiredEnvParam("GITHUB_WORKSPACE"),
);
const gbInBytes = 1024 * 1024 * 1024;
if (logger && diskUsage.free < 2 * gbInBytes) {
const message =
"The Actions runner is running low on disk space " +
`(${(diskUsage.free / gbInBytes).toPrecision(4)} GB available).`;
if (process.env[EnvVar.HAS_WARNED_ABOUT_DISK_SPACE] !== "true") {
logger.warning(message);
} else {
logger.debug(message);
}
core.exportVariable(EnvVar.HAS_WARNED_ABOUT_DISK_SPACE, "true");
}
core.exportVariable(EnvVar.HAS_WARNED_ABOUT_DISK_SPACE, "true");
return {
numAvailableBytes: diskUsage.free,
numTotalBytes: diskUsage.size,
};
} catch (error) {
if (logger) {
logger.warning(
`Failed to check available disk space: ${getErrorMessage(error)}`,
);
}
return undefined;
}
return {
numAvailableBytes: diskUsage.free,
numTotalBytes: diskUsage.size,
};
}

/**
Expand Down