Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional functionality to the mock dgxa100 server #114

Merged
merged 7 commits into from
Apr 23, 2024

Conversation

klueska
Copy link
Contributor

@klueska klueska commented Apr 23, 2024

No description provided.

@klueska klueska requested a review from elezar April 23, 2024 11:05
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question around implementing an explicit mock vs using the generated mocks.

},
},
}

func New() nvml.Interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does using the generated mocks add value here?

It means that we don't have to implement an exhaustive interface and can rely on the stubbing by moq (or allow it to panic if an unimplemneted function is called).

Copy link
Contributor Author

@klueska klueska Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. The types in this package extend the generated mock types for exactly this purpose. I only overwrite the methods I care about, and the rest are left as stubs that panic.

Copy link
Contributor Author

@klueska klueska Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually how figured out which ones I needed to implement -- those that were missing were panicking via the moq stubs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that this breaks information that we get from the mock such as the call history -- number of calls and the parameters.

The "Correct" way to handle this is to implement the funcs as required. See for example: https://github.com/NVIDIA/go-gpuallocator/blob/82d8924b3d40c7d31cc18d56e65b3bffe59d91fd/gpuallocator/device_test.go#L28-L52

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Updated throughout.

pkg/nvml/mock/dgxa100/dgxa100.go Outdated Show resolved Hide resolved
pkg/nvml/mock/dgxa100/dgxa100.go Show resolved Hide resolved
Signed-off-by: Kevin Klues <kklues@nvidia.com>
For now this only holds GI placement information with placeholders for the CI
placement informatin. It should need to be extended to hold CI placement
information in the future.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
@klueska klueska force-pushed the add-functionality-to-mock-dgxa100 branch from bc76c78 to 37bdc54 Compare April 23, 2024 11:38
Comment on lines +84 to +86
DriverVersion: "550.54.15",
NvmlVersion: "12.550.54.15",
CudaDriverVersion: 12040,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DriverVersion: "550.54.15",
NvmlVersion: "12.550.54.15",
CudaDriverVersion: 12040,
SystemGetDriverVersionFunc: func() (string, nvml.Return) {
return "550.54.15", nvml.SUCCESS
},
SystemGetNvmlVersionFunc: func() (string, nvml.Return) {
return "12.550.54.15", nvml.SUCCESS
},
SystemGetCudaDriverVersionFunc: func() (int, nvml.Return) {
return 12040, nvml.SUCCESS
},

Signed-off-by: Kevin Klues <kklues@nvidia.com>
elezar
elezar previously approved these changes Apr 23, 2024
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's get these in and iterate if we need anything else.

One thought ... if we return mock instances from the constructors instead of nvml.Interface and nvml.Device a caller can add / override mocks as required without having to modify the "upstream" code.

Probably out of scope for this PR though.

@klueska
Copy link
Contributor Author

klueska commented Apr 23, 2024

No, I think that makes sense. It will still implement the interface (so we can assign it to the interface in the calling code), but it can also be extended if desired from the calling code (which the current implementation doesn't allow). Let me add that in this PR.

This way the callers can extend these types to futher override their functions
if desired, while still being able to assign them to the appropriate nvml
interfaces.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
@klueska
Copy link
Contributor Author

klueska commented Apr 23, 2024

OK, updated with that change. I also vendored it into mig-parted and tested it to make sure it didn't break anything by doing this.

elezar
elezar previously approved these changes Apr 23, 2024
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
@klueska
Copy link
Contributor Author

klueska commented Apr 23, 2024

Added one more commit to introduce locks to protect from concurrent reads/writes of maps

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@klueska klueska merged commit 1adb7bb into NVIDIA:main Apr 23, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants