Add additional functionality to the mock dgxa100 server #114

klueska · 2024-04-23T11:05:38Z

No description provided.

elezar

Just a question around implementing an explicit mock vs using the generated mocks.

elezar · 2024-04-23T11:17:20Z

pkg/nvml/mock/dgxa100/dgxa100.go

-		},
-	},
-}
-
 func New() nvml.Interface {


Question: Does using the generated mocks add value here?

It means that we don't have to implement an exhaustive interface and can rely on the stubbing by moq (or allow it to panic if an unimplemneted function is called).

I'm not sure I follow. The types in this package extend the generated mock types for exactly this purpose. I only overwrite the methods I care about, and the rest are left as stubs that panic.

It's actually how figured out which ones I needed to implement -- those that were missing were panicking via the moq stubs.

One thing to note is that this breaks information that we get from the mock such as the call history -- number of calls and the parameters.

The "Correct" way to handle this is to implement the funcs as required. See for example: https://github.com/NVIDIA/go-gpuallocator/blob/82d8924b3d40c7d31cc18d56e65b3bffe59d91fd/gpuallocator/device_test.go#L28-L52

Good point. Updated throughout.

pkg/nvml/mock/dgxa100/dgxa100.go

Signed-off-by: Kevin Klues <kklues@nvidia.com>

For now this only holds GI placement information with placeholders for the CI placement informatin. It should need to be extended to hold CI placement information in the future. Signed-off-by: Kevin Klues <kklues@nvidia.com>

Signed-off-by: Kevin Klues <kklues@nvidia.com>

elezar · 2024-04-23T11:47:16Z

pkg/nvml/mock/dgxa100/dgxa100.go

+		DriverVersion:     "550.54.15",
+		NvmlVersion:       "12.550.54.15",
+		CudaDriverVersion: 12040,


Suggested change

DriverVersion: "550.54.15",

NvmlVersion: "12.550.54.15",

CudaDriverVersion: 12040,

SystemGetDriverVersionFunc: func() (string, nvml.Return) {

return "550.54.15", nvml.SUCCESS

},

SystemGetNvmlVersionFunc: func() (string, nvml.Return) {

return "12.550.54.15", nvml.SUCCESS

},

SystemGetCudaDriverVersionFunc: func() (int, nvml.Return) {

return 12040, nvml.SUCCESS

},

Signed-off-by: Kevin Klues <kklues@nvidia.com>

elezar

Looks good. Let's get these in and iterate if we need anything else.

One thought ... if we return mock instances from the constructors instead of nvml.Interface and nvml.Device a caller can add / override mocks as required without having to modify the "upstream" code.

Probably out of scope for this PR though.

klueska · 2024-04-23T12:49:44Z

No, I think that makes sense. It will still implement the interface (so we can assign it to the interface in the calling code), but it can also be extended if desired from the calling code (which the current implementation doesn't allow). Let me add that in this PR.

This way the callers can extend these types to futher override their functions if desired, while still being able to assign them to the appropriate nvml interfaces. Signed-off-by: Kevin Klues <kklues@nvidia.com>

klueska · 2024-04-23T13:05:41Z

OK, updated with that change. I also vendored it into mig-parted and tested it to make sure it didn't break anything by doing this.

elezar

LGTM.

Signed-off-by: Kevin Klues <kklues@nvidia.com>

klueska · 2024-04-23T15:35:25Z

Added one more commit to introduce locks to protect from concurrent reads/writes of maps

elezar

lgtm

klueska requested a review from elezar April 23, 2024 11:05

elezar reviewed Apr 23, 2024

View reviewed changes

klueska added 4 commits April 23, 2024 11:37

Update mock dgxa100 server with real values for GI and CI profiles

b72647c

Signed-off-by: Kevin Klues <kklues@nvidia.com>

Move MIGProfiles and MIGPlacements variables to their own file

b394877

Signed-off-by: Kevin Klues <kklues@nvidia.com>

Add additional functionality to the mock dgxa100 server

37bdc54

Signed-off-by: Kevin Klues <kklues@nvidia.com>

klueska force-pushed the add-functionality-to-mock-dgxa100 branch from bc76c78 to 37bdc54 Compare April 23, 2024 11:38

elezar reviewed Apr 23, 2024

View reviewed changes

Assign dgxa100 mock methods as function pointers instead of overwriting

93fa13d

Signed-off-by: Kevin Klues <kklues@nvidia.com>

elezar previously approved these changes Apr 23, 2024

View reviewed changes

Return concrete types from mock dgxa100 server instead of interfaces

5e1cdb1

This way the callers can extend these types to futher override their functions if desired, while still being able to assign them to the appropriate nvml interfaces. Signed-off-by: Kevin Klues <kklues@nvidia.com>

klueska dismissed elezar’s stale review via 5e1cdb1 April 23, 2024 13:05

elezar previously approved these changes Apr 23, 2024

View reviewed changes

Add Mutexes to mock dgxa100 types to avoid concurrent maps reads/writes

6895ece

Signed-off-by: Kevin Klues <kklues@nvidia.com>

klueska dismissed elezar’s stale review via 6895ece April 23, 2024 15:34

elezar approved these changes Apr 23, 2024

View reviewed changes

klueska merged commit 1adb7bb into NVIDIA:main Apr 23, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional functionality to the mock dgxa100 server #114

Add additional functionality to the mock dgxa100 server #114

klueska commented Apr 23, 2024

elezar left a comment

elezar Apr 23, 2024

klueska Apr 23, 2024 •

edited

klueska Apr 23, 2024 •

edited

elezar Apr 23, 2024

klueska Apr 23, 2024

elezar Apr 23, 2024

elezar left a comment

klueska commented Apr 23, 2024

klueska commented Apr 23, 2024

elezar left a comment

klueska commented Apr 23, 2024

elezar left a comment

-		DriverVersion:     "550.54.15",
-		NvmlVersion:       "12.550.54.15",
-		CudaDriverVersion: 12040,
+		SystemGetDriverVersionFunc: func() (string, nvml.Return) {
+		    return "550.54.15", nvml.SUCCESS
+		},
+		SystemGetNvmlVersionFunc: func() (string, nvml.Return) {
+		      return "12.550.54.15", nvml.SUCCESS
+		 },
+		SystemGetCudaDriverVersionFunc: func() (int, nvml.Return) {
+		     return 12040, nvml.SUCCESS
+		},

Add additional functionality to the mock dgxa100 server #114

Add additional functionality to the mock dgxa100 server #114

Conversation

klueska commented Apr 23, 2024

elezar left a comment

Choose a reason for hiding this comment

elezar Apr 23, 2024

Choose a reason for hiding this comment

klueska Apr 23, 2024 • edited

Choose a reason for hiding this comment

klueska Apr 23, 2024 • edited

Choose a reason for hiding this comment

elezar Apr 23, 2024

Choose a reason for hiding this comment

klueska Apr 23, 2024

Choose a reason for hiding this comment

elezar Apr 23, 2024

Choose a reason for hiding this comment

elezar left a comment

Choose a reason for hiding this comment

klueska commented Apr 23, 2024

klueska commented Apr 23, 2024

elezar left a comment

Choose a reason for hiding this comment

klueska commented Apr 23, 2024

elezar left a comment

Choose a reason for hiding this comment

klueska Apr 23, 2024 •

edited

klueska Apr 23, 2024 •

edited