Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receiver/hostmetrics/disk: add serial number to disk metrics #26297

Closed

Conversation

gdvalle
Copy link

@gdvalle gdvalle commented Aug 30, 2023

Description:

This adds a serial_number attribute to the hostmetricsreceiver's disk metrics. This data is already fetched as part of the IOCountersStat struct we use for other disk metrics, so nothing new is introduced.

Because gopsutil's serial number enumeration was broken when called from IOCounters(), this relies on a patch in upstream that's not yet in a release, so this also bumps gopsutil to a commit with that included.

The use case is associating disk metrics with cloud provider disks. For example, in Google Cloud, a serial number can look like 0Google_PersistentDisk_pvc-xxx, where pvc-xxx is the GCP disk resource name. It's also the "short" serial number, which might be ideal, but gathering that would be larger patch for gopsutil, so I think we can settle for this.

Link to tracking Issue:
N/A

Testing:
This has been running in our production environment for a couple weeks now.

Documentation:
None.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @gdvalle! Please add a changelog w/ the enhancement

@@ -19,7 +19,6 @@ require (
)

require (
cloud.google.com/go/compute/metadata v0.2.4-0.20230617002413-005d2dfb6b68 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these go.mod changes appear unrelated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these all came from running make gotidy. Can see the changes isolated in eb94c8e. Is there another way to get the gopsutil dep update synced without these changes?

@@ -23,7 +23,6 @@ require (
)

require (
cloud.google.com/go/compute/metadata v0.2.4-0.20230617002413-005d2dfb6b68 // indirect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -22,7 +22,6 @@ require (
)

require (
cloud.google.com/go/compute/metadata v0.2.4-0.20230617002413-005d2dfb6b68 // indirect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again

@@ -134,29 +134,29 @@ func (s *scraper) scrape(_ context.Context) (pmetric.Metrics, error) {

func (s *scraper) recordDiskIOMetric(now pcommon.Timestamp, logicalDiskCounterValues []*perfcounters.CounterValues) {
for _, logicalDiskCounter := range logicalDiskCounterValues {
s.mb.RecordSystemDiskIoDataPoint(now, logicalDiskCounter.Values[readBytesPerSec], logicalDiskCounter.InstanceName, metadata.AttributeDirectionRead)
s.mb.RecordSystemDiskIoDataPoint(now, logicalDiskCounter.Values[writeBytesPerSec], logicalDiskCounter.InstanceName, metadata.AttributeDirectionWrite)
s.mb.RecordSystemDiskIoDataPoint(now, logicalDiskCounter.Values[readBytesPerSec], logicalDiskCounter.InstanceName, metadata.AttributeDirectionRead, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does windows not have a similar attribute?

@dmitryax
Copy link
Member

dmitryax commented Aug 30, 2023

This is considered a breaking change even if it doesn't increase cardinality and should not be merged in the current form.

It can be added as an optional resource attribute or as an optional metric attribute after #16374 is done.

Also, there is a working group started to stabilize the system metrics https://github.com/orgs/open-telemetry/projects/55. So this change should be reflected in semantic conventions as well

@gdvalle
Copy link
Author

gdvalle commented Sep 1, 2023

This is considered a breaking change even if it doesn't increase cardinality and should not be merged in the current form.

For my own understanding, could someone explain what the contract is we're breaking here? I'm asking so I know the acceptance criteria for the next go-round.

I can see hostmetrics labeled as "beta", and from https://github.com/open-telemetry/opentelemetry-collector#beta

Same as Alpha, but the configuration options are deemed stable. While there might be breaking changes between releases, component owners should try to minimize them.

Is it then related to the spec described in semantic-conventions, and that gaining a required attribute would violate that? And attribute changes to other receivers, that do not have semantic conventions, would be handled differently?

@dmitryax
Copy link
Member

dmitryax commented Sep 1, 2023

For my own understanding, could someone explain what the contract is we're breaking here? I'm asking so I know the acceptance criteria for the next go-round.

@gdvalle please take a look at this doc https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/scraping-receivers.md. It covers the contract as well

Is it then related to the spec described in semantic-conventions, and that gaining a required attribute would violate that? And attribute changes to other receivers, that do not have semantic conventions, would be handled differently?

Other receivers are more relaxed, especially those that don't have semantic conventions yet. The system metrics are the most mature at this point, and given that the semantic conventions for them will be marked stable soon, this component has to follow all the definitions. All the attributes have to be defined in semantic conventions as well.

How we go about this PR should be discussed in the System Semantic Conventions working group first. The discussion should be started as an issue in https://github.com/open-telemetry/semantic-conventions/issues.

To me, this sounds like an optional resource attribute. However, the problem is that the device is a metric (not resource) attribute. Maybe we can consider changing that. If we keep the device as a metric attribute, this should be added as an optional metric attribute after #16374 is done.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 16, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants