Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: AWS S3 Receiver #30750

Open
2 tasks
adcharre opened this issue Jan 24, 2024 · 9 comments
Open
2 tasks

New component: AWS S3 Receiver #30750

adcharre opened this issue Jan 24, 2024 · 9 comments
Labels
Accepted Component New component has been sponsored

Comments

@adcharre
Copy link
Contributor

adcharre commented Jan 24, 2024

The purpose and use-cases of the new component

The S3 receiver will allow the retrieval and processing of telemetry data previously stored in S3 by the AWS S3 Exporter.
This will make it possible to retrieve data previously cold stored in S3 and allow us to investigate issues not reported within the time span data is available in our Observability service provider.

Example configuration for the component

receivers:
  awss3:
    s3downloader:
      s3_bucket: abucket
      s3_prefix: tenant_a
      s3_partition: minute
    starttime: "2024-01-13 15:00"
    endtime: "2024-01-21 15:00"

Telemetry data types supported

  • traces
  • metrics
  • logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

adcharre

Sponsor (optional)

@atoulme

Additional context

No response

@adcharre adcharre added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Jan 24, 2024
@atoulme
Copy link
Contributor

atoulme commented Jan 24, 2024

I'm interested to learn more. Would this be something you'd be able to checkpoint on?

@adcharre
Copy link
Contributor Author

Would this be something you'd be able to checkpoint on?

@atoulme certainly, it's something I'm actively looking into at the moment it so makes sense to me get a second opinion on the best way to implement this and hopefully accepted.
How best to organise?

@atoulme
Copy link
Contributor

atoulme commented Mar 6, 2024

For all components, we tend to work with folks through CONTRIBUTING.md. The question I asked you earlier is in earnest - one of the thorny issues around a component reading from a remote source is to have a checkpoint mechanism that allows you to know where you stopped. We can use the storage extension for that purpose.

I am happy to sponsor this component if you'd like to work on it.

@atoulme atoulme added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Mar 6, 2024
@adcharre
Copy link
Contributor Author

adcharre commented Mar 6, 2024

Ahh, I understand now! Thank you for clarification and yes that is an issue I have been thinking about - how best to signal that ingest is finished. I'll look into the storage extension and get a PR up with the skeleton of the receiver.

dmitryax pushed a commit that referenced this issue Mar 25, 2024
**Description:** Initial skeleton implementation of the AWS S3 receiver
described in issue #30750.
Full implementation will follow in future PRs.

**Link to tracking Issue:** #30750

**Testing:** -

**Documentation:** Initial README added.
@rhysxevans
Copy link

Hi, apologies I may be hijacking this thread.

Has there been any thought around integrating the S3 receiver, with SQS and S3 Event notifications ?

Our use case is we cannot directly write to an OTEL reciever in all cases, however we can write to a S3 bucket. We would then like the object event notification to notify SQS, where we could have a OTEL collector (or set of them) "listening" and on notification fetch the uploaded file and then output it into hte OTLP backend store. We coudl then also retain the source data in S3 and leverage the current features of this reciever to replay data if required.

An example sender may look something like:

    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:4318
            cors:
              allowed_origins:
                - "http://*"
                - "https://*"

    exporters:
      awss3:
        s3uploader:
            region: us-west-2
            s3_bucket: "tempo-traces-bucket"
            s3_prefix: 'metric'
            s3_partition: 'minute'

    processors:
      batch:
        send_batch_size: 10000
        timeout: 30s
      resource:
        attributes:
        - key: service.instance.id
          from_attribute: k8s.pod.uid
          action: insert
      memory_limiter:
        check_interval: 5s
        limit_mib: 200

    service:
      pipelines:
        traces:
          processors: [memory_limiter, resource, batch]
          exporters: [awss3, spanmetrics]

The reciever could poss look something like:

    receivers:
      awss3:
        sqs:
          queue_url: "https://sqs.us-west-1.amazonaws.com/<account_id>/queue"

    exporters:
      otlp:
        endpoint: 'http://otlp-endpoint:4317'

    processors:
      batch:
        send_batch_size: 10000
        timeout: 30s
      memory_limiter:
        check_interval: 5s
        limit_mib: 200

    service:
      pipelines:
        traces:
          processors: [memory_limiter, batch]
          exporters: [otlp, spanmetrics]

Thoughts ?

S3 Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html

@szechyjs
Copy link

szechyjs commented Apr 9, 2024

It would be nice to be able to have this run continuously instead of specifying start/end times. This would help with shipping traces across clusters/accounts.

flowchart LR
  subgraph env1
  app1 --> env1-collector
  app2 --> env1-collector
  end
  env1-collector --> S3[(S3)]
  subgraph env2
  app3 --> env2-collector
  app4 --> env2-collector
  end
  env2-collector --> S3
  subgraph shared-env
  S3 --> shared-collector
  end

@awesomeinsight
Copy link

It would be nice to be able to have this run continuously instead of specifying start/end times. This would help with shipping traces across clusters/accounts.

flowchart LR
  subgraph env1
  app1 --> env1-collector
  app2 --> env1-collector
  end
  env1-collector --> S3[(S3)]
  subgraph env2
  app3 --> env2-collector
  app4 --> env2-collector
  end
  env2-collector --> S3
  subgraph shared-env
  S3 --> shared-collector
  end

Fully agree,

we also have scenarios where a receiver should constantly process new uploads (from S3Exporter) on an S3 bucket. Means without specifying starttime and endtime but having a checkpoint where it last stopped reading.

@adcharre
Copy link
Contributor Author

@awesomeinsight / @rhysxevans - I see no reason why the receiver could not be expanded to include the scenario you suggest. At the moment I'm focusing on getting the initial implementation merged which focuses on my main use case of restoring data between a set of dates.

@worksForM3
Copy link

@awesomeinsight / @rhysxevans - I see no reason why the receiver could not be expanded to include the scenario you suggest. At the moment I'm focusing on getting the initial implementation merged which focuses on my main use case of restoring data between a set of dates.

If the receiver would be expanded at some point to constantly process new uploads made by the S3Exporter, could it be used to buffer data independently of a file system? The idea would be to have an alternative to https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/exporterhelper#persistent-queue.

The idea is to have a resilient setup of exporters + importers (with s3 in between as buffer) which run stateless, as they would not require any filesystem to buffer data to disks.

Do you think a setup like this would make sense?

andrzej-stencel pushed a commit that referenced this issue May 7, 2024
**Description:** This is the initial implementation of the AWS S3
receiver. The receiver can load trace from an S3 bucket starting at the
configured time until the stop time. Json and protobuf formats are
supported along with gzip compression.

**Link to tracking Issue:** #30750

**Testing:** Unit tests added and read real trace from an S3 bucket.

**Documentation:** None added

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
…y#31710)

**Description:** Initial skeleton implementation of the AWS S3 receiver
described in issue open-telemetry#30750.
Full implementation will follow in future PRs.

**Link to tracking Issue:** open-telemetry#30750

**Testing:** -

**Documentation:** Initial README added.
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
**Description:** This is the initial implementation of the AWS S3
receiver. The receiver can load trace from an S3 bucket starting at the
configured time until the stop time. Json and protobuf formats are
supported along with gzip compression.

**Link to tracking Issue:** open-telemetry#30750

**Testing:** Unit tests added and read real trace from an S3 bucket.

**Documentation:** None added

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this issue May 14, 2024
**Description:** This is the initial implementation of the AWS S3
receiver. The receiver can load trace from an S3 bucket starting at the
configured time until the stop time. Json and protobuf formats are
supported along with gzip compression.

**Link to tracking Issue:** open-telemetry#30750

**Testing:** Unit tests added and read real trace from an S3 bucket.

**Documentation:** None added

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
andrzej-stencel pushed a commit that referenced this issue May 23, 2024
…3110)

**Description:** Add metrics using obsreport to the S3 receiver

**Link to tracking Issue:** #30750 

**Testing:** Full stack test, confirmed
`otelcol_receiver_accepted_spans` and `otelcol_receiver_refused_spans`
was being exported for the `awss3` receiver.

**Documentation:** None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
Development

No branches or pull requests

6 participants