Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new exporter type: DirectFLP #223

Merged
merged 2 commits into from
Nov 24, 2023

Conversation

jotak
Copy link
Member

@jotak jotak commented Nov 16, 2023

Description

DirectFLP starts the whole flowlogs-pipeline process directly from the
Agent, eliminating the need for a network connection between them.

This can be useful when both Agent and FLP run on the same host, saving
resources. This is mutually exclusive with other exporters such as with
Kafka.

Dependencies

netobserv/flowlogs-pipeline#538

(hold until this is merged => replace directive to remove)

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@jotak jotak added no-doc This PR doesn't require documentation change on the NetObserv operator no-qe This PR doesn't necessitate QE approval labels Nov 16, 2023
Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (a5378d6) 31.83% compared to head (fa1002c) 33.38%.

Files Patch % Lines
pkg/agent/agent.go 0.00% 8 Missing ⚠️
pkg/exporter/direct_flp.go 73.91% 4 Missing and 2 partials ⚠️
pkg/exporter/convert_flp.go 95.08% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
+ Coverage   31.83%   33.38%   +1.55%     
==========================================
  Files          37       39       +2     
  Lines        3371     3463      +92     
==========================================
+ Hits         1073     1156      +83     
- Misses       2234     2241       +7     
- Partials       64       66       +2     
Flag Coverage Δ
unittests 33.38% <83.96%> (+1.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +143 to +145
// Flowlogs-pipeline configuration, used when export is "direct-flp". Cf https://github.com/netobserv/flowlogs-pipeline
// The "ingest" stage must be omitted from this configuration, since it is handled internally by the agent.
FLPConfig string `env:"FLP_CONFIG"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as FLP_CONFIG is set, EXPORT should be ignored.
The export capabilities from pod point of view are inside FLP config right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends from where you sit.. In my mind, this new mode is really a new exporter, sitting from the Agent codebase PoV: the agent generates flows and then exports them somewhere, here to FLP; the fact that FLP is in the same process is a detail, it's still an exporter from this codebase PoV.

I guess you have more the "external user point of view" where, once FLP config is set, "export" would mean what happens "after FLP processing", which indeed doesn't really make sense since FLP does the export (to loki, prom, etc.)

idk, I think both approaches can be defended ... @msherif1234 , some preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking of it as new exporter mode resonate well with me as well

Comment on lines +254 to +255
case "direct-flp":
return buildDirectFLPExporter(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move that outside of cfg.Export as:

if len(cfg.FLPConfig) {
  return buildDirectFLPExporter(cfg)
}
switch cfg.Export {
  ...
}

Copy link
Contributor

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach sounds good ! Just a small change on config to make it clear for the user
We should also add an example in deployments folder

// is screwed up for HTTPClientConfig in github.com/prometheus/common/config (used for Loki)
// This is ok as YAML is a superset of JSON.
// E.g. try unmarshaling `{"clientConfig":{"authorization":{}}}` as a api.WriteLoki
// See also https://github.com/prometheus/prometheus/issues/11816
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting thanks for referring to the issue in promo its almost 1yr so probably this work around will stay for sometime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's even a PR from @OlivierCazade that is older than that :-) prometheus/common#394

func (d *DirectFLP) ExportFlows(input <-chan []*flow.Record) {
for inputRecords := range input {
pbRecords := flowsToPBNoChunk(inputRecords)
d.Write(pbRecords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step can be removed if we let the new FLP pipe process ebpf record directly not protobuf records

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done via bcccbb7

out["SrcPort"] = flow.Transport.GetSrcPort()
out["DstPort"] = flow.Transport.GetDstPort()
}
if ethType != ethernet.EtherTypeIPv4 && ethType != ethernet.EtherTypeIPv6 {
Copy link
Member Author

@jotak jotak Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msherif1234 I'm doing this early return here when it's not ipv4/ipv6, could you confirm this is correct?
Previously, in that case (e.g. with ARP), the code would still check for drops/DNS/RTT fields and would add for instance DnsErrno: 0, while now this isn't the case anymore, but I believe our drops/DNS/RTT metrics wouldn't make sense for non-IP packets, right?

If this doesn't sound good to you, I can rollback, but I'd need to adapt also the ConvertToFLP function because I'm enforcing in my tests that the same flow input produces the exact same output regardless if it's using ConvertToFLP or converting via protobuf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only thing we could be missing here is l2 drops, also pls test with DNS TCP to be sure DNSError is only none zero for TCP handshake I can help you with this test if u like

someTime := time.Now()
var someDuration time.Duration = 10000000 // 10ms

tests := []struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msherif1234 FYI the tests here are the same that were previously in decode_protobuf_test except that they're now used to test both pb conversion and direct FLP conversion, making sure they both produce the same result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test coverting from ebpf record to flp configmap but it seems u deleted from gRPC to configmap in case this mode isn't set ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf my next comment below

assert.Equalf(t, *tt.expected, out, tt.name)
}
}
// NOTE: more tests in convert_flp_test.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did intentionally deleted conversion from gRPC to generic map test cases here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, like I wrote this test was moved (and sightly changed) to convert_flp_test, but I admit the name is misleading, I'll rename that file to clarify it's testing BOTH grpc and direct-flp conversions

@@ -17,7 +17,8 @@ const (
)
const MacLen = 6

// IPv6Type value as defined in IEEE 802: https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml
// IPv4Type / IPv6Type value as defined in IEEE 802: https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml
const IPv4Type = 0x0800
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let use ethernet.EtherTypeIPv4 and ethernet.EtherTypeIPv6 instead of hardcoding them ourself

require.NoError(t, err)
assert.Equalf(t, *tt.expected, outDirect, tt.name)

// Test pb conversion
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see here @msherif1234 : I am testing both FLP conversion AND protobuf conversion, with the bonus thatit makes sure the two are producing the same result (they're both compared against the same expected map)

DirectFLP starts the whole flowlogs-pipeline process directly from the
Agent, eliminating the need for a network connection between them.

This can be useful when both Agent and FLP run on the same host, saving
resources. This is mutually exclusive with other exporters such as with
Kafka.

rollback e2e deps bump

Avoid pbuf conversion when using direct-flp

Clarify test file name, use existing ipv6 const

Restore L2 drops
@msherif1234
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot removed the lgtm label Nov 23, 2023
@jotak
Copy link
Member Author

jotak commented Nov 23, 2023

/unhold

@msherif1234
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 23, 2023
@jotak
Copy link
Member Author

jotak commented Nov 24, 2023

thanks @msherif1234 !
/approve

Copy link

openshift-ci bot commented Nov 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 06f4492 into netobserv:main Nov 24, 2023
9 checks passed
@jotak jotak deleted the flp-in-process branch November 29, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm no-doc This PR doesn't require documentation change on the NetObserv operator no-qe This PR doesn't necessitate QE approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants