Use unique per-node "node_counter"s rather than a node hashmap in routing #2803

TheBlueMatt · 2023-12-21T06:00:28Z

During routing, we spend most of our time doing hashmap lookups. It turns out, we can drop two of them, the first requires a good bit of work - assigning each node in memory a random u32 "node counter", we can then drop the main per-node routefinding state map and replace it with a vec. Once we do that, we can also drop the first-hop hashmap lookup that we do on a per node basis as we walk the network graph, replacing it with a check in the same vec.

This is the first in a series of PRs that, in total, substantially more than double our routefinding performance with real data. This first step optimizes the route-finder itself, with later steps more focused on the scorer.

~~Based on #2802.~~

shaavan · 2023-12-22T12:51:31Z

CI's unhappy.
Looks like there's some error in the code

TheBlueMatt · 2024-01-08T21:35:15Z

Fixed.

TheBlueMatt · 2024-01-18T23:56:45Z

Rebased.

codecov-commenter · 2024-01-18T23:56:51Z

Codecov Report

Attention: Patch coverage is 93.73297% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 89.24%. Comparing base (b747b39) to head (f53cc4c).
Report is 4 commits behind head on main.

Files	Patch %	Lines
lightning/src/routing/router.rs	94.33%	9 Missing and 5 partials ⚠️
lightning/src/routing/gossip.rs	91.00%	3 Missing and 6 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2803      +/-   ##
==========================================
+ Coverage   89.21%   89.24%   +0.02%     
==========================================
  Files         117      117              
  Lines       95541    95747     +206     
  Branches    95541    95747     +206     
==========================================
+ Hits        85240    85450     +210     
+ Misses       7814     7805       -9     
- Partials     2487     2492       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai · 2024-01-18T23:56:57Z

Walkthrough

The project has undergone a significant update, focusing on efficiency and data integrity. The .github/workflows/build.yml file reflects updated paths and keys for network graph and scorer binaries, ensuring the latest versions are used. In the lightning source code, there's a refactoring for feature flag checks, structural optimizations for network graph storage, and scoring logic revisions to enhance performance. Additionally, new counters for node tracking in routing have been introduced, suggesting a move towards more detailed network analysis.

Changes

File Path	Change Summary
`.github/workflows/build.yml`	Updated paths and keys for net graph and scorer binaries; new SHA sum checks added.
`.../src/ln/features.rs`	Refactored `requires_unknown_bits` method for efficient flag comparison.
`.../src/routing/gossip.rs`	Added node counters and restructured fields for cache optimization and consistency checks.
`.../src/routing/scoring.rs`	Altered `decay_100k_channel_bounds` function to use graph scorer and current time update.
`.../src/util/test_utils.rs`	Introduced node counters to routing structs for enhanced route tracking.

🐇✨
To code we hop, with every commit,
A graph update, a refactor bit.
With scores and nodes, we weave the net,
Our binary tales, in silicon set. 🌐🔍

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 40

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 5592378 and bcbf56d.

Files selected for processing (6)

.github/workflows/build.yml (1 hunks)
lightning/src/ln/features.rs (1 hunks)
lightning/src/routing/gossip.rs (33 hunks)
lightning/src/routing/router.rs (38 hunks)
lightning/src/routing/scoring.rs (1 hunks)
lightning/src/util/test_utils.rs (2 hunks)

Files not summarized due to errors (1)

lightning/src/routing/router.rs: Error: Message exceeds token limit

Additional comments: 40

.github/workflows/build.yml (4)

86-87: Updated binary file paths and keys for the network graph to reflect new versioning. Ensure that the new binary files are correctly placed and accessible at the specified URLs.

91-93: The download and hash verification steps for the network graph binary have been updated. Verify that the hash matches the expected value to ensure integrity of the downloaded file.

98-98: The environment variable for the expected network graph snapshot SHA sum has been updated. Confirm that this new SHA sum is correct and corresponds to the new binary file.

99-115: New steps have been added for caching, fetching, and verifying the scorer binary file. Ensure that the scorer binary is correctly integrated into the CI process and that the SHA sum verification step is accurate.

lightning/src/ln/features.rs (1)

778-794: The refactoring of the requires_unknown_bits method to use chunk iteration for flag comparison is a significant improvement in terms of efficiency. By processing 64 bits at a time instead of 8, the method reduces the number of iterations needed for feature flag checks, which can be beneficial for performance, especially when dealing with a large number of feature flags.

lightning/src/routing/gossip.rs (34)

40-45: > Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [1-6]

Imports and uses are modified, ensure that all the newly added imports (AtomicUsize, Ordering, etc.) are used in the code and that there are no unused imports which can lead to warnings or bloat.

67-67: The NodeId struct is introduced or modified. Ensure that the changes to this struct are consistent with the rest of the codebase, especially with respect to serialization and deserialization, as these are common areas where issues arise when modifying data structures.

168-169: New fields removed_node_counters and next_node_counter are added to the NetworkGraph struct. Verify that the logic for managing these counters is correctly implemented throughout the codebase, especially in the context of node removal and addition.

197-197: The max_node_counter field is added to the ReadOnlyNetworkGraph. Ensure that this field is properly maintained and represents the correct maximum value of node_counter across all nodes.

754-761: The ChannelUpdateInfo struct is annotated with repr(C, align(32)). Confirm that the alignment and representation directives are appropriate and that they do not cause any unforeseen issues on different architectures or with FFI boundaries.

847-858: The ChannelInfo struct is annotated with repr(align(128), C). Similar to the previous comment, verify that the alignment and representation directives are appropriate and do not cause issues on different architectures or with FFI boundaries.

898-907: The PartialEq implementation for ChannelInfo is modified. Ensure that all fields that should be compared are included and that this change does not introduce any regressions in areas where ChannelInfo equality checks are performed.

1024-1025: The node_one_counter and node_two_counter fields in ChannelInfo are initialized with u32::max_value(). Confirm that this is the intended default value and that it is handled correctly in all parts of the code where ChannelInfo is used.

1036-1037: The DirectedChannelInfo struct now includes source_counter and target_counter fields. Verify that these fields are correctly updated and used in routing decisions.

1046-1051: The new method for DirectedChannelInfo is modified to set source_counter and target_counter. Ensure that the logic for determining these values is correct and that it aligns with the intended use of these counters in routing.

1094-1099: The source_counter and target_counter methods are added to DirectedChannelInfo. Verify that these methods are used consistently and correctly throughout the routing logic.

1290-1295: The node_counter field is added to NodeInfo. Ensure that this field is correctly managed throughout the node's lifecycle and that it is consistent with the new vector-based lookup system.

1359-1359: The node_counter field in NodeInfo is initialized with u32::max_value(). Confirm that this is the intended default value and that it is handled correctly in all parts of the code where NodeInfo is used.

1369-1370: The write method for NetworkGraph now includes a call to test_node_counter_consistency. Verify that this method is correctly implemented and that it does not introduce performance regressions.

1405-1415: The deserialization logic for ChannelInfo and NodeInfo is modified to set node_counter. Ensure that the deserialization process is correct and that the node_counter values are consistent with the serialized data.

1437-1438: The NetworkGraph constructor is modified to initialize removed_node_counters and next_node_counter. Verify that these fields are initialized to the correct values and that the constructor's logic is consistent with the rest of the codebase.

1484-1485: The NetworkGraph constructor is modified to initialize next_node_counter to 0 and removed_node_counters to an empty vector. Confirm that these initial values are correct and that they are handled properly throughout the graph's lifecycle.

1493-1521: The test_node_counter_consistency method is added to NetworkGraph. Verify that this method is correctly implemented and that it is called in appropriate places to ensure the consistency of node_counter values.

1680-1681: The node_one_counter and node_two_counter fields in ChannelInfo are initialized with u32::max_value(). Confirm that this is the intended default value and that it is handled correctly in all parts of the code where ChannelInfo is used.

1696-1697: The logic for adding a channel between nodes is modified. Verify that the changes are correct and that they do not introduce any regressions in channel management.

1711-1713: The remove_channel_in_nodes method is called within a match arm. Verify that the logic for removing and updating channel information is correct and that it does not introduce any inconsistencies in the network graph.

1723-1727: The node_counter_id array is introduced to manage node_counter values for channels. Verify that this logic is correct and that it properly updates the node_counter values for both nodes associated with a channel.

1832-1833: The node_one_counter and node_two_counter fields in ChannelInfo are initialized with u32::max_value(). Confirm that this is the intended default value and that it is handled correctly in all parts of the code where ChannelInfo is used.

1862-1862: The remove_channel_in_nodes method is called. Verify that the logic for removing a channel from nodes is correct and that it does not introduce any inconsistencies in the network graph.

1881-1881: The remove_channel_in_nodes method is called within a loop. Verify that the logic for removing channels and managing node counters is correct and that it does not introduce any inconsistencies in the network graph.

1890-1890: The logic for removing nodes and updating removed_node_counters is modified. Verify that the changes are correct and that they do not introduce any regressions in node management.

1973-1973: The remove_channel_in_nodes method is called within a loop. Verify that the logic for removing channels and managing node counters is correct and that it does not introduce any inconsistencies in the network graph.

2152-2160: The remove_channel_in_nodes method is modified to update removed_node_counters. Verify that the logic for removing channels and managing node counters is correct and that it does not introduce any inconsistencies in the network graph.

2218-2222: The max_node_counter method is added to ReadOnlyNetworkGraph. Verify that this method returns the correct maximum value for node_counter and that it is used appropriately throughout the code.

3519-3520: The node_one_counter and node_two_counter fields in ChannelInfo are set to specific values in a test. Verify that these values are appropriate for the test scenario and that the test correctly reflects the intended behavior of the code.

3539-3540: The node_one_counter and node_two_counter fields in ChannelInfo are set to specific values in a test. Verify that these values are appropriate for the test scenario and that the test correctly reflects the intended behavior of the code.

3595-3595: The node_counter field in NodeInfo is set to a specific value in a test. Verify that this value is appropriate for the test scenario and that the test correctly reflects the intended behavior of the code.

3632-3632: The read_network_graph benchmark function is modified. Verify that the benchmark is correctly set up and that it accurately measures the performance of reading the network graph.

3642-3642: The write_network_graph benchmark function is modified. Verify that the benchmark is correctly set up and that it accurately measures the performance of writing the network graph.

lightning/src/routing/scoring.rs (1)

3687-3687: The logic within decay_100k_channel_bounds has been fundamentally changed. Ensure that the new implementation aligns with the intended behavior and that all dependent features are tested with the updated logic.

coderabbitai · 2024-01-19T00:03:33Z

lightning/src/routing/router.rs


 // While RouteGraphNode can be laid out with fewer bytes, performance appears to be improved
 // substantially when it is laid out at exactly 64 bytes.
-//
-// Thus, we use `#[repr(C)]` on the struct to force a suboptimal layout and check that it stays 64
-// bytes here.
-#[cfg(any(ldk_bench, not(any(test, fuzzing))))]
 const _GRAPH_NODE_SMALL: usize = 64 - core::mem::size_of::<RouteGraphNode>();
-#[cfg(any(ldk_bench, not(any(test, fuzzing))))]
 const _GRAPH_NODE_FIXED_SIZE: usize = core::mem::size_of::<RouteGraphNode>() - 64;

 /// A [`CandidateRouteHop::FirstHop`] entry.


Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [1-1]

The comments about RouteGraphNode layout for performance should be followed by actual code that reflects this. If the layout is not enforced in the code, this comment is misleading.

It is enforced with static assertions.

@TheBlueMatt Thank you for clarifying. It's good to know that static assertions are used to ensure the layout for performance. My apologies for any confusion.

(\\_/) ( •_•) />🥕

lightning/src/routing/router.rs

coderabbitai · 2024-01-19T00:03:34Z

lightning/src/routing/router.rs

+	#[inline(always)]
+	fn src_node_counter(&self) -> u32 {
+		match self {
+			CandidateRouteHop::FirstHop(hop) => hop.payer_node_counter,
+			CandidateRouteHop::PublicHop(hop) => hop.info.source_counter(),
+			CandidateRouteHop::PrivateHop(hop) => hop.source_node_counter,
+			CandidateRouteHop::Blinded(hop) => hop.source_node_counter,
+			CandidateRouteHop::OneHopBlinded(hop) => hop.source_node_counter,
+		}
+	}
+
+	#[inline]
+	fn target_node_counter(&self) -> Option<u32> {
+		match self {
+			CandidateRouteHop::FirstHop(hop) => Some(hop.target_node_counter),
+			CandidateRouteHop::PublicHop(hop) => Some(hop.info.target_counter()),
+			CandidateRouteHop::PrivateHop(hop) => Some(hop.target_node_counter),
+			CandidateRouteHop::Blinded(_) => None,
+			CandidateRouteHop::OneHopBlinded(_) => None,
+		}
+	}
+


The methods src_node_counter and target_node_counter are marked with #[inline(always)] and #[inline] respectively. It's worth noting that forcing inlining is not always beneficial and can sometimes lead to increased binary size without performance gain. It's recommended to benchmark to ensure that these attributes are actually providing a benefit.

Consider removing the #[inline(always)] and #[inline] attributes and rely on the compiler's optimization unless benchmarks show a clear benefit.

This was done in response to careful benchmarking :)

@TheBlueMatt Thank you for the clarification! It's great to hear that these decisions were backed by careful benchmarking. Performance optimizations like these can indeed make a significant difference when they're informed by real-world data. Keep up the good work!

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

coderabbitai · 2024-01-19T00:03:35Z

lightning/src/routing/router.rs

-		// Fetch 1.5x more routes than we need as after we do some scorer updates we may end up
-		// with some routes we picked being un-routable.
-		for _ in 0..route_count * 3 / 2 {
+		for _ in 0..route_count {


The loop in generate_test_routes for selecting random (source, destination) pairs is non-deterministic and could lead to flaky tests. Consider using a deterministic approach or documenting the non-determinism.

Review the non-deterministic loop in generate_test_routes and consider making it deterministic. If non-determinism is required, document why it is necessary for the test.

coderabbitai · 2024-01-19T00:03:35Z

lightning/src/routing/router.rs

+		let (network_graph, _) = bench_utils::read_graph_scorer(&logger).unwrap();
 		let scorer = FixedPenaltyScorer::with_penalty(0);
 		generate_routes(bench, &network_graph, scorer, &Default::default(),
 			Bolt11InvoiceFeatures::empty(), 0, "generate_routes_with_zero_penalty_scorer");
 	}

 	pub fn generate_mpp_routes_with_zero_penalty_scorer(bench: &mut Criterion) {
 		let logger = TestLogger::new();
-		let network_graph = bench_utils::read_network_graph(&logger).unwrap();
+		let (network_graph, _) = bench_utils::read_graph_scorer(&logger).unwrap();


The benchmarks generate_routes_with_zero_penalty_scorer and generate_mpp_routes_with_zero_penalty_scorer are using a fixed penalty scorer with a penalty of 0. Ensure that this is the intended behavior and document the rationale behind using a zero penalty.

Review the use of a fixed penalty scorer with a penalty of 0 in the benchmarks. If this is the intended behavior, add comments explaining why a zero penalty is used.

coderabbitai · 2024-01-19T00:03:35Z

lightning/src/routing/router.rs

+		let (network_graph, scorer) = bench_utils::read_graph_scorer(&logger).unwrap();
 		let params = ProbabilisticScoringFeeParameters::default();
-		let scorer = ProbabilisticScorer::new(ProbabilisticScoringDecayParameters::default(), &network_graph, &logger);
 		generate_routes(bench, &network_graph, scorer, &params, Bolt11InvoiceFeatures::empty(), 0,
 			"generate_routes_with_probabilistic_scorer");
 	}

 	pub fn generate_mpp_routes_with_probabilistic_scorer(bench: &mut Criterion) {
 		let logger = TestLogger::new();
-		let network_graph = bench_utils::read_network_graph(&logger).unwrap();
+		let (network_graph, scorer) = bench_utils::read_graph_scorer(&logger).unwrap();
 		let params = ProbabilisticScoringFeeParameters::default();
-		let scorer = ProbabilisticScorer::new(ProbabilisticScoringDecayParameters::default(), &network_graph, &logger);
 		generate_routes(bench, &network_graph, scorer, &params,
 			channelmanager::provided_bolt11_invoice_features(&UserConfig::default()), 0,
 			"generate_mpp_routes_with_probabilistic_scorer");
 	}

 	pub fn generate_large_mpp_routes_with_probabilistic_scorer(bench: &mut Criterion) {
 		let logger = TestLogger::new();
-		let network_graph = bench_utils::read_network_graph(&logger).unwrap();
+		let (network_graph, scorer) = bench_utils::read_graph_scorer(&logger).unwrap();


The benchmarks generate_routes_with_probabilistic_scorer and generate_large_mpp_routes_with_probabilistic_scorer are using default scoring parameters. Ensure that these parameters are appropriate for the benchmarks and document any assumptions.

Review the scoring parameters used in the benchmarks. If the default parameters are appropriate, document any assumptions made. Otherwise, adjust the parameters to better reflect the intended benchmark conditions.

coderabbitai · 2024-01-19T00:03:35Z

lightning/src/routing/router.rs

+		let (network_graph, scorer) = bench_utils::read_graph_scorer(&logger).unwrap();
 		let mut params = ProbabilisticScoringFeeParameters::default();
 		params.linear_success_probability = false;
-		let scorer = ProbabilisticScorer::new(
-			ProbabilisticScoringDecayParameters::default(), &network_graph, &logger);
 		generate_routes(bench, &network_graph, scorer, &params,
 			channelmanager::provided_bolt11_invoice_features(&UserConfig::default()), 0,
 			"generate_routes_with_nonlinear_probabilistic_scorer");
 	}

 	pub fn generate_mpp_routes_with_nonlinear_probabilistic_scorer(bench: &mut Criterion) {
 		let logger = TestLogger::new();
-		let network_graph = bench_utils::read_network_graph(&logger).unwrap();
+		let (network_graph, scorer) = bench_utils::read_graph_scorer(&logger).unwrap();
 		let mut params = ProbabilisticScoringFeeParameters::default();
 		params.linear_success_probability = false;
-		let scorer = ProbabilisticScorer::new(
-			ProbabilisticScoringDecayParameters::default(), &network_graph, &logger);
 		generate_routes(bench, &network_graph, scorer, &params,
 			channelmanager::provided_bolt11_invoice_features(&UserConfig::default()), 0,
 			"generate_mpp_routes_with_nonlinear_probabilistic_scorer");
 	}

 	pub fn generate_large_mpp_routes_with_nonlinear_probabilistic_scorer(bench: &mut Criterion) {
 		let logger = TestLogger::new();
-		let network_graph = bench_utils::read_network_graph(&logger).unwrap();
+		let (network_graph, scorer) = bench_utils::read_graph_scorer(&logger).unwrap();


The benchmarks generate_routes_with_nonlinear_probabilistic_scorer and generate_large_mpp_routes_with_nonlinear_probabilistic_scorer are explicitly setting linear_success_probability to false. Ensure that this setting is appropriate for the benchmarks and document the rationale behind it.

Review the setting of linear_success_probability in the benchmarks. If this setting is appropriate, add comments explaining why it is set to false. Otherwise, adjust the setting to match the intended benchmark conditions.

coderabbitai · 2024-01-19T00:03:35Z

lightning/src/routing/router.rs

+		do_route_bench(bench, graph, scorer, score_params, bench_name, route_endpoints);
+	}
+
+	#[inline(never)]
+	fn do_route_bench<S: ScoreLookUp + ScoreUpdate>(
+		bench: &mut Criterion, graph: &NetworkGraph<&TestLogger>, scorer: S,
+		score_params: &S::ScoreParams, bench_name: &'static str,
+		route_endpoints: Vec<(ChannelDetails, PaymentParameters, u64)>,
+	) {
+		let payer = bench_utils::payer_pubkey();
+		let keys_manager = KeysManager::new(&[0u8; 32], 42, 42);
+		let random_seed_bytes = keys_manager.get_secure_random_bytes();
+


The function do_route_bench is using a hardcoded seed for randomness, which may not be ideal for benchmarking. Consider using a dynamic seed or documenting why a static seed is used.

Review the use of a hardcoded seed in the function do_route_bench. If a dynamic seed is more appropriate, update the function to use one. Otherwise, add comments explaining the choice of a static seed.

coderabbitai

Review Status

Actionable comments generated: 28

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 51d9ee3 and 8d80f82.

Files selected for processing (6)

.github/workflows/build.yml (1 hunks)
lightning/src/ln/features.rs (1 hunks)
lightning/src/routing/gossip.rs (33 hunks)
lightning/src/routing/router.rs (38 hunks)
lightning/src/routing/scoring.rs (1 hunks)
lightning/src/util/test_utils.rs (2 hunks)

Files not summarized due to errors (1)

lightning/src/routing/router.rs: Error: Message exceeds token limit

Files skipped from review as they are similar to previous changes (4)

.github/workflows/build.yml
lightning/src/ln/features.rs
lightning/src/routing/gossip.rs
lightning/src/util/test_utils.rs

coderabbitai · 2024-01-25T20:21:37Z

lightning/src/routing/router.rs


 // While RouteGraphNode can be laid out with fewer bytes, performance appears to be improved
 // substantially when it is laid out at exactly 64 bytes.
-//
-// Thus, we use `#[repr(C)]` on the struct to force a suboptimal layout and check that it stays 64
-// bytes here.
-#[cfg(any(ldk_bench, not(any(test, fuzzing))))]
 const _GRAPH_NODE_SMALL: usize = 64 - core::mem::size_of::<RouteGraphNode>();
-#[cfg(any(ldk_bench, not(any(test, fuzzing))))]
 const _GRAPH_NODE_FIXED_SIZE: usize = core::mem::size_of::<RouteGraphNode>() - 64;

 /// A [`CandidateRouteHop::FirstHop`] entry.


📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-6]

The constants _GRAPH_NODE_SMALL and _GRAPH_NODE_FIXED_SIZE are defined but not used in the provided code snippet. If they are not used elsewhere, they should be removed to avoid confusion.

- const _GRAPH_NODE_SMALL: usize = 64 - core::mem::size_of::<RouteGraphNode>(); - const _GRAPH_NODE_FIXED_SIZE: usize = core::mem::size_of::<RouteGraphNode>() - 64;

coderabbitai · 2024-01-25T20:21:37Z

lightning/src/routing/router.rs

+	pub target_node_id: &'a NodeId,
+	/// A unique ID which describes the source node of the hop (further from the payment target).
+	/// It will not conflict with any [`super::gossip::NodeInfo::node_counter`]s, but may be equal
+	/// to one if the node is a public node.
+	pub(crate) source_node_counter: u32,
+	/// A unique ID which describes the destination node of the hop (towards the payment target).
+	/// It will not conflict with any [`super::gossip::NodeInfo::node_counter`]s, but may be equal
+	/// to one if the node is a public node.
+	pub(crate) target_node_counter: u32,


Ensure the unique IDs (source_node_counter and target_node_counter) are properly managed and documented, especially regarding their non-conflict with public node counters and their significance in routing logic.

Consider adding more detailed comments on how these counters are generated and used, ensuring clarity for future maintainers.

coderabbitai · 2024-01-25T20:21:37Z

lightning/src/routing/router.rs

+	/// A unique ID which describes the introduction point of the blinded path.
+	/// It will not conflict with any [`super::gossip::NodeInfo::node_counter`]s, but will generally
+	/// be equal to one from the public network graph (assuming the introduction point is a public
+	/// node).
+	source_node_counter: u32,


The comment on line 1198 mentions that source_node_counter will generally be equal to one from the public network graph. This implies an assumption about the introduction point's visibility. Ensure this assumption is valid and clearly documented.

Clarify under what conditions the introduction point would not be a public node and how this affects the routing process.

coderabbitai · 2024-01-25T20:21:37Z

lightning/src/routing/router.rs

+	#[inline(always)]
+	fn src_node_counter(&self) -> u32 {
+		match self {
+			CandidateRouteHop::FirstHop(hop) => hop.payer_node_counter,
+			CandidateRouteHop::PublicHop(hop) => hop.info.source_counter(),
+			CandidateRouteHop::PrivateHop(hop) => hop.source_node_counter,
+			CandidateRouteHop::Blinded(hop) => hop.source_node_counter,
+			CandidateRouteHop::OneHopBlinded(hop) => hop.source_node_counter,
+		}
+	}
+
+	#[inline]
+	fn target_node_counter(&self) -> Option<u32> {
+		match self {
+			CandidateRouteHop::FirstHop(hop) => Some(hop.target_node_counter),
+			CandidateRouteHop::PublicHop(hop) => Some(hop.info.target_counter()),
+			CandidateRouteHop::PrivateHop(hop) => Some(hop.target_node_counter),
+			CandidateRouteHop::Blinded(_) => None,
+			CandidateRouteHop::OneHopBlinded(_) => None,
+		}
+	}
+


The use of #[inline(always)] and #[inline] attributes on src_node_counter and target_node_counter methods should be justified with comments, especially if they were added based on performance benchmarks.

Add comments explaining the decision to use these attributes, including any benchmarks or performance tests that support this choice.

coderabbitai · 2024-01-25T20:21:38Z

lightning/src/routing/router.rs

+const _NODE_MAP_SIZE_TWO_CACHE_LINES: usize = 128 - core::mem::size_of::<Option<PathBuildingHop>>();
+const _NODE_MAP_SIZE_EXACTLY_TWO_CACHE_LINES: usize = core::mem::size_of::<Option<PathBuildingHop>>() - 128;


The constants _NODE_MAP_SIZE_TWO_CACHE_LINES and _NODE_MAP_SIZE_EXACTLY_TWO_CACHE_LINES are defined but not used in the provided code snippet. If they are not used elsewhere, they should be removed to avoid dead code.

- const _NODE_MAP_SIZE_TWO_CACHE_LINES: usize = 128 - core::mem::size_of::<Option<PathBuildingHop>>(); - const _NODE_MAP_SIZE_EXACTLY_TWO_CACHE_LINES: usize = core::mem::size_of::<Option<PathBuildingHop>>() - 128;

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

const _NODE_MAP_SIZE_TWO_CACHE_LINES: usize = 128 - core::mem::size_of::<Option<PathBuildingHop>>();

const _NODE_MAP_SIZE_EXACTLY_TWO_CACHE_LINES: usize = core::mem::size_of::<Option<PathBuildingHop>>() - 128;

coderabbitai · 2024-01-25T20:21:38Z

lightning/src/routing/router.rs

+						.unwrap_or_else(|| CandidateRouteHop::PrivateHop(PrivateHopCandidate {
+							hint: hop, target_node_id: target,
+							source_node_counter: *private_source_node_counter,
+							target_node_counter: *private_target_node_counter,
+						}));


The logic for creating CandidateRouteHop instances based on the type of hop is critical for the routing algorithm. Ensure that this logic is correct and that the correct types of hops are being created.

Review the logic for creating CandidateRouteHop instances to ensure correctness. Add comments to explain the conditions under which different types of hops are created.

coderabbitai · 2024-01-25T20:21:38Z

lightning/src/routing/router.rs

+					if let Some((first_channels, peer_node_counter)) = first_hop_targets.get_mut(&target) {
 						sort_first_hop_channels(first_channels, &used_liquidities,
 							recommended_value_msat, our_node_pubkey);
 						for details in first_channels {
 							let first_hop_candidate = CandidateRouteHop::FirstHop(FirstHopCandidate {
-								details, payer_node_id: &our_node_id,
+								details, payer_node_id: &our_node_id, payer_node_counter,
+								target_node_counter: *peer_node_counter,


The logic for searching for a direct channel between the last checked hop and first hop targets is important for the routing algorithm. Ensure that this logic is correct and optimized for performance.

Review and optimize the logic for searching for a direct channel between hops for correctness and performance. Add comments to explain how this search is performed in the routing algorithm.

coderabbitai · 2024-01-25T20:21:38Z

lightning/src/routing/router.rs

+						if let Some((first_channels, peer_node_counter)) = first_hop_targets.get_mut(&NodeId::from_pubkey(&hop.src_node_id)) {
 							sort_first_hop_channels(first_channels, &used_liquidities,
 								recommended_value_msat, our_node_pubkey);
 							for details in first_channels {
 								let first_hop_candidate = CandidateRouteHop::FirstHop(FirstHopCandidate {
-									details, payer_node_id: &our_node_id,
+									details, payer_node_id: &our_node_id, payer_node_counter,
+									target_node_counter: *peer_node_counter,


The logic for sorting first hop channels and adding entries for first hop candidates is complex and should be carefully reviewed to ensure correctness.

Review the logic for sorting first hop channels and adding entries for first hop candidates to ensure correctness. Add comments to explain how first hop channels are processed in the routing algorithm.

coderabbitai · 2024-01-25T20:21:38Z

lightning/src/routing/router.rs

+				let mut new_entry = dist[payer_node_counter as usize].take().unwrap();
 				let mut ordered_hops: Vec<(PathBuildingHop, NodeFeatures)> = vec!((new_entry.clone(), default_node_features.clone()));

 				'path_walk: loop {
 					let mut features_set = false;
-					let target = ordered_hops.last().unwrap().0.candidate.target().unwrap_or(maybe_dummy_payee_node_id);
-					if let Some(first_channels) = first_hop_targets.get(&target) {
+					let candidate = &ordered_hops.last().unwrap().0.candidate;
+					let target = candidate.target().unwrap_or(maybe_dummy_payee_node_id);
+					let target_node_counter = candidate.target_node_counter();
+					if let Some((first_channels, _)) = first_hop_targets.get(&target) {


The logic for constructing the final path when the target node is our own node is critical for the routing algorithm. Ensure that this logic is correct and that the path is being constructed properly.

Review the logic for constructing the final path when the target node is our own node to ensure correctness. Add comments to explain how the final path is constructed in the routing algorithm.

coderabbitai · 2024-01-25T20:21:38Z

lightning/src/routing/router.rs

+					if target_node_counter.is_none() {
 						break 'path_walk;
 					}
+					if target_node_counter == Some(payee_node_counter) { break 'path_walk; }

-					new_entry = match dist.remove(&target) {
+					new_entry = match dist[target_node_counter.unwrap() as usize].take() {


The logic for breaking out of the path walk loop based on target_node_counter is critical for the routing algorithm. Ensure that this logic is correct and that the loop is being exited at the appropriate times.

Review the logic for breaking out of the path walk loop to ensure correctness. Add comments to explain the conditions under which the loop is exited in the routing algorithm.

TheBlueMatt · 2024-03-05T18:16:44Z

Rebased.

coderabbitai

Review Status

Actionable comments generated: 7

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7a35bf8 and c5e0599.

Files selected for processing (6)

.github/workflows/build.yml (1 hunks)
lightning/src/ln/features.rs (1 hunks)
lightning/src/routing/gossip.rs (33 hunks)
lightning/src/routing/router.rs (38 hunks)
lightning/src/routing/scoring.rs (1 hunks)
lightning/src/util/test_utils.rs (2 hunks)

Files not summarized due to errors (1)

lightning/src/routing/router.rs: Error: Message exceeds token limit

Files not reviewed due to errors (1)

(no review received)

Additional comments: 19

.github/workflows/build.yml (6)

86-87: The path and key for caching the routing graph snapshot have been updated to lightning/net_graph-2023-12-10.bin and ldk-net_graph-v0.0.118-2023-12-10.bin, respectively. Ensure that the new binary file is correctly generated and available at the specified URL to prevent CI failures.

91-93: The fetching process for the routing graph snapshot includes a SHA-256 checksum verification step. It's crucial to verify that the EXPECTED_ROUTING_GRAPH_SNAPSHOT_SHASUM matches the actual checksum of the file at https://bitcoin.ninja/ldk-net_graph-v0.0.118-2023-12-10.bin to ensure integrity and prevent potential security issues.

98-98: The EXPECTED_ROUTING_GRAPH_SNAPSHOT_SHASUM is updated. Double-check that this SHA-256 checksum accurately corresponds to the new routing graph snapshot file to ensure the integrity of the downloaded file.

103-104: The path and key for caching the scorer snapshot have been updated to lightning/scorer-2023-12-10.bin and ldk-scorer-v0.0.118-2023-12-10.bin, respectively. Confirm that the new binary file is correctly generated and accessible at the provided URL to avoid CI disruptions.

108-110: The fetching process for the scorer snapshot includes a SHA-256 checksum verification step. It's essential to ensure that the EXPECTED_SCORER_SNAPSHOT_SHASUM matches the actual checksum of the file at https://bitcoin.ninja/ldk-scorer-v0.0.118-2023-12-10.bin to maintain integrity and avert potential security risks.

115-115: The EXPECTED_SCORER_SNAPSHOT_SHASUM is updated. Verify that this SHA-256 checksum correctly matches the new scorer snapshot file to guarantee the integrity of the downloaded file.
lightning/src/routing/gossip.rs (8)
67-67: The NodeId struct is correctly annotated with #[derive(Clone, Copy, PartialEq, Eq)] to ensure it can be easily copied and compared.

168-169: The addition of removed_node_counters and next_node_counter fields to the NetworkGraph struct is consistent with the PR's objective to optimize routing performance by using unique counters for nodes.

197-197: The max_node_counter field in ReadOnlyNetworkGraph struct is a good addition for tracking the maximum node counter value, which is likely used for performance optimizations in routing.

753-761: The use of #[repr(C, align(32))] for ChannelUpdateInfo struct is a smart optimization to ensure that the struct's layout is predictable and aligned for cache performance. This is particularly important for structures that are frequently accessed and modified in performance-critical paths.

846-852: Similarly, the ChannelInfo struct's alignment with #[repr(align(128), C)] is a thoughtful optimization for cache performance. Ensuring that frequently accessed fields are likely to be on the same or adjacent cache lines can significantly impact performance in routing.

1024-1025: The initialization of node_one_counter and node_two_counter to u32::max_value() in ChannelInfo struct's Readable implementation seems to be a placeholder. It's crucial to ensure that these counters are correctly set elsewhere in the code to meaningful values, as using u32::max_value() directly could lead to incorrect behavior or performance issues.

Verification successful

The search results indicate that node_one_counter and node_two_counter are indeed updated from their initial u32::max_value() state in various parts of the code. There are assertions and explicit updates that suggest these counters are managed and set to meaningful values before being used in a way that could affect behavior or performance. This evidence supports the idea that there is a mechanism in place to ensure the counters are not left at their placeholder values throughout the code's execution.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that node counters are correctly assigned and not left at their initial max value state.
rg "node_(one|two)_counter" --type rust
Length of output: 1526
* 1381-1382: The `test_node_counter_consistency` method in `NetworkGraph` is a good practice for ensuring the internal consistency of node counters. This kind of invariant checking is crucial for debugging and maintaining the integrity of the routing graph. * 1537-1537: The `read_only` method in `NetworkGraph` provides a safe way to access a read-only view of the network graph. This is useful for operations that need to read network graph data without the risk of modifying it.
lightning/src/routing/scoring.rs (1)

3687-3687: Consider evaluating the performance impact of the time_passed method within decay_100k_channel_bounds. Frequent updates to channel scores could lead to performance bottlenecks, especially with a large number of channels.

Explore optimizing the decay calculation to reduce computational overhead, possibly by batching updates or adjusting the decay frequency based on network size and activity.

lightning/src/routing/router.rs (4)

1126-1132: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [3-6]

The constants _GRAPH_NODE_SMALL and _GRAPH_NODE_FIXED_SIZE are used in static assertions to enforce a specific layout for RouteGraphNode, as clarified in previous discussions. Ensure that these assertions are present and effectively enforce the intended layout for performance optimization.

1181-1189: The addition of source_node_counter and target_node_counter fields in this context further supports the PR's goal of optimizing routing performance by using unique u32 counters. Good consistency across different structs.

1205-1209: The introduction of source_node_counter for blinded paths aligns with the PR's optimization strategy. Ensure that the assumptions regarding the introduction point's visibility as a public node are valid and clearly documented.

1354-1375: The use of #[inline(always)] and #[inline] attributes is based on careful benchmarking, as previously clarified. Consider adding comments to document the benchmarking results and rationale behind these decisions to aid future maintainers.

coderabbitai · 2024-03-05T18:20:13Z

lightning/src/util/test_utils.rs

+										payer_node_counter: u32::max_value(),
+										target_node_counter: u32::max_value(),


The use of u32::max_value() as a placeholder for payer_node_counter and target_node_counter in FirstHopCandidate struct is noted. While this approach can serve as a temporary measure, it's crucial to ensure these fields are properly initialized before use to avoid potential bugs or unexpected behavior. Consider implementing a more explicit initialization process or documentation to clarify the intended use and lifecycle of these counters.

coderabbitai · 2024-03-05T18:20:13Z

lightning/src/util/test_utils.rs

+								source_node_counter: u32::max_value(),
+								target_node_counter: u32::max_value(),


Similarly, the use of u32::max_value() as a placeholder for source_node_counter and target_node_counter in PrivateHopCandidate struct raises concerns about proper initialization. As with the previous comment, it's important to ensure these fields are correctly set during actual use to prevent issues. Providing clear documentation or an initialization mechanism would help maintain code clarity and reliability.

coderabbitai · 2024-03-05T18:20:13Z

lightning/src/routing/scoring.rs


 	pub fn decay_100k_channel_bounds(bench: &mut Criterion) {
 		let logger = TestLogger::new();
-		let network_graph = bench_utils::read_network_graph(&logger).unwrap();
-		let mut scorer = ProbabilisticScorer::new(Default::default(), &network_graph, &logger);
-		// Score a number of random channels
-		let mut seed: u64 = 0xdeadbeef;
-		for _ in 0..100_000 {
-			seed = seed.overflowing_mul(6364136223846793005).0.overflowing_add(1).0;
-			let (victim, victim_dst, amt) = {
-				let rong = network_graph.read_only();
-				let channels = rong.channels();
-				let chan = channels.unordered_iter()
-					.skip((seed as usize) % channels.len())
-					.next().unwrap();
-				seed = seed.overflowing_mul(6364136223846793005).0.overflowing_add(1).0;
-				let amt = seed % chan.1.capacity_sats.map(|c| c * 1000)
-					.or(chan.1.one_to_two.as_ref().map(|info| info.htlc_maximum_msat))
-					.or(chan.1.two_to_one.as_ref().map(|info| info.htlc_maximum_msat))
-					.unwrap_or(1_000_000_000).saturating_add(1);
-				(*chan.0, chan.1.node_two, amt)
-			};
-			let path = Path {
-				hops: vec![RouteHop {
-					pubkey: victim_dst.as_pubkey().unwrap(),
-					node_features: NodeFeatures::empty(),
-					short_channel_id: victim,
-					channel_features: ChannelFeatures::empty(),
-					fee_msat: amt,
-					cltv_expiry_delta: 42,
-					maybe_announced_channel: true,
-				}],
-				blinded_tail: None
-			};
-			seed = seed.overflowing_mul(6364136223846793005).0.overflowing_add(1).0;
-			if seed % 1 == 0 {
-				scorer.probe_failed(&path, victim, Duration::ZERO);
-			} else {
-				scorer.probe_successful(&path, Duration::ZERO);
-			}
-		}
+		let (network_graph, mut scorer) = bench_utils::read_graph_scorer(&logger).unwrap();
 		let mut cur_time = Duration::ZERO;
 			cur_time += Duration::from_millis(1);
 			scorer.time_passed(cur_time);


📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [14-14]

There is a TODO comment regarding missing tests.

Would you like me to help generate tests for this functionality?

coderabbitai · 2024-03-05T18:20:14Z

lightning/src/routing/scoring.rs

-				scorer.probe_successful(&path, Duration::ZERO);
-			}
-		}
+		let (network_graph, mut scorer) = bench_utils::read_graph_scorer(&logger).unwrap();


The decay_100k_channel_bounds function lacks documentation. It's crucial to have clear documentation, especially for complex logic, to help maintainers and contributors understand its purpose and how it should be used.

+ /// Decays the bounds of channel liquidity over time to reflect the passage of time and the potential change in channel state.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

let (network_graph, mut scorer) = bench_utils::read_graph_scorer(&logger).unwrap();

let (network_graph, mut scorer) = bench_utils::read_graph_scorer(&logger).unwrap();

coderabbitai · 2024-03-05T18:20:14Z

lightning/src/routing/router.rs

+	/// A unique ID which describes the payer. It will not conlfict with any
+	/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the payer is a
+	/// public node.
+	pub(crate) payer_node_counter: u32,
+	/// A unique ID which describes the first hop counterparty. It will not conflict with any
+	/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the counterparty is
+	/// a public node.
+	pub(crate) target_node_counter: u32,


Please correct the typo in the comment on line 1148: "conlfict" should be "conflict".

- /// A unique ID which describes the payer. It will not conlfict with any + /// A unique ID which describes the payer. It will not conflict with any

The addition of payer_node_counter and target_node_counter fields is a good optimization for enhancing routing performance.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

/// A unique ID which describes the payer. It will not conlfict with any

/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the payer is a

/// public node.

pub(crate) payer_node_counter: u32,

/// A unique ID which describes the first hop counterparty. It will not conflict with any

/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the counterparty is

/// a public node.

pub(crate) target_node_counter: u32,

/// A unique ID which describes the payer. It will not conflict with any

/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the payer is a

/// public node.

pub(crate) payer_node_counter: u32,

/// A unique ID which describes the first hop counterparty. It will not conflict with any

/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the counterparty is

/// a public node.

pub(crate) target_node_counter: u32,

coderabbitai · 2024-03-05T18:20:14Z

lightning/src/routing/router.rs

+#[repr(align(128))]
 struct PathBuildingHop<'a> {
 	candidate: CandidateRouteHop<'a>,
+	target_node_counter: Option<u32>,
 	/// If we've already processed a node as the best node, we shouldn't process it again. Normally
 	/// we'd just ignore it if we did as all channels would have a higher new fee, but because we
 	/// may decrease the amounts in use as we walk the graph, the actual calculated fee may
 	/// decrease as well. Thus, we have to explicitly track which nodes have been processed and
 	/// avoid processing them again.
 	was_processed: bool,
+	/// When processing a node as the next best-score candidate, we want to quickly check if it is
+	/// a direct counterparty of ours, using our local channel information immediately if we can.
+	///
+	/// In order to do so efficiently, we cache whether a node is a direct counterparty here at the
+	/// start of a route-finding pass. Unlike all other fields in this struct, this field is never
+	/// updated after being initialized - it is set at the start of a route-finding pass and only
+	/// read thereafter.
+	is_first_hop_target: bool,


The alignment of PathBuildingHop to 128 bytes is a performance optimization. It would be beneficial to add comments explaining the rationale behind this specific alignment to aid future maintainers and contributors.

coderabbitai · 2024-03-05T18:20:14Z

lightning/src/routing/router.rs

+		use crate::routing::scoring::ProbabilisticScoringFeeParameters;

 		let logger = ln_test_utils::TestLogger::new();
-		let graph = match super::bench_utils::read_network_graph(&logger) {
-			Ok(f) => f,
+		let (graph, mut scorer) = match super::bench_utils::read_graph_scorer(&logger) {
+			Ok(res) => res,


The use of a hardcoded seed in the test generate_routes is noted. Consider documenting the rationale behind the choice of a static seed for consistency and reproducibility purposes.

tnull

Firstly, let me say sorry for taking this long to have a look this, I believe I self-requested review a while back.

I did a first high-level pass and added an initial round of questions. I have to say that I'm close to a concept NACK on this one: the router logic is hard to reason about as it is and we keep discovering bugs here. It seems to me that this PR significantly increases the code complexity and introduces several new angles how things can go wrong. While this seems to work just fine for now, I fear that we'll see more breakage in the router code as a consequence in the future. If we really want to go ahead with this, it would be great if we could find a better abstraction for our newly created data structure that would offer a foolproof API, e.g., so we don't for get to insert/remove reused counters in the corresponding list.

I have yet to run the benchmarks myself to see how much speedup this PR would gain us, but from my first impression I'm not convinced it's worth the increased risks and maintenance costs. Also, it seems that a good chunk of the performance improvements might come from the last few commits alone, which are optimizations that could be applied independently from switching to node counters?

tnull · 2024-03-06T13:58:04Z

lightning/src/routing/router.rs


 	/// Tries to open a network graph file, or panics with a URL to fetch it.
-	pub(crate) fn get_route_file() -> Result<std::fs::File, &'static str> {


Re: "This means future changes to the scorer's data may
be harder to benchmark": Would an alternative be to keep both versions as separate benchmarks, so we could still benchmark updates impacting the Scorer's data model with synthetic data?

We could, but I think the random failures model we had before is borderline useless for benchmarking route performance changes. We're somewhat better off doing some kind of conversion from old to new scoring data and then bencharking from there.

tnull · 2024-03-06T14:11:51Z

lightning/src/routing/gossip.rs

+	///
+	/// These IDs allow the router to avoid a `HashMap` lookup by simply using this value as an
+	/// index in a `Vec`, skipping a big step in some of the hottest code when routing.
+	pub(crate) node_counter: u32,


It seems to me that storing the counter in here could be pretty dangerous if we or users were to clone the NodeInfo, do something with it and come back, at which point we could have dropped the entry and recreated another NodeInfo with the same counter.

Is this an issue? Do we want store the node_counter as part of a wrapper struct holding both the NodeInfo and the counter? Alternatively, we could make this an Option<u32> and make sure that clone() would reset it to None, asserting we'd have to re-insert/lookup it as a 'fresh' info?

Hmm, I'm not convinced its an issue. The network graph is publicly read-only - it has several internal consistency requirements (eg each channel has both side nodes in the nodes map) which imply users can't freely edit bits of it (though they're welcome to take parts of it and copy them locally to build their own graphs).

lightning/src/routing/gossip.rs

tnull · 2024-03-06T14:24:11Z

lightning/src/routing/gossip.rs

@@ -1409,15 +1429,42 @@ impl<L: Deref> NetworkGraph<L> where L::Target: Logger {
 			logger,
 			channels: RwLock::new(IndexedMap::new()),
 			nodes: RwLock::new(IndexedMap::new()),
+			next_node_counter: AtomicUsize::new(0),
+			removed_node_counters: Mutex::new(Vec::new()),


Rather than adding these just side-by-side, could we create a new data structure wrapping them and exposing adequate insert/remove API methods so that we'd never forget to, e.g., call removed_node_counters.push(..) whenever we remove a node?

Hmm, sadly that doesn't improve things much, we end up with a bunch of places where we just replace one removed_node_counters.push(..) with a node_counters.removed_node(...). We can't super easily use a method on the graph to mark a node removed, as we're often doing it with hash map entries already in place from a previous lookup.

Still, test_node_counter_consistency is pretty thorough, so if we have any obvious bugs fuzzing or tests should easily hit assertions in that.

tnull · 2024-03-06T14:26:38Z

lightning/src/routing/gossip.rs

@@ -865,6 +857,24 @@ pub struct ChannelInfo {
 	/// (which we can probably assume we are - no-std environments probably won't have a full
 	/// network graph in memory!).
 	announcement_received_time: u64,
+
+	/// The [`NodeInfo::node_counter`] of the node pointed to by [`Self::node_one`].
+	pub(crate) node_one_counter: u32,


Given that these counters will be reused, how can we be sure that they won't get outdated, especially for cloned ChannelInfos as mentioned above regarding NodeInfo?

They should only get reused when a node was fully removed. The network graph already has internal consistency requirements that any channel has both the source and sink nodes in the graph as well. This just relies on that consistency requirement by adding an additional pointer. In terms of ChannelInfos copied and used outside of a specific graph, indeed, they could point nowhere, but that's kinda by definition - the counters are specific to a NetworkGraph, they aren't global in any other sense, and each route finding operation only cares about a single graph and its contained infos.

lightning/src/routing/router.rs

tnull · 2024-03-06T14:32:31Z

lightning/src/routing/router.rs

+	/// public node.
+	pub(crate) payer_node_counter: u32,
+	/// A unique ID which describes the first hop counterparty. It will not conflict with any
+	/// [`super::gossip::NodeInfo::node_counter`]s, but may be equal to one if the counterparty is


nit: may be equal to one is ambiguous in this context (here and below).

Not sure how this is ambiguous? Its saying that this won't step on the toes of any data in our graph, but it may be equal to some data in our graph if the node is public.

lightning/src/routing/router.rs

TheBlueMatt · 2024-03-06T15:56:18Z

I did a first high-level pass and added an initial round of questions. I have to say that I'm close to a concept NACK on this one: the router logic is hard to reason about as it is and we keep discovering bugs here. It seems to me that this PR significantly increases the code complexity and introduces several new angles how things can go wrong. While this seems to work just fine for now, I fear that we'll see more breakage in the router code as a consequence in the future. If we really want to go ahead with this, it would be great if we could find a better abstraction for our newly created data structure that would offer a foolproof API, e.g., so we don't for get to insert/remove reused counters in the corresponding list.

Fair, let me encapsulate the node counter logic and remove it from get_route and then we can see how we feel about it.

TheBlueMatt · 2024-03-06T16:00:34Z

I have yet to run the benchmarks myself to see how much speedup this PR would gain us, but from my first impression I'm not convinced it's worth the increased risks and maintenance costs. Also, it seems that a good chunk of the performance improvements might come from the last few commits alone, which are optimizations that could be applied independently from switching to node counters?

Sadly not. The last few commits reduce the pressure we put on the branch predictor, and improve things a bit on the edges, but the vast majority of the gain here is dropping the hash table lookups. A very large portion of our total routing time is spent just doing hash table lookups directly (we have like 3 or 4 of them we index into in routing - the network graph, gossip data, dist, etc), so dropping one entirely is a huge win.

Until now, our routing benchmarks used a synthetic scorer, generated by scoring random paths to build up some history. This is pretty far removed from real-world routing conditions, as alternative paths generally have no scoring information and even the paths we do take have only one or two past scoring results. Instead, we fetch a static serialized scorer, generated using minutely probes. This means future changes to the scorer's data may be harder to benchmark, but makes for substantially more realistic benchmarks for changes which don't impact the serialized state.

This ensures the route benchmarks themselves will appear with a distinct callgraph, making router profiling somewhat easier.

These counters are simply a unique number describing each node. They have no specific meaning, but start at 0 and count up, with counters being reused after a node has been deleted.

In the next commit, we'll use the new `node_counter`s to remove a `HashMap` from the router, using a `Vec` to store all our per-node information. In order to make finding entries in that `Vec` cheap, here we store the source and destintaion `node_counter`s in `ChannelInfo`, givind us the counters for both ends of a channel without doing a second `HashMap` lookup.

In the next commit we'll stop using `NodeId`s to look up nodes when routing, instead using the new per-node counters. Here we take the first step, adding a local struct which tracks temporary counters for route hints/source/destination nodes. Because we must ensure we have a 1-to-1 mapping from node ids to `node_counter`s, even across first-hop and last-hop hints, we have to be careful to check the network graph first, then a new `private_node_id_to_node_counter` map to ensure we only ever end up with one counter per node id.

Now that we have unique, dense, 32-bit identifiers for all the nodes in our network graph, we can store the per-node information when routing in a simple `Vec` rather than a `HashMap`. This avoids the overhead of hashing and table scanning entirely, for a nice "simple" optimization win.

When processing the main loop during routefinding, for each node, we check whether it happens to be our peer in one of our channels. This ensures we never fail to find a route that takes a hop through a private channel of ours, to a private node, then through invoice-provided route hints to reach the ultimate payee. Because this is incredibly hot code, doing a full `HashMap` lookup to check if each node is a first-hop target ends up eating a good chunk of time during routing. Luckily, we can trivially avoid this cost. Because we're already looking up the per-node state in the `dist` map, we can store a bool in each first-hop target's state, avoiding the lookup unless we know its going to succeed. This requires storing a dummy entry in `dist`, which feels somewhat strange, but is ultimately fine as we should never be looking at per-node state unless we've already found a path to that node, updating the fields in doign so.

Now that `PathBuildingHop` is stored in a `Vec` (as `Option`s), rather than `HashMap` entries, they can grow to fill a full two cache lines without a memory access performance cost. In the next commit we'll take advantage of this somewhat, but here we update the assertions and drop the `repr(C)`, allowing rust to lay the memory out as it wishes.

While LLVM should inline and elide the redundant calls, because the router is rather large LLVM can decide against inlining in some cases where it would be an nice win. Thus, its worth DRY'ing the redundant calls explicitly.

Because we now have some slack space in `PathBuildingHop`, we can use it to cache some additional hot values. Here we use it to cache the source and target `node_counter`s for public channels, effectively prefetching the values from the channel state.

It turns out we spend several percent of our routefinding time just checking if nodes and channels require unknown features byte-by-byte. While the cost is almost certainly dominated by the memory read latency, avoiding doing the checks byte-by-byte should reduce the branch count slightly, which may reduce the overhead.

Because fetching fields from the `$candidate` often implies an indirect read, grouping them together may result in one or two fewer memory loads, so we do so here.

Because we scan per-channel information in the hot inner loop of our routefinding immediately after looking a channel up in a `HashMap`, we end up spending a nontrivial portion of our routefinding time waiting on memory to be read in. While there is only so much we can do about that, ensuring the channel information that we care about is sitting on one or adjacent cache lines avoids paying that penalty twice. Thus, here we manually lay out `ChannelInfo` and `ChannelUpdateInfo` and set them to 128b and 32b alignment, respectively. This wastes some space in memory in our network graph, but improves routing performance in return.

TheBlueMatt force-pushed the 2023-12-routing-dist-vec branch 2 times, most recently from 22f6f4b to 06bcbf6 Compare January 8, 2024 21:35

tnull self-requested a review January 10, 2024 08:40

TheBlueMatt force-pushed the 2023-12-routing-dist-vec branch from 06bcbf6 to bcbf56d Compare January 18, 2024 23:56

coderabbitai bot reviewed Jan 19, 2024

View reviewed changes

TheBlueMatt force-pushed the 2023-12-routing-dist-vec branch from bcbf56d to 8d80f82 Compare January 25, 2024 20:17

coderabbitai bot reviewed Jan 25, 2024

View reviewed changes

TheBlueMatt force-pushed the 2023-12-routing-dist-vec branch from 8d80f82 to c5e0599 Compare March 5, 2024 18:16

TheBlueMatt mentioned this pull request Mar 5, 2024

Move code out of add_entry! in get_route #2920

Open

coderabbitai bot reviewed Mar 5, 2024

View reviewed changes

tnull reviewed Mar 6, 2024

View reviewed changes

TheBlueMatt force-pushed the 2023-12-routing-dist-vec branch 2 times, most recently from 3b12e7a to 75d0c2e Compare March 19, 2024 22:26

TheBlueMatt added 8 commits March 20, 2024 00:51

f drop unused mut

6d0ab08

Push the route benchmark results into a separate uninlined function

e74c0d6

This ensures the route benchmarks themselves will appear with a distinct callgraph, making router profiling somewhat easier.

Track a counter for each node in our network graph

6d2d095

These counters are simply a unique number describing each node. They have no specific meaning, but start at 0 and count up, with counters being reused after a node has been deleted.

f sp

e9327c8

TheBlueMatt added 6 commits March 20, 2024 00:51

Consolidate candidate access in add_entry during routing

9df8d44

Because fetching fields from the `$candidate` often implies an indirect read, grouping them together may result in one or two fewer memory loads, so we do so here.

TheBlueMatt force-pushed the 2023-12-routing-dist-vec branch from 75d0c2e to f53cc4c Compare March 20, 2024 00:51

		const _NODE_MAP_SIZE_TWO_CACHE_LINES: usize = 128 - core::mem::size_of::<Option<PathBuildingHop>>();
		const _NODE_MAP_SIZE_EXACTLY_TWO_CACHE_LINES: usize = core::mem::size_of::<Option<PathBuildingHop>>() - 128;

		payer_node_counter: u32::max_value(),
		target_node_counter: u32::max_value(),

		source_node_counter: u32::max_value(),
		target_node_counter: u32::max_value(),

	let (network_graph, mut scorer) = bench_utils::read_graph_scorer(&logger).unwrap();
	let (network_graph, mut scorer) = bench_utils::read_graph_scorer(&logger).unwrap();


		/// Tries to open a network graph file, or panics with a URL to fetch it.
		pub(crate) fn get_route_file() -> Result<std::fs::File, &'static str> {

Use unique per-node "node_counter"s rather than a node hashmap in routing #2803

Are you sure you want to change the base?

Use unique per-node "node_counter"s rather than a node hashmap in routing #2803

Conversation

TheBlueMatt commented Dec 21, 2023 • edited

shaavan commented Dec 22, 2023

TheBlueMatt commented Jan 8, 2024

TheBlueMatt commented Jan 18, 2024

codecov-commenter commented Jan 18, 2024 • edited

Codecov Report

coderabbitai bot commented Jan 18, 2024 • edited

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 19, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 25, 2024

Choose a reason for hiding this comment

TheBlueMatt commented Mar 5, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

coderabbitai bot Mar 5, 2024

Choose a reason for hiding this comment

tnull left a comment • edited

Choose a reason for hiding this comment

TheBlueMatt commented Dec 21, 2023 •

edited

codecov-commenter commented Jan 18, 2024 •

edited

coderabbitai bot commented Jan 18, 2024 •

edited

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot Jan 19, 2024 •

edited

coderabbitai bot Jan 19, 2024 •

edited

tnull left a comment •

edited

TheBlueMatt Mar 19, 2024 •

edited