Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CAN FD support #16

Merged
merged 9 commits into from
Dec 5, 2022
Merged

Add CAN FD support #16

merged 9 commits into from
Dec 5, 2022

Conversation

j4r0u53k
Copy link

No description provided.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
}

pub struct CANFrame {
tag: CANFrameType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with CanFD, but it looks like the header of a frame contains different information compared to a classical CAN frame. E.g. no RTR flag, but adds EDL, ERS and BRS.

From an API perspective wouldn't it make more sense to have a completely separate CanFdFrame struct. Then you can also easily implement functions for EDL, ERS, and BRS and drop functions e.g. for setting RTR. You'd need a separate `read_fd_frame then but i think it would make using it easier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the bus level there is EDL bit indicating a FD frame, but this bit is not present in SocketCAN data structures. In fact, the only way one can determine if a frame is classical or FD is from the size returned by read (or recvmsg) syscall (please read SocketCAN documentation: https://www.kernel.org/doc/Documentation/networking/can.txt - Remark about CAN FD (flexible data rate) support).

That SocketCAN API is also the reason why you can't have separate struct and function for reading CAN FD - you don't know whether you get classical or FD frame until you read it.

Cargo.toml Outdated
@@ -16,3 +16,4 @@ nix = "0.15"

[features]
vcan_tests = []
can_fd = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand why you put this here, I am not sure though whether it's not easier in long term to just leave out the feature and make it non optional. Might make sense if there are kernel versions that only support classical can though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this option does not complicate things. Leaving it optional means that users who do not use CAN FD at all will have those 16 B (+ tag) size structures instead of useless 72 B (+ tag). And as far I know most CAN controllers do not support CAN FD.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that it is overkill to have FD as a feature, let the user (or log file) tell whether FD should be enabled or not and risk wasting a few bytes if it isn't used. The differences between classical CAN and FD is so small anyway, so lets not complicate things more than necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that having the FD as a feature is such an overkill. Isn't this a case what the features are made for? Besides, with any further extensions of the protocol (CAN XL) I'm afraid there will be need to feature them out, as the differences from the original protocol will become more significant.

However, I don't want to block this PR from merging, so ok, let's sacrifice a few tens of bytes and CPU instructions to make the code a little bit simplier.

According to your proposals, I would remove the can_fd feature, make the data field in the CANFrameStruct 64 bytes long and let the FD-regarding methods to compile in unconditionally. If the system or controller does not support the FD mode, it will be handled at runtime by returned error.

Addresses review on socketcan-rs#16.
@simontegelid
Copy link

Great addition! dump.rs should probably be updated to handle "##"-log entries as well.

@picode98
Copy link

How much more work would be needed to merge this? This functionality would be quite useful to an organization I'm working with. I don't have a lot of Rust experience, but let me know if there's something I could do to help push this over the line.

Signed-off-by: Jaroslav Beran <jara.beran@gmail.com>
Signed-off-by: Jaroslav Beran <jara.beran@gmail.com>
@j4r0u53k
Copy link
Author

j4r0u53k commented Oct 24, 2021

I have further nothing to add to this PR. Is there anything else the reviewers want to address?

@marcelbuesing marcelbuesing mentioned this pull request Feb 15, 2022
@XVilka
Copy link

XVilka commented Nov 25, 2022

Reviving this to support CAN FD is also very important - every modern system uses it @fpagliughi.

@fpagliughi
Copy link
Collaborator

Yes, I am going to look through these existing PR's ans start testing and merging within the next week or so. This one is on top of the list.

@fpagliughi fpagliughi mentioned this pull request Nov 26, 2022
src/lib.rs Outdated
/// buffer for data
_data: [u8; 8],
/// buffer for data (classical or FD frame)
_data: [u8; CAN_FRAME_DATA_LEN_MAX]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an issue newly introduced with this PR, but does this accurately capture the behavior/memory layout induced by __attribute__((aligned(8))) in the original C header? If #[repr(C)] captures the default C behavior, the entire struct should have alignof = 4, which will probably be irrelevant for heap allocations (as the allocator just defaults to boundaries of 8 on most systems), but might be utilized by the compiler on the stack..

https://doc.rust-lang.org/reference/type-layout.html mentions that you could do #[repr(C, align(8))] on the struct, which, in combination with the padding, would probably always shift this field to the right offset?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why the __attribute__((aligned(8))) in the original C header in the Linux source is at the data field and not at the entire struct can_frame (canfd_frame)?

What if the struct begins at an multiple of 4 address that's not multiple of 8? Would it just allocate 4 more bytes at the end of the data or would it make the data begins at the boundary of 8 (which would introduce a 4 bytes gap between _res1 (len8_dlc respectively) and data?

Copy link
Contributor

@jreppnow jreppnow Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only speculate as to the reasons, both in regards to why this is used at all and why it is on the field specifically and not the entire struct. I do actually work with one of the people responsible for the CAN Linux drivers/the API, and I can ask them when we have the next meeting together (probably this week), if you are really curious.

Regarding your second question: There are two things to consider.

  1. A parent type's (a struct or union in C) alignment in C is the least common multipe of its children's alignments. That means instantantiating a struct such as this one on an address that is not a multiple of 8 is already an error (I honestly do not know if it is UB in C, I think it depends on how the code interacts with the buffer array, for example, it would certainly be UB if it is interpreted as a u64 at some point on a platform that requires u64 to be aligned at 8 byte boundaries.. UPDATE: it is UB even to just have a misaligned pointer: http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7), as it inherits alignment 8 from the field with the attribute. malloc and other C methods will generally return memory that's capable of holding the biggest possible alignment on the platform (these days usually 8 or 16 on a desktop, depending).
  2. No padding is inserted here, as the previous members conventiently already take up 8 bytes of space. If that where not the case, a minimal amount of padding would be inserted as to arrive at an offset of n * 8 off of the start of the struct. In general, a type's layout is fixed at compile time does not change depending on where it is created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that the reason to align the data field instead of the whole struct is to be completely explicit about the intent; that you could expect the data to be aligned even if the rest of the struct changed in the future.

That said, clearly the existing Rust implementation is incorrect. That alignment must be made part of the Rust struct definition. I remember an old trick where you put a zero-length array into the struct of a type to align it, like:

pub struct CanFrame {
    _id: u32,
    _data_len: u8,
    _pad: u8,
    _res0: u8,
    _res1: u8,

    _alignment: [u64; 0], 
    _data: [u8; 8],
}

This appears to align the whole struct to 8-bytes, and not just the data field. But that's OK. It would work.

So then I wondered, "What would bindgen do?", and there it its:

#[repr(C)]
#[repr(align(8))]
#[derive(Debug, Copy, Clone)]
pub struct can_frame {
    pub can_id: canid_t,
    pub can_dlc: __u8,
    pub __pad: __u8,
    pub __res0: __u8,
    pub __res1: __u8,
    pub data: [__u8; 8usize],
}

It just aligns the whole struct, landing the data field at the right place. It's the same thing that the _alignment field does, but is more explicit, so preferable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreppnow Thanks for the link to the C standard. If you can ask your colleagues about the attribute placement it would be great.

I'd go for #[repr(align(8))] on the whole struct if there is no serious reason to do otherwise.

src/lib.rs Outdated
}

pub fn is_fd(&self) -> bool {
if let CANFrameType::Fd = self.tag { true } else { false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a simple == without the if let might be simpler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change it as you propose.

src/lib.rs Outdated
let fd_frames_enable = fd_frames_enable as c_int;
let opt_ptr = &fd_frames_enable as *const c_int;
let rv = unsafe {
setsockopt(self.fd, SOL_CAN_RAW, CAN_RAW_FD_FRAMES, opt_ptr as *const c_void, size_of::<c_int>() as u32)
Copy link
Contributor

@jreppnow jreppnow Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this operation fail when only "normal" CAN is available? Otherwise it might be worthwhile setting this in general on all created CANSockets, might remove some (mildly) sharp corners for API users. Scenario: I open a socket and expect to be able to send FD frames. However, for some reason, send_frame() keeps failing.

Since this PR already makes the inner type of the CANFrame struct be compatible with CAN-FD (i.e. library users pay the cost for the memory anyway), it might make sense to just set this option in general.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may fail if the controller does not support FD mode, or maybe for other reasons as well.
I don't think setting it automatically is a good idea. Rather let user have control on what's going on and let him set the mode explicitly than let some things happens under the hood.

Your proposal (2.) in your comment #16 (comment) seems to me like a good approach.

@jreppnow
Copy link
Contributor

Sorry for intruding, came here from the linked feature request and thought I'd have a brief look.

src/lib.rs Outdated
/// > uninitialized data in a variable even if that variable has an integer type,
/// > which otherwise can hold any fixed bit pattern [...]
///
s: unsafe { std::mem::MaybeUninit::zeroed().assume_init() }
Copy link
Contributor

@jreppnow jreppnow Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct way to use MaybeUninit in this case would be as following:

  1. let mut frame: MaybeUninit<CANFrameStruct> = MaybeUninit::uninit();
  2. Perform the syscall with a pointer to the uninit (memory layout is identical in this case, and FFI-related properties are maintained).
  3. If return value is okay, check if FD or not (store in tag) and then return:
Ok(CANFrame {
    tag, 
    s: unsafe { frame.assume_init() }  
})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I wish that would work... But is there really guarantee that's not UB?

Copy link
Contributor

@jreppnow jreppnow Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your specific concerns? To me, there are 3 potential sources of UB, and I will briefly discuss why I do not think that they apply:

  1. Creating a value with an illegal byte layout is UB (example in the documentation https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html is a reference, which may never be entirely null on the byte level). This is not relevant for our type, as all contained types accept all possible byte layouts (they are primitives that are not bool or char, which have restrictions).
  2. Using MaybeUninit::assume_init() to avoid invariants of a type, for example established by a constructor, can lead to UB. This is not relevant, as this is a POD without any actual semantics and particularly, also without a Drop implementation, as it only a C wrapper.
  3. Creating a variable from memory that has not been initialized leads to its value not being "fixed" and interacting with such a variable is UB. Here, we rely on the guarantee of the sys call that the C code actually properly initializes all the fields (when the return value indicates success). Personally, I think that is a reasonable assumption, although I have to admit I am not a 100% sure about the padding bytes. However, one might argue that we are not actually reading from or writing to these values. It is fair to argue that looking only at the header, with the underlying implementation prone to change, we cannot actually guarantee this. In this case, we would have to go with an initialized version of the struct.

As long as the type is wrapped in a MaybeUninit, it is fine for the memory to not be initialized, i.e. violate the above requirements. So the only risk occurs on the assume_init(). Btw, in C code, it is perfectly fine to write to uninitialized memory (reading from it is also UB in C).

An important distinction to take not of is that I am explicitly not making the type of the variable be MaybeUninit<CANFrame>, but MaybeUninit<CANFrameStruct>. A type containing an Uninit type might have different layout than the same type with just the nested type in it. This, however, would not even compile here as CANFrame is not generic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is about the compiler assumptions. I don't doubt, that the C function called via FFI actually do the right thing with the pointed memory. But does the Rust compiler really get to know that the memory got written and doesn't do some kind of reordering resulting in returning invalid values regardless the actual memory content?

From the examples in the documentation (https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html) they always do some kind of write operation before claiming assume_init on the memory. But can a FFI call be counted as such write?

@jreppnow
Copy link
Contributor

I've thought about this for a while and as a library user, I think it's not very nice to be forced to use the bigger CanFdFrame object if I am only dealing with normal CAN frames. For a single frame, this might not be an issue, but if you for example store a couple of them, say in a vector, this completetely ruins cache efficiency on iteration for no reason, if you are not actually using FD.

I would propose having both C structs and changing the API of CANSocket in such a way that it either

  1. has methods that accept both frame formats (easy to implement, some traps for users)
  2. starts out as a normal CAN socket, which only produces and takes normal CAN frames, and has an upgrade method that consumes self, sets the corresponding sockopt and turns it into a CAN-FD socket which only takes and produces CAN-FD frames. That way, users may not accidentally specify a too small frame as in (1.) (but the implementation gets a bit more complicated and there might be some unavoidable code duplication in the library)

@j4r0u53k
Copy link
Author

j4r0u53k commented Nov 28, 2022

I've thought about this for a while and as a library user, I think it's not very nice to be forced to use the bigger CanFdFrame object if I am only dealing with normal CAN frames. For a single frame, this might not be an issue, but if you for example store a couple of them, say in a vector, this completetely ruins cache efficiency on iteration for no reason, if you are not actually using FD.

I would propose having both C structs and changing the API of CANSocket in such a way that it either

1. has methods that accept both frame formats (easy to implement, some traps for users)

2. starts out as a normal CAN socket, which only produces and takes normal CAN frames, and has an upgrade method that consumes `self`, sets the corresponding sockopt and turns it into a CAN-FD socket which only takes and produces CAN-FD frames. That way, users may not accidentally specify a too small frame as in (1.) (but the implementation gets a bit more complicated and there might be some unavoidable code duplication in the library)

That's a good point. I'd vote for 2. as that solution seems cleaner and more Rust-idiomatic to me (I'm not an Rust expert tho).
The upgraded CAN-FD socket would still need to support both type of frames (since you can send and receive both normal and FD frames over a FD socket). I can let the CAN-FD socket work with the current implementation of CANFrame with the tag.

I'd also like to point out that there might be other implementations of socketcan, not only in the Linux (AFAIK there actually is one in the NuttX RTOS). It would be also nice to allow users work with only normal CAN frames if they don't want to use FD HW at all.

@abalmos
Copy link

abalmos commented Nov 28, 2022

Just to throw it out there, I created a J1939 (common in large trucks, construction, and agriculture) rust layer over SocketCAN's J1939 implementation. It has many of the same issues as being discussed here with respect to CAN-FD. Larger payloads (via transport protocols) and perhaps other features like address claiming (managed by kernel) and alternative filtering options. The rust wrapper is "complete" in the sense that it works for my application. It also uses CMSG to pull out hardware timestamps, and other J1939 specific data like priority and destination addr/name. It could use some tests.

Ultimately, I felt like SocketCAN CAN sockets are sufficiently different enough from SocketCAN J1939 sockets that trying to force them into the same struct was hurting the API ergonomics. Instead, I created a J1939Socket struct which focuses on safely wrapping the J1939 Linux API--similar to this crate for CAN. In the end, Linux/SocketCAN is doing all the work, but the semantics between the two mode are different.

J1939-22 adds CAN-FD support to J1939, but that is not supported at this time (nor in the Kernel either, I believe).

Most of the changes to libc, etc. needed for my layer are now upstream, and I should probably actually publish it 😆. That said, there are some commonalities between CAN / J1939 (the SocketCAN part). Maybe they should live together and share those common things (like CMSG support and safe wrappers around opening the socket, binding, and flag compatibility)?

Sorry if this is off-topic, but seemed related to the question at hand: addressing how to deal with the various modes of SocketCAN.

@fpagliughi
Copy link
Collaborator

I'm also of the opinion that we should have separate structs for normal frames and FD frames. That's the way the C API and kernel seem to treat it, so there's a nice parallel there as well, but of course with C, since the structs overlap, you can cheat to keep things simple.

For @jreppnow 's option #2, would that mean we have a separate CANSocket and CANFDSocket?

That seems workable if a little unwieldy. But since an FD socket should be able to work with both normal and FD frames, it would probably want:

fn read_frame(&self) -> Result<CANFrame> { ... }
fn write_frame(&self, frame: &CANFrame) -> Result<()> { ... }

fn read_fd_frame(&self) -> Result<CANFDFrame> { ... }
fn write_fd_frame(&self, frame: &CANFDFrame) -> Result<()> { ... }

But if we do that, it makes me think we can just add those two functions, read_fd_frame() and write_fd_frame() to the regular CANSocket, which would fail if the socket is not in "FD" mode.

Then, of course, it would just need a function to enable "FD" mode, like:

fn enable_fd(&self) -> Result<()> { ... }

That would just set the CAN_RAW_FD_FRAMES socket option.

OK, so I went full circle on this. I'm thinking that a single CANSocket type that can enable/disable FD mode, and read/write separate CANFrame and CANFDFrame types is the better way to go.

@fpagliughi
Copy link
Collaborator

fpagliughi commented Nov 28, 2022

So I was thinking... If we do have two separate frame types, when you read from an FD-enabled socket, what do you do if a normal (non-FD) frame arrives? I suppose you could put it in a CANFDFrame and mark it as non-FD? Meaning is_fd() returns false.

Or should the read_fd_frame()and write_fd_frame() functions be able to handle both types?

If we want to get a little freaky, we could wrap the frame types in an enum with some implicit conversions:

#[derive(Debug)]
pub enum CanAnyFrame {
    Normal(CanFrame),
    Fd(CanFdFrame),
}

impl From<CanFrame> for CanAnyFrame {
    fn from(frame: CanFrame) -> Self {
        CanAnyFrame::Normal(frame)
    }
}

impl From<CanFdFrame> for CanAnyFrame {
    fn from(frame: CanFdFrame) -> Self {
        CanAnyFrame::Fd(frame)
    }
}

Then the socket could take and return either type within the enum:

impl CanSocket {
    pub fn read_fd_frame(&self) -> Result<CanAnyFrame> {
        // do the read
        if is_fd {
            Ok(CanAnyFrame::Fd(frame))
        }
        else {
            Ok(CanAnyFrame::Normal(frame))
        }
    }

    fn write_fd_frame<F>(&self, frame: F) -> Result<()>
    where
        F: Into<CanAnyFrame>
    {
        match frame.into() {
            CanAnyFrame::Normal(frame) => {
                 // Writing normal frame...
            },
           CanAnyFrame::Fd(frame) => {
               	// Writing FD frame...
           }
        }
        Ok(())
    }
}

I haven't totally convinced myself that this is the best way to go. The write works well, but the read gets a little verbose, having to match every time. But it does keep a clean separation of the frame types.

@jreppnow
Copy link
Contributor

jreppnow commented Nov 28, 2022

I got kinda mentally stuck on this and decided to create a draft version for idea 2., see https://github.com/jreppnow/socketcan-rs/tree/can-fd-alternative, very much wip, but it does compile and work (I commented out the dump part for now though..).

Things to pay attention to:

  • upgrade_to_fd and downgrade_to_2_0 functions
  • I used some default trait functions to avoid some duplication between CanFrame and FdFrame, which kinda turned out to be going in the direction of the HAL thing, interestingly enough
  • I adjusted the tests, so you can see the API in action a little bit
  • I kept the general settings like nonblock and timeout on the actual CANSocket struct, and pass out references to that to avoid duplicating that code. The write_frame (writing a 2.0 frame) function is identical between FD and 2.0, so I also put it in CANSocket and just call it from the respective implementations

The branch also contains my answer: The FD socket actually does pretty much what you describe, it can either return an fd frame, a 2.0 frame or an IoError.

Only difference is I used TryFrom when going from FdFrame to CanFrame, since this may fail.

@fpagliughi
Copy link
Collaborator

Hey @j4r0u53k, first of all, thanks so much for staying on this PR after it sat for so long! And apologies that we're all suddenly throwing different and competing ideas at you.

Looking at your draft branch, most of the ideas and implementation are there, but we need to get the wrapping and some naming issues together.

The more I think about it, the more I believe that there should only be one socket type, CanSocket, which can be switched into and out of an "fd mode."

So let's drop the Can2_0 and Fd socket types and consolidate the impl blocks into one. We don't need the upgrade/downgrade functions. Just make the set_fd(bool) function public. Maybe call it set_fd_mode(bool)?

The FdRead type is just what I was talking about, but could use a different name, because I see it being a writable type as well, like I described above.

In that case your Fd::read_frame() could become CanSocket::read_fd_frame(). Easy enough.

Don't worry too much about errors and error types. I think those will all be getting updated soon anyway.

Finally, it looks like you renamed the types from CanSocket to CANSocket and CanAddr to CANAddr, etc. I definintely don't want to do that. It's an unnecessary breaking change, would make it difficult to merge in some of the other outstanding PR's, and I actually think it's less readable. I was thinking it more likely that we should consider dropping the "Can" altogether since these things are namespaced in the module(s), but that is something to consider as a separate issue once all the other PR's are in.

@jreppnow
Copy link
Contributor

jreppnow commented Nov 29, 2022

Also sorry from me for throwing ideas around and just starting to implement stuff, @j4r0u53k!

@fpagliughi As I've stated, I prefer the solution with separate types, but in the end, both will work, and you as the maintainer get to decide. I can implement the desired changes (including the naming thing, but the wrong CamelCase really irked me...), or I can let @j4r0u53k pick the stuff from my branch that they want to use and finish the work here. Don't wanna "steal" the PR after all :) Please give me a hint as to your preference (mostly directed at @j4r0u53k ^^).

One thing that we need to consult on is the safety of the FFI handling in the read_frame function of the (current) FD struct. The problem is that I assume that in the case of a standard (CAN 2.0) frame, the kernel method is only going to write the first 16 bytes of the struct and leave the rest unwritten - which means that assume_init on this struct before calling into is UB. My current solution would be to use an even more evil method, std::mem::transmute_copy to transmute the MaybeUninit<CANFDFrame> either directly into a CANFrame or into a MaybeUninit<CANFrame> and then assume_init on it. I believe that either would be defined, but if there are differing opinions, we might need to find another solution.

@fpagliughi fpagliughi mentioned this pull request Nov 29, 2022
@fpagliughi
Copy link
Collaborator

Oh, my apologies, I got you two mixed up! The j's fooled me.

Sorry if I'm misunderstanding specifics... I agree we should have separate frame types, but think we should have one socket type. I think that's simpler to implement and maintain, and more closely models what the kernel is doing with an on/off socket option "mode" switch.

So kinda the socket from @j4r0u53k 's original PR here, with the frames from @jreppnow 's alternate, and then fix up the read and write functions like I suggested?

For the read_frame() function, I don't think we need to get too complicated. We obviously need to send a pointer to a CanFdFrame to the C read() function and then check how much was read. If it's 16 bytes, then convert it to a standard frame and return that.

If we started with a properly zeroed-out frame then any successful read should give us a valid Rust struct. So we really don't need to play with unsafe memory access here. Maybe use a default() to give an empty frame, like:

impl Default for FdFrame {
    fn default() -> Self {
        _id: 0,
        _data_len: 0,
        _flags: 0,
        _res0: 0,
        _res1: 0,
        _data: [0u8; 64],
    }
}

Create one of those, then send a pointer to it to the read() function, and if the read succeeds with even a 16-byte result, the struct should be valid (with the rest of the data array at a known value of zero).

@fpagliughi
Copy link
Collaborator

Actually, if it's any easier for all of us, we can do this one step at a time...

  • Land this PR as it is. (I can fix the type names back to CanSocket, etc)
  • Maybe @jreppnow can create a new PR on top of it to split the frame into the two separate types, by cutting and pasting from the 'alternate' work
  • Then we can look to getting Embedded HAL Traits #24 rebased and landed on top of all of this.
  • Then lots of cleanup

But either way, this is @j4r0u53k 's PR, so it's their decision to make as for the next step.

@j4r0u53k
Copy link
Author

j4r0u53k commented Nov 30, 2022

Thanks for the contributions to this PR. I need some time to think about the topics being discussed though.

There is a good reason introduced by @jreppnow to let users work just with the smaller 16B frames when they don't want to utilize FD at all. And I believe that this choice should be possible to make at run-time. But I don't like the idea of upgrading/downgrading the socket.

I recognize these two main options from what you have proposed so far:

  1. Either just have functions to create either CanSocket or CanFdSocket. With a help of traits and some default implementations, most of the code duplication could be possibly avoided.
  2. Keep only one CanSocket type and have read_frame, read_fd_frame functions and their write counterparts there. But what if someone on a FD-enabled socket call read_frame and there was a FD frame in the buffer? Should the function return an error? In fact there is no point in using the simple read_frame when the socket is in FD mode since a FD frame can arrive at any time. Such API makes me feel it's not very "correct" then.

I'm also thinking if we can avoid using the Rust enum for FD-compliant socket. What if there was just a flag distinguishing the frame type. You always work with the whole 72 B structure after all, at least at init. However, I realize that enum is more Rust idiomatic and nice in general.

@fpagliughi
Copy link
Collaborator

I don't have any statistics, but I'm guessing that most people would still be using standard CAN 2.0 frames. So I wouldn't want to introduce an overhead for them. FD should be a zero overhead feature for people not using it.

As for having one or two different socket types, I'm leaning toward having just the one CanSocket, even with the issues you mention. But I'm starting to get the feeling you folks believe having the two separate ones would be safer. I'm willing to be outvoted on this!

But I agree, the upgrade/downgrade isn't very idiomatic. You should be able to make a CanFdSocket on it's own the same way you can make a CanSocket. But it does seem natural that you could at least convert a regular one to an Fd:

impl From<CanSocket> for CanFdSocket { ... }

But definitely, the CanFdSocket should seamlessly read and write either frame types. As I showed above, a single write function could accept either type when wrapped in an enum. For the read, the enum is a little painful, but properly type-safe. Maybe having a choice of read functions, which are both "safe" but have different return types:

fn read_frame(&self) -> Result<CanAnyFrame> { ... }
fn read_fd_frame(&self) -> Result<CanFdFrame> { ... }

In the case of read_fd_frame() the returned frame is always an 'Fd' type, but can convert to a CanFrame using TryFrom if it was, in fact, a standard frame.

@jreppnow
Copy link
Contributor

jreppnow commented Nov 30, 2022

Below are some more thoughts from me. As an aside, I agree with the notion that we should probably find a way to start landing parts of this PR, especially since I presume it's either racing with or blocking the other pending ones?

First, I wanted to clearly reiterate what I see as the advantages of the 2-struct solution over having just one socket type (based on the current library feature set, there might be more in the future):

  1. We can safely (read: with reasonable semantics) offer the users a read method that only returns a CAN2.0 frame, since with the 2-struct approach we can be sure at compile-time that if we have a Can2_0 struct, the contained socket is not actually in FD mode. The problem is that if we provide it otherwise, the semantics are somewhat arbitrary - either it's always an error or it's only an error if the received FD frame is not convertable into a 2.0 frame.
  2. We can prevent the users at compile-time from trying to send or receive FD frames when they have not "upgraded" yet. This is nice in my opinion, but unlike with case 1., the semantics of trying something like this would be clear (just return an error), so it's not as crucial.

Having a type system powerful enough to model stuff like this has always been an appeal of Rust to me, but I can also understand the desire to reduce complexity (both in API as well as in implementation).

Regarding the feedback:

  • Criticism of upgrade_to_fd and downgrade_to_2_0 functions: I think it's important to provide an easy method to switch modes on an existing socket. My personal usecase is actually as follows: Create a 2.0 socket, see if I can upgrade it, if yes: Work with FD frames, otherwise keep the 2.0 socket and just work with normal frames (in a distinctly different way). There are protocols that mandate this kind of behavior as well. However, I hear the criticism with regards to these functions. An alternative would be implementing TryFrom for both structs from the other respectively. On thing that I would like to keep is to return the old struct in case of an error (together with the io::Error), so that one does not have to create a new socket in case the operation fails but one still wants to continue sending frames.
  • There should be a direct way to construct an Fd (a socket enabled for sending and receiving FD frames): Agreed. Just for reference, the reason I built it in the way I did in the draft is because I got kinda locked in mentally on mimicking the way the Linux kernel socket works as closely as possible. However, on second thought, there is not need to burden the users with this additional complexity.
  • One additional idea, albeit probably slightly controversial: To make the differences between the types even more seemless for the users, we could have them implement Deref and DerefMut into the contained socket type (CanSocket in my case), so they essentially provide all of its functions like set_nonblocking directly. However, I realize that these traits are (with good reason) considered to be only for smart pointers and similar types by some. At the moment, it's simple getters: socket() and socket_mut()

@jreppnow
Copy link
Contributor

jreppnow commented Nov 30, 2022

Oh, one more thing that I sort of snuck into my draft but is pretty important: The close(&self) in its current form should definitely go or at least, it should be changed to take close(self) with an empty body, relying on the Drop impl to actually close the socket. Reason: At the moment, it's completely broken: It allows the user to close the socket when they are still holding on to our type - the RawFd is not reset to -1 or something like that, and there are no checks anywhere for that anyway. Closing the socket should consume our wrapper and make it unusable. With this, I'm frankly perfectly fine with breaking backwards compatability, since this might very well be a very broken bit of code somewhere in someone's code.

If everyone is fine with that, I would just open a quick PR for this and the alignment issue of the C struct, so that the changes do not pollute this PR?

(Edited out the mild C++ rant)

Nevermind: close(&self) was actually private in the original version, so it's not that big a deal, bit unnecessary maybe, but not really any kind of danger - was not used outside of Drop impl anyway. Sorry about this confusion..

@fpagliughi
Copy link
Collaborator

fpagliughi commented Nov 30, 2022

Oh, one more thing...

@jreppnow , please try to stay on-topic in this FD pull request! Open another issue about the close() semantics and copy-paste your comment in there. (But I mostly agree, and a number of things in this crate are completely broken, which will be fixed).

OK.

I presume [this PR] either racing with or blocking the other pending ones?

Yes. We're waiting on this one, after which we'll look to rebase and merge #23. That one has a lof of code and file refactoring, so we really want to get this one in first. And my time is limited to work on CAN stuff (a few weeks), before I need to move onto other projects.

implementing TryFrom for both structs from the other respectively. On thing that I would like to keep is to return the old struct in case of an error (together with the io::Error),

Yes. Completely agree.

There should be a direct way to construct an Fd (a socket enabled for sending and receiving FD frames):

Just as a side note... as I mentioned before, please try to keep the naming conventions that already exist in the crate. So as the existing socket is called a CanSocket, the new one should be called CanFdSocket or similar.

To the point, I'm kinda OK with the multiple socket model, because it sort of mimics how other libraries (in various languages) split out network TCP sockets into "connector", "acceptor", and "streaming" socket types. But, on the other hand, I do personally just see it as a mode in a specific socket (The OS doesn't create a new file handle, etc. It just sets an option on the existing handle).

One additional idea, albeit probably slightly controversial:

This sounds like just the kind of thing that would be appropriate for a totally separate PR!

We should try to keep PR's smaller and targeted, especially if they would break the API or be up for discussion.

@jreppnow
Copy link
Contributor

Sorry, I really do get side-tracked too easily ^^

From my point of view, regardless of whether we go for single socket or multi socket, the relevant questions are cleared and it's just up to who is going to implement the final push and clean-up - @j4r0u53k obviously has first dips, but as I mentioned, I'm glad to further contribute if you want me to/don't have much time atm.

@j4r0u53k
Copy link
Author

Maybe having a choice of read functions, which are both "safe" but have different return types:

fn read_frame(&self) -> Result<CanAnyFrame> { ... }
fn read_fd_frame(&self) -> Result<CanFdFrame> { ... }

In the case of read_fd_frame() the returned frame is always an 'Fd' type, but can convert to a CanFrame using TryFrom if it was, in fact, a standard frame.

@fpagliughi One read function on CanFdSocket would be enough: fn read_frame(&self) -> Result<CanAnyFrame> { ... }. That one would give you the right frame (enum variant) every time (unless the syscall returns error). You can't convert CanFdFrame to CanFrame, because it is just not a CanFrame.

@j4r0u53k
Copy link
Author

j4r0u53k commented Dec 1, 2022

@jreppnow About upgrade / downgrade to from FD I was trying to think of an actual use case, but I couldn't come up with any that would require to do dynamic upgrade or even downgrade on demand. In fact you have rather a CAN network of known devices that you keep some static configuration for. A potential user of the library knows what hardware and protocols does he use in his network.

With socketcan in particular, you even have to know before opening the socket, whether this CAN controller supports FD mode and set up bitrate and other config on the link layer when creating the interface. So you can also easily pass this information, whether FD or not, to your app and create exact one of those types directly.

If you need to detect the controller's FD capabilities on fly, you can check the interface for MTU as the docs state (https://docs.kernel.org/networking/can.html#raw-socket-option-can-raw-fd-frames).

I'm looking at the setting of the CAN_RAW_FD_FRAMES socket option as just an runtime API adjustment on the kernel level (from now on use these different rules, structures and sizes), which translated to the Rust world seems like saying a similar: use these different structures and interface. This is something that IMHO belongs to the socket contructor.

@j4r0u53k
Copy link
Author

j4r0u53k commented Dec 1, 2022

Since you claim this PR is a blocker I would then do the following:

  • implement CanSocket working with CanFrame structures
  • implement CanFdSocket working with CanAnyFrame (an enum of CanFrame, CanFdFrame)
  • some cleanup and fixes of the struct alignment
  • try to find out the most optimal initialization (using std::mem::MaybeUninit) that would surely not be an UB

Hopefully by the end of this week (@fpagliughi is it ok for you?).

If the code passes the review and the PR is merged, address any following issues in any of the following PRs (adding upgrade and downgrade if it helps anybody :P)

@fpagliughi
Copy link
Collaborator

@j4r0u53k , that sounds great. Thanks for putting a rush on it. If you can't get everything completely cleaned up, just push what you have, because the whole of it will be revised a few times over the next couple of weeks!

As for the last item, initializing with MaybeUninit, look at my message from yesterday about using a (properly) zeroed-out Default implementation. So, create a valid, though empty, frame with length and all the data as zeros, then send a pointer to that to the read() function. That way, on successful return, the frame will have a fully-known value. The data in the first n bytes, padded with zeroes for the rest.

But I'm making an assumption that the CAN kernel drivers don't zero fill the frames; they only fill the len number of bytes and leave the rest as-is. I could be wrong. But we could always optimize later if that were the case.

@jreppnow
Copy link
Contributor

jreppnow commented Dec 1, 2022

@j4r0u53k Regarding upgrade/downgrade: This is definitely not a feature strictly required to make this PR work, so afaiac you could just leave it out in your work entirely, and there can be a future PR to discuss its merits and possible implementations (edit: which I now noticed you also suggested, so yeah, let's go with that).

Thank you again for your implementation efforts!

@fpagliughi
Copy link
Collaborator

Oh, and, yes, just add:

#[repr(align(8))]

To the CanFrame and CanFdFrame structs to fix the alignment issue.

@fpagliughi fpagliughi mentioned this pull request Dec 2, 2022
@fpagliughi
Copy link
Collaborator

Note the definition of the frames from the libc crate, which just verifies the alignment:

src/unix/linux_like/linux/align.rs
125-
126-            // linux/can.h
127-            #[repr(align(8))]
128-            #[allow(missing_debug_implementations)]
129:            pub struct can_frame {
130-                pub can_id: canid_t,
131-                pub can_dlc: u8,
132-                __pad: u8,
133-                __res0: u8,
134-                __res1: u8,
135-                pub data: [u8; CAN_MAX_DLEN],
136-            }
137-
138-            #[repr(align(8))]
139-            #[allow(missing_debug_implementations)]
140:            pub struct canfd_frame {
141-                pub can_id: canid_t,
142-                pub len: u8,
143-                pub flags: u8,
144-                __res0: u8,
145-                __res1: u8,
146-                pub data: [u8; CANFD_MAX_DLEN],
147-            }
148-        }

In order to be up to date with master branch.
@j4r0u53k
Copy link
Author

j4r0u53k commented Dec 4, 2022

Note the definition of the frames from the libc crate, which just verifies the alignment:

src/unix/linux_like/linux/align.rs
125-
126-            // linux/can.h
127-            #[repr(align(8))]
128-            #[allow(missing_debug_implementations)]
129:            pub struct can_frame {
130-                pub can_id: canid_t,
...

@fpagliughi Thanks for the link.

I'm actually thinking whether rather #[repr(C, align(8))] is the correct type layout.

The docs (https://doc.rust-lang.org/reference/type-layout.html#the-default-representation) says about the Default representation that

There are no guarantees of data layout made by this representation.

However, it also states that

Nominal types without a repr attribute have the default representation.

Does it mean that types with repr attribute have any other representation than Default? And if so, is there is any distinction between #[repr(C, align(8))] and #[repr(align(8))]?

@fpagliughi
Copy link
Collaborator

Oh, apologies for the confusion. My assumption is that in the libc crate, nearly everything uses #[repr(C)].

So you're right, it's both. I've seen it separate, like bindgen created:

#[repr(C)]
#[repr(align(8))]

but if you can combine them into one, #[repr(C, align(8))], then that's fine.

CanNormalSocket is provided for usage without FD mode. Its
interface handles only 16 B classical frames.

For utilization of FD mode an user has to create an instance of
CanFdSocket. FD mode is set within the socket construction.
CanFdSocket operates with CanAnyFrame compound enum type.
Socket `read' method always returns the right variant.

In case of write the user usually knows a priory which type
of frame will be sent, so no special handling is needed.
For easy conversion to CanAnyFrame, `From' trait is implemented
on CanNormalFrame and CanFdFrame, so user can just call `f.into()'
on these.

[socketcan-rs#16]
@j4r0u53k
Copy link
Author

j4r0u53k commented Dec 5, 2022

@fpagliughi It's ready for a review.

@fpagliughi
Copy link
Collaborator

@j4r0u53k There are a few things I have reservations about. I don't know if using a trait on the frames is a good idea, since I think they should be "pure" data. But a trait on the sockets is fine. And there are a few naming issues.

What I didn't realize (though I should have) is that this is diff'ed against the old master (v1.x), so apologies if I accused you of changing things that you hand't changed! I was visually comparing it to the current master.

But rather than try to get this perfect, what I think we should do is land this and #24 now, and then treat master as WIP moving toward the v2.0 release. There are a number of suggestions on low-level implementation (#30 & #28) which would drastically change things here anyway, so lets make some forward progress, and then see how things settle.

But since this may be a messy merge into master, I'll land the other PR first, then try to manually bring this in on top of it.

@fpagliughi fpagliughi merged commit 0cad0a5 into socketcan-rs:1.7.0 Dec 5, 2022
@j4r0u53k
Copy link
Author

j4r0u53k commented Dec 5, 2022

@fpagliughi Thanks for merging this PR.

Yes, I branched from 1.7.0, because it looked more alive than master back then.

I was tentative about using trait for the frames as well, but the language doesn't provide many other ways to reuse code. I would probably have to duplicate common code otherwise... Anyways it might be handy to use &impl CanFrame when you only need the ID and the data without passing the whole enum and doing the match.

I was thinking about some sort of config accessor helper struct that would get temporarily created and captured the struct or its fields by (mut) refs and provided the read/modify methods just for its lifetime... but I couldn't come up with any satisfying solution. Such accessor struct would be better to have shared for both type of frames, so some trait would be needed anyways. And this is similar to what a trait roughly does after all.

Or let all the CanFrame fields pub, and use functions operating on the individual fields, but I find this really ugly.

@fpagliughi
Copy link
Collaborator

Hey @j4r0u53k! Yeah, I completely missed the 1.7.0 branch when I started on this. Or maybe I just forgot about it. But no worries. It's a really messy merge of master (with the Embedded HAL support) and 1.7.0, with this FD pull request. But I hope to have it done this evening... at least a preliminary, messy, combination that people can have a look at.

Any never mind about the Frame trait issue. There was already a Frame trait from the Embedded HAL, and yours was actually only slightly different from theirs, so I'm just redefining yours in terms of theirs, and everything works out. Plus I don't think it actually has the runtime penalty that I was imagining.

fpagliughi added a commit that referenced this pull request Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants