Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SharedFileInputStream should comply with spec #695

Merged
merged 12 commits into from
Dec 22, 2023

Conversation

SeongEon-Jo
Copy link
Contributor

@SeongEon-Jo SeongEon-Jo commented Jul 22, 2023

resolves #694

Added reference to master SharedFileInputStream to slaves.

By doing this, I guess it can retain reference to master SharedFileInputStream in order not to be junked by garbage collection until the reference to the last derived stream is gone.

Feel free to comment if there are some points to add or fix.

@SeongEon-Jo SeongEon-Jo changed the title resloves #694 Retain reference to master SharedFileInputStream Jul 22, 2023
Copy link
Contributor

@jmehrens jmehrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jmehrens jmehrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasj The main issue with this change is that there are finalizers declared in SharedFileInputStream and SharedFile. Should the change really be to just remove the finalizer methods as a slightly incompatible change? Or do we need to convert that code to the Cleaner API?

Really I think the SharedFileInputStream finalizer is mostly pointless since there is a finalizer on SharedFile where actual native resource exists. The only corner case of having a finalizer/cleaner on SharedFileInputStream would be in the case there is a 3rd party subclass that overrides close would see an invocation on G.C.

Makes me think that we should consider either removing both finalizers and document the behavior change. Or remove both finalizers and add a Cleaner API to just SharedFile. Then this patch could be redone such that we don't need to hold a reference to the root SharedFileInputStream.

@jmehrens
Copy link
Contributor

jmehrens commented Aug 3, 2023

@SeongEon-Jo You'll have to include a test case with this PR. Modify the hardcoded file path to use File.createTempFile.

I'm thinking this test will fail even under the new patch.

@Test
void testGrandchild() throws Exception {
    // you can use any file to test
    File file = Paths.get("src/test/resources/test_file.jpeg").toFile();
   
   try( InputStream grandChild = new SharedFileInputStream(file).newStream(0, -1).newStream(0, -1)) {
   
        System.gc();

        Assertions.assertDoesNotThrow(() ->grandChild.read());
    }
}

@SeongEon-Jo
Copy link
Contributor Author

SeongEon-Jo commented Aug 3, 2023

@jmehrens I added a test case but slightly changed the code you gave because SharedFileInputStream.newStream() returns InputStream, not SharedInputStream.

So I just casted it to SharedFileInputStream in order to invoke newStream().

Test result was "passed".

Also, I ran the same test in junit5 like below.

@Test
    void testGrandchild() throws Exception {
        File sampleFile = File.createTempFile("test", "test");

        try (InputStream grandChild = ((SharedFileInputStream) new SharedFileInputStream(sampleFile).newStream(0, -1)).newStream(0, -1)) {
            System.gc();
            Assertions.assertDoesNotThrow(() -> grandChild.read());
        }
    }

Result was also "passed".

I only added a case of grand child. But if you want me to add another cases, feel free to tell me about that.

@jmehrens
Copy link
Contributor

jmehrens commented Aug 4, 2023

Result was also "passed".

I only added a case of grand child. But if you want me to add another cases, feel free to tell me about that.

Oh I see. The reason this passes is that the un-reference parent gets garbage collected but the call to close by the finializer is a no-op because of the reference counting. All good then.

You should add your original test case too as a method to the JUnit test. You should copy one of the license headers from the other test cases.

@SeongEon-Jo
Copy link
Contributor Author

SeongEon-Jo commented Aug 4, 2023

You should add your original test case too as a method to the JUnit test. You should copy one of the license headers from the other test cases.

Added the license header and child test case.

But I want to notify that in the version before this patch, the test results were different depending on whether I invoke Assertions.assertDoesNotThrow() of junit5 or just invoke child.read().

For example, in a test code I gave in #694,

@Test
void test() throws Exception {
    // you can use any file to test
    File file = Paths.get("src/test/resources/test_file.jpeg").toFile();
   
    InputStream slaveSharedFileIs = new SharedFileInputStream(file).newStream(0, -1);
   
    System.gc();

    Assertions.assertDoesNotThrow(() -> slaveSharedFileIs.read());
}

it failed, which means slaveShareFiles.read() throws an IOException and it is what I expected.

But when I tested with the code below,

@Test
void test() throws Exception {
    // you can use any file to test
    File file = Paths.get("src/test/resources/test_file.jpeg").toFile();
   
    InputStream slaveSharedFileIs = new SharedFileInputStream(file).newStream(0, -1);
   
    System.gc();

    slaveSharedFileIs.read();
}

it passed, which means slaveSharedFileIs.read() does not throw IOException

I guess this occurs because of the code scope but not sure of the exact reason.

Of course, after my patch, no matter whether I invoke Assertions.assertDoesNotThrow() or just invoke child.read(), all test succeeded, not throwing IOException.

The reason I am notifying this is because I am not that sure if the test code I've added properly covers all the problematic situations.

Please let me know your opinion and suggest test cases I can add to cover them.

@jmehrens
Copy link
Contributor

jmehrens commented Aug 4, 2023

But I want to notify that in the version before this patch, the test results were different depending on whether I invoke Assertions.assertDoesNotThrow() of junit5 or just invoke child.read().

Put a comment on this behavior so future maintainers don't try to refactor and break the test.

@SeongEon-Jo
Copy link
Contributor Author

Put a comment on this behavior so future maintainers don't try to refactor and break the test.

Added a comment to the test class.

Feel free to tell me if there is something to add or edit in the comment.

@jmehrens
Copy link
Contributor

jmehrens commented Aug 6, 2023

@SeongEon-Jo I've been looking at the source code and I think the original behavior of this class is by design. Per:

Note that when the SharedFileInputStream is closed,
all streams created with the newStream
method are also closed. This allows the creator of the
SharedFileInputStream object to control access to the
underlying file and ensure that it is closed when
needed, to avoid leaking file descriptors. Note also
that this behavior contradicts the requirements of
SharedInputStream and may change in a future release.

That is why there are 2 code paths for closing and force closing. So in theory this issue could be closed as will not fix because the behavior is actually documented. If we were going to change the behavior so this didn't happen then we could instead pinning the root stream we could just reference count all streams instead of treating one stream as the root marked by a boolean. Meaning we never call forceClose or track a boolean to identify the root stream.

Other than the test case provided, how did you come across this issue in the wild? Is it the case of some abstraction returning a sub stream?

@SeongEon-Jo
Copy link
Contributor Author

@jmehrens

As I understand, the document you mentioned is about that closing the root stream will close any derived streams created by newStream().

It is a sensible behavior and I totally agree with that, and that's not what I am talking about.

What I am pointing out in the issue (#694) and this PR is that G.C may close the root SharedFileInputStream unexpectedly when there is nothing to keep a reference to it.

The solution I try to give is also about making the root SharedFileInputStream not be junked by G.C by pinning it as the way derived streams retain its reference.

I guess a way of the reference counting you suggested seems to solve the problem only after you remove its finalize() method because if G.C invokes the root SharedFileInputStream's finalize(), it will make it close its RandomAccessFile whatever the reference count is.

Actually, I also initially thought of the reference counting as a solution, but just made derived streams keep a reference to the root stream because removing or modifying finalize() is quite burdensome 😂

@SeongEon-Jo
Copy link
Contributor Author

SeongEon-Jo commented Aug 7, 2023

Other than the test case provided, how did you come across this issue in the wild? Is it the case of some abstraction returning a sub stream?

I found it when I make a MimeMessage with SharedFileInputStream and extract some of the DataHandler in it and get DataSource and also InputStream in it.

Let me give you a sample code to help you understand.

Firstly, make a new MimeMessage with SharedFileInputStream and extract some of the InputStream of aDataSource of a DataHandler.

public List<InputStream> extract() {

        MimeMessage message = new MimeMessage(session, new SharedFileInputStream(getFile()));

        List<InputStream> results = new ArrayList<>();
        
        foreach (mimePart of message) { 
            if (...) {
                InputStream extractedInputStream = mimePart.getDataHandler().getDataSource().getInputStream();
                results.add(extractedInputStream);
            }
        }
        return results;
    }

Note that MimeMessage actually has a SharedFileInputStream created by newStream() and there's no reference to a root stream .

protected void parse(InputStream is) throws MessagingException {
if (!(is instanceof ByteArrayInputStream) &&
!(is instanceof BufferedInputStream) &&
!(is instanceof SharedInputStream))
is = new BufferedInputStream(is);
headers = createInternetHeaders(is);
if (is instanceof SharedInputStream) {
SharedInputStream sis = (SharedInputStream) is;
contentStream = sis.newStream(sis.getPosition(), -1);
} else {
try {
content = MimeUtility.getBytes(is);
} catch (IOException ioex) {
throw new MessagingException("IOException", ioex);
}
}
modified = false;
}

Secondly, invoke the extract() and do something with the returned InputStream, getting size for example.

public List<Integer> getSize() {
        List<InputStream> extracted = extract();

        return extracted.stream()
                .map(is -> is.available())
                .collect(Collectors.toList());
    }

I found is.available() threw IOException: Stream Closed.

It's because G.C invoked finalize() of a root stream somewhen between extract() and getSize(), closing the RandomAccessFile.

As a result, extracted InputStream which was also made of SharedFileInputStream that MimeMessage has, also became unavailable.

I also checked RandomAccessFile is closed while running getSize() by debugger.

스크린샷 2023-08-07 오전 10 17 10

@jmehrens
Copy link
Contributor

jmehrens commented Aug 8, 2023

I guess a way of the reference counting you suggested seems to solve the problem only after you remove its finalize() method because if G.C invokes the root SharedFileInputStream's finalize(), it will make it close its RandomAccessFile whatever the reference count is.

I think the problem is this class is designed it such a way that it is a candidate for java.lang.ref.Reference.reachabilityFence

I would think that we don't need to pin the root stream. It just needs to be kept alive long enough such that the counter gets incremented before the root is freed. E.G:

public synchronized InputStream newStream(long start, long end) {
    try {
            if (in == null)
                throw new RuntimeException("Stream closed");
            if (start < 0)
                throw new IllegalArgumentException("start < 0");
            if (end == -1)
                end = datalen;
            return new SharedFileInputStream(sf,
                    this.start + start, end - start, bufsize);
    } finally {
         Reference.reachabilityFence(this);
    }
}

However, I would have thought synchronized keyword would be enough to keep it alive. Also I noticed that the close methods are not synchronized. This probably due to to multiple locks being acquired but I wonder if that is getting us into trouble since SharedFileInputStream::close is not synchronized it is just barging ahead of newStream. Synchronizing close might fix this issue with the root being G.C. too quickly.

Given your test case it seems that we should move forward with fixing this issue. My preference would be to use the Cleaner API on the SharedFile and get rid of the finalize method on the SharedFileInputStream. However, I would like to see just the simple reference counting working before we move to Cleaner API.

@SeongEon-Jo
Copy link
Contributor Author

@jmehrens

I did same tests (#695 (comment)) based on the sample code you gave. (Note that tests were run with original SharedFileInputStream before my patch.)

Firstly, added Reference.reachabilityFence(this); at the finally scope of newStream().

Secondly, added synchronized at the close() without adding Reference.reachabilityFence(this); at the finally scope of newStream().

Third, added both.

Test all failed.

Actually, I don't understand what synchronized has to do with G.C. Is it just for supporting synchronizing in multi-thread?

If there is something I can refer to, please let me know. so that I can add another tests.

My preference would be to use the Cleaner API on the SharedFile and get rid of the finalize method on the SharedFileInputStream.

This seems to be one of the reasonable ways to deal with this problem because it leaves file's lifecycle to the SharedFile that actually references to it, rather than to the SharedFileInputStream.

I also found that just removing finalize() of SharedFileInputStream made it all test passed.

I think if you fix the problem in that way, a completely new work needs to be done, closing this PR.

It seems to be not that hard work. Do you want me to work on it?

@jmehrens
Copy link
Contributor

jmehrens commented Aug 8, 2023

Test all failed.

I think this because we also have to get rid of the concept of the root stream being able to close all streams and simply reference count. As in get rid of root/master boolean and forceClose methods. I could be wrong though.

Actually, I don't understand what synchronized has to do with G.C.

In this case synchronized modifier means that the current thread has locked 'this' for the whole method call. Meaning that 'this' is reachable during the whole method call. Without synchronized modifier this can be G.C during the method call if there are no more references to this where this is the root stream in:

InputStream slaveSharedFileIs = new SharedFileInputStream(file).newStream(0, -1);

This is explained at the bottom of:
java.lang.ref.Reference.reachabilityFence

Is it just for supporting synchronizing in multi-thread?

Usually it is to ensure that a close stream is seen as closed to all threads.

My preference would be to use the Cleaner API on the SharedFile and get rid of the finalize method on the SharedFileInputStream.

This seems to be one of the reasonable ways to deal with this problem because it leaves file's lifecycle to the SharedFile that actually references to it, rather than to the SharedFileInputStream.

I also found that just removing finalize() of SharedFileInputStream made it all test passed.

I think if you fix the problem in that way, a completely new work needs to be done, closing this PR.

It seems to be not that hard work. Do you want me to work on it?

Let me try to get consensus on the approach.

The more I think about it the reason the finalizer exists on the SharedFileInputStream is to make the reference counting accurate. Which enables the last stream to close the native resource. Which is important if the last/root stream is strongly referenced. For sure is important on Windows due to open file locking.

@SeongEon-Jo
Copy link
Contributor Author

SeongEon-Jo commented Aug 9, 2023

I think this because we also have to get rid of the concept of the root stream being able to close all streams and simply reference count. As in get rid of root/master boolean and forceClose methods. I could be wrong though.

I see. All test passed after getting rid of master boolean and forceClose() however I tested, adding synchronized or not, adding Reference.reachabilityFence(this) or not. But also need further examination whether it is actually freed from memory.

Please be aware of this result and refer to it for future works.

Let me try to get consensus on the approach.

All right. Please let me know if there's any progress and anything that I can help.

@jmehrens
Copy link
Contributor

@SeongEon-Jo Apologizes for not getting back to you sooner on this PR. After looking this over and what has been learned from your previous code I think the approach should be:

  1. SharedFileInputStream::close should use java.lang.ref.Reference.reachabilityFence since it is not synchronized. I don't want to mark this method as synchronized at this time because I think Bill intentionally didn't mark it as synchronized. That issue can be handled in another ticket.
  2. SharedFileInputStream::newStream should use java.lang.ref.Reference.reachabilityFence. This is not needed for correctness but shows the intent that we want to make sure the parent is kept alive.
  3. Root/master boolean field should be deleted along with references to it.
  4. forceClose() method should be removed along with references to it.
  5. API documentation should be updated to remove the paragraph about the root stream closing all streams.
  6. Both finalizers should be marked as synchronized.
  7. We should avoid using the Cleaner API at this time because it is going to cause more work for this PR.

I'm thinking with these changes this class should work as you expect it to and we won't need to pin the root stream.

Thanks for all your efforts on this PR and helping me gain an understanding of what is going on with this class.

@SeongEon-Jo
Copy link
Contributor Author

@jmehrens

All sounds good in your approach.

All test passed fortunately after applying your suggestions.

Thanks for all your efforts on this PR and helping me gain an understanding of what is going on with this class.

You're welcome. I'm happy to contribute. 😎

Feel free again to let me know if you have more to add or edit.

@SeongEon-Jo
Copy link
Contributor Author

@jmehrens

I am getting an error like below during build.

[INFO] 
[INFO] --- maven-compiler-plugin:3.11.0:compile (base-compile) @ jakarta.mail-api ---
[INFO] Changes detected - recompiling the module! :dependency
[INFO] Compiling 113 source files with javac [debug release 8] to target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /home/jenkins/agent/workspace/mail_PR-695/api/src/main/java/jakarta/mail/util/SharedFileInputStream.java:[444,22] cannot find symbol
  symbol:   method reachabilityFence(jakarta.mail.util.SharedFileInputStream)
  location: class java.lang.ref.Reference
[ERROR] /home/jenkins/agent/workspace/mail_PR-695/api/src/main/java/jakarta/mail/util/SharedFileInputStream.java:[488,22] cannot find symbol
  symbol:   method reachabilityFence(jakarta.mail.util.SharedFileInputStream)
  location: class java.lang.ref.Reference
[INFO] 2 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

I surely added java.lang.ref.Reference import to SharedFileInputStream like below.

import java.lang.ref.Reference;

What should I do to resolve this?

@jmehrens
Copy link
Contributor

@SeongEon-Jo Looks like we are targeting JDK8 still in the pom.xml. Instead of calling reachabilityFence

You can do the same with:

import java.util.Objects;

Objects.requireNonNull(this); //TODO: replace with Reference.reachabilityFence

@jmehrens jmehrens changed the title Retain reference to master SharedFileInputStream SharedFileInputStream should comply with spec Sep 16, 2023
@jmehrens
Copy link
Contributor

jmehrens commented Sep 16, 2023

@SeongEon-Jo Sorry for the delay on this. Codewise, I think this is all good. Just need to do a review of how it is used in the codebase before I merge it.

I think you need to update the https://github.com/jakartaee/mail-api/blob/master/doc/release/CHANGES.txt
with something like:

CHANGES IN THE 2.1.3 RELEASE
          ----------------------------
E 695  SharedFileInputStream should comply with spec

I think you also need to put a compatibility note in: https://github.com/jakartaee/mail-api/blob/master/www/docs/COMPAT.txt
for the 2.1.3 RELEASE. Something like:

- SharedFileInputStream should comply with spec
    The root SharedFileInputStream no longer closes all streams created with the newStream.  This behavior was not compliant with the contract specified in the SharedInputStream interface which specifies that all streams must be closed before the shared resource is closed.

I finally found in my notes the explanation on why requireNonNull works: https://mail.openjdk.org/pipermail/core-libs-dev/2018-February/051312.html

Thanks for the help! Nice work!

@SeongEon-Jo
Copy link
Contributor Author

@jmehrens

Updated exactly same way you suggested.

Please let me know if there's anything which violates docs conventions 😂

I finally found in my notes the explanation on why requireNonNull works: https://mail.openjdk.org/pipermail/core-libs-dev/2018-February/051312.html

Got this. Thanks for the kind explanation!

@jmehrens
Copy link
Contributor

@SeongEon-Jo I haven't forgotten about this! I'll do my best to get this reviewed and merged. Thank you for your patience.

@jmehrens jmehrens merged commit 6c9ac50 into jakartaee:master Dec 22, 2023
3 checks passed
@jmehrens
Copy link
Contributor

Committed as:
6c9ac50

@jmehrens jmehrens self-assigned this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Master SharedFileInputStream closes unexpectedly
2 participants