-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add predicate to GFF3Codec to give a chance to filter out some unused attributes #1575
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lindenb Thank you, this seems like a good idea. I have a few nitpicky comments, but look good overall.
*/ | ||
public Gff3Codec setFilterOutAttribute(final Predicate<String> filterOutAttribute) { | ||
/* check required keys are always kept */ | ||
for(final String key : new String[] {Gff3Constants.PARENT_ATTRIBUTE_KEY,Gff3BaseData.ID_ATTRIBUTE_KEY,Gff3BaseData.NAME_ATTRIBUTE_KEY}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicky, but can you add spaces between these?
@@ -200,7 +220,7 @@ private Gff3Feature decode(final LineIterator lineIterator, final DecodeDepth de | |||
return attributes; | |||
} | |||
|
|||
static private Gff3BaseData parseLine(final String line, final int currentLine) { | |||
private Gff3BaseData parseLine(final String line, final int currentLine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably keep this static and explicitly pass in the predicate
@@ -217,6 +237,8 @@ static private Gff3BaseData parseLine(final String line, final int currentLine) | |||
final int phase = splitLine.get(GENOMIC_PHASE_INDEX).equals(Gff3Constants.UNDEFINED_FIELD_VALUE) ? -1 : Integer.parseInt(splitLine.get(GENOMIC_PHASE_INDEX)); | |||
final Strand strand = Strand.decode(splitLine.get(GENOMIC_STRAND_INDEX)); | |||
final Map<String, List<String>> attributes = parseAttributes(splitLine.get(EXTRA_FIELDS_INDEX)); | |||
/* remove attibutes if they fail 'acceptExtraFieldKey' */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment no longer matches the variable name.
* @param filterOutAttribute the predicate | ||
* @return this codec | ||
*/ | ||
public Gff3Codec setFilterOutAttribute(final Predicate<String> filterOutAttribute) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit torn if I like this as a fluent setter or instead if we should add this as an extra constructor parameter. That way we could make the predicate field final and reduce possible strange state changes, but it's not really that important either way.
@@ -217,6 +237,8 @@ static private Gff3BaseData parseLine(final String line, final int currentLine) | |||
final int phase = splitLine.get(GENOMIC_PHASE_INDEX).equals(Gff3Constants.UNDEFINED_FIELD_VALUE) ? -1 : Integer.parseInt(splitLine.get(GENOMIC_PHASE_INDEX)); | |||
final Strand strand = Strand.decode(splitLine.get(GENOMIC_STRAND_INDEX)); | |||
final Map<String, List<String>> attributes = parseAttributes(splitLine.get(EXTRA_FIELDS_INDEX)); | |||
/* remove attibutes if they fail 'acceptExtraFieldKey' */ | |||
attributes.keySet().removeIf(KEY->this.filterOutAttribute.test(KEY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be the less verbose:
attributes.keySet().removeIf(this.filterOutAttribute);
@@ -9,8 +9,8 @@ | |||
import java.util.Map; | |||
|
|||
public class Gff3BaseData { | |||
private static final String ID_ATTRIBUTE_KEY = "ID"; | |||
private static final String NAME_ATTRIBUTE_KEY = "Name"; | |||
static final String ID_ATTRIBUTE_KEY = "ID"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these move to the constants class?
Co-authored-by: Louis Bergelson <louisb@broadinstitute.org>
Codecov Report
@@ Coverage Diff @@
## master #1575 +/- ##
===============================================
- Coverage 69.841% 69.804% -0.037%
- Complexity 9633 9639 +6
===============================================
Files 702 702
Lines 37611 37618 +7
Branches 6108 6088 -20
===============================================
- Hits 26268 26259 -9
- Misses 8897 8907 +10
- Partials 2446 2452 +6
|
@lbergelson thank you for your review. I moved the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lindenb Looks good. Thank you.
Description
When using a GFF3Reader with DecodeDepth==DEEP, it may use a large amount of memory with attributes that will never be used ("version" ,"tag", etc...). This PR gives the GFF3Codec a chance to set a
Predicate<String>
to only keep a defined set of attributes.the private attribute of ID_ATTRIBUTE_KEY and NAME_ATTRIBUTE_KEY Gff3BaseData was removed to check if the predicate does not remove them.
a new method
setFilterOutAttribute
was added to GFF3Codecthe
static
attribute of GFF3Codec.parseLine was removedI added a test codecFilterOutFieldsTest
Things to think about before submitting: