-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added complete support for pathlib #1678
Conversation
@microsoft-github-policy-service agree |
… added ``Iterable[os.PathLike]`` support.
…rchgeo into pathlib-support
@adamjstewart There is an Here are the logs
|
@pioneerHitesh can you open a separate issue for the download problems? Maybe the URL changed. |
@adamjstewart any additional changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, it seems like 99% of the time, we accept any Path as input, then immediately convert it to a str. I think this is still useful, but it's a lot of extra work to ensure valid types and cast everything. Is torchvision's download_url the only reason we need to cast to a str? I wonder if we could just update download_url to accept any Path as input. I'm willing to submit a PR to torchvision if you think that's worth it. We could even have a wrapper in TorchGeo that just converts the path to a str and calls the torchvision version. Let me know what you think of this idea.
Also, this PR is becoming massive. Is there any way we can split this up into multiple PRs to make it easier on both of us to update/review? Maybe 1 PR to remove os.path.expanduser
, another PR to fix root={self.paths}
to paths={self.paths!r}
, another PR to add support to the datasets, and another PR to update the tests? Or would that be too much work for you?
assert isinstance(dataset.paths, str) | ||
zipfile = os.path.join(dataset.paths, "eu_dem_v11_E30N10.zip") | ||
shutil.unpack_archive(zipfile, dataset.paths, "zip") | ||
assert check_instance_type(dataset.paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just assert that this is a pathlib.Path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart We may not be able to do that because we have to assert for str
,bytes
, os.PathLike[str]
and
os.PathLike[bytes]
.
|
||
import numpy as np | ||
import rasterio | ||
import torch | ||
from torch import Tensor | ||
from torchvision.datasets.utils import check_integrity, download_url | ||
from torchvision.utils import draw_segmentation_masks | ||
from typing_extensions import TypeAlias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a dep on typing_extensions right now. Let's just remove the TypeAlias
usage, we don't actually need it.
@@ -737,3 +740,25 @@ def percentile_normalization( | |||
(img - lower_percentile) / (upper_percentile - lower_percentile + 1e-5), 0, 1 | |||
) | |||
return img_normalized | |||
|
|||
|
|||
def check_instance_type(paths: Path | Iterable[Path]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this function:
- The name is way too vague
- The docstring doesn't tell me when it's True and when it's False
- Do we really need this complex of a check?
It seems like we're really just checking if paths
is a Path
(True) or Iterable[Path]
(False). Can't we just do:
return not isinstance(paths, Iterable)
? In that case, we wouldn't even need a function. Basically, if we can figure out a reasonable way to remove this function, I would be happier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart Unfortunately isinstance(paths, Iterable)
will return True
for bytes and str
. We will have to find another way to eliminate this function. One way would be to have List[Path]
instead of Iterable[Path]
.
Then we may use isinstance(paths,List)
.
In addition to the torchvision's My suggestion would be that we add support for |
We could fix the datasets in one PR and then update the tests in a second PR after the first one is merged. |
Should I start implementing the changes? |
Yeah, let's start with PRs for |
Creating a separate issue would be better? This PR already has 75 comments. |
We can either close this PR and start new PRs or mark this one as a draft and rebase once other PRs are merged. I have no preference. |
I think we should convert this to draft. |
should i rename |
For NonGeoDatasets we want to keep using |
@adamjstewart please take a look at this PR. |
Closing in favor of #1731 |
Added
pathlib
(path object) support for the entiredataset module
Closes #1616