Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcat:downloadUrl as rdf:Resource (instead of literal) #864

Open
coret opened this issue Feb 12, 2024 · 3 comments
Open

dcat:downloadUrl as rdf:Resource (instead of literal) #864

coret opened this issue Feb 12, 2024 · 3 comments

Comments

@coret
Copy link
Contributor

coret commented Feb 12, 2024

From @nfreire:

I have started to work on NDE's dataset descriptions via the Sparql endpoint and I noticed that the distributions have the dcat:downloadUrl property with a literal value. DCAT defines the range of dcat:downloadUrl as rdf:Resource, so it would be preferable to have the values as resource references rather than literals. Would it be possible to make this change?

Reference: dcat:downloadUrl

@ddeboer
Copy link
Member

ddeboer commented Feb 12, 2024

We’re probably talking about dcat:accessURL here? We’re not using dcat:downloadURL.

Currently, we keep the source’s datatype. Do we want to always convert this to IRI?

@coret
Copy link
Contributor Author

coret commented Feb 12, 2024

First of all, the dataset descriptions concerned where not made by the Dataset Register, but were imported via https://archief.nl/id/dataset/foto/2-10-62ntfoto-loda-edm-distributie which had literals instead of IRI's and used dcat:downloadURL instead of dcat:accessURL both fixed now.

https://joinup.ec.europa.eu/release/how-use-accessurl-and-downloadurl recommends:

The dcat:accessURL should be used as a direct access to a file or to a page containing further instructions. It is mandatory and guarantees the existence of descriptions for the distributions. While the dcat:downloadURL is a direct link to a file. It allows software programs to use the link to get access to the file.
If only direct download access can be provided, the URL of the data should be duplicated in both accessURL and downloadURL.

I don't think we should do this duplication to dcat:downloadURL ...

Currently, we keep the source’s datatype. Do we want to always convert this to IRI?

If you look at https://schema.org/DataDownload the sdo:contentURL has range URL. So, using literals is wrong.

Of all dcat:accessURL in our triplestore 16,309 are an IRI and 498 are literals. These literals seems to be produced by only three publishing systems: opendata.picturae.com, dc4eu.nl and goudatijdmachine.nl.

So I'd propose we change our SHACL so it's emits a warning when the sdo:contentURL is not an IRI. In the same time communicate to Picturae/Vitec and DC4EU that in 3 months distributions with literals will become invalid (we then make the SHACL report an error when providing a literal). Note: the goudatijdmachine.nl datasetdescriptions will be corrected before the next crawl. We should check if there's an IRI check on dcat:accessURL.

@coret
Copy link
Contributor Author

coret commented Feb 12, 2024

Query to find distributions with a literal (instead of IRI):

PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT ?distribution ?accessURL WHERE {
    ?distribution a dcat:Distribution ;
                  dcat:accessURL ?accessURL .
    FILTER(isLiteral(?accessURL))
}

coret added a commit that referenced this issue Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants