Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data change frequency #772

Open
ddeboer opened this issue Jul 10, 2023 · 1 comment
Open

Data change frequency #772

ddeboer opened this issue Jul 10, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@ddeboer
Copy link
Member

ddeboer commented Jul 10, 2023

Requested by @wouterbeek for the purpose of data caching: a way for publishers to indicate how often their data changes. We’re talking here about the distribution’s data, not the dataset description itself.

A proposal: to the list of dataset attributes, add a recommended property event.eventSchedule.repeatFrequency that holds the update frequency in ISO 8601 duration format.

{
  "@context": "https://schema.org/",
  "@type": "DataDownload",
  "encodingFormat": "application/sparql-results+xml",
  "contentUrl": "http://vocab.getty.edu/sparql",
  "dateModified": "2023-08-15",
  "event": {
    "@type": "PublicationEvent",
    "eventSchedule": {
      "@type": "Schedule",
      "repeatFrequency": "P1W"
    }
  }
}

The NDE Knowledge Graph, which regularly crawls datasets, could help by:

  1. if not supplied by the publisher, heuristically (bot not strictly) detecting dateModified by comparing the current number of triples with that found during the last crawl
  2. if not supplied by the publisher, store a last n of dateModifieds and (again, heuristically) deriving a repeatFrequency from that.

/cc @coret @rcdeboer

@ddeboer ddeboer added the enhancement New feature or request label Jul 10, 2023
@coret
Copy link
Contributor

coret commented Jul 10, 2023

Good idea!

Looking at schema:DataDownload, shouldn't this be publication (instead of event)?

"A way for publishers to indicate how often their data changes" feels as a promise (of a future/frequent event), not a event that occured. But, schema:eventSchedule covers this:

[...] There are circumstances where it is preferable to share a schedule for a series of repeating events rather than data on the individual events themselves. For example, a website or application might prefer to publish a schedule for a weekly gym class rather than provide data on every event. A schedule could be processed by applications to add forthcoming events to a calendar. [...]

Additionally, we have to determine how we store this schema:org based piece of information in our triplestore/KG which is DCAT based. Strangely, I only see a frequency property for the Dataset class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants