Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Metadata to --internal database #2343

Merged
merged 8 commits into from
Jun 11, 2024

Conversation

asg017
Copy link
Contributor

@asg017 asg017 commented May 6, 2024

Work-in-progress, not ready for review. refs #2341

  • Removes .metadata() and ._metadata() fields from the Datasette class
  • Removes /-/metadata.json endpoint
  • Removes metadata "fallback"
  • Removes the get_metadata() hook
  • Adds the following methods to Datasette:
    • get_instance_metadata()
    • get_database_metadata()
    • get_resource_metadata()
    • get_column_metadata()
    • set_instance_metadata()
    • set_database_metadata()
    • set_resource_metadata()
    • set_column_metadata()
  • Adds the following tables to internal.db
    • datasette_metadata_instance_entries
    • datasette_metadata_database_entries
    • datasette_metadata_resource_entries
    • datasette_metadata_column_entries

TODO

  • Support metadata.json
  • Don't change facet logic, move facet_size to synchronous config
  • Docs!

馃摎 Documentation preview 馃摎: https://datasette--2343.org.readthedocs.build/en/2343/

@asg017 asg017 marked this pull request as ready for review May 20, 2024 16:37
@simonw simonw self-requested a review May 21, 2024 16:37
return dict(rows)

async def set_instance_metadata(self, key: str, value: str):
# TODO upsert only supported on SQLite 3.24.0 (2018-06-04)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inspired me to survey people to find out what SQLite versions are out there in the wild:

https://twitter.com/simonw/status/1800198142002115029
https://fedi.simonwillison.net/@simon/112593180340389914

datasette/app.py Outdated
Comment on lines 446 to 460
async def initialize_internal_database(self):
await self.get_internal_database().execute_write_script(
"""
CREATE TABLE IF NOT EXISTS datasette_metadata_instance_entries(
key text,
value text,
unique(key)
);

CREATE TABLE IF NOT EXISTS datasette_metadata_database_entries(
database_name text,
key text,
value text,
unique(database_name, key)
);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should live in datasette/utils/internal_db.py since that's where the other logic that creates tables and populates them lives.

@@ -646,57 +711,113 @@ def _metadata_recursive_update(self, orig, updated):
orig[key] = upd_value
return orig

def metadata(self, key=None, database=None, table=None, fallback=True):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so happy to be rid of the synchronous (but undocumented) .metadata() method.

@simonw
Copy link
Owner

simonw commented Jun 10, 2024

Ran Datasette like this (with the datasette-visible-internal-db plugin installed):

datasette -m metadata.json fixtures.db --internal _internal.db  -p 8003 --root

CleanShot 2024-06-10 at 09 22 47@2x

Here are the tables in _internal.db (output using SQLite dump):

CREATE TABLE datasette_metadata_column_entries(
                database_name text,
                resource_name text,
                column_name text,
                key text,
                value text,
                unique(database_name, resource_name, column_name, key)
              );
INSERT INTO "datasette_metadata_column_entries" VALUES('fixtures','roadside_attractions','name','description','The name of the attraction');
INSERT INTO "datasette_metadata_column_entries" VALUES('fixtures','roadside_attractions','address','description','The street address for the attraction');
CREATE TABLE datasette_metadata_database_entries(
                database_name text,
                key text,
                value text,
                unique(database_name, key)
              );
INSERT INTO "datasette_metadata_database_entries" VALUES('fixtures','description','Test tables description');
CREATE TABLE datasette_metadata_instance_entries(
                key text,
                value text,
                unique(key)
              );
INSERT INTO "datasette_metadata_instance_entries" VALUES('title','Datasette Fixtures');
INSERT INTO "datasette_metadata_instance_entries" VALUES('description_html','An example SQLite database demonstrating Datasette. <a href="/login-as-root">Sign in as root user</a>');
INSERT INTO "datasette_metadata_instance_entries" VALUES('license','Apache License 2.0');
INSERT INTO "datasette_metadata_instance_entries" VALUES('license_url','https://github.com/simonw/datasette/blob/main/LICENSE');
INSERT INTO "datasette_metadata_instance_entries" VALUES('source','tests/fixtures.py');
INSERT INTO "datasette_metadata_instance_entries" VALUES('source_url','https://github.com/simonw/datasette/blob/main/tests/fixtures.py');
INSERT INTO "datasette_metadata_instance_entries" VALUES('about','About Datasette');
INSERT INTO "datasette_metadata_instance_entries" VALUES('about_url','https://github.com/simonw/datasette');
CREATE TABLE datasette_metadata_resource_entries(
                database_name text,
                resource_name text,
                key text,
                value text,
                unique(database_name, resource_name, key)
              );
INSERT INTO "datasette_metadata_resource_entries" VALUES('fixtures','simple_primary_key','description_html','Simple <em>primary</em> key');
INSERT INTO "datasette_metadata_resource_entries" VALUES('fixtures','simple_primary_key','title','This <em>HTML</em> is escaped');

@simonw
Copy link
Owner

simonw commented Jun 10, 2024

SELECT 
    ct.database_name,
    ct.table_name,
    ct.rootpage,
    ct.sql,
    COALESCE(
        json_group_object(dmre.key, dmre.value) 
        FILTER (WHERE dmre.key IS NOT NULL AND dmre.value IS NOT NULL),
        '{}'
    ) AS metadata
FROM 
    catalog_tables ct
LEFT JOIN 
    datasette_metadata_resource_entries dmre
ON 
    ct.database_name = dmre.database_name 
    AND ct.table_name = dmre.resource_name
GROUP BY 
    ct.database_name,
    ct.table_name
ORDER BY 
    ct.database_name,
    ct.table_name;

(Wrote with help of GPT-4o)

Returns results like this:

CleanShot 2024-06-10 at 09 28 52@2x

But FILTER in aggregates was added in 3.30.0 on 2019-10-04: https://sqlite.org/changes.html#version_3_30_0

@asg017
Copy link
Contributor Author

asg017 commented Jun 10, 2024

Those pyodide errors look not too fun

async def apply_metadata_json(self):
# Apply any metadata entries from metadata.json to the internal tables
# step 1: top-level metadata
for key in self._metadata_local or {}:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rename self._metadata_local to self._metadata_on_startup or self._metadata_from_file to help clarify that it's the metadata that was passed to Datasette's constructor on startup?

I don't want people to get confused and think that reading from self._metadata_local is a useful thing to do, when they should be calling the new async methods instead.

Probably overthinking that though, since those methods will be documented but self._metadata_local will not.

Comment on lines +474 to +476
# TODO(alex) is metadata.json was loaded in, and --internal is not memory, then log
# a warning to user that they should delete their metadata.json file

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with us not doing this - software telling me to delete a file is a bit weird.

@simonw simonw merged commit e1bfab3 into simonw:main Jun 11, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants