Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

METS Server #966

Merged
merged 50 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
604dd9d
wip
kba Dec 6, 2022
f73194d
.
kba Dec 7, 2022
04eee34
getting there
kba Dec 8, 2022
3af495b
slowly but determinedly
kba Dec 8, 2022
82e2a69
remove noise from makefile
kba Dec 9, 2022
dcad68a
mets-server: bashlib should take same args
kba Dec 13, 2022
63ba8f0
ClientSideOcrdMets: fix signature of self.file_groups
MehmedGIT Dec 13, 2022
3d48af6
OcrdWorkspace.is_remote should be a bool
kba Dec 14, 2022
14db382
mets_server: only save_mets on PUT and DELETE
kba Dec 14, 2022
4f726ab
resolver: shorten mets_server_{host,port} check
MehmedGIT Dec 14, 2022
9b4a751
Merge branch 'master' into mets-server
kba Dec 14, 2022
6fdf7dd
--port must be int
kba Dec 14, 2022
bfa17c3
mets_server: replace Model constructor with static create calls
kba Dec 14, 2022
3685aeb
Merge branch 'mets-server' of https://github.com/kba/ocrd-core into m…
kba Dec 14, 2022
69dad92
mets_server: different loggers for socket/host-port
kba Dec 14, 2022
3532600
mets_server: missed mimetype kwarg
joschrew Dec 19, 2022
390682c
mets_server: use factory method not constructor
kba Dec 19, 2022
9f27a98
mets_server: file search/adding on /file not /
kba Dec 19, 2022
6baa0c1
Merge branch 'mets-server' of https://github.com/kba/ocrd-core into m…
kba Dec 19, 2022
892841c
workspace: save content to file only if not remote
kba Dec 19, 2022
821b5e8
Merge remote-tracking branch 'origin/master' into mets-server
kba Aug 17, 2023
3a6662d
finish implementation / test mets server
kba Aug 17, 2023
4f49d37
METS Server: equivalent functionality to files for agents
kba Aug 17, 2023
ae7c773
Update ocrd/ocrd/cli/workspace.py
kba Aug 17, 2023
93512db
METS server: consistently use local_filename
kba Aug 17, 2023
5827d30
ocrd workspace CLI: reference METS server option
bertsky Aug 17, 2023
22ea8e0
Update ocrd/ocrd/cli/workspace.py
kba Aug 17, 2023
54b14b8
mets server: add stop
kba Aug 17, 2023
374f1f4
mets server: improve docs
kba Aug 17, 2023
ca10f46
mets server: remove XXX HACK comments, they are not;
kba Aug 17, 2023
c3eebd5
mets server: provide fallback for non-wrapped OcrdFile methods
kba Aug 17, 2023
3a037f8
mets server: __str__ handlers
kba Aug 17, 2023
55bb692
mets server: support unique_identifier
kba Aug 17, 2023
baa52ef
mets server: clean up is_remote muddle
kba Aug 17, 2023
7bf1168
mets server: no content will pass through it
kba Aug 17, 2023
f477ae1
mets server will never pass content to workspace.add_file
kba Aug 17, 2023
46e34bc
mets server: single option --mets-server-url/-U
kba Aug 17, 2023
1b55c8f
:package: v2.53.0
kba Aug 21, 2023
f2da896
workspace server start: pass workspace context
kba Aug 21, 2023
47eb196
METS server: support -U for processor options
kba Aug 21, 2023
668a8f4
move ClientSideOcrd{Agent,File} to ocrd_models
kba Aug 21, 2023
671ca32
typo: -{,-}mets-server-url
kba Aug 21, 2023
41393b3
pass mets_server_url from run_processor
kba Aug 21, 2023
98a5690
ClientSideOcrdFile et al need url too
kba Aug 21, 2023
f6efd94
mets server: test both UDS and TCP variant
kba Aug 21, 2023
780fbbc
Merge branch 'master' into mets-server
kba Aug 21, 2023
6cf22a8
mets server: allow both local_filename and url to be None
kba Aug 22, 2023
927ea59
mets server: forbid local/remote workspace with different directories
kba Aug 22, 2023
b53938c
pin requests < 2.30, OCR-D/core#1082
kba Aug 22, 2023
5ce54a5
ci: localhost -> 127.0.0.1
kba Aug 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ ignored-modules=cv2,tesserocr,ocrd.model
ignore-patterns='.*generateds.*'
disable =
fixme,
E501,
trailing-whitespace,
logging-not-lazy,
inconsistent-return-statements,
Expand Down
1 change: 1 addition & 0 deletions ocrd/ocrd/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@
from ocrd.workspace import Workspace
from ocrd.workspace_backup import WorkspaceBackupManager
from ocrd.resource_manager import OcrdResourceManager
from ocrd.mets_server import OcrdMetsServer
89 changes: 80 additions & 9 deletions ocrd/ocrd/cli/workspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,24 @@
import click

from ocrd import Resolver, Workspace, WorkspaceValidator, WorkspaceBackupManager
from ocrd.mets_server import OcrdMetsServer
from ocrd_utils import getLogger, initLogging, pushd_popd, EXT_TO_MIME, safe_filename, parse_json_string_or_file
from ocrd.decorators import mets_find_options
from . import command_with_replaced_help


class WorkspaceCtx():

def __init__(self, directory, mets_url, mets_basename, automatic_backup):
def __init__(self, directory, mets_url, mets_basename, mets_server_url, automatic_backup):
self.log = getLogger('ocrd.cli.workspace')
self.resolver = Resolver()
if mets_basename:
self.log.warning(DeprecationWarning('--mets-basename is deprecated. Use --mets/--directory instead.'))
self.directory, self.mets_url, self.mets_basename = self.resolver.resolve_mets_arguments(directory, mets_url, mets_basename)
self.resolver = Resolver()
self.directory, self.mets_url, self.mets_basename, self.mets_server_url \
= self.resolver.resolve_mets_arguments(directory, mets_url, mets_basename, mets_server_url)
self.automatic_backup = automatic_backup


pass_workspace = click.make_pass_decorator(WorkspaceCtx)

# ----------------------------------------------------------------------
Expand All @@ -43,14 +46,26 @@ def __init__(self, directory, mets_url, mets_basename, automatic_backup):
@click.option('-d', '--directory', envvar='WORKSPACE_DIR', type=click.Path(file_okay=False), metavar='WORKSPACE_DIR', help='Changes the workspace folder location [default: METS_URL directory or .]"')
@click.option('-M', '--mets-basename', default=None, help='METS file basename. Deprecated, use --mets/--directory')
@click.option('-m', '--mets', default=None, help='The path/URL of the METS file [default: WORKSPACE_DIR/mets.xml]', metavar="METS_URL")
@click.option('-U', '--mets-server-url', 'mets_server_url', help="TCP host of METS server")
@click.option('--backup', default=False, help="Backup mets.xml whenever it is saved.", is_flag=True)
@click.pass_context
def workspace_cli(ctx, directory, mets, mets_basename, backup):
def workspace_cli(ctx, directory, mets, mets_basename, mets_server_url, backup):
"""
Working with workspace
Managing workspaces

A workspace comprises a METS file and a directory as point of reference.

Operates on the file system directly or via a METS server
(already running via some prior `server start` subcommand).
"""
initLogging()
ctx.obj = WorkspaceCtx(directory, mets_url=mets, mets_basename=mets_basename, automatic_backup=backup)
ctx.obj = WorkspaceCtx(
directory,
mets_url=mets,
mets_basename=mets_basename,
mets_server_url=mets_server_url,
automatic_backup=backup
)

# ----------------------------------------------------------------------
# ocrd workspace validate
Expand Down Expand Up @@ -175,7 +190,13 @@ def workspace_add_file(ctx, file_grp, file_id, mimetype, page_id, ignore, check_
Add a file or http(s) URL FNAME to METS in a workspace.
If FNAME is not an http(s) URL and is not a workspace-local existing file, try to copy to workspace.
"""
workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
workspace = Workspace(
ctx.resolver,
directory=ctx.directory,
mets_basename=ctx.mets_basename,
automatic_backup=ctx.automatic_backup,
mets_server_url=ctx.mets_server_url,
)

log = getLogger('ocrd.cli.workspace.add')
if not mimetype:
Expand Down Expand Up @@ -208,7 +229,16 @@ def workspace_add_file(ctx, file_grp, file_id, mimetype, page_id, ignore, check_

if not page_id:
log.warning("You did not provide '--page-id/-g', so the file you added is not linked to a specific page.")
workspace.add_file(file_grp, file_id=file_id, mimetype=mimetype, page_id=page_id, force=force, ignore=ignore, local_filename=local_filename, url=fname)
kwargs = {
'file_id': file_id,
'mimetype': mimetype,
'page_id': page_id,
'force': force,
'ignore': ignore,
'local_filename': local_filename,
'url': fname
}
workspace.add_file(file_grp, **kwargs)
workspace.save_mets()

# ----------------------------------------------------------------------
Expand Down Expand Up @@ -401,7 +431,12 @@ def workspace_find(ctx, file_grp, mimetype, page_id, file_id, output_field, down
output_field = [snake_to_camel.get(x, x) for x in output_field]
modified_mets = False
ret = list()
workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
workspace = Workspace(
ctx.resolver,
directory=ctx.directory,
mets_basename=ctx.mets_basename,
mets_server_url=ctx.mets_server_url,
)
for f in workspace.find_files(
file_id=file_id,
file_grp=file_grp,
Expand Down Expand Up @@ -676,3 +711,39 @@ def workspace_backup_undo(ctx):
"""
backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
backup_manager.undo()


# ----------------------------------------------------------------------
# ocrd workspace serve
# ----------------------------------------------------------------------

@workspace_cli.group('server')
@pass_workspace
def workspace_serve_cli(ctx): # pylint: disable=unused-argument
"""Control a METS server for this workspace"""
assert ctx.mets_server_url, "For METS server commands, you must provide '-U/--mets-server-url'"

@workspace_serve_cli.command('stop')
@pass_workspace
def workspace_serve_stop(ctx): # pylint: disable=unused-argument
"""Stop the METS server"""
workspace = Workspace(
ctx.resolver,
directory=ctx.directory,
mets_basename=ctx.mets_basename,
mets_server_url=ctx.mets_server_url,
)
workspace.mets.stop()

@workspace_serve_cli.command('start')
@pass_workspace
def workspace_serve_start(ctx): # pylint: disable=unused-argument
"""
Start a METS server

(For TCP backend, pass a network interface to bind to as the '-U/--mets-server-url' parameter.)
"""
kba marked this conversation as resolved.
Show resolved Hide resolved
OcrdMetsServer(
workspace=Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename),
url=ctx.mets_server_url,
).startup()
4 changes: 3 additions & 1 deletion ocrd/ocrd/decorators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
def ocrd_cli_wrap_processor(
processorClass,
mets=None,
mets_server_url=None,
working_dir=None,
dump_json=False,
dump_module_dir=False,
Expand Down Expand Up @@ -74,7 +75,8 @@ def ocrd_cli_wrap_processor(
# if not kwargs['output_file_grp']:
# raise ValueError('-O/--output-file-grp is required')
resolver = Resolver()
working_dir, mets, _ = resolver.resolve_mets_arguments(working_dir, mets, None)
working_dir, mets, _, mets_server_url = \
resolver.resolve_mets_arguments(working_dir, mets, None, mets_server_url)
workspace = resolver.workspace_from_url(mets, working_dir)
page_id = kwargs.get('page_id')
# XXX not possible while processors do not adhere to # https://github.com/OCR-D/core/issues/505
Expand Down
5 changes: 3 additions & 2 deletions ocrd/ocrd/decorators/ocrd_cli_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ def cli(mets_url):
"""
# XXX Note that the `--help` output is statically generate_processor_help
params = [
option('-m', '--mets', default="mets.xml"),
option('-w', '--working-dir'),
option('-m', '--mets', help="METS to process", default="mets.xml"),
option('-w', '--working-dir', help="Working Directory"),
option('-U', '-mets-server-url', help="METS server URL. Starts with http:// then TCP, otherwise unix socket path"),
# TODO OCR-D/core#274
# option('-I', '--input-file-grp', required=True),
# option('-O', '--output-file-grp', required=True),
Expand Down
1 change: 1 addition & 0 deletions ocrd/ocrd/lib.bash
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ ocrd__parse_argv () {
-I|--input-file-grp) ocrd__argv[input_file_grp]=$2 ; shift ;;
-w|--working-dir) ocrd__argv[working_dir]=$(realpath "$2") ; shift ;;
-m|--mets) ocrd__argv[mets_file]=$(realpath "$2") ; shift ;;
--mets-server-url) ocrd_argv[mets_server_url]="$2" ; shift ;;
--overwrite) ocrd__argv[overwrite]=true ;;
--profile) ocrd__argv[profile]=true ;;
--profile-file) ocrd__argv[profile_file]=$(realpath "$2") ; shift ;;
Expand Down