Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using Helm connector with MPI example #82

Open
zhangzhenhuajack opened this issue Mar 8, 2023 · 8 comments · Fixed by #95
Open

Error using Helm connector with MPI example #82

zhangzhenhuajack opened this issue Mar 8, 2023 · 8 comments · Fixed by #95
Labels
bug Something isn't working

Comments

@zhangzhenhuajack
Copy link

zhangzhenhuajack commented Mar 8, 2023

image

~/streamflow/streamflow/examples/mpi on master !1 > pwd                                                                                                                                                            took 9s py cwl with zhenhua.zhang@mixbio-dev-2 at 16:01:42
/home/zhenhua.zhang/streamflow/streamflow/examples/mpi
~/streamflow/streamflow/examples/mpi on master !1 > cat streamflow.yml                                                                                                                                                     py cwl with zhenhua.zhang@mixbio-dev-2 at 16:16:50
version: v1.0
workflows:
  master:
    type: cwl
    config:
      file: cwl/main.cwl
      settings: cwl/config.yml
    bindings:
      - step: /compile
        target:
          deployment: helm-mpi
          service: openmpi
      - step: /execute
        target:
          deployment: helm-mpi
          locations: 2
          service: openmpi
deployments:
  dc-mpi:
    type: docker-compose
    config:
      files:
        - environment/docker-compose/docker-compose.yml
      compatibility: true
      projectName: openmpi
  helm-mpi:
    type: helm
    config:
      chart: environment/helm/openmpi
      kubeconfig: /home/zhenhua.zhang/.kube/config-streamflow
      releaseName: openmpi-rel
      transferBufferSize: 10240
      namespace: streamflow
~/streamflow/streamflow/examples/mpi on master !1 > streamflow run streamflow.yml                                                                                                                                 took 15s py cwl with zhenhua.zhang@mixbio-dev-2 at 16:01:32
Resolved 'cwl/main.cwl' to 'file:///data/home/zhenhua.zhang/streamflow/streamflow/examples/mpi/cwl/main.cwl'
2023-03-08 16:01:35.818 INFO     Processing workflow b18a19ab-29c4-4d38-bdb0-13a8fba584a9
2023-03-08 16:01:35.818 INFO     Building workflow execution plan
2023-03-08 16:01:35.833 INFO     COMPLETED Building of workflow execution plan
2023-03-08 16:01:35.833 INFO     Running workflow b18a19ab-29c4-4d38-bdb0-13a8fba584a9
2023-03-08 16:01:35.924 INFO     DEPLOYING helm-mpi
2023-03-08 16:01:39.950 INFO     COMPLETED Deployment of helm-mpi
2023-03-08 16:01:40.185 INFO     COPYING /data/home/zhenhua.zhang/streamflow/streamflow/examples/mpi/cwl/data/cs.cxx on local file-system to /tmp/streamflow/19f786af-721f-49fe-bf97-7aac459a9ac5/c9b7457b-77da-4032-a92b-ed904317388c/cs.cxx on location helm-mpi/openmpi/openmpi-rel-574588bf6b-76fnx:openmpi
2023-03-08 16:01:40.361 INFO     EXECUTING step /compile (job /compile/0) on location helm-mpi/openmpi/openmpi-rel-574588bf6b-76fnx:openmpi into directory /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d:
mpicxx \
	-O3 \
	-o \
	cs \
	/tmp/streamflow/19f786af-721f-49fe-bf97-7aac459a9ac5/c9b7457b-77da-4032-a92b-ed904317388c/cs.cxx
2023-03-08 16:01:41.345 INFO     COMPLETED Step /compile
2023-03-08 16:01:41.729 INFO     COPYING /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d/cs on location helm-mpi/openmpi/openmpi-rel-574588bf6b-76fnx:openmpi to /tmp/streamflow/d56ec932-b589-48f2-b328-b1db611b5c33/74a3f255-817b-4b1c-8c55-0ea1ea3cba3c/cs on location helm-mpi/openmpi/openmpi-rel-574588bf6b-krks5:openmpi
2023-03-08 16:01:41.905 ERROR    Error transferring file /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d/cs in location openmpi-rel-574588bf6b-76fnx:openmpi to /tmp/streamflow/d56ec932-b589-48f2-b328-b1db611b5c33/74a3f255-817b-4b1c-8c55-0ea1ea3cba3c/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-krks5:openmpi
Traceback (most recent call last):
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/workflow/step.py", line 1458, in run
    token=await self.transfer(job, token),
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 551, in transfer
    return token.update(await self._transfer_value(job, token.value))
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 360, in _transfer_value
    return await self._update_file_token(job, token_value)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 458, in _update_file_token
    raise WorkflowExecutionException(
streamflow.core.exception.WorkflowExecutionException: Error transferring file /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d/cs in location openmpi-rel-574588bf6b-76fnx:openmpi to /tmp/streamflow/d56ec932-b589-48f2-b328-b1db611b5c33/74a3f255-817b-4b1c-8c55-0ea1ea3cba3c/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-krks5:openmpi
2023-03-08 16:01:41.907 INFO     SKIPPED Step /execute
2023-03-08 16:01:41.907 INFO     UNDEPLOYING helm-mpi
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/zhenhua.zhang/.kube/config-streamflow
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/zhenhua.zhang/.kube/config-streamflow
release "openmpi-rel" uninstalled
2023-03-08 16:01:42.198 INFO     COMPLETED Undeployment of helm-mpi
2023-03-08 16:01:42.237 ERROR    FAILED Workflow execution
Traceback (most recent call last):
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/main.py", line 258, in main
    asyncio.run(_async_run(args))
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/main.py", line 166, in _async_run
    await asyncio.gather(*workflow_tasks)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/main.py", line 74, in main
    output_tokens = await executor.run()
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/workflow/executor.py", line 135, in run
    raise WorkflowExecutionException("FAILED Workflow execution")
streamflow.core.exception.WorkflowExecutionException: FAILED Workflow execution
@zhangzhenhuajack zhangzhenhuajack changed the title when run examples/mpi/streamflow.yaml get a error :ERROR Error transferring file /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d/cs in location openmpi-rel-574588bf6b-76fnx:openmpi to /tmp/streamflow/d56ec932-b589-48f2-b328-b1db611b5c33/74a3f255-817b-4b1c-8c55-0ea1ea3cba3c/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-krks5:openmpi when use helm to run examples/mpi/streamflow.yaml get a error :ERROR Error transferring file /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d/cs in location openmpi-rel-574588bf6b-76fnx:openmpi to /tmp/streamflow/d56ec932-b589-48f2-b328-b1db611b5c33/74a3f255-817b-4b1c-8c55-0ea1ea3cba3c/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-krks5:openmpi Mar 8, 2023
GlassOfWhiskey added a commit that referenced this issue Mar 14, 2023
Prior to this commit, the MPI example was not working for these reasons:
 - When scheduling a job on multiple locations the STREAMFLOW_HOSTS variable
   was not correctly populated;
 - Helm connector had some issues with file transfers and tar streaming;
 - DockerCompose connector had issues with parsing available locations.
All these issues are solved by this fix, which also fixes #82.
GlassOfWhiskey added a commit that referenced this issue Mar 14, 2023
Prior to this commit, the MPI example was not working for these reasons:
 - When scheduling a job on multiple locations the STREAMFLOW_HOSTS variable
   was not correctly populated;
 - Helm connector had some issues with file transfers and tar streaming;
 - DockerCompose connector had issues with parsing available locations.
All these issues are solved by this fix, which also fixes #82.
GlassOfWhiskey added a commit that referenced this issue Mar 14, 2023
Prior to this commit, the MPI example was not working for these reasons:
 - When scheduling a job on multiple locations the STREAMFLOW_HOSTS variable
   was not correctly populated;
 - Helm connector had some issues with file transfers and tar streaming;
 - DockerCompose connector had issues with parsing available locations.
All these issues are solved by this fix, which also fixes #82.
GlassOfWhiskey added a commit that referenced this issue Mar 14, 2023
Prior to this commit, the MPI example was not working for these reasons:
 - When scheduling a job on multiple locations the STREAMFLOW_HOSTS variable
   was not correctly populated;
 - Helm connector had some issues with file transfers and tar streaming;
 - DockerCompose connector had issues with parsing available locations.
All these issues are solved by this fix, which also fixes #82.
@GlassOfWhiskey
Copy link
Member

Dear @zhangzhenhuajack,
Thanks for opening the issue. This should be solved by PR #95.
Feel free to repoen the issue if you still face problems.

@GlassOfWhiskey GlassOfWhiskey changed the title when use helm to run examples/mpi/streamflow.yaml get a error :ERROR Error transferring file /tmp/streamflow/6fc36c60-dc32-4711-b399-16a09047632d/cs in location openmpi-rel-574588bf6b-76fnx:openmpi to /tmp/streamflow/d56ec932-b589-48f2-b328-b1db611b5c33/74a3f255-817b-4b1c-8c55-0ea1ea3cba3c/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-krks5:openmpi Error using Helm connector with MPI example Mar 14, 2023
@GlassOfWhiskey GlassOfWhiskey added the bug Something isn't working label Mar 14, 2023
@zhangzhenhuajack
Copy link
Author

zhangzhenhuajack commented Mar 17, 2023

when I git pull update , still has error. @GlassOfWhiskey

2023-03-17 16:04:24.132 ERROR    Error transferring file /home/zhenhua.zhang/tmp/79022d8f-319d-4c6c-856f-a5d0151f0fa4/cs in location openmpi-rel-574588bf6b-hrjjw:openmpi to /home/zhenhua.zhang/tmp/410468d9-63dd-4981-ae79-7dac1fd0e2ec/c549d3c3-e74a-41f0-b836-c17675dc79da/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-v8mls:openmpi

image
streamflow run streamflow_jk.yml

~/streamflow/streamflow/examples/mpi on master ?1 > streamflow run streamflow_jk.yml                                                                                                                                py cwl with zhenhua.zhang@mixbio-dev-2 at 16:03:56
Resolved 'cwl/main.cwl' to 'file:///data/home/zhenhua.zhang/streamflow/streamflow/examples/mpi/cwl/main.cwl'
2023-03-17 16:04:00.295 INFO     Processing workflow 2e8842c3-15ba-4e59-8fee-947de39ca7fb
2023-03-17 16:04:00.295 INFO     Building workflow execution plan
2023-03-17 16:04:00.330 INFO     COMPLETED Building of workflow execution plan
2023-03-17 16:04:00.330 INFO     Running workflow 2e8842c3-15ba-4e59-8fee-947de39ca7fb
2023-03-17 16:04:00.426 INFO     DEPLOYING helm-mpi
2023-03-17 16:04:22.051 INFO     COMPLETED Deployment of helm-mpi
2023-03-17 16:04:22.298 INFO     COPYING /data/home/zhenhua.zhang/streamflow/streamflow/examples/mpi/cwl/data/cs.cxx on local file-system to /home/zhenhua.zhang/tmp/85d69ceb-5d2d-42bb-91bc-47eb492ffec5/6f864633-a789-4433-8b72-206901962211/cs.cxx on location helm-mpi/openmpi/openmpi-rel-574588bf6b-hrjjw:openmpi
2023-03-17 16:04:22.504 INFO     EXECUTING step /compile (job /compile/0) on location helm-mpi/openmpi/openmpi-rel-574588bf6b-hrjjw:openmpi into directory /home/zhenhua.zhang/tmp/79022d8f-319d-4c6c-856f-a5d0151f0fa4:
mpicxx \
	-O3 \
	-o \
	cs \
	/home/zhenhua.zhang/tmp/85d69ceb-5d2d-42bb-91bc-47eb492ffec5/6f864633-a789-4433-8b72-206901962211/cs.cxx
2023-03-17 16:04:23.656 INFO     COMPLETED Step /compile
2023-03-17 16:04:23.944 INFO     COPYING /home/zhenhua.zhang/tmp/79022d8f-319d-4c6c-856f-a5d0151f0fa4/cs on location helm-mpi/openmpi/openmpi-rel-574588bf6b-hrjjw:openmpi to /home/zhenhua.zhang/tmp/410468d9-63dd-4981-ae79-7dac1fd0e2ec/c549d3c3-e74a-41f0-b836-c17675dc79da/cs on location helm-mpi/openmpi/openmpi-rel-574588bf6b-v8mls:openmpi
2023-03-17 16:04:24.132 ERROR    Error transferring file /home/zhenhua.zhang/tmp/79022d8f-319d-4c6c-856f-a5d0151f0fa4/cs in location openmpi-rel-574588bf6b-hrjjw:openmpi to /home/zhenhua.zhang/tmp/410468d9-63dd-4981-ae79-7dac1fd0e2ec/c549d3c3-e74a-41f0-b836-c17675dc79da/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-v8mls:openmpi
Traceback (most recent call last):
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/workflow/step.py", line 1458, in run
    token=await self.transfer(job, token),
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 551, in transfer
    return token.update(await self._transfer_value(job, token.value))
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 360, in _transfer_value
    return await self._update_file_token(job, token_value)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 458, in _update_file_token
    raise WorkflowExecutionException(
streamflow.core.exception.WorkflowExecutionException: Error transferring file /home/zhenhua.zhang/tmp/79022d8f-319d-4c6c-856f-a5d0151f0fa4/cs in location openmpi-rel-574588bf6b-hrjjw:openmpi to /home/zhenhua.zhang/tmp/410468d9-63dd-4981-ae79-7dac1fd0e2ec/c549d3c3-e74a-41f0-b836-c17675dc79da/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-v8mls:openmpi
2023-03-17 16:04:24.138 INFO     SKIPPED Step /execute
2023-03-17 16:04:24.139 INFO     UNDEPLOYING helm-mpi
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/zhenhua.zhang/.kube/config-streamflow
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/zhenhua.zhang/.kube/config-streamflow
release "openmpi-rel" uninstalled
2023-03-17 16:04:24.428 INFO     COMPLETED Undeployment of helm-mpi
2023-03-17 16:04:24.517 ERROR    FAILED Workflow execution
Traceback (most recent call last):
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/main.py", line 258, in main
    asyncio.run(_async_run(args))
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/main.py", line 166, in _async_run
    await asyncio.gather(*workflow_tasks)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/main.py", line 74, in main
    output_tokens = await executor.run()
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/workflow/executor.py", line 135, in run
    raise WorkflowExecutionException("FAILED Workflow execution")
streamflow.core.exception.WorkflowExecutionException: FAILED Workflow execution
~/streamflow/streamflow/examples/mpi on master ?1 > cat streamflow_jk.yml                                                                                                                             PIPE took 23s py cwl with zhenhua.zhang@mixbio-dev-2 at 16:08:38
version: v1.0
workflows:
  master:
    type: cwl
    config:
      file: cwl/main.cwl
      settings: cwl/config.yml
    bindings:
      - step: /compile
        target:
          deployment: helm-mpi
          service: openmpi
      - step: /execute
        target:
          deployment: helm-mpi
          locations: 2
          service: openmpi
deployments:
  dc-mpi:
    type: docker-compose
    config:
      files:
        - environment/docker-compose/docker-compose.yml
      compatibility: true
      projectName: openmpi
  helm-mpi:
    type: helm
    config:
      chart: environment/helm/openmpi
      kubeconfig: /home/zhenhua.zhang/.kube/config-streamflow
      releaseName: openmpi-rel
      transferBufferSize: 10240
      namespace: streamflow
    workdir: /home/zhenhua.zhang/tmp

@GlassOfWhiskey
Copy link
Member

GlassOfWhiskey commented Mar 17, 2023

Hi @zhangzhenhuajack,
I repoened the issue, but I am not able to reproduce the error anymore.
Could you please run streamflow in debug mode?

streamflow run --debug streamflow_jk.yml

In this way, I can get a bit more information to reproduce it.
Thank you.

@zhangzhenhuajack
Copy link
Author

zhangzhenhuajack commented Mar 20, 2023

~/streamflow/streamflow/examples/mpi on master !1 ?1 > streamflow run --debug streamflow_jk.yml                                                                                                                     py cwl with zhenhua.zhang@mixbio-dev-2 at 09:37:47
Resolved 'cwl/main.cwl' to 'file:///data/home/zhenhua.zhang/streamflow/streamflow/examples/mpi/cwl/main.cwl'
2023-03-20 09:38:05.790 INFO     Processing workflow 6bd446a7-3b12-4280-9dbe-d222c1d581b8
2023-03-20 09:38:05.790 INFO     Building workflow execution plan
2023-03-20 09:38:05.790 DEBUG    Translating Workflow /
2023-03-20 09:38:05.791 DEBUG    Translating WorkflowStep /compile
2023-03-20 09:38:05.791 DEBUG    Translating CommandLineTool /compile
2023-03-20 09:38:05.791 DEBUG    Translating WorkflowStep /execute
2023-03-20 09:38:05.792 DEBUG    Translating CommandLineTool /execute
2023-03-20 09:38:05.805 INFO     COMPLETED Building of workflow execution plan
2023-03-20 09:38:05.805 INFO     Running workflow 6bd446a7-3b12-4280-9dbe-d222c1d581b8
2023-03-20 09:38:05.806 DEBUG    Step /num_processes-injector received inputs ['0']
2023-03-20 09:38:05.807 DEBUG    Step /source_file-injector received inputs ['0']
2023-03-20 09:38:05.807 DEBUG    Retrieving available locations for job /num_processes-injector/0 on __LOCAL__.
2023-03-20 09:38:05.807 DEBUG    Available locations for job /num_processes-injector/0 on __LOCAL__ are ['__LOCAL__'].
2023-03-20 09:38:05.808 DEBUG    Job /num_processes-injector/0 allocated locally
2023-03-20 09:38:05.808 DEBUG    Retrieving available locations for job /source_file-injector/0 on __LOCAL__.
2023-03-20 09:38:05.808 DEBUG    Available locations for job /source_file-injector/0 on __LOCAL__ are ['__LOCAL__'].
2023-03-20 09:38:05.808 DEBUG    Job /source_file-injector/0 allocated locally
2023-03-20 09:38:05.808 DEBUG    COMPLETED Step __deploy__/helm-mpi
2023-03-20 09:38:05.808 DEBUG    COMPLETED Step __deploy__/__LOCAL__
2023-03-20 09:38:05.809 DEBUG    Job /num_processes-injector/0 changed status to RUNNING
2023-03-20 09:38:05.809 DEBUG    Job /source_file-injector/0 changed status to RUNNING
2023-03-20 09:38:05.885 DEBUG    COMPLETED Step /num_processes-injector/__schedule__
2023-03-20 09:38:05.885 DEBUG    COMPLETED Step /source_file-injector/__schedule__
2023-03-20 09:38:05.889 DEBUG    Job /num_processes-injector/0 changed status to COMPLETED
2023-03-20 09:38:05.889 DEBUG    Step /num_processes-token-transformer received inputs ['0']
2023-03-20 09:38:05.889 DEBUG    Step /num_processes-injector received inputs ['0']
2023-03-20 09:38:05.890 DEBUG    COMPLETED Step /num_processes-injector
2023-03-20 09:38:05.890 DEBUG    Step /num_processes-token-transformer received inputs ['0']
2023-03-20 09:38:05.890 DEBUG    COMPLETED Step /num_processes-token-transformer
2023-03-20 09:38:05.890 DEBUG    Job /source_file-injector/0 changed status to COMPLETED
2023-03-20 09:38:05.890 DEBUG    Step /source_file-token-transformer received inputs ['0']
2023-03-20 09:38:05.891 DEBUG    Step /source_file-injector received inputs ['0']
2023-03-20 09:38:05.891 DEBUG    COMPLETED Step /source_file-injector
2023-03-20 09:38:05.892 DEBUG    Step /compile/source_file-token-transformer received inputs ['0']
2023-03-20 09:38:05.892 DEBUG    Step /source_file-token-transformer received inputs ['0']
2023-03-20 09:38:05.892 DEBUG    COMPLETED Step /source_file-token-transformer
2023-03-20 09:38:05.893 DEBUG    Step /compile/__transfer__/source_file received inputs ['0']
2023-03-20 09:38:05.893 DEBUG    Step /compile/__schedule__ received inputs ['0']
2023-03-20 09:38:05.893 DEBUG    Step /compile/source_file-token-transformer received inputs ['0']
2023-03-20 09:38:05.893 DEBUG    Retrieving available locations for job /compile/0 on helm-mpi/openmpi.
2023-03-20 09:38:05.894 INFO     DEPLOYING helm-mpi
2023-03-20 09:38:05.913 DEBUG    COMPLETED Step /compile/source_file-token-transformer
2023-03-20 09:38:05.940 DEBUG    EXECUTING helm --kubeconfig  "/home/zhenhua.zhang/.kube/config-streamflow" --namespace  "streamflow" --registry-config  "/home/zhenhua.zhang/.config/helm/registry.json" --repository-cache  "/home/zhenhua.zhang/.cache/helm/repository" --repository-config  "/home/zhenhua.zhang/.config/helm/repositories.yaml" install --timeout  "1000m" --wait  openmpi-rel environment/helm/openmpi
2023-03-20 09:39:18.332 INFO     COMPLETED Deployment of helm-mpi
2023-03-20 09:39:18.350 DEBUG    Available locations for job /compile/0 on helm-mpi/openmpi are ['openmpi-rel-574588bf6b-jh2kl:openmpi', 'openmpi-rel-574588bf6b-lh5g2:openmpi'].
2023-03-20 09:39:18.350 DEBUG    Job /compile/0 allocated on location helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.351 DEBUG    EXECUTING command mkdir -p /home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3 /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7 /home/zhenhua.zhang/tmp/eac62e7b-63d4-4e55-b970-1a5bb9b2d527 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.455 DEBUG    EXECUTING command mkdir -p /home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.456 DEBUG    Step /compile/__schedule__ received inputs ['0']
2023-03-20 09:39:18.456 DEBUG    COMPLETED Step /compile/__schedule__
2023-03-20 09:39:18.499 INFO     COPYING /data/home/zhenhua.zhang/streamflow/streamflow/examples/mpi/cwl/data/cs.cxx on local file-system to /home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx on location helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.527 DEBUG    EXECUTING command test -f "/home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx" && sha1sum "/home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx" | awk '{print $1}' 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.581 DEBUG    Step /compile received inputs ['0']
2023-03-20 09:39:18.581 DEBUG    Step /compile/__transfer__/source_file received inputs ['0']
2023-03-20 09:39:18.582 DEBUG    Job /compile/0 started
2023-03-20 09:39:18.582 DEBUG    Job /compile/0 changed status to RUNNING
2023-03-20 09:39:18.582 DEBUG    Job /compile/0 inputs: {
    "source_file": {
        "basename": "cs.cxx",
        "checksum": "sha1$958eebef8f8a9a7ab46e0009caffcd754d0f255a",
        "class": "File",
        "dirname": "/home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26",
        "location": "file:///home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx",
        "nameext": ".cxx",
        "nameroot": "cs",
        "path": "/home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx",
        "size": 3037
    }
}
2023-03-20 09:39:18.583 INFO     EXECUTING step /compile (job /compile/0) on location helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi into directory /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7:
mpicxx \
	-O3 \
	-o \
	cs \
	/home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx
2023-03-20 09:39:18.583 DEBUG    COMPLETED Step /compile/__transfer__/source_file
2023-03-20 09:39:18.583 DEBUG    Step /compile received inputs ['0']
2023-03-20 09:39:18.583 DEBUG    EXECUTING command cd /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7 && export HOME="/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7" && export TMPDIR="/home/zhenhua.zhang/tmp/eac62e7b-63d4-4e55-b970-1a5bb9b2d527" && mpicxx -O3 -o cs /home/zhenhua.zhang/tmp/746ff75c-a2b8-4e69-8c1a-6ba429ea4bf3/04029cea-248f-4a47-8431-ecdc29a8db26/cs.cxx 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi for job /compile/0
2023-03-20 09:39:18.827 DEBUG    EXECUTING command test -e "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cwl.output.json" 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.874 DEBUG    Job /compile/0 changed status to COMPLETED
2023-03-20 09:39:18.875 DEBUG    EXECUTING command printf "%s\0" /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs | xargs -0 -I{} sh -c "if [ -e \"{}\" ]; then echo \"{}\"; fi" | sort 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.917 DEBUG    EXECUTING command test -e "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" && readlink -f "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:18.988 DEBUG    EXECUTING command test -f "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.036 DEBUG    EXECUTING command test -e "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" && readlink -f "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.080 DEBUG    EXECUTING command find -L "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" -type f -exec ls -ln {} \+ | awk 'BEGIN {sum=0} {sum+=$5} END {print sum}';  2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.158 DEBUG    EXECUTING command test -f "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" && sha1sum "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" | awk '{print $1}' 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.199 DEBUG    EXECUTING command test -e "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" && readlink -f "/home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs" 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.241 DEBUG    COMPLETED Job /compile/0 terminated
2023-03-20 09:39:19.241 DEBUG    Step /execute/executable_file-token-transformer received inputs ['0', '0']
2023-03-20 09:39:19.242 DEBUG    Step /execute/num_processes-token-transformer received inputs ['0', '0']
2023-03-20 09:39:19.242 DEBUG    Step /execute/__transfer__/num_processes received inputs ['0']
2023-03-20 09:39:19.243 DEBUG    Step /execute/num_processes-token-transformer received inputs ['0', '0']
2023-03-20 09:39:19.243 INFO     COMPLETED Step /compile
2023-03-20 09:39:19.243 DEBUG    COMPLETED Step /execute/num_processes-token-transformer
2023-03-20 09:39:19.244 DEBUG    Step /execute/__transfer__/executable_file received inputs ['0']
2023-03-20 09:39:19.244 DEBUG    Step /execute/__schedule__ received inputs ['0', '0']
2023-03-20 09:39:19.244 DEBUG    Step /execute/executable_file-token-transformer received inputs ['0', '0']
2023-03-20 09:39:19.244 DEBUG    Retrieving available locations for job /execute/0 on helm-mpi/openmpi.
2023-03-20 09:39:19.244 DEBUG    Available locations for job /execute/0 on helm-mpi/openmpi are ['openmpi-rel-574588bf6b-jh2kl:openmpi', 'openmpi-rel-574588bf6b-lh5g2:openmpi'].
2023-03-20 09:39:19.245 DEBUG    helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpiJob /execute/0 allocated on locations , helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
2023-03-20 09:39:19.245 DEBUG    EXECUTING command mkdir -p /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0 /home/zhenhua.zhang/tmp/ed879a2c-3e25-48bd-8bff-4a179f7948b4 /home/zhenhua.zhang/tmp/b77d0947-658a-4379-a905-e429d8dd66e1 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.246 DEBUG    EXECUTING command mkdir -p /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0 /home/zhenhua.zhang/tmp/ed879a2c-3e25-48bd-8bff-4a179f7948b4 /home/zhenhua.zhang/tmp/b77d0947-658a-4379-a905-e429d8dd66e1 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
2023-03-20 09:39:19.246 DEBUG    COMPLETED Step /execute/executable_file-token-transformer
2023-03-20 09:39:19.399 DEBUG    EXECUTING command mkdir -p /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.401 DEBUG    EXECUTING command mkdir -p /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
2023-03-20 09:39:19.401 DEBUG    Step /execute/__schedule__ received inputs ['0', '0']
2023-03-20 09:39:19.402 DEBUG    COMPLETED Step /execute/__schedule__
2023-03-20 09:39:19.403 DEBUG    Step /execute/__transfer__/num_processes received inputs ['0']
2023-03-20 09:39:19.403 DEBUG    COMPLETED Step /execute/__transfer__/num_processes
2023-03-20 09:39:19.480 DEBUG    EXECUTING command ln -snf /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e/cs 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi
2023-03-20 09:39:19.521 INFO     COPYING /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs on location helm-mpi/openmpi/openmpi-rel-574588bf6b-jh2kl:openmpi to /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e/cs on location helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
2023-03-20 09:39:19.567 DEBUG    EXECUTING command test -f "/home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e/cs" && sha1sum "/home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e/cs" | awk '{print $1}' 2>&1 on helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
2023-03-20 09:39:19.649 ERROR    Error transferring file /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs in location openmpi-rel-574588bf6b-jh2kl:openmpi to /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
Traceback (most recent call last):
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/workflow/step.py", line 1458, in run
    token=await self.transfer(job, token),
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 551, in transfer
    return token.update(await self._transfer_value(job, token.value))
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 360, in _transfer_value
    return await self._update_file_token(job, token_value)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/step.py", line 458, in _update_file_token
    raise WorkflowExecutionException(
streamflow.core.exception.WorkflowExecutionException: Error transferring file /home/zhenhua.zhang/tmp/ab23f1f8-8a07-4d87-a936-2c3f210770d7/cs in location openmpi-rel-574588bf6b-jh2kl:openmpi to /home/zhenhua.zhang/tmp/b1b6c462-4447-43aa-b74d-1282ad4db5e0/3e0932b0-1bd3-4cb9-a689-0f3bd3eb981e/cs in location helm-mpi/openmpi/openmpi-rel-574588bf6b-lh5g2:openmpi
2023-03-20 09:39:19.650 DEBUG    Step /execute received inputs ['0', '0']
2023-03-20 09:39:19.650 DEBUG    Step result-collector-transformer received inputs ['0']
2023-03-20 09:39:19.650 DEBUG    Step result-collector received inputs ['0']
2023-03-20 09:39:19.650 DEBUG    Step result-collector/__schedule__ received inputs ['0']
2023-03-20 09:39:19.651 DEBUG    FAILED Step /execute/__transfer__/executable_file
2023-03-20 09:39:19.651 INFO     SKIPPED Step /execute
2023-03-20 09:39:19.651 DEBUG    SKIPPED Step result-collector-transformer
2023-03-20 09:39:19.651 DEBUG    SKIPPED Step result-collector
2023-03-20 09:39:19.651 DEBUG    SKIPPED Step result-collector/__schedule__
2023-03-20 09:39:19.652 INFO     UNDEPLOYING helm-mpi
2023-03-20 09:39:19.652 DEBUG    EXECUTING helm --kubeconfig  "/home/zhenhua.zhang/.kube/config-streamflow" --namespace  "streamflow" --registry-config  "/home/zhenhua.zhang/.config/helm/registry.json" --repository-cache  "/home/zhenhua.zhang/.cache/helm/repository" --repository-config  "/home/zhenhua.zhang/.config/helm/repositories.yaml" uninstall --timeout  "1000m" openmpi-rel
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/zhenhua.zhang/.kube/config-streamflow
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/zhenhua.zhang/.kube/config-streamflow
release "openmpi-rel" uninstalled
2023-03-20 09:39:19.928 INFO     COMPLETED Undeployment of helm-mpi
2023-03-20 09:39:19.965 ERROR    FAILED Workflow execution
Traceback (most recent call last):
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/main.py", line 258, in main
    asyncio.run(_async_run(args))
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/main.py", line 166, in _async_run
    await asyncio.gather(*workflow_tasks)
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/cwl/main.py", line 74, in main
    output_tokens = await executor.run()
  File "/home/zhenhua.zhang/miniconda3/envs/cwl/lib/python3.9/site-packages/streamflow/workflow/executor.py", line 135, in run
    raise WorkflowExecutionException("FAILED Workflow execution")
streamflow.core.exception.WorkflowExecutionException: FAILED Workflow execution

I think copy file is error, helm not support copy file to k8s's pod,maybe need to used kubectl cp command to cp file from location to k8s's pod or from k8s's pod to location。

I have other question is : Whether PVC needs to be configured to solve location file communication problems with k8s

@zhangzhenhuajack
Copy link
Author

zhangzhenhuajack commented Mar 20, 2023

this pod is running,but pod can not interact with location's file. @GlassOfWhiskey

@zhangzhenhuajack
Copy link
Author

zhangzhenhuajack commented Mar 20, 2023

if streamflow can directly support kubectl command,my personal tasks may be more powerful than helm。I may be think: 'Helm focuses on managing YML, and Kubeclt focuses on managing k8s's sources,like( pods,pvc)'

@GlassOfWhiskey
Copy link
Member

Hi @zhangzhenhuajack,
would you please try again with streamflow==0.2.0.dev4, the newly released version?
I found and solved a bug that was causing file transfer issues on Kubernetes sometimes, due to some kind of race condition.
I'd like to know if the fix I implemented also solves the problem for you.

@GlassOfWhiskey
Copy link
Member

Oh BTW StreamFlow 0.2.0.dev4 also as a new Kubernetes connector, which doesn't require Helm charts but only a list of Kubernetes .yaml manifests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants