XLA support #1466

mfatih7 · 2023-01-28T19:46:45Z

Hello

Up to now, I was using torchmetrics in my training scripts running on GPUs.
Now I want to use Google Tensor Processing Units in my work.
For the last few days, I am observing that torchmetrics is not compatible with XLA library.
torchmetrics needs to be lowered for TPU support.

best regards

The text was updated successfully, but these errors were encountered:

github-actions · 2023-01-28T19:47:24Z

Hi! thanks for your contribution!, great first issue!

justusschock · 2023-01-29T11:54:47Z

Hi @mfatih7 ,
Thanks for the issue. Would you be interested in doing so or at least give us a hint on what's not working on TPU?

mfatih7 · 2023-01-29T12:13:21Z

Hi @justusschock

I can give information about the situation in the TPU.
I have a PyTorch project in which I use different kinds of models for classification.
In the program flow, I import your module as

from torchmetrics import ConfusionMatrix

I instantiate an instance using

confmat = ConfusionMatrix(task="binary", num_classes=2).to(device)

When the device is GPU it works without any problem.
When the device is TPU confmat causes XLA compilations

Actually, it is a normal situation.
PyTorch XLA team welcomes lowering requests for torch functions.
The functions causing compilations are _unique2 and bincount.
You can check my issue in XLA issue.

Maybe you need to implement your module without these functions.

I can give more information with pleasure.

mfatih7 · 2023-01-29T14:58:17Z

To prevent XLA compilations temporarily, I am using the simple function below.

def get_confusion_matrix_for_xla(outputs, labels):
    
    confusion_matrix = torch.zeros( 2, 2, dtype=torch.int64, device = params.DEVICE)
    
    confusion_matrix[0,0] = torch.sum( torch.logical_and( labels < 0.5, outputs < 0.5) )
    confusion_matrix[0,1] = torch.sum( torch.logical_and( labels < 0.5, outputs >= 0.5) )
    confusion_matrix[1,0] = torch.sum( torch.logical_and( labels >= 0.5, outputs < 0.5) )
    confusion_matrix[1,1] = torch.sum( torch.logical_and( labels >= 0.5, outputs >= 0.5) )
    
    return confusion_matrix

SkafteNicki · 2023-01-30T13:43:10Z

Hi @mfatih7,
Is it possible to reproduce this behaviour in a colab notebook?
While we cannot rewrite our hole framework to support XLA, we could probably implement fallback solutions on specific devices. For example _bincount is a function that is not support by MPS accelerator yet, and therefore we have a fallback solution for that:
https://github.com/Lightning-AI/metrics/blob/5d4ffe01aa09b7108f7e0e4034748bdfd64bf5f9/src/torchmetrics/utilities/data.py#L206-L228

mfatih7 · 2023-01-30T17:50:36Z

hi @SkafteNicki

I could not get the warnings with code purely written in the COLAB notebook.
But here is a .py file and a COLAB notebook.
You can easily see the warnings.

I can give more support if needed.
More debugging options are available.

Do not forget to select TPU from COLAB settings

SkafteNicki · 2023-01-31T10:53:07Z

Hi @mfatih7,
Could you try re-running it with the changes from this branch:
https://github.com/Lightning-AI/metrics/tree/xla_test
and additionally also change the initialization of the metric to be:

confmat = ConfusionMatrix(task="binary", num_classes=2, validate_args=False).to(device)

?

mfatih7 · 2023-01-31T10:57:59Z

OK but

How can I download this version in my COLAB notebook?
I was using !pip install torchmetrics at the top of my notebooks.

justusschock · 2023-01-31T11:13:12Z

Change !pip install torchmetrics to !pip install git+https://github.com/Lightning-AI/metrics@xla_test to install from this branch.

mfatih7 · 2023-01-31T12:17:55Z

OK

I don't see any recompilations due to torchmetrics now.
Aside from the parameter change in the instantiation, do you also change some parts of the source code?

Will you commit to the main torchmetrics branch?

Do you consider making your library accessible without installation on COLAB?

SkafteNicki · 2023-01-31T12:22:37Z

@mfatih7 so I can explain what I did:

By setting validate_args=False you are going to skip an internal check that the input is the right format. The check uses torch.unique which XLA does not suppport.
Secondly, in the branch you used I implemented some logic for bincount such that if XLA is detected then we use a simple for-loop which works for XLA but you should note can be significantly slower if you have a large number of classes.

I think we can include the change, but we are not going to officially support XLA for now

mfatih7 · 2023-01-31T12:26:40Z

Thank you

We can close this issue if you want.

I hope I can hear any updates in the future.

SkafteNicki · 2023-01-31T13:46:55Z

Hi @mfatih7,
You are welcome, it will be close when PR #1471 is merged.

mfatih7 added the enhancement New feature or request label Jan 28, 2023

SkafteNicki mentioned this issue Jan 31, 2023

Fixed bincount on XLA #1471

Merged

4 tasks

Borda closed this as completed in #1471 Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLA support #1466

XLA support #1466

mfatih7 commented Jan 28, 2023

github-actions bot commented Jan 28, 2023

justusschock commented Jan 29, 2023

mfatih7 commented Jan 29, 2023 •

edited

mfatih7 commented Jan 29, 2023 •

edited

SkafteNicki commented Jan 30, 2023

mfatih7 commented Jan 30, 2023 •

edited

SkafteNicki commented Jan 31, 2023

mfatih7 commented Jan 31, 2023

justusschock commented Jan 31, 2023

mfatih7 commented Jan 31, 2023

SkafteNicki commented Jan 31, 2023

mfatih7 commented Jan 31, 2023

SkafteNicki commented Jan 31, 2023

XLA support #1466

XLA support #1466

Comments

mfatih7 commented Jan 28, 2023

github-actions bot commented Jan 28, 2023

justusschock commented Jan 29, 2023

mfatih7 commented Jan 29, 2023 • edited

mfatih7 commented Jan 29, 2023 • edited

SkafteNicki commented Jan 30, 2023

mfatih7 commented Jan 30, 2023 • edited

SkafteNicki commented Jan 31, 2023

mfatih7 commented Jan 31, 2023

justusschock commented Jan 31, 2023

mfatih7 commented Jan 31, 2023

SkafteNicki commented Jan 31, 2023

mfatih7 commented Jan 31, 2023

SkafteNicki commented Jan 31, 2023

mfatih7 commented Jan 29, 2023 •

edited

mfatih7 commented Jan 29, 2023 •

edited

mfatih7 commented Jan 30, 2023 •

edited