Make DistributedSampler stateful #1315

ramanishsingh · 2024-08-21T16:58:34Z

Fixes #1269

Changes

torchdata/stateful_dataloader/sampler.py : Added new classes StatefulDistributedSampler and _StatefulDistributedSamplerIterator
test/stateful_dataloader/test_dataloader.py new tests for StatefulDistributedSampler

andrewkho · 2024-08-22T16:12:19Z

AI Store test can be safely ignored for now

andrewkho

Looks pretty good, but would like to simplify the code a bit and move the tests around as well

test/stateful_dataloader/test_dataloader.py

andrewkho · 2024-08-22T16:19:11Z

test/stateful_dataloader/test_dataloader.py

@@ -1947,6 +1960,116 @@ def test_sampler_reproducibility(self):
                    ls[i].append(next(its[i]))
            self.assertEqual(ls[0], ls[1])

+    def test_initialization_StatefulDistributedSampler(self):


Let's move all of these tests out to a new file called test_sampler.py. You can update https://github.com/pytorch/data/blob/main/.github/workflows/stateful_dataloader_ci.yml to call it in an additional step

Created here: https://github.com/pytorch/data/blob/stateful_distributedsampler/test/stateful_dataloader/test_sampler.py

Added new line here:

data/.github/workflows/stateful_dataloader_ci.yml

Line 81 in cdc5d31

- name: Run StatefulDataSampler tests with pytest - datasampler

andrewkho · 2024-08-22T16:20:16Z

test/stateful_dataloader/test_dataloader.py

+        from torchdata.stateful_dataloader.sampler import StatefulDistributedSampler
+
+        dataset = self.dataset
+        sampler = StatefulDistributedSampler(dataset, num_replicas=10, rank=0, shuffle=False, seed=42, drop_last=False)


For testing state_dict, let's have most of the tests set up with passing sampler + dataset to StatefulDataLoader so we can test that it works end-to-end

You might need to use a dummy Collate function to easily inspect elements, check the test_state_dict.py file for examples

New tests here:

data/test/stateful_dataloader/test_sampler.py

Line 173 in cdc5d31

def test_dataloader_state_dict(self):

andrewkho · 2024-08-22T16:22:55Z

torchdata/stateful_dataloader/sampler.py

+        self.next_yielded = None
+
+    def __iter__(self):
+


Is it possible to fork the DistributedSampler.__iter__ code here instead and just update, instead of having a separate Iterator class?

data/torchdata/stateful_dataloader/sampler.py

Line 149 in cdc5d31

self.indices = list(super().__iter__())

andrewkho · 2024-08-22T16:25:18Z

torchdata/stateful_dataloader/sampler.py

+        if self.sampler.shuffle:
+            # deterministically shuffle based on epoch and seed
+            g = torch.Generator()
+            g.manual_seed(self.sampler.seed + self.sampler.epoch)
+            indices = torch.randperm(len(self.sampler.dataset), generator=g).tolist()  # type: ignore[arg-type]
+        else:
+            indices = list(range(len(self.sampler.dataset)))  # type: ignore[arg-type]
+
+        if not self.sampler.drop_last:
+            # add extra samples to make it evenly divisible
+            padding_size = self.sampler.total_size - len(indices)
+            if padding_size <= len(indices):
+                indices += indices[:padding_size]
+            else:
+                indices += (indices * math.ceil(padding_size / len(indices)))[:padding_size]
+        else:
+            # remove tail of data to make it evenly divisible.
+            indices = indices[: self.sampler.total_size]
+        assert len(indices) == self.sampler.total_size
+
+        # subsample
+        indices = indices[self.sampler.rank : self.sampler.total_size : self.sampler.num_replicas]
+        assert len(indices) == self.sampler.num_samples
+
+        self.parent_iterator = iter(indices)
+        self.indices = list(self.parent_iterator)
+        self.current_index = 0


Is there a way to call the original code instead of forking it here?

andrewkho · 2024-08-22T16:26:55Z

torchdata/stateful_dataloader/sampler.py

+    def state_dict(self) -> Dict[str, Any]:
+        return self.sampler.state_dict()
+
+    def load_state_dict(self, state_dict: Dict[str, Any]) -> None:
+        self.sampler.load_state_dict(state_dict)


I don't think we need this both here and in the main sampler class, can we consolidate to have this in just one place?

andrewkho

Couple of suggestions, but looks great! very nice test suite.

When you're done making changes, please run the fbcode CI for media_dataloader

test/stateful_dataloader/test_sampler.py

torchdata/stateful_dataloader/sampler.py

Co-authored-by: Andrew Ho <andrewkh@meta.com>

facebook-github-bot · 2024-08-25T17:24:06Z

@ramanishsingh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-26T16:27:12Z

@ramanishsingh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

andrewkho

LGTM!

facebook-github-bot · 2024-09-06T19:25:25Z

This pull request was exported from Phabricator. Differential Revision: D61772177

ramanishsingh added 4 commits August 20, 2024 19:07

make distributed sampler stateful

b0a7963

run black

2e33f9c

add tests

6c6ef78

run black

Loading
Loading status checks…

b6c71d8

facebook-github-bot added the CLA Signed label Aug 21, 2024

ramanishsingh requested review from andrewkho and gokulavasan August 21, 2024 16:59

ramanishsingh changed the title ~~Make DistributedSampling stateful~~ Make DistributedSampler stateful Aug 21, 2024

remove unncessary import

Loading
Loading status checks…

2e431ef

ramanishsingh removed request for gokulavasan and andrewkho August 21, 2024 17:07

pytorch deleted a comment from pytorch-bot bot Aug 21, 2024

ramanishsingh added 3 commits August 21, 2024 10:09

run precommit

Loading
Loading status checks…

306431e

import math in sampler.py

Loading
Loading status checks…

fe08bfc

define methods in _StatefulDistributedSamplerIterator

Loading
Loading status checks…

0dce976

ramanishsingh requested review from andrewkho and gokulavasan August 21, 2024 19:16

ramanishsingh self-assigned this Aug 21, 2024

andrewkho suggested changes Aug 22, 2024

View reviewed changes

ramanishsingh added 3 commits August 22, 2024 11:02

remove unnecessary repetition of methods

Loading
Loading status checks…

fb0a187

run precommit

Loading
Loading status checks…

4d014e5

add tests

Loading
Loading status checks…

cdc5d31

ramanishsingh requested a review from andrewkho August 22, 2024 21:33

andrewkho reviewed Aug 24, 2024

View reviewed changes

test/stateful_dataloader/test_sampler.py Outdated Show resolved Hide resolved

torchdata/stateful_dataloader/sampler.py Outdated Show resolved Hide resolved

ramanishsingh and others added 4 commits August 25, 2024 07:33

updated dataloader types in tests

Loading
Loading status checks…

1851799

add itertools import

Loading
Loading status checks…

a13679c

update tests

Loading
Loading status checks…

d93d3c4

remove unnecessary imports

Loading
Loading status checks…

c024661

andrewkho approved these changes Aug 28, 2024

View reviewed changes

ramanishsingh merged commit 8b6e903 into main Aug 28, 2024
42 of 45 checks passed

facebook-github-bot added the fb-exported label Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make DistributedSampler stateful #1315

Make DistributedSampler stateful #1315

ramanishsingh commented Aug 21, 2024

andrewkho commented Aug 22, 2024

andrewkho left a comment

andrewkho Aug 22, 2024

ramanishsingh Aug 22, 2024

andrewkho Aug 22, 2024

andrewkho Aug 22, 2024

ramanishsingh Aug 22, 2024

andrewkho Aug 22, 2024

ramanishsingh Aug 22, 2024

andrewkho Aug 22, 2024

andrewkho Aug 22, 2024

andrewkho left a comment

facebook-github-bot commented Aug 25, 2024

facebook-github-bot commented Aug 26, 2024

andrewkho left a comment

facebook-github-bot commented Sep 6, 2024

Make DistributedSampler stateful #1315

Make DistributedSampler stateful #1315

Conversation

ramanishsingh commented Aug 21, 2024

Changes

andrewkho commented Aug 22, 2024

andrewkho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewkho left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 25, 2024

facebook-github-bot commented Aug 26, 2024

andrewkho left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 6, 2024