Add support for the Training Method for finetuning, and for Direct-Preference Optimization (DPO) #262

VProv · 2025-03-03T15:41:28Z

Describe your changes

This PR adds support for the Training Method for finetuning, and for Direct-Preference Optimization (DPO).

src/together/cli/api/finetune.py

src/together/utils/files.py

mryab · 2025-03-05T13:26:36Z

src/together/utils/files.py

+        filtered_messages.append(
+            {column: message[column] for column in REQUIRED_COLUMNS_MESSAGE}
+        )


Hm, I'm not sure if filtering files when they are uploaded is the right solution: this will require users to reupload their data whenever we support a new field for messages (for example, function calling)

Agree, removed the filtering part from the function

mryab · 2025-03-05T13:27:06Z

src/together/utils/files.py

+    )
+
+    if not isinstance(example["preferred_output"], list):
+        raise ValueError(


All of these should be InvalidFileFormatError

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

…CE_OPENAI

artek0chumak · 2025-03-10T11:30:55Z

src/together/types/finetune.py

+    Training method type for SFT training
+    """
+
+    method: str = "sft"


Suggested change

method: str = "sft"

method: Literal["sft"] = "sft"

src/together/constants.py

artek0chumak · 2025-03-10T11:37:14Z

src/together/utils/files.py

+
+    has_weights = False
+    # Check for weights in messages
+    if _has_weights(messages):


Why did you make this into a separate function? Why not to inline it?

It can even be like

has_weights = any("weight" in message for message in messages)

artek0chumak · 2025-03-10T11:41:27Z

src/together/utils/files.py

+            )
+        previous_role = message["role"]
+
+    return messages, has_weights


Why do you need to return messages? The row doesn't seem to be modified.

artek0chumak · 2025-03-10T11:42:15Z

src/together/utils/files.py

+    return messages, has_weights
+
+
+def validate_preference_openai(example: Dict[str, Any], idx: int = 0) -> Dict[str, Any]:


Why do you need to return an example?

mryab · 2025-03-10T11:30:29Z

src/together/resources/finetune.py

@@ -105,6 +109,12 @@ def createFinetuneRequest(
        lr_scheduler_args=FinetuneLinearLRSchedulerArgs(min_lr_ratio=min_lr_ratio),
    )

+    training_method_cls: Union[TrainingMethodSFT, TrainingMethodDPO] = (


Nit: since you're using the | notation to specify union types above, I would use it here as well and remove the redundant import

src/together/resources/finetune.py

mryab · 2025-03-10T11:42:05Z

src/together/utils/files.py

+    has_weights = False
+    # Check for weights in messages
+    if _has_weights(messages):
+        has_weights = True


Isn't it just the following? :)

Suggested change

has_weights = False

# Check for weights in messages

if _has_weights(messages):

has_weights = True

has_weights = _has_weights(messages)

mryab · 2025-03-10T11:43:01Z

src/together/utils/files.py

+
+
+def validate_messages(
+    messages: List[Dict[str, str | bool]], idx: int = 0


It's hard to imagine a case where we would want to use the default line number, maybe it's best to remove the default value?

mryab · 2025-03-10T12:00:16Z

src/together/utils/files.py

+    example["input"]["messages"], _ = validate_messages(
+        example["input"]["messages"], idx
+    )


We don't modify anything in messages, I would simply make validate_messages return nothing and raise an exception in case of an error

mryab · 2025-03-10T12:04:16Z

tests/unit/test_files_checks.py

+
+def test_check_jsonl_invalid_preference_openai_structural_issues(tmp_path: Path):
+    # Test various structural issues in OpenAI preference format
+    test_cases = [


Let's use pytest.mark.parametrize for iterating over multiple test cases

mryab · 2025-03-10T12:15:15Z

tests/unit/test_files_checks.py

@@ -80,45 +128,149 @@ def test_check_jsonl_valid_conversational_single_turn(tmp_path: Path):
 def test_check_jsonl_valid_conversational_multiple_turns(tmp_path: Path):
    # Create a valid JSONL file with conversational format and multiple user-assistant turn pairs
    file = tmp_path / "valid_conversational_multiple_turns.jsonl"
-    content = [


I'd prefer to keep the current file for this test and write a new one for , because

Unit tests should test orthogonal capabilities, otherwise this gets misleading when an error is introduced (improper parsing of preference data should not affect tests for regular conversation datasets)

Right now, it actually looks like this test is now identical to test_check_jsonl_valid_preference_openai, which is unlikely to be what you want :)

Created a separate file

mryab · 2025-03-11T14:21:38Z

src/together/resources/finetune.py

+    AVAILABLE_TRAINING_METHODS = {
+        TrainingMethodSFT().method,
+        TrainingMethodDPO().method,
+    }


Since this is a constant, can you move it to the top of the file (outside of the function and the class definition)?

mryab · 2025-03-11T14:29:42Z

src/together/resources/finetune.py

    lrScheduler = FinetuneLRScheduler(
        lr_scheduler_type="linear",
        lr_scheduler_args=FinetuneLinearLRSchedulerArgs(min_lr_ratio=min_lr_ratio),
    )

+    training_method_cls: TrainingMethodSFT | TrainingMethodDPO = TrainingMethodSFT()


Nit: maybe annotate the type as training_method_cls: TrainingMethod? It's a bit clearer and more extensible

There were some issues with pre-commit checks when I tried to do this, as I remember

Weird, do you remember what was the error by any chance? Not blocking, but I'd love to know how to fix it in the future

src/together/utils/files.py

mryab · 2025-03-11T14:36:54Z

tests/unit/test_preference_openai.py

+    assert report["has_min_samples"]
+
+
+# Define test cases for missing fields


The comment seems redundant

mryab · 2025-03-11T14:40:05Z

tests/unit/test_preference_openai.py

+from together.constants import MIN_SAMPLES
+from together.utils.files import check_file
+
+# Test data for preference OpenAI format


This one's also not very informative given the name of the variable

mryab · 2025-03-11T14:40:14Z

tests/unit/test_preference_openai.py

+    assert not report["is_check_passed"], f"Test should fail when {description}"
+
+
+# Define test cases for structural issues


Here as well

mryab · 2025-03-11T16:08:42Z

src/together/resources/finetune.py

    lrScheduler = FinetuneLRScheduler(
        lr_scheduler_type="linear",
        lr_scheduler_args=FinetuneLinearLRSchedulerArgs(min_lr_ratio=min_lr_ratio),
    )

+    training_method_cls: TrainingMethodSFT | TrainingMethodDPO = TrainingMethodSFT()


Weird, do you remember what was the error by any chance? Not blocking, but I'd love to know how to fix it in the future

mryab · 2025-03-11T16:10:45Z

tests/unit/test_preference_openai.py

+    assert not report["is_check_passed"], f"Test should fail when {description}"
+
+
+STRUCTURAL_ISSUE_TEST_CASES = [


Nit: the constant can be made private

mryab · 2025-03-11T16:10:56Z

tests/unit/test_preference_openai.py

+    assert report["has_min_samples"]
+
+
+MISSING_FIELDS_TEST_CASES = [


Nit: the constant can be made private

VProv added 3 commits February 28, 2025 11:42

Initial DPO update for the finetuning python client

Verified

This commit was signed with the committer’s verified signature.

bhrutledge Brian Rutledge

GPG key ID: D411764EE1DA4B90

Verified
Learn about vigilant mode

3f1ec6a

Add dpo to cli, fix typing mismatch with API

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

ee7e02d

Remove prints

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

Loading
Loading status checks…

fdbdc8e

VProv requested a review from punkerpunker March 4, 2025 13:56

Add check that the prompt is the same for the PREFERENCE dataset format

Loading
Loading status checks…

ee470fc

VProv requested review from azahed98 and mryab March 4, 2025 14:29

mryab removed the request for review from punkerpunker March 4, 2025 18:50

mryab reviewed Mar 5, 2025

View reviewed changes

mryab changed the title ~~Add support for the Training Method for finetuning, and for Direct-Preference Optimization (DPO).~~ Add support for the Training Method for finetuning, and for Direct-Preference Optimization (DPO) Mar 5, 2025

VProv and others added 3 commits March 5, 2025 17:53

Update src/together/cli/api/finetune.py

Loading
Loading status checks…

8f50eb8

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Update src/together/cli/api/finetune.py

Loading
Loading status checks…

b322858

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Update validation function; Use correct errors

Loading
Loading status checks…

24d209e

VProv requested a review from mryab March 5, 2025 18:49

VProv added 3 commits March 5, 2025 11:04

Update error messages

Loading
Loading status checks…

f1f8d9a

Remove PREFERENCE dataset support

Loading
Loading status checks…

5320437

Remove Preference support from constants; Add unit tests for PREFEREN…

Loading
Loading status checks…

fbd17a6

…CE_OPENAI

mryab requested review from artek0chumak and removed request for azahed98 March 10, 2025 11:21

artek0chumak reviewed Mar 10, 2025

View reviewed changes

mryab requested changes Mar 10, 2025

View reviewed changes

VProv added 2 commits March 11, 2025 05:50

Add type checks and style improvements

bf0b180

Move tests to another file; Add more test cases for openai format

Loading
Loading status checks…

7357926

VProv requested review from mryab and artek0chumak March 11, 2025 14:15

mryab reviewed Mar 11, 2025

View reviewed changes

VProv added 2 commits March 11, 2025 08:46

Small style changes

Loading
Loading status checks…

5f00d95

Merge main

Loading
Loading status checks…

3c72b9c

VProv requested a review from mryab March 11, 2025 15:58

mryab approved these changes Mar 11, 2025

View reviewed changes

artek0chumak approved these changes Mar 11, 2025

View reviewed changes

mryab merged commit a4fd112 into main Mar 11, 2025
10 of 11 checks passed

mryab deleted the Vprov/dpo_python branch March 11, 2025 18:02

		return messages, has_weights


		def validate_preference_openai(example: Dict[str, Any], idx: int = 0) -> Dict[str, Any]:



		def validate_messages(
		messages: List[Dict[str, str \| bool]], idx: int = 0

		assert report["has_min_samples"]


		# Define test cases for missing fields

		assert not report["is_check_passed"], f"Test should fail when {description}"


		# Define test cases for structural issues

		assert not report["is_check_passed"], f"Test should fail when {description}"


		STRUCTURAL_ISSUE_TEST_CASES = [

		assert report["has_min_samples"]


		MISSING_FIELDS_TEST_CASES = [

Add support for the Training Method for finetuning, and for Direct-Preference Optimization (DPO) #262

Add support for the Training Method for finetuning, and for Direct-Preference Optimization (DPO) #262

Conversation

VProv commented Mar 3, 2025

Describe your changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment