-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve fine-tuning checkpoint support #259
Conversation
src/together/cli/api/finetune.py
Outdated
click.echo(f"Job {fine_tune_id} contains the following checkpoints:") | ||
table = tabulate(display_list, headers="keys", tablefmt="grid") | ||
click.echo(table) | ||
click.echo("\nTo download a checkpoint, use cmd: together fine-tuning download") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
click.echo("\nTo download a checkpoint, use cmd: together fine-tuning download") | |
click.echo("\nTo download a checkpoint, use `together fine-tuning download`") |
src/together/cli/api/finetune.py
Outdated
display_list.append( | ||
{ | ||
"Type": checkpoint.type, | ||
"Timestamp": checkpoint.timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to format the timestamps to strings only in this function. As you can see in your own example, truncating the seconds etc. can lead to a result where the order of checkpoints becomes incorrect: therefore, it's better to sort by the integer timestamp value. Furthermore, users of the Python CLI might want to apply other datetime formats to display the values in their code, so our current behavior is too destructive
src/together/resources/finetune.py
Outdated
get_event_step, | ||
) | ||
|
||
_FT_JOB_REGEX = r"^ft-[\dabcdef-]+:\d+$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the behavior if the user passes a FT job ID without a colon? I believe this expression won't match, perhaps it's best to rename it to clarify that it only handles job IDs with step numbers?
src/together/resources/finetune.py
Outdated
if event_type == FinetuneEventType.CHECKPOINT_SAVE: | ||
formatted_time = format_event_timestamp(event) | ||
step = get_event_step(event) | ||
checkpoint_name = f"{id}:{step}" if step else id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step=0 will put the code in a surprise :)
checkpoint_name = f"{id}:{step}" if step else id | |
checkpoint_name = f"{id}:{step}" if step is not None else id |
src/together/resources/finetune.py
Outdated
) | ||
|
||
# Sort by timestamp (newest first) | ||
checkpoints.sort(key=lambda x: x.timestamp, reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correct sorting, timestamps should not be strings at this point
src/together/resources/finetune.py
Outdated
|
||
checkpoints.append( | ||
FinetuneCheckpoint( | ||
type="Intermediate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about listing the step here as well, like "Intermediate (step 20)"
? It's not completely obvious how to interpret the table now, but if we add a bit of extra information, it will be clearer to the user what the names stand for
src/together/resources/finetune.py
Outdated
@@ -207,6 +286,9 @@ def create( | |||
For datasets with the "messages" field (conversational format) or "prompt" and "completion" fields | |||
(Instruction format), inputs will be masked. | |||
Defaults to "auto". | |||
from_checkpoint (str, optional): The checkpoint to be used in the fine-tuning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_checkpoint (str, optional): The checkpoint to be used in the fine-tuning. | |
from_checkpoint (str, optional): The checkpoint identifier to continue training from a previous fine-tuning job. |
src/together/resources/finetune.py
Outdated
@@ -548,6 +654,9 @@ async def create( | |||
For datasets with the "messages" field (conversational format) or "prompt" and "completion" fields | |||
(Instruction format), inputs will be masked. | |||
Defaults to "auto". | |||
from_checkpoint (str, optional): The checkpoint to be used in the fine-tuning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_checkpoint (str, optional): The checkpoint to be used in the fine-tuning. | |
from_checkpoint (str, optional): The checkpoint identifier to continue training from a previous fine-tuning job. |
6c41e61
to
05d85d3
Compare
Issue #ENG-18828 , #ENG-18829
Describe your changes
Improve fine-tuning checkpoint support with two features: