LlaVa model in transformers #25060

RajeshRadha · 2023-07-24T21:41:07Z

Feature request

Support to Llava model in transformers? https://github.com/haotian-liu/LLaVA Similar to InstructBlip w/ connector module between image embeddings and LLM's

Motivation

Llava is performing really well in MLLM related tasks and for folks to try out InstructBlip vs Llava models it makes it easier if it's in hugging face as it's mostly using the same Image Encoder embeddings from (EVA or ViT or CLIP) and foundational models from (T5 or Vicuna or Llama-2). Code maintenance and ease of integration is easy

Your contribution

I can definitely help with a PR or tag along with folks in hugging face to make it happen

ydshieh · 2023-07-26T12:27:31Z

Hi @RajeshRadha Thank you for the feature request.

As @ArthurZucker mentioning to me, the repo. has reached 4K starts and 300 fork, it seems this is quite popular.

Will leave our core maintainers @amyeroberts and @sgugger to see if this qualifies the model to be in transformers or we still prefer to have it first on the Hub.

amyeroberts · 2023-07-28T18:52:28Z

Given the popularity and performance of the model, I think it'd be a good addition into transformers :)

@RajeshRadha if you'd like to add the model, feel free to open a PR and tag @ArthurZucker and myself for review.

ArthurZucker · 2023-08-03T12:41:43Z

Just for reference, before the model got so popular, #22848 and #23849 were opened!

ZeguanXiao · 2023-08-20T05:05:10Z

Any update about this model? #23849 is closed and unactivated.

ArthurZucker · 2023-08-21T08:39:13Z

cc @rafaelpadilla and @amyeroberts if one of you has the bandwidth

amyeroberts · 2023-08-25T16:30:46Z

I won't have time unfortunately before I'm off :( If @rafaelpadilla or anyone in the community would like to add this model - it would be a great addition!

ArthurZucker · 2023-12-03T09:05:37Z

PR will be merged the coming week 😉

github-actions · 2023-12-29T08:07:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2024-01-03T06:11:39Z

#27662 closes this

RonanKMcGovern · 2024-02-02T16:59:30Z

This is a great integration. As a further step, it would be great to have an API for multi-modal models.

I think it's unlikely TGI (see here) or vLLM would integrate multi-modal as it's too different.

There is a (closed) PR on the Llava project that allows for a simple single-call API. Possibly building on that is a good way to go.

A key feature I see as valuable is continuous batching, this is what really allows devs to spin up a multi-modal end point for production.

Questions

Is it too much of a stretch to try and add continuous batching to transformers? I'm guessing yes, because for LLMs, that has been offloaded to TGI.
Are there other angles that should be considered generally for getting to a multi modal API?

younesbelkada · 2024-02-05T03:05:57Z

Thanks @RonanKMcGovern for your feedback ! I think TGI could support multi-modal models as they did it in the past with idefics if I am not mistaken cc @OlivierDehaene

RonanKMcGovern · 2024-02-05T10:19:53Z

Thanks @younesbelkada that makes sense intuitively. IDEFIX (flamenco style models) have a single tokenizer, whether it's image or text (if I'm not mistaken) so that makes it easier plug and play for TFI.

I see that as a pretty significant advantage. With an a good inference endpoint, llava just isn't as useful because devs can't use it well in production.

I need to read more on why llava 1.6 is stronger than IDEFIX. I guess IDEFIX has the drawback that it had to be entirely trained from scratch.

Makes me wonder whether it would have been better to take an IDEFIX approach in making Llava.

RajeshRadha changed the title ~~LlaVa model in trwansformers~~ LlaVa model in transformers Jul 24, 2023

ydshieh self-assigned this Jul 26, 2023

shauray8 mentioned this issue Aug 28, 2023

Adding Llava to transformers #25789

Closed

4 tasks

huggingface deleted a comment from github-actions bot Sep 19, 2023

shauray8 mentioned this issue Sep 25, 2023

[WIP]Add ProPainter #26391

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlaVa model in transformers #25060

LlaVa model in transformers #25060

RajeshRadha commented Jul 24, 2023

ydshieh commented Jul 26, 2023

amyeroberts commented Jul 28, 2023

ArthurZucker commented Aug 3, 2023

ZeguanXiao commented Aug 20, 2023

ArthurZucker commented Aug 21, 2023

amyeroberts commented Aug 25, 2023

ArthurZucker commented Dec 3, 2023

github-actions bot commented Dec 29, 2023

ArthurZucker commented Jan 3, 2024

RonanKMcGovern commented Feb 2, 2024

younesbelkada commented Feb 5, 2024

RonanKMcGovern commented Feb 5, 2024

LlaVa model in transformers #25060

LlaVa model in transformers #25060

Comments

RajeshRadha commented Jul 24, 2023

Feature request

Motivation

Your contribution

ydshieh commented Jul 26, 2023

amyeroberts commented Jul 28, 2023

ArthurZucker commented Aug 3, 2023

ZeguanXiao commented Aug 20, 2023

ArthurZucker commented Aug 21, 2023

amyeroberts commented Aug 25, 2023

ArthurZucker commented Dec 3, 2023

github-actions bot commented Dec 29, 2023

ArthurZucker commented Jan 3, 2024

RonanKMcGovern commented Feb 2, 2024

younesbelkada commented Feb 5, 2024

RonanKMcGovern commented Feb 5, 2024