Model Showcase: Explore Hugging Face Models
View More →Salesforce/blip-image-captioning-large
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
See Details →xtuner/llava-phi-3-mini-gguf
llava-phi-3-mini is a LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.
See Details →nlpconnect/vit-gpt2-image-captioning
The Illustrated Image Captioning using transformers
See Details →