Hugging Face: BLIP-2 zero-shot image-to-text in Hugging Face Transformers enables captioning, VQA, and multimodal prompting | SignalBreak | SignalBreak