mindformers.pipeline.MultiModalToTextPipeline
- class mindformers.pipeline.MultiModalToTextPipeline(model: Union[PreTrainedModel, Model], processor: Optional[BaseXModalToTextProcessor] = None, **kwargs)[source]
Pipeline for multi-modal to text generation.
- Parameters
model (Union[PretrainedModel, Model]) – The model used to perform task, the input should be a model instance inherited from PretrainedModel.
processor (BaseXModalToTextProcessor, optional) – The image_processor of model, it could be None if the model do not need image_processor. Default:
None
.
- Returns
A pipeline for MultiModalToTextPipeline.
- Raises
TypeError – If input model and image_processor's types are not corrected.
ValueError – If the input model is not in support list.
Examples
>>> import os >>> import mindspore as ms >>> from mindformers import build_context >>> from mindformers import AutoModel, AutoTokenizer, pipeline, AutoProcessor, MindFormerConfig, AutoConfig >>> os.environ['USE_ROPE_SELF_DEFINE'] = 'True' >>> inputs = [[{"image": "/path/to/example.jpg"}, {"text": "Please describe this image."}]] >>> # Note: >>> # "image": is an image path >>> model_path = "/path/to/cogvlm2_mode_path" >>> # Note: >>> # mode_path: a new folder (containing configs/cogvlm2/predict_cogvlm2_image_llama3_chat_19b.yaml) >>> config_path = "/path/to/cogvlm2_mode_path/predict_cogvlm2_image_llama3_chat_19b.yaml" >>> # Note: >>> # config_path: the predict_cogvlm2_image_llama3_chat_19b.yaml path in mode_path >>> # Please change the value of 'vocab_file' in predict_cogvlm2_image_llama3_chat_19b.yaml >>> # to the value of 'tokenizer.model'. >>> config = MindFormerConfig(config_path) >>> build_context(config) >>> model_config = AutoConfig.from_pretrained(config_path) >>> tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=True) >>> model = AutoModel.from_config(model_config) >>> processor = AutoProcessor.from_pretrained(config_path, trust_remote_code=True, use_fast=True) >>> param_dict = ms.load_checkpoint("/path/to/cogvlm2image.ckpt") >>> _, not_load = ms.load_param_into_net(model, param_dict) >>> text_generation_pipeline = pipeline(task="multi_modal_to_text_generation", >>> model=model, processor=processor) >>> outputs = text_generation_pipeline(inputs, max_length=model_config.max_decode_length, >>> do_sample=False, top_k=model_config.top_k,top_p=model_config.top_p) >>> for output in outputs: >>> print(output) Question: Please describe this image. Answer:This image is an apple. >>> # Note: >>> # The final result shall be subject to the actual input image.