mindformers.pipeline.MultiModalToTextPipeline

查看源文件
class mindformers.pipeline.MultiModalToTextPipeline(model: Union[PreTrainedModel, Model], processor: Optional[BaseXModalToTextProcessor] = None, **kwargs)[源代码]

多模态文本生成的推理流程。

参数:
  • model (Union[PretrainedModel, Model]) - 执行任务的模型。必须是继承自 PretrainedModel 类的模型实例。

  • processor (BaseXModalToTextProcessor, 可选) - 模型的图片处理器。默认值: None

返回:

一个 MultiModalToTextPipeline 实例。

异常:
  • TypeError - 如果输入模型和图片处理流程的类型设置错误。

  • ValueError - 如果输入模型不在支持列表中。

样例:

>>> import os
>>> import mindspore as ms
>>> from mindformers import build_context
>>> from mindformers import AutoModel, AutoTokenizer, pipeline, AutoProcessor, MindFormerConfig, AutoConfig
>>> inputs = [[{"image": "/path/to/example.jpg"}, {"text": "Please describe this image."}]]
>>> # Note:
>>> #     "image": is an image path
>>> model_path = "/path/to/cogvlm2_mode_path"
>>> # Note:
>>> #     mode_path: a new folder (containing configs/cogvlm2/predict_cogvlm2_image_llama3_chat_19b.yaml)
>>> config_path = "/path/to/cogvlm2_mode_path/predict_cogvlm2_image_llama3_chat_19b.yaml"
>>> # Note:
>>> #     config_path: the predict_cogvlm2_image_llama3_chat_19b.yaml path in mode_path
>>> #     Please change the value of 'vocab_file' in predict_cogvlm2_image_llama3_chat_19b.yaml
>>> #     to the value of 'tokenizer.model'.
>>> config = MindFormerConfig(config_path)
>>> build_context(config)
>>> model_config = AutoConfig.from_pretrained(config_path)
>>> tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=True)
>>> model = AutoModel.from_config(model_config)
>>> processor = AutoProcessor.from_pretrained(config_path, trust_remote_code=True, use_fast=True)
>>> param_dict = ms.load_checkpoint("/path/to/cogvlm2image.ckpt")
>>> _, not_load = ms.load_param_into_net(model, param_dict)
>>> text_generation_pipeline = pipeline(task="multi_modal_to_text_generation",
>>>                                     model=model, processor=processor)
>>> outputs = text_generation_pipeline(inputs, max_length=model_config.max_decode_length,
>>>                                    do_sample=False, top_k=model_config.top_k,top_p=model_config.top_p)
>>> for output in outputs:
>>>     print(output)
Question: Please describe this image. Answer:This image is an apple.
>>> # Note:
>>> #     The final result shall be subject to the actual input image.