mindformers.models.LlamaForCausalLM

View Source On Gitee
class mindformers.models.LlamaForCausalLM(config: LlamaConfig = None)[source]

Provide llama training loss or logits through network.

Parameters

config (LlamaConfig, optional) – The config of llama model. Default: None .

Inputs:
  • input_ids (Tensor) - the indices of input sequence tokens in the vocabulary with data type Int64/Int32, Tensor of shape \((batch, seq\_length)\).

  • labels (Tensor, optional) - the labels of inputs with data type Int64/Int32, Tensor of shape \((batch, seq\_length)\) . Default: None.

  • input_position (Tensor, optional) - the position ids of inputs (at incremental reasoning mode) which is an increasing sequence with data type Int64/Int32, Tensor \((batch, seq\_length)\). Default: None.

  • position_ids (Tensor, optional) - the position ids of inputs which is an increasing sequence with data type Int64/Int32, Tensor \((batch, seq\_length)\). Default: None.

  • attention_mask (Tensor, optional) - input sentences padding mask, where 0 indicates padding position with data type Int64/Int32, Tensor of shape \((batch, seq\_length)\). Default: None.

  • input_embeds (Tensor, optional) - the embedding of inputs with data type Float32/Float16, Tensor of shape \((batch, seq\_length, hidden\_size)\). Default: None.

  • init_reset (Tensor, optional) - A Bool tensor with shape [1], used to clear the past key parameter and past value parameter used in the incremental prediction. Only valid when use_past is True. Tensor of shape \((1)\). Default: Tensor([True]).

  • batch_valid_length (Tensor, optional) - Int32 tensor with shape [batch_size] the past calculated the index. Used for incremental prediction when the use_past is True. Default: None.

  • batch_index (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • zactivate_len (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • block_tables (Tensor, optional) - Int64 type Tensor, store mapping tables for each sequence. Default: None.

  • slot_mapping (Tensor, optional) - Int32 type Tensor, token cache physical slot index. Default: None.

  • prefix_keys_values (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • llm_boost_inputs (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • q_seq_lens (Tensor, optional) - In parallel decoding, the query may be flattened. The Paged Attention operator need q_seq_lens to obtain the length information. Default: None .

Outputs:

Tensor. If it is in training mode, the output Tensor contains loss; If it is in prediction mode, the output Tensor contains logits; If it is in evaluation mode, the output Tensor contains logits, tokens, and input masks.

Examples

>>> from mindformers.models.llama import LlamaConfig, LlamaForCausalLM
>>> import mindspore as ms
>>> ms.set_context(mode=0)
>>> config = LlamaConfig(batch_size=2)
>>> network = LlamaForCausalLM(config=config)
>>> type(network)
<class 'mindformers.models.llama.llama.LlamaForCausalLM'>
>>> from mindformers import LlamaForCausalLM
>>> network = LlamaForCausalLM.from_pretrained('llama2_7b')
>>> type(network)
<class 'mindformers.models.llama.llama.LlamaForCausalLM'>