mindformers.models.LlamaForCausalLM

View Source On Gitee
class mindformers.models.LlamaForCausalLM(config: LlamaConfig = None)[source]

Provide llama training loss or logits through network.

Parameters

config (LlamaConfig, optional) – The config of llama model. Default: None .

Inputs:
  • input_ids (Tensor) - the indices of input sequence tokens in the vocabulary with data type Int64/Int32, Tensor of shape \((batch, seq\_length)\).

  • labels (Tensor, optional) - the labels of inputs with data type Int64/Int32, Tensor of shape \((batch, seq\_length)\) . Default: None.

  • input_position (Tensor, optional) - the position ids of inputs (at incremental reasoning mode) which is an increasing sequence with data type Int64/Int32, Tensor \((batch, seq\_length)\). Default: None.

  • position_ids (Tensor, optional) - the position ids of inputs which is an increasing sequence with data type Int64/Int32, Tensor \((batch, seq\_length)\). Default: None.

  • attention_mask (Tensor, optional) - input sentences padding mask, where 0 indicates padding position with data type Int64/Int32, Tensor of shape \((batch, seq\_length)\). Default: None.

  • input_embeds (Tensor, optional) - the embedding of inputs with data type Float32/Float16, Tensor of shape \((batch, seq\_length, hidden\_size)\). Default: None.

  • init_reset (Tensor, optional) - A Bool tensor with shape [1], used to clear the past key parameter and past value parameter used in the incremental prediction. Only valid when use_past is True. Tensor of shape \((1)\). Default: Tensor([True]).

  • batch_valid_length (Tensor, optional) - Int32 tensor with shape [batch_size] the past calculated the index. Used for incremental prediction when the use_past is True. Default: None.

  • batch_index (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • zactivate_len (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • block_tables (Tensor, optional) - Int64 type Tensor, store mapping tables for each sequence. Default: None.

  • slot_mapping (Tensor, optional) - Int32 type Tensor, token cache physical slot index. Default: None.

  • prefix_keys_values (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • llm_boost_inputs (Tensor, optional) - Discard argument. Will be deleted in the future. Default: None.

  • q_seq_lens (Tensor, optional) - In parallel decoding, the query may be flattened. The Paged Attention operator need q_seq_lens to obtain the length information. Default: None .

  • loss_mask (Tensor, optional) - Float32/Int32 type tensor, which is used to determine whether the corresponding token position participates in the loss calculation. If the value is \((1)\), the loss of the position is calculated, and \((0)\) is not calculated. Default: None.

  • gather_index (Tensor, optional) - Int32 type Tensor, used to obtain the last latent vector of each sequence. Default: None.

  • seq_range (Tensor, optional) - Int32 type Tensor, used to obtain Mask and positional encoding of valid tokens for each sequence. Default: None.

Outputs:

Tensor. If it is in training mode, the output Tensor contains loss; If it is in prediction mode, the output Tensor contains logits; If it is in evaluation mode, the output Tensor contains logits, tokens, and input masks.

Examples

>>> from mindformers.models.llama import LlamaConfig, LlamaForCausalLM
>>> import mindspore as ms
>>> ms.set_context(mode=0)
>>> config = LlamaConfig(batch_size=2)
>>> network = LlamaForCausalLM(config=config)
>>> type(network)
<class 'mindformers.models.llama.llama.LlamaForCausalLM'>
>>> from mindformers import LlamaForCausalLM
>>> network = LlamaForCausalLM.from_pretrained('llama2_7b')
>>> type(network)
<class 'mindformers.models.llama.llama.LlamaForCausalLM'>