# MindSPONGE Molecular Simulation Practice

Translator: [LiangRio](https://gitee.com/liangrio)

[![View Source On Gitee](https://gitee.com/mindspore/docs/raw/r1.5/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/r1.5/docs/mindscience/docs/source_zh_cn/mindsponge/ala.md)

## Overview

Molecular simulation is a method of exploiting computer to simulate the structure and behavior of molecules by using the molecular model at the atomic-level, and then simulate the physical and chemical properties of the molecular system. It builds a set of models and algorithms based on the experiment and through the basic principles, so as to calculate the reasonable molecular structure and molecular behavior.

In recent years, molecular simulation technology has been developed rapidly and widely used in many fields. In the field of medical design, it can be used to study the mechanism of action of virus and drugs. In the field of biological science, it can be used to characterize the multi-level structure and properties of proteins. In the field of materials science, it can be used to study the structure and mechanical properties, material optimization design. In the field of chemistry, it can be used to study surface catalysis and mechanism. In the field of petrochemical industry, it can be used for structure characterization, synthesis design, adsorption and diffusion of molecular sieve catalyst, construction and characterization of polymer chain and structure of crystalline or amorphous bulk polymer, and prediction of important properties including blending behavior, mechanical properties, diffusion, cohesion and so on.

MindSPONGE in MindSpore is molecular simulation library jointly developed by the Gao Yiqin research group of PKU and Shenzhen Bay Laboratory and Huawei MindSpore team. MindSPONGE has the features like high-performance, modularization, etc. MindSPONGE can complete the traditional molecular simulation process efficiently based on MindSpore's automatic parallelism, graph-computing fusion and other features. MindSPONGE can combine AI methods such as neural networks with traditional molecular simulations by utilizing MindSpore's feature of automatic differentiation.

This tutorial mainly introduces how to use MindSPONGE, which is built in MindSpore, to perform high performance molecular simulation on the GPU.

> Here you can download the complete sample code: <https://gitee.com/mindspore/mindscience/tree/r0.1/MindSPONGE/examples/polypeptide>.

## Overall Execution

1. Prepare input files of molecular simulation, load data, and determine the molecular system of calculation;
2. Define and initialize the MindSPONGE module, make sure the calculation process;
3. Run training script, output thermodynamic information of the simulation, and check the result.

## Preparation

Before practicing, make sure you have MindSpore installed correctly. If not, you can turn to [MindSpore Installation](https://www.mindspore.cn/install/en).

## Example of Simulated Polypeptide Aqueous Solution System

MindSPONGE has advantages of high-performance and usability, and this tutorial uses MindSPONGE to simulate polypeptide aqueous solution system. The simulated system is an alanine tripeptide aqueous solution system.

### Preparing Input Files

The simulated system of this tutorial requires 3 input files:

- Property file (file suffix`.in`), declares the basic conditions for the simulation, parameter control to the whole simulation process.
- Topology file (file suffix`.param7`), describes the topological relations and parameters of the internal molecules in the system.
- Coordinate file (file suffix`.rst7`), describes the initial coordinates of each atom in the system.

Topology and Coordinate files can be modeling completed by tleap (download address <http://ambermd.org/GetAmber.php>, comply with the GPL), which is a built-in tool in AmberTools, through the modeling process.

The modeling process is as follows:

- Open tleap

```bash
tleap
```

- Load force field ff14SB that built-in in tleap

```bash
> source leaprc.protein.ff14SB
```

- Build model of alanine tripeptide

```bash
> ala = sequence {ALA ALA ALA}
```

- Use tleap to load its force field tip3p

```bash
> source leaprc.water.tip3p
```

- Use `slovatebox` in tleap to dissolve alanine tripeptide chain, complete the system construction. `10.0`, represents the water we add is over 10 Angstrom far away from the border of molecular we dissolve and the system.

```bash
> solvatebox ala TIP3PBOX 10.0
```

- Save constructed system as file suffix `parm7` and `rst7`

```bash
> saveamberparm ala ala.parm7 ala_350_cool_290.rst7
```

After constructing the Topology file(`WATER_ALA.parm7`) and Coordinate file(`WATER_ALA_350_COOL_290.RST7`) that needed through tleap, it is required to declare basic conditions of simulation by Property file, which executes parameter control to the whole simulation process. Take Property file `NVT_299_10ns.in` in this tutorial as an example, contents are as follows:

```text
NVT 290k
   mode = 1,                              # Simulation mode ; mode=1 for NVT ensemble
   dt= 0.001,                             # Time step in picoseconds (ps). The time length of each MD step
   step_limit = 1,                        # Total step limit, number of MD steps run
   thermostat=1,                          # Thermostat for temperature ; thermostat=0 for Langevin thermostat
   langevin_gamma=1.0,                    # Gamma_ln for Langevin thermostat represents coupling strength between thermostat and system
   target_temperature=290,                # Target temperature
   write_information_interval=1000,       # Output frequency
   amber_irest=0,                         # Input style ;  amber_irest=1 for using amber style input & rst7 file contains veclocity
   cut=10.0,                              # Nonbonded cutoff distance in Angstroms
```

- `mode`, Molecular Dynamics (MD) mode, `1` represents the simulation uses `NVT` ensemble.
- `dt`, represents the step size in the simulation.
- `step_limit`, represents total steps in the simulation.
- `thermostat`, represents the method of temperature control, `1` represents to use `Liujian-Langevin`.
- `langevin_gamma`, represents `Gamma_In` parameters in the thermostat.
- `target_temperature`, represents the target temperature.
- `amber_irest`, represents the input mode, `0` represents to use the amber mode to input, and files suffix `rst7` do not include the attribute `veclocity`.
- `cut`, represents the distance of non-bonding interaction.

### Loading Data

After completing the construction of input files, save files under the path `data` to local workplace, the directory structure is as follows:

```text
└─polypeptide
    ├─data
    │      NVT_290_10ns.in                 # specific MD simulation setting
    │      WATER_ALA.parm7                 # topology file include atom & residue & bond & nonbond information
    │      WATER_ALA_350_cool_290.rst7     # restart file record atom coordinate & velocity and box information

```

Read the parameters needed by the simulation system from three input files, and use them for calculation in MindSpore. The loading code is as follows:

```python
import argparse
from mindspore import context

parser = argparse.ArgumentParser(description='Sponge Controller')
parser.add_argument('--i', type=str, default=None, help='input file')
parser.add_argument('--amber_parm', type=str, default=None, help='paramter file in AMBER type')
parser.add_argument('--c', type=str, default=None, help='initial coordinates file')
parser.add_argument('--r', type=str, default="restrt", help='')
parser.add_argument('--x', type=str, default="mdcrd", help='')
parser.add_argument('--o', type=str, default="mdout", help="")
parser.add_argument('--box', type=str, default="mdbox", help='')
parser.add_argument('--device_id', type=int, default=0, help='')
args_opt = parser.parse_args()

context.set_context(mode=context.GRAPH_MODE, device_target="GPU", device_id=args_opt.device_id, save_graphs=False)
```

### Constructing Simulation Process

By using computational force module and computational energy module defined in MindSPONGE, the system reaches the equilibrium state we need through multiple iterations of molecular dynamics process evolves, and energy and other data obtained in each simulation step is recorded. For convenience, this tutorial set `1` as the number of iterations, the code for constructing the simulation process is as follows:

```python
from mindsponge.md.simulation import Simulation
from mindspore import Tensor

if __name__ == "__main__":
    simulation = Simulation(args_opt)
    save_path = args_opt.o
    for steps in range(simulation.md_info.step_limit):
        print_step = steps % simulation.ntwx
        if steps == simulation.md_info.step_limit - 1:
            print_step = 0
        temperature, total_potential_energy, sigma_of_bond_ene, sigma_of_angle_ene, sigma_of_dihedral_ene, \
        nb14_lj_energy_sum, nb14_cf_energy_sum, LJ_energy_sum, ee_ene, _ = simulation(Tensor(steps), Tensor(print_step))
        # compute energy and temperature

```

### Running Script

Execute the following command, start `main.py` training script for training:

```text
python main.py --i /path/NVT_290_10ns.in \
               --amber_parm /path/WATER_ALA.parm7 \
               --c /path/WATER_ALA_350_cool_290.rst7 \
               --o /path/ala_NVT_290_10ns.out

```

- `i` is property file of MD simulation, which control simulation process.
- `amber_parm` is topology file of MD simulation system.
- `c` is initial coordinate file we input.
- `o` is log file output after simulation, which records energy and other data obtained in each simulation step.
- `path` is the path to the file, this path is denoted as `data` in this tutorial.

During training, property file (file suffix`.in`), topology file (file suffix`.param7`) and coordinate file (file suffix`.rst7`) can be used under specified temperatures to perform simulation, compute force and energy, perform molecular dynamics process evolves.

### Running Result

After training, output file `ala_NVT_290_10ns.out` can be obtained, which records the change of system energy and can be viewed for thermodynamic information of the simulation system. When viewing `ala_NVT_290_10ns.out`, contents are as follows:

```text
_steps_ _TEMP_ _TOT_POT_ENE_ _BOND_ENE_ _ANGLE_ENE_ _DIHEDRAL_ENE_ _14LJ_ENE_ _14CF_ENE_ _LJ_ENE_ _CF_PME_ENE_
      1 0.788   -5836.521         48.745       0.891         14.904      9.041    194.479  763.169    -6867.750
   ...
```

Types of energy output in the simulation process are recorded, namely iterations(steps), temperature(TEMP), total energy(TOT_POT_E), bond length(BOND_ENE), bond angle(ANGLE_ENE), dihedral angle interactions(DIHEDRAL_ENE), and none-bonded interaction that includes electrostatic force and Leonard-Jones interaction.

## Performance

| Parameter                 |   GPU |
| -------------------------- |---------------------------------- |
| Resource                   | GPU (Tesla V100 SXM2); memory 16 GB
| Upload date              |
| MindSpore version          | 1.2
| Training parameter        | step=1
| Output                    | numpy file
| Speed                      | 15.0 ms/step
| Total time                 | 5.7 s
| Script                    | [Link](https://gitee.com/mindspore/mindscience/tree/r0.1/MindSPONGE/examples/polypeptide/scripts)