SPONGE Molecular Simulation Practice

Translator: LiangRio

Linux GPU Model Development Expert

Overview

Molecular simulation is a method of exploiting computer to simulate the structure and behavior of molecules by using the molecular model at the atomic-level, and then simulate the physical and chemical properties of the molecular system. It builds a set of models and algorithms based on the experiment and through the basic principles, so as to calculate the reasonable molecular structure and molecular behavior.

In recent years, molecular simulation technology has been developed rapidly and widely used in many fields. In the field of medical design, it can be used to study the mechanism of action of virus and drugs. In the field of biological science, it can be used to characterize the multi-level structure and properties of proteins. In the field of materials science, it can be used to study the structure and mechanical properties, material optimization design. In the field of chemistry, it can be used to study surface catalysis and mechanism. In the field of petrochemical industry, it can be used for structure characterization, synthesis design, adsorption and diffusion of molecular sieve catalyst, construction and characterization of polymer chain and structure of crystalline or amorphous bulk polymer, and prediction of important properties including blending behavior, mechanical properties, diffusion, cohesion and so on.

SPONGE in MindSpore is molecular simulation library jointly developed by the Gao Yiqin research group of PKU and Shenzhen Bay Laboratory and Huawei MindSpore team. SPONGE has the features like high-performance, modularization, etc. SPONGE can complete the traditional molecular simulation process efficiently based on MindSpore’s automatic parallelism, graph-computing fusion and other features. SPONGE can combine AI methods such as neural networks with traditional molecular simulations by utilizing MindSpore’s feature of automatic differentiation.

This tutorial mainly introduces how to use SPONGE, which is built in MindSpore, to perform high performance molecular simulation on the GPU.

Here you can download the complete sample code: https://gitee.com/mindspore/mindspore/tree/r1.3/model_zoo/research/hpc/sponge.

Overall Execution

Prepare input files of molecular simulation, load data, and determine the molecular system of calculation;
Define and initialize the SPONGE module, make sure the calculation process;
Run training script, output thermodynamic information of the simulation, and check the result.

Preparation

Before practicing, make sure you have MindSpore installed correctly. If not, you can turn to MindSpore Installation.

Example of Simulated Polypeptide Aqueous Solution System

SPONGE has advantages of high-performance and usability, and this tutorial uses SPONGE to simulate polypeptide aqueous solution system. The simulated system is an alanine tripeptide aqueous solution system.

Preparing Input Files

The simulated system of this tutorial requires 3 input files:

Property file (file suffix.in), declares the basic conditions for the simulation, parameter control to the whole simulation process.
Topology file (file suffix.param7), describes the topological relations and parameters of the internal molecules in the system.
Coordinate file (file suffix.rst7), describes the initial coordinates of each atom in the system.

Topology and Coordinate files can be modeling completed by tleap (download address http://ambermd.org/GetAmber.php, comply with the GPL), which is a built-in tool in AmberTools, through the modeling process.

The modeling process is as follows:

Open tleap

tleap

Load force field ff14SB that built-in in tleap

> source leaprc.protein.ff14SB

Build model of alanine tripeptide

> ala = sequence {ALA ALA ALA}

Use tleap to load its force field tip3p

> source leaprc.water.tip3p

Use slovatebox in tleap to dissolve alanine tripeptide chain, complete the system construction. 10.0, represents the water we add is over 10 Angstrom far away from the border of molecular we dissolve and the system.

> solvatebox ala TIP3PBOX 10.0

Save constructed system as file suffix parm7 and rst7

> saveamberparm ala ala.parm7 ala_350_cool_290.rst7

After constructing the Topology file(WATER_ALA.parm7) and Coordinate file(WATER_ALA_350_COOL_290.RST7) that needed through tleap, it is required to declare basic conditions of simulation by Property file, which executes parameter control to the whole simulation process. Take Property file NVT_299_10ns.in in this tutorial as an example, contents are as follows:

NVT 290k
   mode = 1,                              # Simulation mode ; mode=1 for NVT ensemble
   dt= 0.001,                             # Time step in picoseconds (ps). The time length of each MD step
   step_limit = 1,                        # Total step limit, number of MD steps run
   thermostat=1,                          # Thermostat for temperature ; thermostat=0 for Langevin thermostat
   langevin_gamma=1.0,                    # Gamma_ln for Langevin thermostat represents coupling strength between thermostat and system
   target_temperature=290,               # Target temperature
   write_information_interval=1000,       # Output frequency
   amber_irest=0,                         # Input style ;  amber_irest=1 for using amber style input & rst7 file contains veclocity
   cut=10.0,                              # Nonbonded cutoff distance in Angstroms

mode, Molecular Dynamics (MD) mode, 1 represents the simulation uses NVT ensemble.
dt, represents the step size in the simulation.
step_limit, represents total steps in the simulation.
thermostat, represents the method of temperature control, 1 represents to use Liujian-Langevin.
langevin_gamma, represents Gamma_In parameters in the thermostat.
target_temperature, represents the target temperature.
amber_irest, represents the input mode, 0 represents to use the amber mode to input, and files suffix rst7 do not include the attribute veclocity.
cut, represents the distance of non-bonding interaction.

Loading Data

After completing the construction of input files, save files under the path sponge_in to local workplace, the directory structure is as follows:

└─sponge
    ├─sponge_in
    │      NVT_290_10ns.in                 # specific MD simulation setting
    │      WATER_ALA.parm7                 # topology file include atom & residue & bond & nonbond information
    │      WATER_ALA_350_cool_290.rst7     # restart file record atom coordinate & velocity and box information

Read the parameters needed by the simulation system from three input files, and use them for calculation in MindSpore. The loading code is as follows:

import argparse
from mindspore import context

parser = argparse.ArgumentParser(description='Sponge Controller')
parser.add_argument('--i', type=str, default=None, help='input file')
parser.add_argument('--amber_parm', type=str, default=None, help='paramter file in AMBER type')
parser.add_argument('--c', type=str, default=None, help='initial coordinates file')
parser.add_argument('--r', type=str, default="restrt", help='')
parser.add_argument('--x', type=str, default="mdcrd", help='')
parser.add_argument('--o', type=str, default="mdout", help="")
parser.add_argument('--box', type=str, default="mdbox", help='')
parser.add_argument('--device_id', type=int, default=0, help='')
args_opt = parser.parse_args()

context.set_context(mode=context.GRAPH_MODE, device_target="GPU", device_id=args_opt.device_id, save_graphs=False)

Constructing Simulation Process

By using computational force module and computational energy module defined in SPONGE, the system reaches the equilibrium state we need through multiple iterations of molecular dynamics process evolves, and energy and other data obtained in each simulation step is recorded. For convenience, this tutorial set 1 as the number of iterations, the code for constructing the simulation process is as follows:

from src.simulation_initial import Simulation
from mindspore import Tensor

if __name__ == "__main__":
    simulation = Simulation(args_opt)
    save_path = args_opt.o
    for steps in range(simulation.md_info.step_limit):
        print_step = steps % simulation.ntwx
        if steps == simulation.md_info.step_limit - 1:
            print_step = 0
        temperature, total_potential_energy, sigma_of_bond_ene, sigma_of_angle_ene, sigma_of_dihedral_ene, \
        nb14_lj_energy_sum, nb14_cf_energy_sum, LJ_energy_sum, ee_ene, _ = simulation(Tensor(steps), Tensor(print_step))
        # compute energy and temperature

Running Script

Execute the following command, start main.py training script for training:

python main.py --i /path/NVT_290_10ns.in \
               --amber_parm /path/WATER_ALA.parm7 \
               --c /path/WATER_ALA_350_cool_290.rst7 \
               --o /path/ala_NVT_290_10ns.out

i is property file of MD simulation, which control simulation process.
amber_parm is topology file of MD simulation system.
c is initial coordinate file we input.
o is log file output after simulation, which records energy and other data obtained in each simulation step.
path is the path to the file, this path is denoted as sponge_in in this tutorial.

During training, property file (file suffix.in), topology file (file suffix.param7) and coordinate file (file suffix.rst7) can be used under specified temperatures to perform simulation, compute force and energy, perform molecular dynamics process evolves.

Running Result

After training, output file ala_NVT_290_10ns.out can be obtained, which records the change of system energy and can be viewed for thermodynamic information of the simulation system. When viewing ala_NVT_290_10ns.out, contents are as follows:

_steps_ _TEMP_ _TOT_POT_ENE_ _BOND_ENE_ _ANGLE_ENE_ _DIHEDRAL_ENE_ _14LJ_ENE_ _14CF_ENE_ _LJ_ENE_ _CF_PME_ENE_
      0 0.000   -5713.804         0.037       0.900         14.909      9.072    194.477  765.398    -6698.648
   ...

Types of energy output in the simulation process are recorded, namely iterations(steps), temperature(TEMP), total energy(TOT_POT_E), bond length(BOND_ENE), bond angle(ANGLE_ENE), dihedral angle interactions(DIHEDRAL_ENE), and none-bonded interaction that includes electrostatic force and Leonard-Jones interaction.