SPONGE Molecular Simulation Practice
Translator: LiangRio
Linux
GPU
Model Development
Expert
Overview
Molecular simulation is a method of exploiting computer to simulate the structure and behavior of molecules by using the molecular model at the atomic-level, and then simulate the physical and chemical properties of the molecular system. It builds a set of models and algorithms based on the experiment and through the basic principles, so as to calculate the reasonable molecular structure and molecular behavior.
In recent years, molecular simulation technology has been developed rapidly and widely used in many fields. In the field of medical design, it can be used to study the mechanism of action of virus and drugs. In the field of biological science, it can be used to characterize the multi-level structure and properties of proteins. In the field of materials science, it can be used to study the structure and mechanical properties, material optimization design. In the field of chemistry, it can be used to study surface catalysis and mechanism. In the field of petrochemical industry, it can be used for structure characterization, synthesis design, adsorption and diffusion of molecular sieve catalyst, construction and characterization of polymer chain and structure of crystalline or amorphous bulk polymer, and prediction of important properties including blending behavior, mechanical properties, diffusion, cohesion and so on.
SPONGE in MindSpore is molecular simulation library jointly developed by the Gao Yiqin research group of PKU and Shenzhen Bay Laboratory and Huawei MindSpore team. SPONGE has the features like high-performance, modularization, etc. SPONGE can complete the traditional molecular simulation process efficiently based on MindSpore’s automatic parallelism, graph-computing fusion and other features. SPONGE can combine AI methods such as neural networks with traditional molecular simulations by utilizing MindSpore’s feature of automatic differentiation.
This tutorial mainly introduces how to use SPONGE, which is built in MindSpore, to perform high performance molecular simulation on the GPU.
Here you can download the complete sample code: https://gitee.com/mindspore/mindspore/tree/r1.3/model_zoo/research/hpc/sponge.
Overall Execution
Prepare input files of molecular simulation, load data, and determine the molecular system of calculation;
Define and initialize the SPONGE module, make sure the calculation process;
Run training script, output thermodynamic information of the simulation, and check the result.
Preparation
Before practicing, make sure you have MindSpore installed correctly. If not, you can turn to MindSpore Installation.
Example of Simulated Polypeptide Aqueous Solution System
SPONGE has advantages of high-performance and usability, and this tutorial uses SPONGE to simulate polypeptide aqueous solution system. The simulated system is an alanine tripeptide aqueous solution system.
Preparing Input Files
The simulated system of this tutorial requires 3 input files:
Property file (file suffix
.in
), declares the basic conditions for the simulation, parameter control to the whole simulation process.Topology file (file suffix
.param7
), describes the topological relations and parameters of the internal molecules in the system.Coordinate file (file suffix
.rst7
), describes the initial coordinates of each atom in the system.
Topology and Coordinate files can be modeling completed by tleap (download address http://ambermd.org/GetAmber.php, comply with the GPL), which is a built-in tool in AmberTools, through the modeling process.
The modeling process is as follows:
Open tleap
tleap
Load force field ff14SB that built-in in tleap
> source leaprc.protein.ff14SB
Build model of alanine tripeptide
> ala = sequence {ALA ALA ALA}
Use tleap to load its force field tip3p
> source leaprc.water.tip3p
Use
slovatebox
in tleap to dissolve alanine tripeptide chain, complete the system construction.10.0
, represents the water we add is over 10 Angstrom far away from the border of molecular we dissolve and the system.
> solvatebox ala TIP3PBOX 10.0
Save constructed system as file suffix
parm7
andrst7
> saveamberparm ala ala.parm7 ala_350_cool_290.rst7
After constructing the Topology file(WATER_ALA.parm7
) and Coordinate file(WATER_ALA_350_COOL_290.RST7
) that needed through tleap, it is required to declare basic conditions of simulation by Property file, which executes parameter control to the whole simulation process. Take Property file NVT_299_10ns.in
in this tutorial as an example, contents are as follows:
NVT 290k
mode = 1, # Simulation mode ; mode=1 for NVT ensemble
dt= 0.001, # Time step in picoseconds (ps). The time length of each MD step
step_limit = 1, # Total step limit, number of MD steps run
thermostat=1, # Thermostat for temperature ; thermostat=0 for Langevin thermostat
langevin_gamma=1.0, # Gamma_ln for Langevin thermostat represents coupling strength between thermostat and system
target_temperature=290, # Target temperature
write_information_interval=1000, # Output frequency
amber_irest=0, # Input style ; amber_irest=1 for using amber style input & rst7 file contains veclocity
cut=10.0, # Nonbonded cutoff distance in Angstroms
mode
, Molecular Dynamics (MD) mode,1
represents the simulation usesNVT
ensemble.dt
, represents the step size in the simulation.step_limit
, represents total steps in the simulation.thermostat
, represents the method of temperature control,1
represents to useLiujian-Langevin
.langevin_gamma
, representsGamma_In
parameters in the thermostat.target_temperature
, represents the target temperature.amber_irest
, represents the input mode,0
represents to use the amber mode to input, and files suffixrst7
do not include the attributeveclocity
.cut
, represents the distance of non-bonding interaction.
Loading Data
After completing the construction of input files, save files under the path sponge_in
to local workplace, the directory structure is as follows:
└─sponge
├─sponge_in
│ NVT_290_10ns.in # specific MD simulation setting
│ WATER_ALA.parm7 # topology file include atom & residue & bond & nonbond information
│ WATER_ALA_350_cool_290.rst7 # restart file record atom coordinate & velocity and box information
Read the parameters needed by the simulation system from three input files, and use them for calculation in MindSpore. The loading code is as follows:
import argparse
from mindspore import context
parser = argparse.ArgumentParser(description='Sponge Controller')
parser.add_argument('--i', type=str, default=None, help='input file')
parser.add_argument('--amber_parm', type=str, default=None, help='paramter file in AMBER type')
parser.add_argument('--c', type=str, default=None, help='initial coordinates file')
parser.add_argument('--r', type=str, default="restrt", help='')
parser.add_argument('--x', type=str, default="mdcrd", help='')
parser.add_argument('--o', type=str, default="mdout", help="")
parser.add_argument('--box', type=str, default="mdbox", help='')
parser.add_argument('--device_id', type=int, default=0, help='')
args_opt = parser.parse_args()
context.set_context(mode=context.GRAPH_MODE, device_target="GPU", device_id=args_opt.device_id, save_graphs=False)
Constructing Simulation Process
By using computational force module and computational energy module defined in SPONGE, the system reaches the equilibrium state we need through multiple iterations of molecular dynamics process evolves, and energy and other data obtained in each simulation step is recorded. For convenience, this tutorial set 1
as the number of iterations, the code for constructing the simulation process is as follows:
from src.simulation_initial import Simulation
from mindspore import Tensor
if __name__ == "__main__":
simulation = Simulation(args_opt)
save_path = args_opt.o
for steps in range(simulation.md_info.step_limit):
print_step = steps % simulation.ntwx
if steps == simulation.md_info.step_limit - 1:
print_step = 0
temperature, total_potential_energy, sigma_of_bond_ene, sigma_of_angle_ene, sigma_of_dihedral_ene, \
nb14_lj_energy_sum, nb14_cf_energy_sum, LJ_energy_sum, ee_ene, _ = simulation(Tensor(steps), Tensor(print_step))
# compute energy and temperature
Running Script
Execute the following command, start main.py
training script for training:
python main.py --i /path/NVT_290_10ns.in \
--amber_parm /path/WATER_ALA.parm7 \
--c /path/WATER_ALA_350_cool_290.rst7 \
--o /path/ala_NVT_290_10ns.out
i
is property file of MD simulation, which control simulation process.amber_parm
is topology file of MD simulation system.c
is initial coordinate file we input.o
is log file output after simulation, which records energy and other data obtained in each simulation step.path
is the path to the file, this path is denoted assponge_in
in this tutorial.
During training, property file (file suffix.in
), topology file (file suffix.param7
) and coordinate file (file suffix.rst7
) can be used under specified temperatures to perform simulation, compute force and energy, perform molecular dynamics process evolves.
Running Result
After training, output file ala_NVT_290_10ns.out
can be obtained, which records the change of system energy and can be viewed for thermodynamic information of the simulation system. When viewing ala_NVT_290_10ns.out
, contents are as follows:
_steps_ _TEMP_ _TOT_POT_ENE_ _BOND_ENE_ _ANGLE_ENE_ _DIHEDRAL_ENE_ _14LJ_ENE_ _14CF_ENE_ _LJ_ENE_ _CF_PME_ENE_
0 0.000 -5713.804 0.037 0.900 14.909 9.072 194.477 765.398 -6698.648
...
Types of energy output in the simulation process are recorded, namely iterations(steps), temperature(TEMP), total energy(TOT_POT_E), bond length(BOND_ENE), bond angle(ANGLE_ENE), dihedral angle interactions(DIHEDRAL_ENE), and none-bonded interaction that includes electrostatic force and Leonard-Jones interaction.