Choice of the basis set and the auxiliary basis set
As so-called post-SCF approaches, correlated wave function methods including Moller-Plesset (MP) perturbation theory, the random-phase approximation (RPA) and coupled-cluster (CC) theory require a mean-field theoretical method as a starting point, for which in most cases the Hartree-Fock (HF) method is used
In contrast to HF theory and most density functional approximations, such advanced methods do not only depend on the set of occupied single-particle states but do also require the inclusion of unoccupied, or virtual states as well. Due to the generally different shape and spacial extent of unoccupied states, this circumstance makes the development of special basis sets necessary. As a consequence, calculations involving these post-SCF methods are recommended to be performed with so called valence-correlation consistent (VCC) basis sets instead of the conventional light, intermediate and tight basis sets, which have been developed primarly for use in HF and DFT applications. The NAO-VCC-nZ (n = 2,3,4,5) basis sets, which have originally been developed for MP2 and RPA applications but have since also been employed in CC calculations, are constructed analogously to the polarized VCC Gaussian Type Orbital (GTO) basis sets developed by Dunning, commonly abbreviated as cc-pVnZ, which are widely used in quantum chemistry applications.
The major benefit of using VCC basis sets in combination with MP2, RPA and CC methods comes from the generally observed painfully slow convergence of these methods with respect to the basis set size. Dunning-type basis sets are designed in such a way, that the basis set incompleteness error (BSIE) of the correlation energy decays cubically with respect to n, such that the slow convergence of the BSIE can be circumvented by performing a calculation with different values of n and extrapolating the correlation energy to the so-called complete basis set (CBS) limit. The most widely used technique of that kind is the two-point extrapolation, for which the (MP2, RPA, CC, etc.) correlation energy is computed using the n and n+1 basis set and subsequently extrapolated using the following equation
In this tutorial we will explore the reliability and accuracy of this kind of extrapolation for the paracetamol molecule.
In addition to a special treatment of the atomic basis set, the consideration of an auxiliary basis set is indispensible. All correlated wave function methods require the computation of Coulomb integrals (also often reffered to as 4-center-2-electron integrals or electron repulsion integrals). Without the use of a resolution-of-the-identity (RI) technique, which allows to decompose the tensor of Coulomb integrals into less memory intensive lower-rank tensors, these correlated wave function methods would become computationally intractible very quickly. The probably most common flavor of the RI technique, RI-V, is deemed to be the most accurate one to represent the Coulomb integrals and is available in FHI-aims for molecular calculations. For bigger system sizes, however, the unfavorable scaling of the RI-V approach with respect to system size becomes impractical. A localized version of RI-V, RI-LVL, which is implemented in FHI-aims, offers a more affordable, albeit less accurate approximation. As has been shown in previous studies, the lack of accuracy in the RI-LVL scheme can be significantly improved by manually adding basis functions to the auxiliary basis set.
In this tutorial, we will make use of the memory-efficient RI-LVL approximation to determine the correlation energy of the paracetamol molecule on the MP2 level of theory using FHI-aims in combination with CC4S. The CC4S code is a pure post-SCF code, which relies on another electronic structure package (here: FHI-aims) to perform the initial HF calculation. At the end of the HF calculation, the single-particle energies, the single-particle wave functions and the RI-related quantities need to be post-processes before being fed to CC4S. This job is done by the CC-aims interface. Upon completion of the HF calculation, CC-aims computes a low-rank representation of the Coulomb integrals tensor, the so called Coulomb vertex, which is required by CC4S. In the simplest kind of calculation, CC-aims writes the Coulomb vertex, the single-particle energies and a some additional information to files, that CC4S can process. We will go through the steps one has to perform to obtain the electronic correlation energy on the MP2 level. Note, that the procedure presented here does equally apply for other ground state methods that are implemented in CC4S, which includes, RPA+SOSEX, CCSD and CCSD(T).
The FHI-aims input
To perform any kind of FHI-aims calculation, a minimum of two input files is required:
a control.in
file, which contains the specifications of the calculation, and the
structure of the molecule or material given by the geometry.in
file.
The control.in
file
The control.in
file consists of two parts, the calculation settings and the basis set.
In terms of calculation settings, we want to perform a HF calculation (xc hf
). To
reduce the computational cost of the MP2 calculation, one common approach is to
only include the outermost shell of valence electrons for the post-SCF calculation and
freeze the lower-lying ones via frozen_core_postSCF 1
.
As noted before, while the RI-V approach is generally the most accurate one, it also
becomes computationally impractical for bigger systems, so that we will utilize the
RI-LVL method, which in the control.in
file can be specified via RI_method LVL
.
Currently, however, the RI-LVL scheme for post-SCF methods is only available under
periodic boundary conditions. While slightly inconvenient for molecular applications,
this is not a problem if the molecule is put in a huge simulation cell. Since
FHI-aims employs localized basis functions, the addition of vacuum around the molecules
comes at virtually zero computational overhead, so that we can easily add
100 angstrom of vacuum or more without any additional cost.
In the control.in
files this is reflected by specifying k_grid 1 1 1
and
symmetry_reduced_k_grid .false.
. To invoke CC-aims after the HF calculation is completed,
one has to add output cc4s
to the control.in
file and one has to ensure
that the ScaLAPACK version of FHI-aims is employed via KS_method parallel
. After additionally
disabling relativistic corrections (relativistic none
) and ensuring charge neutrality of the
paracetamol molecule (charge 0
), the
minimal control.in
file to generate the necessary files for a CC4S calculation looks like this
=== "control.in"
xc hf
frozen_core_postSCF 1
RI_method LVL
k_grid 1 1 1
symmetry_reduced_k_grid .false.
output cc4s
KS_method parallel
relativistic none
charge 0
The NAO-VCC-nZ basis sets that we are going to use grow very quickly in size with increasing
n. It can happen, particularly for n=4,5, that the basis set becomes slightly
ill-conditioned. By specifying override_illconditioning .true.
, one can avoid that FHI-aims
aborts the calculation. However, make sure that the ill-conditioning is indeed a result
of the basis set size and not of any other pathological computational parameter choice.
Finally, as we intend to use the computed HF quantities in a subsequent MP2 calculation, we want
to ensure that sufficiently tight convergence criteria are met, for which we add a few more
parameter specifications. As a result the complete set of parameters in the control.in
file are
=== "control.in"
xc hf
frozen_core_postSCF 1
RI_method LVL
k_grid 1 1 1
symmetry_reduced_k_grid .false.
output cc4s
KS_method parallel
relativistic none
charge 0
override_illconditioning .true.
sc_accuracy_rho 1E-6
sc_accuracy_eev 1E-4
sc_accuracy_etot 1E-6
The second half of the control.in
file contains the basis set information. As noted before, we
are going to use the NAO-VCC-nZ basis sets which can be found in the species_defaults
subdirectory of the FHI-aims code. In the case of paracetamol, which consists of C, H, O and N,
the NAO-VCC-2Z basis set can be added to the control.in
file via
cat control.in ~/FHIaims_fixed_elsi/FHIaims/species_defaults/NAO-VCC-nZ/NAO-VCC-2Z/06_C_default ~/FHIaims_fixed_elsi/FHIaims/species_defaults/NAO-VCC-nZ/NAO-VCC-2Z/01_H_default ~/FHIaims_fixed_elsi/FHIaims/species_defaults/NAO-VCC-nZ/NAO-VCC-2Z/08_O_default ~/FHIaims_fixed_elsi/FHIaims/species_defaults/NAO-VCC-nZ/NAO-VCC-2Z/07_N_default > control_CHON.in
mv control_CHON.in control.in
Finally, as noted in the beginning, utilization of the RI-LVL scheme in combination with correlated wave function methods requires manual addition of auxiliary basis functions. We have found that the following auxiliary functions generally suffice for main-group elements, to reach auxiliary basis set completeness:
for_aux hydro 4 f 1.0
for_aux hydro 5 g 1.0
for_aux hydro 6 h 1.0
To add them to your calculation, just copy-paste these three functions at the end of the basis set specification of each chemical element.
The geometry.in
file
The geometry.in
file we are going to use looks like this
=== "geometry.in"
lattice_vector 100.00000000 0.00000000 0.00000000
lattice_vector 0.00000000 100.00000000 0.00000000
lattice_vector 0.00000000 0.00000000 100.00000000
atom_frac -0.04903565 0.01241169 -0.00008674 C
atom_frac -0.04348102 -0.00033115 0.00011141 C
atom_frac -0.04074974 0.02367077 -0.00032200 C
atom_frac -0.02691010 0.02187815 -0.00035479 C
atom_frac -0.02135859 0.00910356 -0.00015637 C
atom_frac -0.02955862 -0.00230022 0.00008209 C
atom_frac -0.05989439 0.01355379 -0.00005679 H
atom_frac -0.02044152 0.03063831 -0.00053906 H
atom_frac -0.01054603 0.00826149 -0.00019262 H
atom_frac -0.05010819 -0.00899975 0.00029391 H
atom_frac -0.04555253 0.03652521 -0.00052445 O
atom_frac -0.05524927 0.03625030 -0.00047415 H
atom_frac -0.02514637 -0.01567650 0.00029638 N
atom_frac -0.01292920 -0.02221467 0.00035969 C
atom_frac -0.03266090 -0.02253960 0.00044532 H
atom_frac -0.00009404 -0.01425119 0.00017140 C
atom_frac 0.00062493 -0.00800063 -0.00883349 H
atom_frac 0.00819244 -0.02137940 0.00026640 H
atom_frac 0.00069120 -0.00768504 0.00894304 H
atom_frac -0.01289340 -0.03450510 0.00057082 O
In addition to the atomic coordinates of the paracetamol molecule, our geometry.in
file also
contains the lattice vectors to enclose the molecule in a 100Ax100Ax100A box, which needs to be
done to use the RI-LVL scheme in combination with a post-SCF method.
Running your first MP2 calculation
Note, that due to the increased scaling of the MP2 method with respect to system, most calculation performed in this tutorial require access to a computing cluster, particularly for those calculations involving bigger basis sets.
To maintain some order, you should create a directory called paracetamol_mp2
, which should
contain a subdirectory 2Z
, where our first MP2 calculation will be performed.
The calculation will consist of two parts: We will start with a FHI-aims calculation
to generate the necessary input files for CC4S, which will be done in the aims
subsubdirectory. Subsequently the CC4S calculation will be performed in a separate
CC4S
subsubdirectory.
Start by moving the control.in
and geometry.in
file assembled in the previous
two sections into the aims
directory.
mkdir -p paracetamol_mp2/2Z/aims
cp control.in paracetamol_mp2/2Z/aims
cp geometry.in paracetamol_mp2/2Z/aims
cd paracetamol_mp2/2Z/aims
To run the calculation execute the following command
mpiexec -np 4 /path/to/fhi-aims/executable/aims.xxx.scalapack.mpi.x
This calculation will perform a HF SCF cycle and subsequently invoke CC-aims to generate the necessary input files for CC4S. After the calculation finishes, you should find the following additional files in your calculation directory:
EigenEnergies.yaml
EigenEnergies.elements
CoulombVertex.yaml
CoulombVertex.elements
AuxiliaryField.yaml
State.yaml
Now, everything is set to perform an MP2 calculation (or any other method implemented
in CC4S for that matter). For that leave the aims directory
and create a CC4S
directory
cd ..
mkdir CC4S
cd CC4S
Copy or create links to the yaml
and elements
files created in the aims
directory.
ln -s ../aims/*yaml .
ln -s ../aims/*elements .
The last file you need to run the CC4S calculation, is one more yaml
file that specifies which
method of CC4S shall be used for the calculation. For an MP2 calculation the file looks like this
=== "mp2.yaml"
- name: Read
in:
fileName: "EigenEnergies.yaml"
out:
destination: EigenEnergies
- name: Read
in:
fileName: "CoulombVertex.yaml"
out:
destination: CoulombVertex
- name: DefineHolesAndParticles
in:
eigenEnergies: EigenEnergies
out:
slicedEigenEnergies: EigenEnergies
- name: SliceOperator
in:
slicedEigenEnergies: EigenEnergies
operator: CoulombVertex
out:
slicedOperator: CoulombVertex
- name: VertexCoulombIntegrals
in:
slicedCoulombVertex: CoulombVertex
out:
coulombIntegrals: CoulombIntegrals
- name: SecondOrderPerturbationTheory
in:
slicedEigenEnergies: EigenEnergies
coulombIntegrals: CoulombIntegrals
out:
energy: Mp2Energy
To perform the MP2 calculation, execute the following command
mpiexec -np N /path/to/CC4S/Cc4s -i mp2.yaml
where N specifies the number of cores to be used in parallel. The calculation evaluated below was performed on 2 nodes on the raven MPCDF cluster, which equates to N=144.
After the CC4S calculation is completed, the MP2 correlation energy will be printed to the standard output (albeit with a limited precision)
step: 6, SecondOrderPerturbationTheory
Contracting second order energy...
correlation energy: -1.73466
and - as a double precision real number to cc4s.out.yaml
=== "cc4s.out.yaml"
name: SecondOrderPerturbationTheory
out:
energy:
correlation: -1.7346634404468899
in Hartree energy units.
To obtain the total MP2 energy, the total HF energy needs to be added to the MP2 correlation energy. The total HF energy can be found in the output file of the previous FHI-aims calculation either at the end of the final SCF step
| Total energy : -512.54583727 Ha -13947.08185176 eV
in both Hartree and eV units
or at the end of the calculation (only in eV units)
------------------------------------------------------------------------------
Final output of selected total energy values:
The following output summarizes some interesting total energy values
at the end of a run (AFTER all relaxation, molecular dynamics, etc.).
| Total energy of the DFT / Hartree-Fock s.c.f. calculation : -13947.081851759 eV
| Final zero-broadening corrected energy (caution - metals only) : -13947.081851759 eV
| For reference only, the value of 1 Hartree used in FHI-aims is : 27.211384500 eV
Hence, we find that for the paracetamol molecule using the NAO-VCC-2Z basis set, the MP2 correlation energy is -1.73466 Ha, while the total MP2 energy is -512.54584 + (-1.73466) = -514.2805 Ha.
Reducing memory overhead - the optimized auxiliary field vertex
If you look at the files that have been generated by the CC-aims interface, the biggest one by
far is the CoulombVertex.elements
file, which in the case of the previous calculation
takes up almost 9 GB. Indeed, for calculations involving bigger basis sets the
memory demand of the Coulomb vertex grows very quickly.
To reduce this memory overhead, CC-aims can truncate the Coulomb vertex by essentially performing a principal component analysis. The resulting quantity is called the optimized auxiliary field
Coulomb vertex. To invoke this truncation one of two key-value pairs need to be added to the
control.in
file. One can either specify the smallest singular value <thr>
, below which the
Coulomb vertex is discarded via
cc4s_screen_thresh <thr>
or one can explicitely state how many of the biggest singular values are to be kept via
cc4s_screen_num <num>
,where <thr>
is a double precision real number and <num>
is an integer.
If one now adds cc4s_screen_thresh 1.0d-3
to the previous control.in
file
and repeats the MP2 calculation, one ends up with an MP2 correlation energy of
name: SecondOrderPerturbationTheory
out:
energy:
correlation: -1.7346596876159019
which is almost identical to the previous value obtained with the full Coulomb vertex. The size of the Coulomb vertex, however, is reduced to 1.6 GB.
In a practical calculation, one can determine the maximal value for <thr>
using
a small basis set like the NAO-VCC-2Z basis and determine the biggest value of <thr>
for which the quantity of interest (here: MP2 correlation energy) changes sufficiently little.
In this tutorial, we keep this threshold at a conservative value of 1d-3. Note, however,
that while in the present case we only compute the MP2 correlation energy of a single
system, oftentimes one is interested in the difference of two energies (e.g reaction energies,
adsorption energies, phase transition energies, etc.). Due to error compensation,
it is expected that in such cases, the <thr>
value can be chosen even bigger without
a significant loss in accuracy. Testing the convergence of the correlation energy (difference)
as a function of <thr>
can be done quickly and can substantially decrease the
memory and computational overhead of the calculation.
Basis set convergence and extrapolation
As has been noted in the beginning of this tutorial, correlated wave function methods including
MP2, RPA and CC are known to converge very slowly with basis set size. To demonstrate that,
we will repeat the previous MP2 calculation with the NAO-VCC-3Z, NAO-VCC-4Z
and NAO-VCC-5Z basis set. For each of these calculations you should create a new
subdirectory in paracetamol_mp2
mkdir -p paracetamol_mp2/3Z/
mkdir -p paracetamol_mp2/4Z/
mkdir -p paracetamol_mp2/5Z/
and repeat the remaining calculation procedure for the three bigger basis set by including
the correct basis set file from the species_defaults
directory in the control.in
file
(and remember to include the manual addition of the three for_aux
entries for each element).
Be aware, that the FHI-aims calculations for these basis sets will be more demanding and will
require more computational resources. A summary of the required resources and the resulting
MP2 correlation energies in given in the table below
Basis set | MP2 corr. energy (mHa/atom) | Coulomb vertex size (GB) | #Cores | Time for FHI-aims (min) |
---|---|---|---|---|
NAO-VCC-2Z | -86.733 | 1.6 | 144 | 3.2 |
NAO-VCC-3Z | -99.368 | 14 | 288 | 5.1 |
NAO-VCC-4Z | -104.386 | 83 | 288 | 20.0 |
NAO-VCC-5Z | -106.783 | 407 | 576 | 46.25 |
These results show very clearly that even going from the NAO-VCC-4Z to the NAO-VCC-5Z basis set still changes the MP2 correlation energy by almost 2.4 mHa/atom (=65 meV/atom). Thanks to the correlation-consistency of the NAO-VCC-nZ basis sets, we can make use of the two-point extrapolation technique introduced in the beginning of this tutorial. Since we have computed the results for n=2,3,4,5, we can compute the CBS extrpolation for CBS[23], CBS[34] and CBS[45], the latter of which we will use as our reference.
Extrapolation | MP2 corr. energy (mHa/atom) |
---|---|
CBS[23] | -104.688 |
CBS[34] | -108.047 |
CBS[45] | -109.298 |
These results show that the extrapolated values of the MP2 correlation energy converge faster, so that the remaining difference between the CBS[34] and the CBS[45] estimate is 1.25 mHa/atom (= 34 meV/atom). It is very important to emphasize again, that in practical applications where one is generally interested in the difference between two energies, the basis set incompleteness error will be even smaller due to error compensation.
One also has to realize, that in many practical applications (even more so for CC calculations), the NAO-VCC-5Z will be unfeasible due to the computational overhead. In these cases, the CBS[34] estimate constitutes the best compromise between accuracy and affordability.