Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with running FoF Halo Finder in parallel #161

Open
Anirban-096 opened this issue Dec 12, 2022 · 4 comments
Open

Problem with running FoF Halo Finder in parallel #161

Anirban-096 opened this issue Dec 12, 2022 · 4 comments

Comments

@Anirban-096
Copy link

I have Gadget-2 snapshots (each snapshot is saved over 16 files as snapshot_XXX.0 to snapshot_XXX.15 ) containing 1024^3 particles inside a periodic box of side length 160 cMpc/h.

I want to run the FoF halo finder on these snaps with a criteria of a minimum 10 dm particles to identify the halo - hence, I modified the “nMembers” variable from 8 to 10 (in line 34 of EnzoFOF.c and of fof_main.c ) while installing from source

I ran yt-FoF Halo Finder on my snaps - first, in serial mode (i.e. without yt.enable_parallelsim( ) ) and then, in parallel mode (by adding the line - yt.enable_parallelsim( ) ) with different numbers of processors each time (nprocs = 8, 9, 11, 12).

The python code being used for the parallel/serial runs is mentioned below

SERIAL yt-FOF Code :

import yt
import numpy as np
from yt.extensions.astro_analysis.halo_analysis import HaloCatalog

snum=41
​snapfile='test_snaps_serial/snapshot_'+('000'+str(snum))[-3:]+".0"
​print("-----------------------(start)----------------------------")
print("SNAP = ",snapfile)
data_ds = yt.load(snapfile)
hc = HaloCatalog(data_ds=data_ds,finder_method="fof",finder_kwargs={"ptype": "Halo", "link":-0.0001953125},output_dir="halocats_serial")
hc.create()

PARALLEL yt-FOF Code :

import yt
import numpy as np
from mpi4py import MPI
from yt.extensions.astro_analysis.halo_analysis import HaloCatalog
​
yt.enable_parallelism()
​
snum=41
​snapfile='test_snaps_parallel/snapshot_'+('000'+str(snum))[-3:]+".0"
data_ds = yt.load(snapfile)
hc = HaloCatalog(data_ds=data_ds,finder_method="fof",finder_kwargs={"ptype": "Halo", "padding": 0.02, "link":-0.0001953125},output_dir="halocats_parallel")
hc.create()

The no. of halos identified from serial mode runs is different from the total no. of halos identified in the parallel mode runs.
In fact, among the various parallel mode runs, the number of identified halos is dependent on the number of processors used for the parallelisation.

Mode		Nprocs		Total no. of particles filled		Total no of halos identified

Serial		-		1073741824				1533386

Parallel	8		910287624				1315510

Parallel	9		388695489				550452				

Parallel	11		338752779				483369				

Parallel	12		579769833				841258

The no of halos identified in the yt-FOF serial mode halo catalogs and the halo statistics (halo mass function) match very accurately with those returned by other FoF halo finders (like P-FoF) working on the same snapshot file. So the serial runs are fine, the problem lies in the parallel mode runs.

From the terminal outputs (see attached files here), I noticed the total number of particles being “filled in” (summed over all processors used) is not the same as the original particle number count ( = 1024^3 = 1073741824) and this may be causing the discrepancy in the determined halo counts.

I am unable to understand how to fix this issue or where exactly I am going wrong in the parallelism implementation (if that is the case). Any suggestions / help in this regard will be highly appreciated.

Thanks and Regards,
Anirban

@brittonsmith
Copy link
Member

Hi @anirban1196,

This is interesting, I've not seen behavior like this before. My first thought is that you might want to be careful using 9, 11, 12 numbers of processors because this result in odd partitioning of the domain. For example, 11, being prime, will result in 11 divisions along a single axis. It's better to go with factors or 2, or even better, powers of 2, to get simpler geometries for the domain decomposition. This doesn't explain the behavior you're seeing, just something to keep in mind.

As you point out, the total number of particles filled is telling. The place I would check out first is in the yt source. In yt/utilities/parallel_tools/parallel_analysis_interface.py, have a look at the function called partition_index_3d. This is called within yt_astro_analysis/halo_analysis/halo_finding/halo_objects.py to create the individual sub regions over which each processor will do halo finding. The function returns left and right edges for the individual subregions. I imagine if you stick some print statements in there, it should be obvious whether the subregions are fully covering the domain. Can you have a look at this and let us know what you find?

@Anirban-096
Copy link
Author

Anirban-096 commented Dec 15, 2022

Hi @brittonsmith,
Thanks for your reply.
I get your point about wisely choosing the number of processors to parallelize over. Hence, I just worked with nprocs = 8
I inserted print log statements after line 954 and line 985 in halo_objects.py and ran FoF in parallel. The terminal outputs (arranged processor-wise for easy visual inspection) is attached here
The sub-regions are covering the entire domain, albeit with marginal overlap among the subregions due to volume padding (as inferred from the prints outputted after line 985) .

Another thing I realized is that changing the value of the variable " padding " also changes the number of particles being filled into each processor. Earlier I was using the yt-default value of 0.02 - which may have been too small for halo-finding purposes inside the 160 cMpc/h box [ = 160 * 1e3 in code units (comoving kpc/h) ].

I have now made new runs with large values of " padding " ( ~ 10* 1e3) and parallelized over 16 processors. I will share that output once those runs are completed.
On using these large values of " padding ", I see that the total number of particles being filled in (summed over all 16 procs) has exceeded 1024^3 now.
However, I guess the FoF halo finder will not overcount halos - i.e. won't double count halos identified in the overlapping space (which will also get increased) common to two "padded" subregions.

@Anirban-096
Copy link
Author

Hi @brittonsmith,

As I had mentioned above, I inserted print log statements after line 954 and line 985 in halo_objects.py and made new runs with large values passed to the padding variable in FoFHaloFinder

I found that the value passed to the padding variable affects the number of particles being "filled in" and also the number of halos identified/saved (see table below).

Hence, I wonder whether one should even expect the total number of particles being filled in (summed over all used procs) to match 1024**3 because the particles being filled in to a particular processor is the particles belonging to that corresponding "padded" subregion (right?). Therefore, if one simply sums over these values for each processor, it will involve the over-counting of particles lying in the overlapping padded sub-regions. So this possibly explains why changing the padding value also changes the particles being filled in. Unfortunately, this also changes the no. of halos identified - which is also worrying.

Another thing that I missed out to mention in my very first comment was that for the parallel runs, I was overriding internal calculations of the linking length and passing it directly during the function call.
For e.g.,
hc = HaloCatalog(data_ds=data_ds,finder_method="fof",finder_kwargs={"ptype": "Halo", "padding": 0.02, "link":-0.0001953125},output_dir="halocats_parallel")

The reason behind doing this was the internally calculated linking length was dependent on the no. of procs used. For nprocs=16, it was calculated as 2.503e-04, while for nprocs=8, it was 2.064e-04, and so on. Only for the serial run case, the internally calculated linking length was correct (i.e. 1.953e-04 = 0.2*1/1024).

In the new runs with increased values of padding, I reverted to letting the code calculate the linking length on its own.
hc = HaloCatalog(data_ds=data_ds,finder_method="fof",finder_kwargs={"ptype": "Halo", "padding":10000},output_dir="halocats_parallel_pad10k")

Additionally, I inserted a print log statement after line 965 in halo_objects.py to display the total number of particles over all processors (hereafter, called "npart") and used in the linking length calculation.

It turns out that his number "npart" is not dependent on the padding value used (unlike the earlier case) and is probably a better reflector of the fact that all the 1024^3 particles are not being sent out during the parallelization step. However, "npart" is dependent on the number of processors used.

Nprocs	Padding value	No. of particles "filled in" 	nparts		No. of halos saved
			(summed over all nprocs)	[line 965]       (** see footnote )
--------------------------------------------------------------------------------------------

8	4,000		1057555906 (< 1024**3)		910285593	1427388	
8	10,000		1549317386 (> 1024**3)		910285593	1495690

16	4,000		650563869  (< 1024**3)		510367811	1045662
16	10,000		1130112219 (> 1024**3)		510367811	1232102
--------------------------------------------------------------------------------------------
** Footnote : This count is less than the total sum of the 
" Number of groups: " reported after FoF operation by each processor.

The detailed terminal outputs for these runs (displaying the left and right edge of each of the subregions, nparts, halo counts etc.) can be found in this folder

@brittonsmith , I would very much appreciate your suggestion/clarifications to the last two comments posted above (by me). Thanks !

@brittonsmith
Copy link
Member

I can see several possibilities for potential issues, but I am not certain of any of them. The problem may be in the io or possibly how the python wrapper to the halo finder converts positions into the correct units for the internal c code, or perhaps even an issue with periodicity. I doubt the issue is with the core halo finding code as that has not been altered in many years and has never shown issues like this before.

We can start with the io by confirming that reading data from a series of subvolumes similar to what the halo finder would do results in reading all the particles. Can you put together a code that makes 8 region objects (8 sub-cubes half the size of the box, together spanning the full domain), reads particles in each, counts them, and sums them? With no padding, the sum should be exactly the total number of particles in the box. If not, then we have a problem. Then, you might try padding the regions in the same way the halo finder does and confirm that each region still reads the same number as it does within the halo finder.

Please, let me know what you find. Also, please note that I am currently quite unwell and may take some time to respond. I will also be traveling and on leave from the middle of next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants