Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling and unpickling messes up Index #87

Open
bobtrekdrop opened this issue Mar 14, 2017 · 5 comments
Open

Pickling and unpickling messes up Index #87

bobtrekdrop opened this issue Mar 14, 2017 · 5 comments

Comments

@bobtrekdrop
Copy link

bobtrekdrop commented Mar 14, 2017

Pickling and unpickling renders the Index unusable. The test_pickle.py does not test the correct properties, should check Index.bounds as well. Code to reproduce:

import cPickle as pickle
from rtree import index
import numpy as np

lonR = [-2.,9.]
latR = [50.,54.]
lonD = 0.125
latD = 0.125
lon = np.arange(lonR[0],lonR[1],lonD)
lat = np.arange(latR[0],latR[1],latD)
	
rtree = index.Index()
i = 0
for y in lat:
	for x in lon:
		rtree.insert(i, (x,y,x+lonD,y+latD))
		i += 1

unpickled = pickle.loads(pickle.dumps(rtree))
print(rtree.get_bounds())
print(unpickled.get_bounds())
@ghost
Copy link

ghost commented Dec 1, 2017

Would really love to see this fix/implemented. I also notice that specifying an index name doesn't lead to the index to be stored at a particular file path? It would be useful to be able to find the serialised data somehow.

I've also tried pickling with dill and it gives the same issue.

Wish I could help but I couldn't understand the serialisation code at all.

@cgmike
Copy link

cgmike commented Feb 6, 2020

I am having exactly the same problem here on Ubuntu 18.04 with Python 3.6.
It would be extremely helpful if this gets fixed.

@SergeBouchut
Copy link

SergeBouchut commented Nov 28, 2021

Is this issue is planned to be solved? It could be super useful to keep the hand on the "right time" and the conditions under which the index has to serialized.

@adamjstewart
Copy link
Collaborator

We just ran into this as well. TorchGeo uses rtree to store a database of raster file bounding boxes. However, if you try to use a parallel data loader, the multiprocessing library will pickle and unpickle the index, and all entries in the index will be removed.

I tried digging into this but couldn't get any further than anyone else. I guess it would be useful to understand why we need to delete the state handle instead of just pickling it. It seems that if I remove the del state["handle"] line in Index.__getstate__ it causes a segfault, but I don't know why that is.

@adamjstewart
Copy link
Collaborator

Update: I found a hack to make this work: #197. I don't like it, but it's better than nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants