Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent unpickle errors when returning objects from intersection #103

Open
tomplex opened this issue Apr 12, 2018 · 2 comments
Open

Comments

@tomplex
Copy link

tomplex commented Apr 12, 2018

Hello,

I'm using rtree in an application I'm writing and running into some odd issues with the objects that I'm passing in to the index when creating it. Unfortunately I can't consistently replicate the issue (even with the same dataset), but I will provide an example of what I'm seeing in the hopes that someone has seen it before / can help me debug.

I'm bulk-loading two indexes, each from a different generator of data, passing in the ID of each row as the object in the index, an alphanumeric primary key. It's in all cases a valid ascii string, with no fancy characters; just letters and numbers. The example below is contrived and not necessarily representative of the data I'm working with, which I unfortunately cannot share, but is a small scale example of what I'm doing.

from rtree import index
from shapely import wkt

class Data:
    def __init__(self, id, _wkt):
        self.id = id
        self.geom = wkt.loads(_wkt)

my_data = [Data('B0001', 'Polygon((0 0, 0 1, 1 1, 1 0, 0 0))'), Data('B0002', 'Polygon(...)')]

def loader():
    for i, obj in enumerate(my_data):
        yield i, obj.geom.bounds, obj.id

idx = index.Index(loader())

Now, when I perform intersections against the index, I've been using the objects='raw' argument, as I wish to get back from the index the IDs of the matching records:

intersections = idx.intersection((0.5, 0.5, 0.5, 0.5), objects='raw')

When doing this, I will inconsistently receive a number of similar but not identical errors:

Traceback (most recent call last):
  File "scratch.py", line 46, in <module>
    intersecting_parcels = bldg.intersection(parcels)
  File "/apn_updater/latitude/latitude.py", line 227, in intersection
    return other.intersection(self)
  File "/apn_updater/latitude/latitude.py", line 179, in intersection
    return [self[i] for i in list(self._rtree.intersection(other.bounds, objects='raw'))]
  File "/usr/local/lib/python3.6/site-packages/rtree/index.py", line 501, in _get_objects
    yield self.loads(data)
  File "/usr/local/lib/python3.6/site-packages/rtree/index.py", line 281, in loads
    return pickle.loads(string)
_pickle.UnpicklingError: invalid load key, 'x'.
Traceback (most recent call last):
  File "scratch.py", line 59, in <module>
    intersecting_buildings = [b for b in parcel.intersection(buildings) if parcel.intersects(b) and b.apn == parcel.apn]
  File "/apn_updater/latitude/latitude.py", line 221, in intersection
    return self.geom.bounds
  File "/apn_updater/latitude/latitude.py", line 173, in intersection
    """
  File "/usr/local/lib/python3.6/site-packages/rtree/index.py", line 501, in _get_objects
    yield self.loads(data)
  File "/usr/local/lib/python3.6/site-packages/rtree/index.py", line 281, in loads
    return pickle.loads(string)
_pickle.UnpicklingError: invalid load key, '8'.
Traceback (most recent call last):
  File "scratch.py", line 45, in <module>
    intersecting_parcels = bldg.intersection(parcels)
  File "/apn_updater/latitude/latitude.py", line 221, in intersection
    return other.intersection(self)
  File "/apn_updater/latitude/latitude.py", line 173, in intersection
    return [self[i] for i in list(self._rtree.intersection(other.bounds, objects='raw'))]
  File "/usr/local/lib/python3.6/site-packages/rtree/index.py", line 501, in _get_objects
    yield self.loads(data)
  File "/usr/local/lib/python3.6/site-packages/rtree/index.py", line 281, in loads
    return pickle.loads(string)
ValueError: unsupported pickle protocol: 180

I am not overriding the loads or dumps methods of the Rtree (though I have tried that to resolve this issue). The inability to consistently get this error condition when running the same application with the same data has been frustrating. Could this issue be a result of my using multiple Index objects at once?

Thanks in advance for any help.

@sgillies
Copy link
Member

@tomplex I'm unsure what's up with pickling here. For what it's worth, I haven't used this in my own applications. Instead I've maintained a mapping of strings (like 'B0001') to ints and have used those ints as Rtree indexes (instead of the one generated by enumerate). I recommend this as a workaround.

Can you provide a little more information?

  • Python version
  • OS
  • is this a multi-threaded app?

I haven't seen any reports of problems with multiple indexes, but I can't rule it out.

@tomplex
Copy link
Author

tomplex commented Apr 16, 2018

Hey @sgillies, thanks for the response. I started out using rtree the way you mention, but was just messing around with some other usage options as I'm trying to write a library around rtree / shapely for my own use.

To answer your specific questions:

  • Python 3.6.5
  • Debian 8 (jessie) in docker python3.6 image specifically, docker is running on OSX High Sierra

The app is using threading to get & load data into separate rtree's simultaneously at the beginning, but there aren't ever multiple threads loading to a single index or reading from the same index. Here's a small example of what I mean:

from rtree import index
from threading import Thread

class Dataset:
    def load(self, data):
        self._rtree = index.Index(data)
        
    def load_async(self, data):
        self._thread = Thread(target=self.load, args=(data,))
        self._thread.start()


d1 = Dataset()
d2 = Dataset()

d1.load_async(data1)
d2.load_async(data2)

# wait 'till it's done loading

results = d2.intersection('Polygon(...)')

I saw that the libspatialindex library is not thread-safe for inserting or reading, but as far as I understood the comments on a few github issues this seemed like it was mostly an issue with multiple threads acting on single rtree instances, and folks who were reporting issues were seeing things like segfaults and other catastrophic errors - not ones with pickle. That said, I understand if your answer now is "don't use rtree in any multithreaded context". =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants