Oxidize `QuantumCircuit._data` and intern `CircuitInstruction` args #10827

kevinhartman · 2023-09-12T23:32:17Z

Summary

Introduces a new list-like Rust data typeCircuitData, used to more efficiently store a QuantumCircuit's instruction listing in memory. This is used along with another new Rust type InternContext, which serves as a cache for argument lists that use the same sequence of numeric bit identifiers.

The basic idea is that when a CircuitInstruction instance is appended to QuantumCircuit, the underlying CircuitData stores just a reference to its operation on the Python heap and two u32 IDs (handles into the InternContext) which correspond to its qubits and clbits argument lists. When CircuitData elements are accessed (e.g. via QuantumCircuit._data), they get converted back to CircuitInstruction instances.

Details and comments

These changes leave the QuantumCircuitData wrapper entirely in place, with a few internal modifications to side-step implementing things like __rmul__ for the Rust-side CircuitData.

The InternContext deals in terms of bit indices (not Python references), so it's naturally able to share argument lists across circuits, and it doesn't distinguish between qubit and clbit arg lists. Its only limitation is really the number of unique argument sequences it can store, which right now is 2^32 since we're using a slotted approach with u32 IDs.

At present, new QuantumCircuit instance manage their own InternContext, ~~and copies of a circuit share the context of the original~~ (EDIT: for this PR, no sharing is done since concurrent implications need to be addressed.) In theory, we could use another mechanism to share the same InternContext across (all | some) new circuits (e.g. using a Python singleton), but that's probably only worth exploring if we discover that memory pressure across circuits is a big bottleneck for users down the line.

Intern context implementation

When an argument list is interned (via InternContext::intern(arg_list: Vec<BitType>)-> IndexType), the provided argument list is looked up in a mapping of arg list to slot ID, a u32. If the mapping already exists, the returned slot ID identifies a slot which already contains the argument list corresponding to arg list. ~~The number of uses (also a field of the slot) is incremented.~~ If the mapping does not exist, the next available slot is claimed, the arg list is moved into it, ~~and the new slot's use count is set to 1~~ (EDIT: use count tracking has been cut from this PR).

An interned argument list is retrieved via its slot ID (InternContext::lookup(slot: IndexType) -> &Vec<BitType>).

When a client of InternContext knows that it no longer needs a particular use of a slot, it can call InternContext::drop_use(slot: IndexType) to decrement the use count, unallocating the slot if it is no longer in use.

~~The (likely) only client of InternContext is CircuitData, which has been written with care to drop intern uses as CircuitInstructions are removed from it, and on its own destruction.~~ (EDIT: Use count tracking cut from this PR)

Memory profiling

I profiled circuit construction with memray of a circuit with about 5 million parameter-less gates and found that peak memory use was reduced by about 400 MiB (from 1.7 GiB to 1.3 GiB). With #10314 applied, the same circuit comes down to just 192 MiB peak memory use (most of our memory pressure comes from operation instances, with argument lists being the second worst offender).

Todo

Memory profiling for common operations on large circuits. I've only profiled circuit construction so far.
Performance profiling.
Remove instances of us copying the contents of QuantumCircuit._data into Python lists in various places.
Run ASV.
Rust-side unit testing for InternContext.
~~Handle case of intern context full.~~ => likely not necessary with 2^32 slots.

jakelishman

I'm really pretty impressed by quite how few changes were needed, even within QuantumCircuit itself! This in principle looks super exciting to me, and I'm looking forwards to having this in properly (though of course I know we've got a way to go yet).

I didn't review in full detail, more just a fairly quick scan through. A common comment that I didn't want to put everywhere was that I think we pretty much should never be taking Vec or returning it at the Python/Rust API boundary; it always necessitates a copy, when we could just act on an incoming PyList/PyTuple/PySequence natively; the cache locality of the objects is the same as Vec, except we don't need to copy onto the Rust heap for an object we mostly just throw away / move somewhere else.

crates/accelerate/src/quantum_circuit/circuit_data.rs

jakelishman · 2023-09-13T12:54:11Z

crates/accelerate/src/quantum_circuit/circuit_data.rs

+                    |fn_idx_to_bit: &PyObject, args: &Vec<BitType>| -> PyResult<Vec<PyObject>> {
+                        args.iter()
+                            .map(|i| fn_idx_to_bit.call1(py, (*i,)))
+                            .collect()
+                    };


This is the kind of thing where it would be much more convenient to have fn_idx_to_bit as a known PyList (or PyTuple) - we'd be able to directly index into the memory, rather than needing to indirect via a Python-space call.

We'd also probably be better off here collecting into a PyTuple allocated on the Python heap directly rather than a Rust-heap Vec that will need copying over.

We'd also probably be better off here collecting into a PyTuple allocated on the Python heap directly rather than a Rust-heap Vec that will need copying over.

That sounds like a good idea to me, but I can't actually figure out a way to do that without first collecting the elements into a Vec. Collecting into a PyResult<&PyTuple> or PyResult<Py<PyTuple>> doesn't seem to work. The difficulty is that we're collecting a sequence of Results into a single Result containing a sequence.

If we were constructing a PyList, I could pre-create an empty list and add these items to it in a for-loop. But since PyTuple is immutable, I don't think that's an option here.

It would certainly be possible with the raw C API, so this may just be a case where if it does turn out to be performance-important, we might want to drop to using the C-API FFI slightly more directly. I'm fairly sure it can be done in a perfectly safe way (albeit using unsafe blocks to call the FFI functions), so we might even want to implement the collection trait back up to PyO3 if so.

Yeah, it seems like an obvious gap in PyO3 to not have the collection traits for these. Would certainly be a nice contribution upstream 😄.

qiskit/circuit/controlflow/if_else.py

qiskit/circuit/quantumcircuit.py

qiskit/circuit/quantumcircuitdata.py

mtreinish

Overall this looks great to me! Thanks for all the hard work diving into this. I just had a few minor questions and comments inline, but overall this looks very close to being ready from my perspective.

crates/accelerate/src/quantum_circuit/circuit_data.rs

mtreinish · 2023-11-14T13:58:51Z

crates/accelerate/src/quantum_circuit/circuit_data.rs

+            qubit_indices_native: HashMap::new(),
+            clbit_indices_native: HashMap::new(),


Could you save on the reallocation cost here by moving checking qubits and clbits before creating the object and if they're lists then pulling the len and calling with_capacity()

We could, but it'd probably be important to change qubits and clbits args to be of type PySequence or PyList instead in that case so we guarantee there's an exact size, rather than having special case logic.

The reasoning behind accepting Iterable was to provide some hint to the user that the lists they provide are not used verbatim, but their contents instead copied into the CircuitData and maintained therein.

crates/accelerate/src/quantum_circuit/circuit_data.rs

crates/accelerate/src/quantum_circuit/circuit_instruction.rs

crates/accelerate/src/quantum_circuit/intern_context.rs

releasenotes/notes/bit-interning-35da0aaa76aa7fc5.yaml

InternedInstruction is renamed to PackedInstruction to make it clearer that the code deals in terms of some form of a packed instruction, and that interning of the instruction's qubits and clbits is just an implementation detail of that.

jakelishman

Thank you very much for this, Kevin! I don't have any major comments, I think most of my bits and bobs are just small questions about PyO3 rather than anything else.

crates/accelerate/src/quantum_circuit/py_ext.rs

crates/accelerate/src/quantum_circuit/circuit_data.rs

jakelishman · 2023-11-16T18:43:59Z

crates/accelerate/src/quantum_circuit/circuit_data.rs

+/// As a convenience, :attr:`.CircuitData.qubits` and
+/// :attr:`.CircuitData.clbits` are exposed to Python as ``list`` instances,
+/// which are updated inplace as new bits are added via
+/// :meth:`~.CircuitData.add_qubit` and :meth:`~.CircuitData.add_clbit`.
+/// **DO NOT MODIFY THESE LISTS DIRECTLY**,
+/// or they will become out of sync with the internals of this class.
+/// In this way, a :class:`.CircuitData` owns its ``qubits`` and ``clbits``.


How much of a convenience / performance necessity is it, over us only offering copies of the list? I'm fine if it's necessary, but if it's not, it feels safer not to offer the footgun, and just make what are currently the qubits and clbits attributes methods instead.

Turns out that it wasn't necessary for these lists to be updated in place.

I've changed the logic to instead replace the lists each time new bits are added in f2646b4. This way, client code won't be able to depend on the lists being updated.

Technically, this may be a breaking change since previously QuantumCircuit.{qubits, clbits} would be updated in place as new bits and registers were added to this circuit. Is this worth documenting in the release notes?

I left CircuitData.{qubits, clbits} as attributes rather than methods since the underlying values are cached as bits are added. I'm mildly concerned that changing them to methods might convey to clients that the lists are constructed on the fly and thus expensive to compute, which is not true. I could see an argument where they make more sense as methods to convey that the reference cannot be relied upon, but I've documented this in the docstrings.

crates/accelerate/src/quantum_circuit/circuit_data.rs

crates/accelerate/src/quantum_circuit/circuit_instruction.rs

crates/accelerate/src/quantum_circuit/intern_context.rs

qiskit/circuit/quantumcircuit.py

qiskit/circuit/quantumcircuitdata.py

jakelishman

Thanks so much for the huge amount of work that went into this. This looks ready to merge to me. I know you've got a follow-up in the pipeline, and I think the best thing we can do right now is get this merged and start actually using the new code to get it exercised as widely as we can.

crates/accelerate/src/quantum_circuit/circuit_data.rs

crates/accelerate/src/quantum_circuit/py_ext.rs

jakelishman · 2023-11-17T23:11:44Z

I'm not pressing "merge" yet because I'll give Matt another chance to look first, since there have been significant changes since his review.

mtreinish

This LGTM, thanks for all the hard work implementing this!

mtreinish · 2023-11-20T17:06:10Z

crates/accelerate/src/quantum_circuit/circuit_data.rs

+/// A private enumeration type used to extract arguments to pymethods
+/// that may be either an index or a slice.
+#[derive(FromPyObject)]
+pub enum SliceOrInt<'a> {


Is it private or public? :)

…iskit#10827) * Initial commit. * Fix bugs with slicing impl. * Fix sort, remove dead code. * Use custom cfg flag for debug. * Run fmt. * Add todo. * Revert utils change, not needed anymore. * Use CircuitData in compose. * Revert test stub. * Remove once_cell. Not needed anymore. * Run format. * Fix lint issues. * Use PyTuple in ElementType. * Use native list and dict types for lookup tables. * Implement __traverse__ and __clear__. * Take iterable for extend. Preallocate. * Fix insertion indexing behavior. * Fix typo. * Avoid possible hash collisions in InternContext. * Use () instead of None for default iterable. * Resolve copy share intern context TODO. * Use () instead of empty list in stubed lists. * Use u32 for IndexType. * Resolve TODO in if_else.py. * Fix Rust lint issues. * Add unit testing for InternContext. * Remove print logging. * Fix bug introduced during print removal. * Add helper methods for getting the context in CircuitData. * Fix issue with BlueprintCircuit. * Workaround for CircuitInstruction operation mutability. Fix lint. * Revert "Workaround for CircuitInstruction operation mutability." This reverts commit 21f3e51. * Add _add_ref to InstructionSet. Allows in-place update of CircuitInstruction.operation within a CircuitData. * Exclude CircuitData::intern_context from GC clear. * Fix lint. * Avoid copy into list. * Override __deepcopy__. * Override __copy__ to avoid pulling CircuitData into a list. * Implement copy for CircuitData. * Port CircuitInstruction to Rust. * Use Rust CircuitInstruction. * Optimize circuit_to_instruction.py * Use freelist for CircuitInstruction class. * Remove use count, fix extend, InternContext internal. Previously CircuitData::extend would construct CircuitInstruction instances in memory for the entire iterable. Now, a GILPool is created for each iteration of the loop to ensure each instance is dropped before the next one is created. At most 3 CircuitInstruction instances are now created during the construction and transpilation of a QuantumVolume circuit in my testing. Use count tracking is now removed from InternContext. InternContext is now no longer exposed through the Python API. * Revert to using old extraction for now until we move bits inside CircuitData. * Fix issue with deletion. * Performance optimization overhaul. - Switched to native types for qubits, clbits, qubit_indices, clbit_indices. - Tweaks to ensure that only 1-3 CircuitInstruction instances are ever alive at the same time (e.g. in CircuitData.extend). - Use py_ext helpers to avoid unnecessary deallocations in PyO3. - Move pickling for CircuitData from QC to CircuitData itself. - Add CircuitData.reserve to preallocate space during copy scenarios. * Fix lint. * Attempt to move docstring for CircuitInstruction. * Add missing py_ext module file. * Use full struct for InternedInstruction. * Remove __copy__ from QuantumCircuit. * Improve bit key wrapper name. * Remove opt TODO comment. Will be done in new PR. * Clean up GC cycle breaking. * Add separate method convert_py_index_clamped. * Implement __eq__ instead of __richcmp__. * Avoid 'use'ing SliceOrInt enum. * Port slice conversion to pure Rust. * Clean up InternContext. * Change import order in quantumcircuitdata.py. * Use .zip method on iter(). * Rename get_or_cache to intern_instruction. * Improve error handling. * Add documentation comments for Rust types. * Move reserve method. * Add tests for bit key error. * Localize BlueprintCircuit workaround. * Slice refactoring, fixes, tests. * Fix setitem slice regression, clean up slice testing. * Use Python docstring form for pymethods. * Don't use underscore in BitAsKey type name. * Add release note. * Add upgrade note. * Add error messages for exceeded qubits and clbits. * Use BitType instead of u32. * Improve code comments for extend. * Improve readability with PackedInstruction. InternedInstruction is renamed to PackedInstruction to make it clearer that the code deals in terms of some form of a packed instruction, and that interning of the instruction's qubits and clbits is just an implementation detail of that. * Fix reserve issue. * Use usize for pointer type. * Use copied instead of cloned on Option. * Use .is() instead of IDs. * Convert tuples to list in legacy format. * Remove redundant parens. * Add crate comment for py_ext. * Make CircuitData::qubits and CircuitData::clbits ephemeral.

kevinhartman added 11 commits September 8, 2023 16:39

Initial commit.

e9cf029

Fix bugs with slicing impl.

bee235c

Fix sort, remove dead code.

6b72c7a

Use custom cfg flag for debug.

e8c9a60

Run fmt.

85df501

Add todo.

a35521d

Revert utils change, not needed anymore.

38068ae

Use CircuitData in compose.

9ebc36a

Revert test stub.

d78d5e6

Remove once_cell. Not needed anymore.

aa33e00

Run format.

3837517

jakelishman reviewed Sep 13, 2023

View reviewed changes

kevinhartman added 18 commits September 13, 2023 14:48

Fix lint issues.

697049b

Use PyTuple in ElementType.

bc33882

Use native list and dict types for lookup tables.

cf10769

Implement __traverse__ and __clear__.

65fec56

Take iterable for extend. Preallocate.

dcc31f8

Fix insertion indexing behavior.

9185dc2

Fix typo.

e9bc36b

Avoid possible hash collisions in InternContext.

e1faea3

Use () instead of None for default iterable.

45061e9

Resolve copy share intern context TODO.

6b3a0a0

Use () instead of empty list in stubed lists.

71a23b2

Use u32 for IndexType.

eb35ffb

Resolve TODO in if_else.py.

77210a2

Fix Rust lint issues.

cd2bdc1

Add unit testing for InternContext.

9eb1a57

Remove print logging.

8447c02

Fix bug introduced during print removal.

bafef1d

Add helper methods for getting the context in CircuitData.

1d63a40

kevinhartman added 2 commits November 9, 2023 17:11

Don't use underscore in BitAsKey type name.

f0e6554

Add release note.

0cb4a1d

mtreinish reviewed Nov 14, 2023

View reviewed changes

kevinhartman added 8 commits November 14, 2023 16:58

Add upgrade note.

0f00b87

Add error messages for exceeded qubits and clbits.

cd3b920

Use BitType instead of u32.

9f4f12d

Improve code comments for extend.

5965c6c

Improve readability with PackedInstruction.

d46475a

InternedInstruction is renamed to PackedInstruction to make it clearer that the code deals in terms of some form of a packed instruction, and that interning of the instruction's qubits and clbits is just an implementation detail of that.

Fix reserve issue.

65aa628

Use usize for pointer type.

dab5373

Use copied instead of cloned on Option.

b10d8b4

kevinhartman requested a review from mtreinish November 15, 2023 16:03

jakelishman reviewed Nov 16, 2023

View reviewed changes

kevinhartman added 5 commits November 16, 2023 16:43

Use .is() instead of IDs.

7731670

Convert tuples to list in legacy format.

dad3430

Remove redundant parens.

a39fe80

Add crate comment for py_ext.

0fda537

Make CircuitData::qubits and CircuitData::clbits ephemeral.

f2646b4

jakelishman approved these changes Nov 17, 2023

View reviewed changes

crates/accelerate/src/quantum_circuit/circuit_data.rs Show resolved Hide resolved

crates/accelerate/src/quantum_circuit/py_ext.rs Show resolved Hide resolved

mtreinish approved these changes Nov 20, 2023

View reviewed changes

mtreinish added this pull request to the merge queue Nov 20, 2023

kevinhartman mentioned this pull request Nov 20, 2023

Variable-declaration and storage representation in Qiskit Qiskit/RFCs#50

Merged

Merged via the queue into Qiskit:main with commit f3857f1 Nov 20, 2023
14 checks passed

jakelishman mentioned this pull request Nov 30, 2023

Track runtime variables in control-flow builders #10977

Merged

jakelishman mentioned this pull request Dec 8, 2023

Add Rust-based SparsePauliOp.to_matrix and Miri tests #11388

Merged

jakelishman mentioned this pull request Jan 8, 2024

Fix quadratic construction time of QuantumCircuit #11517

Merged

mtreinish mentioned this pull request Feb 5, 2024

Leverage Rust InternContext for DAGCircuit internals #11721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oxidize `QuantumCircuit._data` and intern `CircuitInstruction` args #10827

Oxidize `QuantumCircuit._data` and intern `CircuitInstruction` args #10827

kevinhartman commented Sep 12, 2023 •

edited

jakelishman left a comment

jakelishman Sep 13, 2023

kevinhartman Sep 14, 2023

jakelishman Sep 14, 2023 •

edited

kevinhartman Sep 14, 2023

mtreinish left a comment

mtreinish Nov 14, 2023

kevinhartman Nov 14, 2023

jakelishman left a comment

jakelishman Nov 16, 2023

kevinhartman Nov 17, 2023

jakelishman left a comment

jakelishman commented Nov 17, 2023

mtreinish left a comment

mtreinish Nov 20, 2023

		qubit_indices_native: HashMap::new(),
		clbit_indices_native: HashMap::new(),

Oxidize QuantumCircuit._data and intern CircuitInstruction args #10827

Oxidize QuantumCircuit._data and intern CircuitInstruction args #10827

Conversation

kevinhartman commented Sep 12, 2023 • edited

Summary

Details and comments

Intern context implementation

Memory profiling

Todo

jakelishman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakelishman Sep 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtreinish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakelishman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakelishman left a comment

Choose a reason for hiding this comment

jakelishman commented Nov 17, 2023

mtreinish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oxidize `QuantumCircuit._data` and intern `CircuitInstruction` args #10827

Oxidize `QuantumCircuit._data` and intern `CircuitInstruction` args #10827

kevinhartman commented Sep 12, 2023 •

edited

jakelishman Sep 14, 2023 •

edited