Skip to content

Commit

Permalink
GH-40068: [C++] Possible data race when reading metadata of a parquet…
Browse files Browse the repository at this point in the history
… file (#40111)

### Rationale for this change

The `ParquetFileFragment` will cache the parquet metadata when loading it.  The `metadata()` method accesses this metadata (a shared_ptr) but does not grab the lock used to set that shared_ptr.  It's possible then that we are reading a shared_ptr at the same time some other thread is setting the shared_ptr which is technically (I think) undefined behavior.

### What changes are included in this PR?

Guard access to the metadata by grabbing the mutex first

### Are these changes tested?

Existing tests should regress this change

### Are there any user-facing changes?

No
* Closes: #40068

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
  • Loading branch information
westonpace authored and raulcd committed Mar 13, 2024
1 parent 5cd4f07 commit 0969975
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
5 changes: 5 additions & 0 deletions cpp/src/arrow/dataset/file_parquet.cc
Original file line number Diff line number Diff line change
Expand Up @@ -779,6 +779,11 @@ ParquetFileFragment::ParquetFileFragment(FileSource source,
parquet_format_(checked_cast<ParquetFileFormat&>(*format_)),
row_groups_(std::move(row_groups)) {}

std::shared_ptr<parquet::FileMetaData> ParquetFileFragment::metadata() {
auto lock = physical_schema_mutex_.Lock();
return metadata_;
}

Status ParquetFileFragment::EnsureCompleteMetadata(parquet::arrow::FileReader* reader) {
auto lock = physical_schema_mutex_.Lock();
if (metadata_ != nullptr) {
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/dataset/file_parquet.h
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ class ARROW_DS_EXPORT ParquetFileFragment : public FileFragment {
}

/// \brief Return the FileMetaData associated with this fragment.
const std::shared_ptr<parquet::FileMetaData>& metadata() const { return metadata_; }
std::shared_ptr<parquet::FileMetaData> metadata();

/// \brief Ensure this fragment's FileMetaData is in memory.
Status EnsureCompleteMetadata(parquet::arrow::FileReader* reader = NULLPTR);
Expand Down

0 comments on commit 0969975

Please sign in to comment.