[core] introducing row-range-keyed DeletionVectors for DataEvolution tables by steFaiz · Pull Request #8344 · apache/paimon

steFaiz · 2026-06-24T09:13:12Z

Purpose

The first part of #8322

This is a very BIG pr, I tried to split it into several sub PRs, but found it very hard.

The PR only focuses on core module, leaving Spark/Flink module untouched, mainly includes:

RowRange DeletionFiles are fully decoupled with data files, they are just to split one big deletion vector into many small files.
Introduce a DeletionFileKey to replace current filename as the unique identifier of each DeletionFile
Introduce a DataEvolutionApplyDvReader to filter out deleted row ids. This class refers to IndexedSplitRecordReader: First enrich readType by _ROW_ID, then filter rows by RowRange DeletionVectors.
Slightly changed SnapshotReaderImpl, assigning overlapping row range DeletionVectors to each DataSplit.

Besides, most changes are about changing many String fields to DeletionFileKey.

Tests

Unit Tests only.

…tables

JingsongLi · 2026-06-25T03:55:41Z

+            return false;
+        }
+
+        return currentDv.isDeleted(rowId);


This reader only evaluates the first row-range DV whose end has not passed the current row id. If two row-range keyed DVs overlap, for example [0, 10] and [5, 15], rows 5-10 are never checked against the second DV, so deletions recorded only there will be returned. I don't see the writer/manifest combiner rejecting overlapping RowIdRangeKey values; they only deduplicate exact keys. Please either enforce non-overlapping row-range DVs when writing/combining metadata, or make this reader evaluate all DVs that may contain the row.

Thanks for your remind! I've added the check in GlobalCombiner and added a test

JingsongLi · 2026-06-25T03:55:41Z

                            ? null
-                            : IndexFileMetaSerializer.dvMetasToRowArrayData(dvMetas.values()),
+                            : IndexFileMetaSerializer.metasToRowArrayData(
+                                    dvMetas, DeletionFileKey.Type.FILE_NAME),


For row-range keyed DVs this renders the compatibility marker row instead of the real row-id ranges, because metasToRowArrayData(..., FILE_NAME) intentionally fast-fails old readers. That makes table_indexes.dv_ranges misleading for data-evolution tables. Can we expose the row-range DV schema here as well, or render the ranges in a separate column, instead of showing the legacy marker?

This is very helpful! I've unified FileNameKey and RowRangeKey:

Reusing the same field

For FileNames, display the file name

For RowRanges, display the formated range. e.g. [0, 100]

steFaiz · 2026-06-26T09:11:53Z

I thought about the design and I think that making row ranged-DV aligned with normal file ranges is much more simple and easy to manage, compared to fully decoupled mode. Just like filename-based implementation.
The main logic:

At first DV generation, generate DVs aligned with existing normal data file ranges
During compaction, if some data files triggered row-level compaction and are merge into larger data file range, we merge related DVs too.

This is because it's complicated to update DVs if we let DVs can be fully decouple, consider this scenario:
Current DV ranges:

// Current DV:
[0, 100] , [101, 200], [230, 300]

Update a new DV range [50, 250]
This will involves:

among all DVs, binary search to find all overlapping DVs. here we find: [0, 100] , [101, 200] , [230, 300]
check each DV:
a. For [0, 100], it's only partial overlapping with [50, 250], so we may have to split the new DV into [50, 100], [101, 250]
b. For [101, 200] and [230, 300], we need to merge it with current splitted new DV: [101, 250], but how to merge them? Merge into a single big range [101, 300]? or split it into some selected size?

If we don't design sophiscated merge/split mechanism, there might be some DVs' size too big or too small.

I think we could:

At read path: capable of reading any decoupled DV ranges (already implemented in this PR)
At write/compaction path: force DV ranges be aligned with existing normal file ranges.

@JingsongLi Could you please review this proposal and share your thoughts? Any feedback would be greatly appreciated!

JingsongLi · 2026-06-28T15:12:19Z

@steFaiz If we move in this direction, do we maintain another simple design:

Set the DV according to the oldest data file, still using the file name, which will not introduce any new mechanism and maintain the old DV mechanism.

steFaiz · 2026-06-29T02:40:30Z

@JingsongLi Thanks for your reply! This design: "Reference to the oldest file" is also in my original design docs.
I choose current range-based because:

range-based approach is more natural and easy to search
The oldest-filename approach can reuse the existing DV key, but it introduces an implicit anchor-file
requirement. Today DataEvolutionFileStoreScan can prune files by writeCols: for SELECT c, it only needs files
containing column c. If DVs are attached to the oldest file, the scan still needs to keep or lookup that
oldest file only for finding DVs. That makes projection planning depend on an unrelated physical file. For example:
a. Data files could be filtered by writeCols, if full schema is [a, b, c, d] and some files only contains [c]
b. For query like SELECT c from t, current implementation only need all files with writeCols = [c]
c. Now we have to keep the oldest file in plan (or we cannot find related DVs)
This logic seems weird. Like "if delation-vector is enabled, we need to plan all column files as well as the oldest file".
In theory, for compaction without materialize DVs (materialize DVs need to rebuild global indices), DVs do not need to rewrite for range-based approach (Rewriting DV is low-cost, I think it's trivial)

I agree that the oldest filename based approach will reduce code changes.

Which one do the community prefer? range based or the oldest filename based? I'll modify this PR!

JingsongLi · 2026-06-29T04:27:01Z

@steFaiz Thank you very much for your reply, it is very valuable. Indeed, both options will bring different levels of complexity, and we need to choose between them. I am also very conflicted because the complexity they bring is almost of the same magnitude.

I prefer the older filename based approach because it preserves the original DV mechanism and eliminates the need to worry about maintaining the row id ranges. We can consider retaining the original file information in DV mode until the DV is generated.

[core] introducing row-range-keyed DeletionVectors for DataEvolution …

94a4727

…tables

steFaiz marked this pull request as draft June 24, 2026 09:15

steFaiz added 4 commits June 24, 2026 20:14

fix tests

c04f1a2

fix tests

8f28553

fix tests

e381df9

add test

9cebbd7

steFaiz marked this pull request as ready for review June 25, 2026 03:07

JingsongLi reviewed Jun 25, 2026

View reviewed changes

fix comments: add some hard check

b98f581

steFaiz mentioned this pull request Jun 29, 2026

[core] introducing DeletionVector mechanism for DataEvolution tables #8380

Merged

JingsongLi closed this Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] introducing row-range-keyed DeletionVectors for DataEvolution tables#8344

[core] introducing row-range-keyed DeletionVectors for DataEvolution tables#8344
steFaiz wants to merge 6 commits into
apache:masterfrom
steFaiz:dv_for_de_core_support

steFaiz commented Jun 24, 2026 •

edited

Loading

Uh oh!

JingsongLi Jun 25, 2026

Uh oh!

steFaiz Jun 25, 2026

Uh oh!

JingsongLi Jun 25, 2026

Uh oh!

steFaiz Jun 25, 2026

Uh oh!

steFaiz commented Jun 26, 2026

Uh oh!

JingsongLi commented Jun 28, 2026

Uh oh!

steFaiz commented Jun 29, 2026 •

edited

Loading

Uh oh!

JingsongLi commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

steFaiz commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

JingsongLi Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

steFaiz Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

steFaiz Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

steFaiz commented Jun 26, 2026

Uh oh!

JingsongLi commented Jun 28, 2026

Uh oh!

steFaiz commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingsongLi commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steFaiz commented Jun 24, 2026 •

edited

Loading

steFaiz commented Jun 29, 2026 •

edited

Loading