Skip to content

[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack#65095

Draft
starocean999 wants to merge 1 commit into
apache:masterfrom
starocean999:master_0414
Draft

[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack#65095
starocean999 wants to merge 1 commit into
apache:masterfrom
starocean999:master_0414

Conversation

@starocean999

Copy link
Copy Markdown
Contributor

Related PR: #48502

Problem Summary:

The SetPreAggStatus rule uses a bottom-up traversal to determine whether
pre-aggregation can be enabled for OlapScan nodes. The original implementation
used a Stack<PreAggInfoContext> as the traversal context, which had a
fundamental flaw: multiple scan nodes under sibling subtrees (e.g., both sides
of a Join) incorrectly shared the same stack-level context.

This caused two problems:

  1. A scan could inherit filter/join/project information from unrelated branches
    of the plan tree.
  2. To work around this, visitLogicalAggregate used retainAll() to filter out
    scan nodes that were not actually children of the current aggregate — an
    inelegant patch that partially masked the underlying issue.

Root cause: The stack-based context could not guarantee a one-to-one
correspondence between a scan node and its nearest aggregate ancestor.

Fix: Replace the Stack<PreAggInfoContext> with a Map<RelationId, PreAggInfoContext>, where each scan node gets its own dedicated context entry.
Each visitor method uses a snapshot-diff pattern:

  1. Snapshot the current map keys before visiting children
  2. Visit children (they add scan contexts to the map)
  3. Diff to identify which scans were added by this subtree
  4. Apply node-specific info only to those scans

At LogicalAggregate, child scan contexts are consumed (removed from the map)
and their results are stored — no more retainAll() filtering needed.

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@starocean999

Copy link
Copy Markdown
Contributor Author

/review

@starocean999 starocean999 changed the title [refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack… [refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack Jul 1, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed. I did not find a substantiated blocking issue in this PR.

Critical checkpoint conclusions:

  • Goal: the PR replaces the shared stack in SetPreAggStatus with per-RelationId contexts. The changed traversal matches that goal and avoids sharing scan context between sibling branches.
  • Scope: the change is focused to the pre-aggregation status rewriter plus two regression explain expectations.
  • Optimizer correctness: the snapshot/diff pattern was checked against DefaultPlanRewriter.visitChildren, LogicalFilter, LogicalJoin, LogicalProject, LogicalAggregate, and LogicalRepeat. The handled nodes preserve context until the nearest aggregate consumes it; generic nodes remove only contexts introduced below that node, preserving pre-existing sibling/ancestor contexts.
  • Parallel paths: rule placement remains in the existing late rewrite stage, and the second registration path still uses the same custom rule. I did not find a missed sibling join/aggregate path in the changed code.
  • Configuration, concurrency, lifecycle, persistence, compatibility, FE/BE protocol, and storage format: not applicable; this PR does not add or alter those surfaces.
  • Tests: the regression suite updates the two right-join sibling aggregate cases from preagg_t3 OFF to ON, which targets the reported shared-context behavior. No .out artifact is expected for these explain { contains ... } assertions.
  • Validation: exact PR-range git diff --check passed. I did not run FE build or regression tests because this runner lacks .worktree_initialized, thirdparty/installed, and executable thirdparty/installed/bin/protoc.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS. tests-session-config proposed TSC-1 for trailing whitespace, but I dismissed it because it came from git show --check HEAD in a grafted checkout; the exact base-to-head git diff --check passes and the PR patch changes only two assertion strings. Final convergence round 1 ended with both subagents reporting NO_NEW_VALUABLE_FINDINGS against the same ledger and empty proposed inline comment set.

@starocean999

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (39/39) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 30121 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f60111f0536c7fcbce709339ba8d900fcb5152ef, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17699	4143	4111	4111
q2	2028	333	205	205
q3	10302	1452	871	871
q4	4681	482	340	340
q5	7496	856	580	580
q6	182	173	144	144
q7	809	858	626	626
q8	9357	1644	1592	1592
q9	5750	4485	4458	4458
q10	6783	1822	1565	1565
q11	526	343	314	314
q12	734	563	441	441
q13	18110	3363	2810	2810
q14	267	259	244	244
q15	q16	800	784	715	715
q17	980	1042	1022	1022
q18	7230	5896	5675	5675
q19	1328	1296	1088	1088
q20	780	677	595	595
q21	5972	2747	2427	2427
q22	445	364	298	298
Total cold run time: 102259 ms
Total hot run time: 30121 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4408	4290	4344	4290
q2	302	324	219	219
q3	4681	4997	4490	4490
q4	2097	2191	1387	1387
q5	4482	4325	4336	4325
q6	231	177	126	126
q7	2113	1999	1627	1627
q8	2628	2278	2213	2213
q9	8288	7869	7865	7865
q10	4839	4802	4359	4359
q11	597	418	389	389
q12	779	766	559	559
q13	3265	3645	2872	2872
q14	307	321	286	286
q15	q16	719	741	672	672
q17	1374	1342	1476	1342
q18	7773	7277	7331	7277
q19	1144	1120	1128	1120
q20	2219	2220	1932	1932
q21	5378	4695	4508	4508
q22	538	467	417	417
Total cold run time: 58162 ms
Total hot run time: 52275 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 174162 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f60111f0536c7fcbce709339ba8d900fcb5152ef, data reload: false

query5	4332	637	519	519
query6	495	212	202	202
query7	4850	619	341	341
query8	340	195	172	172
query9	8794	4070	4045	4045
query10	471	359	296	296
query11	5905	2350	2146	2146
query12	159	106	105	105
query13	1282	635	461	461
query14	6323	5311	4953	4953
query14_1	4309	4275	4288	4275
query15	225	212	182	182
query16	1034	447	443	443
query17	950	747	623	623
query18	2454	478	364	364
query19	217	200	156	156
query20	112	108	111	108
query21	241	164	137	137
query22	13595	13604	13307	13307
query23	17489	16600	16220	16220
query23_1	16277	16359	16366	16359
query24	7753	1801	1314	1314
query24_1	1324	1323	1337	1323
query25	602	477	408	408
query26	1351	371	210	210
query27	2615	606	376	376
query28	4434	2037	2046	2037
query29	1097	638	524	524
query30	338	266	228	228
query31	1129	1096	996	996
query32	106	65	64	64
query33	545	330	266	266
query34	1197	1150	649	649
query35	779	794	671	671
query36	1431	1360	1224	1224
query37	165	117	138	117
query38	1877	1726	1674	1674
query39	931	946	902	902
query39_1	872	886	891	886
query40	250	170	141	141
query41	67	62	63	62
query42	95	94	93	93
query43	326	327	281	281
query44	1427	773	779	773
query45	221	192	180	180
query46	1116	1220	748	748
query47	2346	2300	2272	2272
query48	397	439	290	290
query49	589	422	325	325
query50	1076	411	351	351
query51	4505	4396	4380	4380
query52	88	87	74	74
query53	271	279	209	209
query54	299	229	210	210
query55	75	71	70	70
query56	294	310	286	286
query57	1434	1392	1324	1324
query58	279	252	274	252
query59	1568	1651	1437	1437
query60	322	271	260	260
query61	161	153	146	146
query62	699	648	613	613
query63	247	206	203	203
query64	2530	800	624	624
query65	4862	4800	4755	4755
query66	1817	552	397	397
query67	29634	29590	29408	29408
query68	3451	1558	944	944
query69	409	315	266	266
query70	1083	964	973	964
query71	345	351	322	322
query72	2967	2716	2312	2312
query73	846	790	458	458
query74	5109	4989	4750	4750
query75	2604	2607	2266	2266
query76	2329	1188	786	786
query77	354	388	279	279
query78	12447	12692	11807	11807
query79	1412	1185	766	766
query80	1113	544	468	468
query81	516	332	285	285
query82	572	160	126	126
query83	373	328	294	294
query84	322	163	132	132
query85	1058	605	521	521
query86	415	294	296	294
query87	1827	1820	1745	1745
query88	3778	2824	2814	2814
query89	464	402	363	363
query90	1759	197	197	197
query91	208	192	163	163
query92	68	62	59	59
query93	1589	1665	943	943
query94	630	355	340	340
query95	787	609	506	506
query96	1077	814	349	349
query97	2682	2722	2581	2581
query98	219	211	202	202
query99	1179	1154	1027	1027
Total cold run time: 259761 ms
Total hot run time: 174162 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f60111f0536c7fcbce709339ba8d900fcb5152ef, data reload: false

query1	0.00	0.00	0.00
query2	0.09	0.05	0.05
query3	0.26	0.14	0.14
query4	1.61	0.14	0.13
query5	0.27	0.23	0.22
query6	1.29	1.10	1.11
query7	0.03	0.00	0.01
query8	0.06	0.04	0.04
query9	0.38	0.32	0.32
query10	0.55	0.58	0.56
query11	0.21	0.14	0.14
query12	0.20	0.15	0.15
query13	0.47	0.48	0.47
query14	1.01	1.02	1.00
query15	0.61	0.59	0.59
query16	0.32	0.32	0.33
query17	1.16	1.09	1.18
query18	0.24	0.22	0.22
query19	2.11	2.00	1.97
query20	0.02	0.01	0.01
query21	15.43	0.22	0.15
query22	4.76	0.06	0.05
query23	16.10	0.30	0.13
query24	2.97	0.43	0.34
query25	0.10	0.04	0.06
query26	0.77	0.20	0.16
query27	0.05	0.03	0.03
query28	3.50	0.94	0.55
query29	12.49	4.31	3.54
query30	0.27	0.16	0.15
query31	2.77	0.58	0.31
query32	3.23	0.59	0.49
query33	3.15	3.26	3.22
query34	15.66	4.19	3.52
query35	3.51	3.50	3.56
query36	0.55	0.44	0.42
query37	0.09	0.07	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.02
query40	0.18	0.17	0.15
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.72 s
Total hot run time: 25.47 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 57.35% (39/68) 🎉
Increment coverage report
Complete coverage report

@starocean999

Copy link
Copy Markdown
Contributor Author

/review

@starocean999

Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants