Skip to content

[fix](UDF) Fix create function rollback after Nereids failure#65100

Open
morrySnow wants to merge 4 commits into
apache:masterfrom
morrySnow:codex/fix-function-rollback-nereids
Open

[fix](UDF) Fix create function rollback after Nereids failure#65100
morrySnow wants to merge 4 commits into
apache:masterfrom
morrySnow:codex/fix-function-rollback-nereids

Conversation

@morrySnow

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Problem Summary: CREATE FUNCTION added the new function to the catalog and wrote the edit log before registering the function in Nereids. If Nereids registration failed, rollback removed the whole overload list for that function name, so a failed overload creation could delete existing overloads. This change registers the function in Nereids before journaling it and rolls back only the just-created overload when registration fails.

Release note

None

Check List (For Author)

  • Test:
    • Unit Test: ./run-fe-ut.sh --run org.apache.doris.catalog.CreateFunctionTest
    • Regression test: ./run-regression-test.sh --run -d fault_injection_p0 -s test_create_function_rollback (local config selected the suite but skipped docker body because excludeDockerTest=true)
  • Behavior changed: Yes. Failed CREATE FUNCTION Nereids registration no longer journals the failed function or removes existing overloads.
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for one regression-test issue. The FE rollback change itself looks focused: database and global CREATE FUNCTION now roll back only the failed overload and avoid journaling when Nereids registration fails, while replay/image paths still rebuild Nereids from journaled Function objects. However, the new docker regression currently asserts an error substring that the DROP path does not emit, so it should fail when executed.

Critical checkpoints:

  • Goal/test proof: the production rollback path addresses the intended overload-preservation bug, and FE unit coverage was added; the new regression needs the expected error fixed.
  • Scope: the code change is small and focused.
  • Concurrency/lifecycle: database/global function maps remain protected by existing synchronized methods; no new threads or special lifecycle risks were found.
  • Config/compatibility/persistence: the debug point is gated by enable_debug_points; no new persisted schema or mixed-version protocol change; add-function replay remains equivalent through existing journaled Function objects.
  • Parallel paths: database and global function paths were both updated; the UDTF outer-function path was checked and is not directly changed by this PR.
  • Tests: I could not run FE tests or regression tests in this runner because thirdparty/installed and thirdparty/installed/bin/protoc are absent.
  • User focus: no additional review focus was provided.

Subagent conclusions:

  • optimizer-rewrite: no candidates.
  • tests-session-config: TSC-1 was accepted as MAIN-1 and is submitted inline.
  • Convergence C1: both live subagents replied NO_NEW_VALUABLE_FINDINGS for the final ledger/comment set.

Issue Number: None

Related PR: None

Problem Summary: CREATE FUNCTION added the new function to the catalog and wrote the edit log before registering the function in Nereids. If Nereids registration failed, rollback removed the whole overload list for that function name, so a failed overload creation could delete existing overloads. This change registers the function in Nereids before journaling it and rolls back only the just-created overload when registration fails.

None

- Test:
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.catalog.CreateFunctionTest
    - Regression test: ./run-regression-test.sh --run -d fault_injection_p0 -s test_create_function_rollback (local config selected the suite but skipped docker body because excludeDockerTest=true)
- Behavior changed: Yes. Failed CREATE FUNCTION Nereids registration no longer journals the failed function or removes existing overloads.
- Does this need documentation: No
@morrySnow morrySnow force-pushed the codex/fix-function-rollback-nereids branch from c3c6454 to e84cbcf Compare July 1, 2026 10:05
@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking rollback gap that remains in a parallel create-function path.

Critical checkpoint conclusions:

  • Goal and tests: the PR fixes the single-function rollback shape for local/global UDFs, but it does not cover or fix local table functions, which create a base UDTF and an _outer companion as one SQL statement.
  • Scope and focus: the production change is small, but the rollback behavior is incomplete for a caller that already uses the modified Database.addFunction() API twice.
  • Concurrency and lifecycle: the changed catalog methods are synchronized and I did not find a new thread-safety issue; the risk is command-level atomicity and durable state after a failed statement.
  • Config and compatibility: no user-facing config or mixed-version protocol change was introduced. The debug point is gated by enable_debug_points.
  • Parallel paths: scalar/local/global paths are addressed, but the local CREATE TABLES FUNCTION base-plus-outer path can still leave partial state.
  • Testing: added tests cover local scalar, global scalar, and alias rollback, but not the UDTF companion failure. I did not run FE/regression tests in this checkout because thirdparty/installed and thirdparty/installed/bin/protoc are missing.
  • Persistence: single-function journaling now happens after Nereids translation, but a local UDTF base function can still be edit-logged before the companion succeeds.
  • Existing review context: the regression-test expected-error issue at test_create_function_rollback.groovy:49 is already covered by discussion r3504714499, so I did not duplicate it.

User focus: no additional user-provided focus points were supplied.

Subagent conclusions: optimizer-rewrite-001 became the inline comment below. tests-session-config-dup-001 was merged as a duplicate of that candidate. tests-session-config-dup-002 was dismissed as a duplicate of the existing GitHub thread on the regression assertion. Final convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same ledger and proposed comment set.

Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Database.java Outdated
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29635 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e84cbcfa34d2c660069643b88ee55764d5381d4a, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17814	4077	4017	4017
q2	2025	322	217	217
q3	10258	1454	850	850
q4	4689	470	341	341
q5	7495	867	572	572
q6	184	166	136	136
q7	805	844	633	633
q8	9378	1612	1597	1597
q9	5641	4468	4404	4404
q10	6795	1841	1565	1565
q11	505	353	316	316
q12	712	565	427	427
q13	18113	3393	2780	2780
q14	259	261	240	240
q15	q16	799	793	715	715
q17	969	1012	977	977
q18	7154	5852	5579	5579
q19	1347	1328	1024	1024
q20	751	660	547	547
q21	5943	2693	2402	2402
q22	442	358	296	296
Total cold run time: 102078 ms
Total hot run time: 29635 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4371	4284	4288	4284
q2	297	314	210	210
q3	4562	5011	4430	4430
q4	2077	2147	1369	1369
q5	4426	4295	4351	4295
q6	229	178	128	128
q7	1712	2202	1677	1677
q8	2562	2200	2167	2167
q9	8055	8235	7895	7895
q10	4770	4791	4294	4294
q11	608	421	365	365
q12	777	793	562	562
q13	3311	3619	2884	2884
q14	285	298	292	292
q15	q16	740	762	640	640
q17	1356	1332	1313	1313
q18	8009	7317	7350	7317
q19	1171	1138	1086	1086
q20	2206	2206	1950	1950
q21	5295	4575	4408	4408
q22	519	461	406	406
Total cold run time: 57338 ms
Total hot run time: 51972 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 77.78% (7/9) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 174238 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e84cbcfa34d2c660069643b88ee55764d5381d4a, data reload: false

query5	4300	650	497	497
query6	482	227	203	203
query7	4862	586	337	337
query8	348	195	164	164
query9	8747	4037	4045	4037
query10	477	370	318	318
query11	5904	2372	2159	2159
query12	166	111	104	104
query13	1284	604	437	437
query14	6241	5310	4978	4978
query14_1	4325	4263	4271	4263
query15	212	202	183	183
query16	1036	476	488	476
query17	946	744	597	597
query18	2458	472	359	359
query19	217	198	153	153
query20	114	108	108	108
query21	235	161	136	136
query22	13646	13617	13290	13290
query23	17478	16555	16228	16228
query23_1	16347	16346	16264	16264
query24	7537	1763	1303	1303
query24_1	1308	1300	1295	1295
query25	573	467	396	396
query26	1360	353	208	208
query27	2614	576	383	383
query28	4506	2080	2040	2040
query29	1104	650	490	490
query30	347	263	231	231
query31	1113	1105	992	992
query32	141	65	62	62
query33	542	342	269	269
query34	1166	1162	672	672
query35	788	786	677	677
query36	1424	1396	1232	1232
query37	160	109	92	92
query38	1908	1704	1635	1635
query39	936	924	904	904
query39_1	879	882	878	878
query40	243	156	145	145
query41	65	65	61	61
query42	92	90	92	90
query43	320	319	282	282
query44	1440	790	770	770
query45	202	190	177	177
query46	1073	1252	723	723
query47	2364	2309	2266	2266
query48	422	404	289	289
query49	593	432	307	307
query50	1123	430	325	325
query51	4434	4411	4408	4408
query52	85	83	73	73
query53	257	282	199	199
query54	274	228	209	209
query55	75	70	66	66
query56	293	272	276	272
query57	1438	1403	1333	1333
query58	289	244	243	243
query59	1581	1628	1476	1476
query60	350	272	248	248
query61	158	152	152	152
query62	698	648	592	592
query63	245	207	210	207
query64	2539	765	607	607
query65	4865	4796	4770	4770
query66	1809	516	427	427
query67	29732	29507	29368	29368
query68	3208	1583	946	946
query69	408	303	270	270
query70	1068	982	957	957
query71	330	303	325	303
query72	2884	2711	2372	2372
query73	835	803	444	444
query74	5091	4947	4763	4763
query75	2627	2618	2237	2237
query76	2353	1181	775	775
query77	363	380	290	290
query78	12601	12573	11764	11764
query79	1453	1242	733	733
query80	1231	547	455	455
query81	548	320	275	275
query82	1254	159	123	123
query83	361	323	286	286
query84	325	155	127	127
query85	960	593	508	508
query86	428	292	272	272
query87	1838	1815	1764	1764
query88	3716	2823	2795	2795
query89	481	406	357	357
query90	1884	196	194	194
query91	197	184	162	162
query92	65	61	53	53
query93	1694	1563	1007	1007
query94	773	363	334	334
query95	758	595	494	494
query96	1099	776	352	352
query97	2712	2681	2572	2572
query98	226	204	199	199
query99	1175	1143	1033	1033
Total cold run time: 260629 ms
Total hot run time: 174238 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e84cbcfa34d2c660069643b88ee55764d5381d4a, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.05	0.05
query3	0.26	0.14	0.14
query4	1.60	0.14	0.14
query5	0.26	0.22	0.22
query6	1.28	1.08	1.09
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.38	0.31	0.31
query10	0.57	0.60	0.54
query11	0.20	0.14	0.15
query12	0.18	0.14	0.14
query13	0.46	0.48	0.46
query14	1.02	1.01	1.01
query15	0.62	0.59	0.60
query16	0.30	0.33	0.32
query17	1.08	1.15	1.05
query18	0.22	0.21	0.21
query19	2.02	1.91	1.99
query20	0.02	0.01	0.01
query21	15.45	0.20	0.14
query22	4.88	0.05	0.06
query23	16.16	0.30	0.12
query24	2.96	0.45	0.32
query25	0.14	0.05	0.05
query26	0.74	0.23	0.15
query27	0.04	0.04	0.04
query28	3.53	0.93	0.52
query29	12.51	4.38	3.45
query30	0.27	0.15	0.16
query31	2.77	0.62	0.31
query32	3.22	0.59	0.50
query33	3.14	3.24	3.19
query34	15.63	4.23	3.51
query35	3.54	3.53	3.50
query36	0.56	0.45	0.43
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.04	0.03	0.03
query40	0.17	0.15	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 96.74 s
Total hot run time: 25.17 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 4.26% (2/47) 🎉
Increment coverage report
Complete coverage report

@morrySnow morrySnow changed the title [fix](alias function) Fix create function rollback after Nereids failure [fix](UDF) Fix create function rollback after Nereids failure Jul 2, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65100

Problem Summary: Local CREATE TABLES FUNCTION registered the base UDTF and its _outer companion through two independent Database.addFunction calls. If the companion registration failed because of an existing _outer overload or Nereids translation error, the base function had already been added and logged. This change registers the base and companion as one Database-level batch, rolls back catalog and Nereids state for functions added by the batch on failure, and writes add-function edit logs only after the whole batch succeeds.

### Release note

Fix CREATE TABLES FUNCTION rollback when companion function registration fails.

### Check List (For Author)

- Test: Unit Test attempted: ./run-fe-ut.sh --run org.apache.doris.catalog.CreateFunctionTest#testCreateTableFunctionRollbackWhenOuterFunctionFails (failed before running the test because fe-core compile could not find generated DorisParser contexts CreateStreamContext, ShowCreateStreamContext, DefaultValueContext, DropStreamContext, ShowStreamsContext, and edu.umd.cs.findbugs.annotations.SuppressFBWarnings). Manual test: git diff --cached --check.
- Behavior changed: Yes. Failed local CREATE TABLES FUNCTION no longer leaves a visible or durable base UDTF when the _outer companion fails.
- Does this need documentation: No
@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review conclusion: request changes.

I found three remaining issues in the current head. Two are production correctness issues in local CREATE TABLES FUNCTION atomicity/durability, and one is a Java unit-test assertion that will fail before checking the intended rollback behavior. I did not re-raise the existing Groovy regression assertion mismatch or the earlier returned-error atomicity thread except where the current head has distinct paths.

Critical checkpoints:

  • Goal/test coverage: the PR fixes ordinary rollback paths and adds FE tests, but the table-function pair is still not atomic under IF NOT EXISTS and not crash-durable as a pair.
  • Scope/focus: the code change is focused on function registration rollback.
  • Concurrency/locking: modified catalog mutations remain synchronized on Database/GlobalFunctionMgr; no new lock-order issue found.
  • Lifecycle/persistence: table-function journaling is still problematic because the base and _outer functions are persisted independently.
  • Configuration/session: no new config item; the debug-point test path is scoped to the docker regression test. No additional session propagation issue found.
  • Compatibility: existing replay remains per-function, which is the source of the crash-replay concern below.
  • Parallel paths: global single-function rollback path was checked separately; no table-function global path exists in this PR.
  • Tests: added FE tests cover non-IF-NOT-EXISTS rollback and companion conflict, but one new assertion uses the wrong UserException accessor and the tests do not cover the IF NOT EXISTS half-registration case or crash/replay atomicity.
  • Existing review context: regression expected-message issue is already covered by discussion_r3504714499; original table-function returned-error atomicity issue is already covered by discussion_r3505102407 and fixed for that path.

Subagent conclusions: optimizer-rewrite proposed OPT-1, accepted as MAIN-002. tests-session-config first reported no candidates, then during convergence proposed TEST-1, accepted as MAIN-003. MAIN-001 was found and verified by the main review. Convergence round 2 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same three-comment ledger set.

Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Database.java
Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Database.java Outdated
Comment thread fe/fe-core/src/test/java/org/apache/doris/catalog/CreateFunctionTest.java Outdated
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65100

Problem Summary: CREATE TABLES FUNCTION still had two gaps after batching catalog registration. With IF NOT EXISTS, an existing unrelated <name>_outer overload could make the companion add return false while the base UDTF was still registered and logged. On the successful path, the base and _outer companion were still written as separate OP_ADD_FUNCTION edit-log records, so an FE crash between the two durable writes could replay only the base function. This change treats local table-function creation as a pair: IF NOT EXISTS skips only when both matching UDTF halves already exist, partial companion collisions fail before registering either half, and successful table-function creation writes one OP_ADD_FUNCTIONS journal payload that replays both functions together.

### Release note

Fix CREATE TABLES FUNCTION IF NOT EXISTS and edit-log atomicity for local table functions.

### Check List (For Author)

- Test: Unit Test attempted: ./run-fe-ut.sh --run org.apache.doris.catalog.CreateFunctionTest#testCreateTableFunctionLogsPairAtomically (failed before running the test in fe-core compile with Too many open files in system while reading jetty-alpn-java-client-12.0.34.jar). Manual test: git diff --cached --check. Regression test not run per instruction for this directory.
- Behavior changed: Yes. CREATE TABLES FUNCTION IF NOT EXISTS no longer creates a base UDTF when the generated _outer signature collides, and successful table-function creation now journals the pair atomically.
- Does this need documentation: No
@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review result: requesting changes.

Critical checkpoint conclusions:

  • Goal: the PR addresses the previously raised rollback and table-function pair journaling paths, but the new rollback cleanup still has a correctness gap for variadic overloads.
  • Scope: the change is focused on FE function registration, Nereids registration, edit-log persistence, and tests.
  • Concurrency/lifecycle/config: no new config or broad concurrency path was added; existing synchronized Database and GlobalFunctionMgr mutation paths are preserved.
  • Persistence/compatibility: OP_ADD_FUNCTIONS write/read/replay handling was added, but direct serialization/read/replay coverage is missing for the new durable payload.
  • Tests: new FE unit tests and a docker regression were added. I did not run FE tests in this checkout because .worktree_initialized, thirdparty/installed, and thirdparty/installed/bin/protoc are missing.
  • User focus: no additional user-provided review focus was supplied.

Subagent conclusions:

  • optimizer-rewrite proposed OPT-1; I verified and accepted it as an inline comment.
  • tests-session-config proposed TSC-1; I verified and accepted it as an inline comment.
  • Existing table-function atomicity and IF NOT EXISTS pair issues were treated as already-known review context and were not duplicated.
  • Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same current ledger and proposed comment set.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resubmitting the accepted inline comments with explicit RIGHT-side line anchors after the position-based anchors landed one line low.

Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Database.java
db.addTableFunction(tableFunction, false);

Mockito.verify(spyEditLog, Mockito.never()).logAddFunction(Mockito.any(Function.class));
Mockito.verify(spyEditLog).logAddFunctions(Mockito.argThat(functions ->

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only verifies that the live code called the mocked logAddFunctions() method. The PR adds a new durable format (CreateFunctionInfo under OP_ADD_FUNCTIONS) plus JournalEntity.readFields() and EditLog.loadJournal() replay handling, but none of the new tests write/read that journal entity or replay it into a fresh catalog. Given this is the fix for crash/restart atomicity, please add a persistence test that serializes an OP_ADD_FUNCTIONS entry for the base and _outer functions, reads it back through JournalEntity, replays it, and asserts both functions are restored.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 8b7b341. Added CreateFunctionTest.testCreateTableFunctionJournalReplayRestoresPair. It builds the base and _outer functions, writes a JournalEntity with OP_ADD_FUNCTIONS/CreateFunctionInfo, reads it back through JournalEntity.readFields(), verifies the replay payload contains both functions, calls EditLog.loadJournal(), and asserts both functions are restored in the catalog.

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 91.43% (32/35) 🎉
Increment coverage report
Complete coverage report

morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jul 2, 2026
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65100

Problem Summary: CREATE TABLES FUNCTION rollback removed Nereids UDF builders by function name and argument types only. Catalog identity also includes the variadic flag, so a failed fixed table-function batch could delete the unrelated variadic UDF builder while leaving its catalog entry visible. This change passes the variadic flag through rollback cleanup and makes FunctionRegistry.dropUdf match the full UDF identity. It also adds journal serialization/replay coverage for OP_ADD_FUNCTIONS so the durable table-function pair format is exercised end to end.

### Release note

Fix CREATE TABLES FUNCTION rollback to preserve unrelated variadic UDF overloads.

### Check List (For Author)

- Test: Manual test: git diff --check and git diff --cached --check. Unit Test attempted: ./run-fe-ut.sh --run org.apache.doris.catalog.CreateFunctionTest (failed before running the test in fe-core compile due to existing generated parser/annotation errors: missing DorisParser stream/default contexts and SuppressFBWarnings). Regression test not run per instruction for this directory.
- Behavior changed: Yes. Failed CREATE TABLES FUNCTION rollback now removes only the exact fixed/variadic Nereids UDF identity that was added by the failed batch.
- Does this need documentation: No
Issue Number: None

Related PR: apache#65100

Problem Summary: CREATE TABLES FUNCTION rollback removed Nereids UDF builders by function name and argument types only. Catalog identity also includes the variadic flag, so a failed fixed table-function batch could delete the unrelated variadic UDF builder while leaving its catalog entry visible. This change passes the variadic flag through rollback cleanup and makes FunctionRegistry.dropUdf match the full UDF identity. It also adds journal serialization/replay coverage for OP_ADD_FUNCTIONS so the durable table-function pair format is exercised end to end.

Fix CREATE TABLES FUNCTION rollback to preserve unrelated variadic UDF overloads.

- Test: Manual test: git diff --check and git diff --cached --check. Unit Test attempted: ./run-fe-ut.sh --run org.apache.doris.catalog.CreateFunctionTest (failed before running the test in fe-core compile due to existing generated parser/annotation errors: missing DorisParser stream/default contexts and SuppressFBWarnings). Regression test not run per instruction for this directory.
- Behavior changed: Yes. Failed CREATE TABLES FUNCTION rollback now removes only the exact fixed/variadic Nereids UDF identity that was added by the failed batch.
- Does this need documentation: No
@morrySnow morrySnow force-pushed the codex/fix-function-rollback-nereids branch from 8b7b341 to 89d87e2 Compare July 2, 2026 07:38
@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review for PR #65100 at head 89d87e2.

I found no new inline comments to submit in this pass. One remaining regression-test concern is still present at regression-test/suites/fault_injection_p0/test_create_function_rollback.groovy:49, but it is already covered by the existing inline thread #65100 (comment), so I did not duplicate it.

Critical checkpoint conclusions:

  • Goal and tests: the PR addresses create-function rollback ordering, table-function pair batching, Nereids drop identity, and OP_ADD_FUNCTIONS replay coverage. The added unit tests cover rollback, table-function batch logging, vararg identity, and journal read/replay. The regression test still has the already-threaded error-string concern.
  • Scope: the functional changes are focused on FE function registration, Nereids UDF registry state, persistence replay, and related tests.
  • Concurrency and locks: function add/drop paths remain synchronized on Database or GlobalFunctionMgr; no new async or cross-lock path was introduced.
  • Lifecycle/static state: no new static lifecycle issue found. Nereids registry add/drop state is updated through existing registry synchronization.
  • Configuration/session: no production config change. The regression uses the debug-point config for fault injection. Alias UDF session variables continue to be captured and restored during translation.
  • Compatibility/persistence: the new OP_ADD_FUNCTIONS payload has write, JournalEntity read, EditLog replay, and unit coverage. Existing OP_ADD_FUNCTION and global-function replay paths are unchanged.
  • Parallel paths: local scalar/aggregate/table functions, global functions, Java/Python/Alias UDF builders, drop, replay, and Nereids registry cleanup were checked. No missed parallel path requiring a new comment was found.
  • Test execution: static review only in this runner. thirdparty/installed/bin/protoc is missing, so FE tests were not run here. Added-line whitespace in the GitHub PR patch was checked and no trailing whitespace was found.

Subagent conclusions:

  • optimizer-rewrite: no new valuable findings. It checked UDF add/drop/replay paths, Nereids registry lookup/drop behavior, Alias/Java/Python UDF builders, command routing, and OP_ADD_FUNCTIONS replay wiring.
  • tests-session-config: no new valuable findings. It checked FE tests, regression test behavior, session/config propagation, journal persistence/replay, and basic style.
  • Main merge: MAIN-DUP-1 was classified as duplicate_of_existing_thread; MAIN-DISM-2 was dismissed as preexisting/overlapping existing IF NOT EXISTS review context.
  • Convergence: round 2 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

User focus: no additional user-provided review focus was supplied.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29488 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 89d87e26ba22f1a1dd4267482c9e72cba9bb9b94, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17883	4125	4082	4082
q2	2017	313	192	192
q3	10310	1454	806	806
q4	4685	470	343	343
q5	7529	862	570	570
q6	183	176	138	138
q7	815	830	631	631
q8	9384	1567	1518	1518
q9	5615	4422	4382	4382
q10	6758	1807	1527	1527
q11	506	349	317	317
q12	740	561	444	444
q13	18100	3414	2794	2794
q14	277	258	234	234
q15	q16	783	786	704	704
q17	1038	935	926	926
q18	6965	5872	5488	5488
q19	1168	1311	1104	1104
q20	771	636	558	558
q21	5630	2701	2432	2432
q22	464	358	298	298
Total cold run time: 101621 ms
Total hot run time: 29488 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4481	4459	4320	4320
q2	285	313	211	211
q3	4617	4976	4381	4381
q4	2081	2164	1354	1354
q5	4424	4330	4355	4330
q6	232	189	129	129
q7	1747	2087	1825	1825
q8	2553	2154	2189	2154
q9	8144	8222	7747	7747
q10	4726	4700	4310	4310
q11	587	410	386	386
q12	760	768	532	532
q13	3278	3645	2933	2933
q14	297	305	290	290
q15	q16	710	728	632	632
q17	1346	1352	1328	1328
q18	8050	7300	7383	7300
q19	1204	1117	1112	1112
q20	2229	2216	1913	1913
q21	5271	4529	4442	4442
q22	516	464	402	402
Total cold run time: 57538 ms
Total hot run time: 52031 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 89.25% (83/93) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173027 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 89d87e26ba22f1a1dd4267482c9e72cba9bb9b94, data reload: false

query5	4344	629	482	482
query6	465	225	219	219
query7	4851	587	348	348
query8	336	197	169	169
query9	8765	4078	4019	4019
query10	509	370	301	301
query11	5928	2357	2146	2146
query12	167	108	102	102
query13	1270	605	456	456
query14	6282	5324	4969	4969
query14_1	4308	4318	4292	4292
query15	220	199	181	181
query16	1001	460	445	445
query17	1011	743	609	609
query18	2463	482	371	371
query19	221	197	158	158
query20	116	111	109	109
query21	242	164	137	137
query22	13705	13679	13386	13386
query23	17319	16536	16153	16153
query23_1	16250	16249	16400	16249
query24	7556	1801	1287	1287
query24_1	1332	1273	1281	1273
query25	581	471	392	392
query26	1335	351	217	217
query27	2595	589	387	387
query28	4488	2041	2017	2017
query29	1079	649	506	506
query30	338	263	231	231
query31	1124	1108	1038	1038
query32	100	62	65	62
query33	550	317	256	256
query34	1185	1159	638	638
query35	772	787	664	664
query36	1391	1416	1237	1237
query37	153	110	91	91
query38	1870	1702	1635	1635
query39	931	916	911	911
query39_1	873	867	886	867
query40	267	160	138	138
query41	66	63	63	63
query42	97	97	93	93
query43	329	331	277	277
query44	1471	778	802	778
query45	196	191	178	178
query46	1028	1185	747	747
query47	2400	2321	2241	2241
query48	393	423	309	309
query49	577	418	317	317
query50	1034	428	337	337
query51	4420	4370	4303	4303
query52	86	87	75	75
query53	257	284	201	201
query54	268	238	208	208
query55	75	71	71	71
query56	314	302	281	281
query57	1462	1414	1311	1311
query58	300	272	260	260
query59	1643	1667	1474	1474
query60	303	276	242	242
query61	160	150	153	150
query62	685	655	590	590
query63	244	206	205	205
query64	2553	770	628	628
query65	4866	4789	4796	4789
query66	1807	523	388	388
query67	29574	29752	28799	28799
query68	3207	1506	928	928
query69	409	312	270	270
query70	1091	979	958	958
query71	343	327	312	312
query72	2935	2689	2396	2396
query73	864	770	445	445
query74	5142	4966	4796	4796
query75	2629	2596	2229	2229
query76	2345	1182	792	792
query77	358	386	287	287
query78	12377	12504	11784	11784
query79	1425	1173	778	778
query80	1281	562	470	470
query81	524	333	273	273
query82	605	160	123	123
query83	406	324	299	299
query84	326	161	131	131
query85	981	613	528	528
query86	426	307	290	290
query87	1825	1825	1756	1756
query88	3750	2787	2772	2772
query89	453	409	352	352
query90	1929	199	197	197
query91	207	194	166	166
query92	64	60	58	58
query93	1654	1697	946	946
query94	738	362	304	304
query95	786	579	482	482
query96	1109	781	375	375
query97	2672	2683	2546	2546
query98	214	209	198	198
query99	1190	1149	1038	1038
Total cold run time: 259650 ms
Total hot run time: 173027 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 89d87e26ba22f1a1dd4267482c9e72cba9bb9b94, data reload: false

query1	0.01	0.00	0.00
query2	0.10	0.05	0.05
query3	0.26	0.14	0.13
query4	1.61	0.15	0.14
query5	0.24	0.22	0.22
query6	1.24	1.12	1.09
query7	0.05	0.01	0.00
query8	0.06	0.03	0.03
query9	0.37	0.31	0.32
query10	0.55	0.57	0.56
query11	0.20	0.14	0.14
query12	0.18	0.15	0.14
query13	0.48	0.48	0.47
query14	1.01	1.01	1.02
query15	0.62	0.59	0.58
query16	0.31	0.32	0.33
query17	1.09	1.13	1.05
query18	0.22	0.20	0.21
query19	2.08	1.97	1.99
query20	0.02	0.01	0.02
query21	15.43	0.21	0.14
query22	4.84	0.06	0.06
query23	16.19	0.30	0.13
query24	2.98	0.41	0.32
query25	0.12	0.05	0.05
query26	0.75	0.21	0.15
query27	0.03	0.04	0.04
query28	3.52	0.93	0.56
query29	12.53	4.25	3.46
query30	0.28	0.16	0.15
query31	2.77	0.62	0.32
query32	3.23	0.60	0.48
query33	3.18	3.22	3.25
query34	15.49	4.25	3.54
query35	3.50	3.56	3.55
query36	0.56	0.44	0.44
query37	0.09	0.07	0.07
query38	0.06	0.05	0.04
query39	0.04	0.03	0.03
query40	0.19	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 96.66 s
Total hot run time: 25.42 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 41.25% (66/160) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants