Skip to content

ext/intl: Pre-size arrays created from ICU enumerations#22483

Merged
LamentXU123 merged 2 commits into
php:masterfrom
kn1g78:intl-array-presize
Jun 27, 2026
Merged

ext/intl: Pre-size arrays created from ICU enumerations#22483
LamentXU123 merged 2 commits into
php:masterfrom
kn1g78:intl-array-presize

Conversation

@kn1g78

@kn1g78 kn1g78 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Use uenum_count() to pre-size arrays built from ICU enumerations.

This avoids unnecessary HashTable growth in transliterator_list_ids(),
resourcebundle_locales() and locale_get_keywords().

locale_get_keywords() also uses the keyword length reported by ICU when
inserting the associative entry, with a fallback to strlen() for unknown
lengths.

If counting fails, the previous behavior is preserved by falling back to an
unsized array initialization.

@kn1g78 kn1g78 marked this pull request as ready for review June 27, 2026 08:13
Comment thread ext/intl/locale/locale_methods.cpp Outdated

/* Traverse it, filling the return array. */
array_init( return_value );
array_init_size( return_value, count > 0 ? count : 0 );

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you need this ternary expression count is 0 otherwise.

@devnexen

Copy link
Copy Markdown
Member

lgtm minus the nit ; here since these are enumerations, uenum_count should not be slow but it s not a huge win neither.

@LamentXU123 LamentXU123 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. The min size of hashtable in Zend is 8. So array_init won't grow if the size is smaller than 8. That makes your optimization in locale_get_keyword not that much useful (it could even be slower)
b. please run a benchmark if you think this optimize codes. It is not so obvious because uenum_count can be slow either (as David mentioned). See: https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/common/unicode/uenum.h
c. I don't love the additional complexity. (But it is good if this do speed-up things :) )

@kn1g78 kn1g78 force-pushed the intl-array-presize branch from f861cc2 to 4f580c7 Compare June 27, 2026 09:11
@kn1g78 kn1g78 requested a review from LamentXU123 June 27, 2026 09:15

@LamentXU123 LamentXU123 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍The remaining one should be obvious optimization.
Seems like the Resourceboundle one is deleted? Feel free to add it back if you have your bench results.

@kn1g78 kn1g78 force-pushed the intl-array-presize branch from 4f580c7 to 644aacd Compare June 27, 2026 09:57
@kn1g78

kn1g78 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Use uenum_count() to pre-size arrays built from ICU enumerations in
transliterator_list_ids() and resourcebundle_locales().

falls back to an unsized array and keeps the existing iteration behavior.

locale_get_keywords() is left unchanged because those arrays are usually
small.

Local ICU-level benchmark numbers, using PHP 8.5.6RC3 + ICU 77 through FFI:

Counts: transliterator=758, resource(NULL)=881, keyword=3

utrans_openIDs + uenum_count             1.033 us/op
utrans_openIDs + full iterate         2017.807 us/op

ures_available(NULL) + uenum_count       2.916 us/op
ures_available(NULL) + full iterate    879.196 us/op

uloc_openKeywords + uenum_count          5.498 us/op
uloc_openKeywords + full iterate         8.059 us/op

@LamentXU123 LamentXU123 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@LamentXU123 LamentXU123 merged commit 8247706 into php:master Jun 27, 2026
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants