Why isn’t String‘s - Câu hỏi phỏng vấn Java

The accuracy of the String class's length() method in Java depends on what is meant by "accurate." The length() method accurately reports the number of 16-bit char values that make up the String object, which corresponds to the number of char units in the underlying char array[3][4][12]. However, this count may not align with a human's intuitive understanding of the length of a string in terms of the number of visible characters or graphemes, especially when dealing with Unicode characters that require more than one char value.

Unicode characters can be represented by a single code unit (char), or they can be composed of a pair of code units known as a surrogate pair. This is because the char data type in Java uses UTF-16 encoding, where each char is 16 bits. The Unicode standard has more characters than can be represented in 16 bits, so characters outside the Basic Multilingual Plane (BMP) are represented using two 16-bit code units[3][7][12].

When a String contains these supplementary characters, each character is counted as two by the length() method because it requires two char values. Additionally, combining characters, such as diacritical marks that combine with preceding characters, also contribute to the length() method's count, even though they may not increase the perceived character count[7][8][11].

For example, the string "🤦🏼‍♂️" (facepalm emoji with a skin tone modifier and gender sign) is a single grapheme to a human reader, but it is composed of multiple Unicode code points and even more UTF-16 code units. Therefore, the length() method would return a value that reflects the number of code units, not the number of graphemes or code points, which can be confusing or seem inaccurate to those not familiar with Unicode's intricacies[8][18].
...

Why isn’t String‘s...

Câu trả lời Why isn’t String‘s...

Bình luận