public final class Normalizer
extends Object
java.lang.Object | |
↳ | java.text.Normalizer |
该类提供方法normalize
,它将Unicode文本转换为等同的组合或分解形式,从而可以更轻松地排序和搜索文本。 normalize
方法支持Unicode Standard Annex #15 — Unicode Normalization Forms中描述的标准规范化表单。
带有口音或其他装饰的字符可以用几种不同的Unicode编码方式进行编码。 例如,采取字符A-急性。 在Unicode中,这可以编码为单个字符(“合成”形式):
U+00C1 LATIN CAPITAL LETTER A WITH ACUTEor as two separate characters (the "decomposed" form):
U+0041 LATIN CAPITAL LETTER A U+0301 COMBINING ACUTE ACCENTTo a user of your program, however, both of these sequences should be treated as the same "user-level" character "A with acute accent". When you are searching or comparing text, you must ensure that these two sequences are treated as equivalent. In addition, you must handle characters with more than one accent. Sometimes the order of a character's combining accents is significant, while in other cases accent sequences in different orders are really equivalent.
类似地,字符串“ffi”可以被编码为三个单独的字母:
U+0066 LATIN SMALL LETTER F U+0066 LATIN SMALL LETTER F U+0069 LATIN SMALL LETTER Ior as the single character
U+FB03 LATIN SMALL LIGATURE FFIThe ffi ligature is not a distinct semantic character, and strictly speaking it shouldn't be in Unicode at all, but it was included for compatibility with existing character sets that already provided it. The Unicode standard identifies such characters by giving them "compatibility" decompositions into the corresponding semantic characters. When sorting and searching, you will often want to use these mappings.
如上面第一个例子所示, normalize
方法通过将文本转换为规范组合和分解形式来帮助解决这些问题。 另外,您可以让它执行兼容性分解,以便可以将兼容性字符视为与其等效项相同。 最后, normalize
方法将口音重新排列为正确的规范顺序,以便您不必担心自己的重音重排。
W3C通常建议在NFC中交换文本。 还要注意,大多数遗留字符编码仅使用预先组合的表格,并且通常不会自己编码任何组合标记。 为了转换为这种字符编码,需要将Unicode文本标准化为NFC。 有关更多用法示例,请参阅Unicode标准附录。
Nested classes |
|
---|---|
枚举 |
Normalizer.Form 此枚举提供了Unicode Standard Annex #15 — Unicode Normalization Forms中描述的四种Unicode规范化形式的常量以及两种访问它们的方法。 |
Public methods |
|
---|---|
static boolean |
isNormalized(CharSequence src, Normalizer.Form form) 确定给定的char值序列是否归一化。 |
static String |
normalize(CharSequence src, Normalizer.Form form) 规范化一系列char值。 |
Inherited methods |
|
---|---|
From class java.lang.Object
|
boolean isNormalized (CharSequence src, Normalizer.Form form)
确定给定的char值序列是否归一化。
Parameters | |
---|---|
src |
CharSequence : The sequence of char values to be checked. |
form |
Normalizer.Form : The normalization form; one of NFC , NFD , NFKC , NFKD |
Returns | |
---|---|
boolean |
true if the sequence of char values is normalized; false otherwise. |
Throws | |
---|---|
NullPointerException |
If src or form is null. |
String normalize (CharSequence src, Normalizer.Form form)
规范化一系列char值。 序列将根据指定的标准化进行标准化。
Parameters | |
---|---|
src |
CharSequence : The sequence of char values to normalize. |
form |
Normalizer.Form : The normalization form; one of NFC , NFD , NFKC , NFKD |
Returns | |
---|---|
String |
The normalized String |
Throws | |
---|---|
NullPointerException |
If src or form is null. |