About Defining Valid Java Identifiers

Identifiers, in Java, can be written using Unicode characters. The Java Language Specification states that the identifier can be composed by an Unicode unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. This sequence, evidently, cannot be a Java keyword, a boolean literal (true or false), or the null literal.

Thus, a Java identifier is composed by two pieces:

[Java letter] + [Sequence of Java digits or Java letters]

What are, then, Java digits and Java letters?

According to the JLS, a Java letter is any character for which the method Character.isJavaIdentifierStart(int) returns true. And a "Java-letter-or-digit" is any character for which the method Character.isJavaIdentifierPart(int) returns true.

Using this methods we can easily determine which characters are valid to be a Java letter or Java digit in the whole Unicode set of characters.

For instance, the following listing goes over the whole Basic Multilingual Plane looking for characters of the Basic Latin Unicode block that are valid Java letters.


for(char c=0; c < Character.MAX_VALUE; c++) {
if (Character.isJavaIdentifierStart(c)) {
if(Character.UnicodeBlock.of(c).equals(Character.UnicodeBlock.BASIC_LATIN)) {
System.out.print(c+" ");
}
}
}

This would yield the following output:

$ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z

As you can see, if we are writing identifiers using only basic latinc characters, then only this characters [A-Z], [a-z] the _ (underscore) and the $ (dollar symbol) are valid Java letters.

Therefore, the following are invalid identifiers:

48chevy .name -ouput

Whereas these are valid identifiers:

$money _name chevy

The following listing goes over the Basic Multilingual Plane looking for characters in the basic latin Unicode block that are valid a "Java-digit-or-letter" :


for(char c=0; c < Character.MAX_VALUE; c++) {
if(Character.isJavaIdentifierPart(c)) {
if(Character.UnicodeBlock.of(c).equals(Character.UnicodeBlock.BASIC_LATIN))       {
System.out.print(c+" ");
}
}
}

This code yields the following output:

$ 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z

As you can see, in this case, numbers are a valid part of an identifier, provided that they are not the first character.

These are valid identifiers:

$chevy248 _n646 no$money

Evidently, since identifiers can be written in Unicode, you could take this validations further if you would like to write identifiers in another languages, using different character sets.

For instance, the following listing shows how to determine if a character is valid "Java-digit-or-letter" in the extended Latin Unicode block:


for(char c=0; c < Character.MAX_VALUE; c++) {
if(Character.isJavaIdentifierPart(c)) {
if(Character.UnicodeBlock.of(c).equals(Character.UnicodeBlock.LATIN_EXTENDED_A))    {
System.out.print(c+" ");
}
}
}

This would yield the following output:

Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ

The following listing shows what Greek characters are valid as Java letters:


for(char c=0; c < Character.MAX_VALUE; c++) {
if (Character.isJavaIdentifierStart(c)) {
if(Character.UnicodeBlock.of(c).equals(Character.UnicodeBlock.GREEK))    {
System.out.print(c+" ");
}
}
}

And this yields an output like this:

Ά Έ Ή Ί Ό Ύ Ώ ΐ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά έ ή ί ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ

The following listing shows how to determine which currency symbols are valid Java letters:


for(char c=0; c < Character.MAX_VALUE; c++) {
if (Character.isJavaIdentifierStart(c)) {
if(Character.UnicodeBlock.of(c).equals(Character.UnicodeBlock.CURRENCY_SYMBOLS))       {
System.out.print(c+" ");
}
}
}

This yields somewhat like this:

₣ ₤ ₧ ₩ ₪ ₫ €

Meaning that the following are valid identifiers:

₤pound ₧pesetas €euro

This implies that we could write Java identifiers in practically any language. For instance:

String αρετη = "";
int español = 0;

Well, I hope you find this useful. I will try to delve into another interesting articles in the future, covering many of the subjects related to the SCJP.