De acordo com o Unicode® Standard Annex # 31 (identificador Unicode e sintaxe padrão) , ID_Start
e ID_Continue
de categorias de caracteres são derivadas do Unicode General_Category
( leia todo Formato de arquivo UnicodeData e Artigos de banco de dados de caracteres Unicode ).
Veja Tabela 2. Propriedades de Classes Lexicais para Identificadores (resumo):
ID_Start
characters are derived from the Unicode General_Category of uppercase letters, lowercase letters, titlecase letters, modifier letters, other letters, letter numbers, plusOther_ID_Start
, minusPattern_Syntax
andPattern_White_Space
code points.
- In set notation:
[[:L:][:Nl:][:Other_ID_Start:]--[:Pattern_Syntax:]--[:Pattern_White_Space:]]
ID_Continue
characters includeID_Start
characters, plus characters having the Unicode General_Category of nonspacing marks, spacing combining marks, decimal number, connector punctuation, plusOther_ID_Continue
, minusPattern_Syntax
andPattern_White_Space
code points.
- In set notation:
[[:ID_Start:][:Mn:][:Mc:][:Nd:][:Pc:][:Other_ID_Continue:]--[:Pattern_Syntax:]--[:Pattern_White_Space:]]
Podemos ver referência a Other_ID_Start
, Other_ID_Continue
, Pattern_Syntax
e Pattern_White_Space
here; por exemplo:
The exact list of characters covered by the
Other_ID_Start
andOther_ID_Continue
properties depends on the version of Unicode. For more information, see Unicode Standard Annex #44, “Unicode Character Database” [UAX44].
Analise UnicodeData.txt
, aplique a (s) regex (es) válida (s) criada (s) acima definir notação . Inscreva-se na versão apropriada de UnicodeData.txt
, navegando no Índice de / Public .
http://unicode.org/Public/5.0.0/ucd/UnicodeData.txt
↑ ↑ ↑