Text after "
Which regular expression features are standard, and what are the special nature?
What should I do, and not, if I want to use the same regex in different references, languages, platforms?
There is no standard, but if maximum portability is your goal, then you get stuck with features supported by JavaScript Should regexes All other major flavors support everything JS, here and there are only minor variations. For example, some only support POSIX character level markings ( [: alpha:]
), while others use Unicode syntax ( \ p {alpha}
).
Perhaps the most troubling variations are those that are affecting the dot (.
) and the anchor ( ^
and $
). For example, there is no DOTALL (or "single-line") mode in Javascript, so to match any with a new line you have to type [\ s \ S] like Code must use
. In the meantime, Ruby is a DOTALL mode, but it says multi mode - do any other call "multi" ( ^
And as the $
line anchors) Ruby always works.
Keep in mind, the dot matching does not match at all (in default mode). Traditionally, this was just linefeed ( \ n
), but more and more flavors are related to line divisors (or at least estimated). For example, the dot in Java does not match any of the [\ r \ n \ u0085 \ u2028] \
while ^
and $
treat \ r \ n
as a single separator and will not match between two characters.
Note that I only perl-derived taste, such as Python, Ruby, PHP, Javascript, etc. It will not be able to include GRP, AJAX, and MySQL such as GNU or POSICX-based flavors; They are of little features, but it is not that you choose them anyway.
I do not even include the XML schema taste; For example, this anchor ( ^
, $
, \ A
) is much more limited than JavaScript, , \ z
, etc.). Because matches are always anchored on both ends.
Comments
Post a Comment