Then I have the % u041E% u043B% u0435% u0433% 20% u042F% u043a
Is it real in UTF-8 or (better for HTML organizations for me)?
This javascript escape ()
format. This is similar to the url-encoding but is not compatible. Its use is usually a mistake.
Instead, the best part is to use the correct URL-encoding ( encodeurIComponent ()
) that generates the script. After that you can decode it with urldecode
or with any other normal URL-decoding function on the server side.
If you have to exchange data in this non-standard format, then you have to type a custom decoder for it. Take a quick hack while taking advantage of the HTML character-reference-decoder:
< Pre> function jsunescape ($ s) {$ s = preg_replace ('/% u (....) /', '& # X $ 1;', $ s); $ S = preg_replace ('/%(..)/', 'and # x $ 1;', $ s); Return html_entity_decode ($ s, ENT_COMPAT, 'utf-8'); }
This gives a raw UTF-8 byte string. If you really want it in HTML character references like & amp; # 1056; & Amp; # 1091; ...
then leave the html_entity_decode
call but generally you do not do this. Best to keep the wire in raw format, unless they need to avoid for the final output - and not the best, not to replace the non-ASCII characters in the context of the letter unless you really need it .
What would happen if something like this was sent to me '% CE% EB% E5% E3 +% DF% EA% F3% F8% EA% E8% ED'
This URL is form-encoded, which is not directly compatible with the escape ()
format. When the 2-digit byte escape from the url-encoding is different from the crazy escape
-format 4-digit code-unit-escapes, then the character +
is ambiguous This is a plus (if String comes from escape
), or a space (if this browser comes from form submission). There is no way to say how it is. This escape ()
.
There is another reason to not use it; If the charset of this string was UTF-8, yes, the above mentioned function would be fine, both URL-encoded bytes and nuts escape ()
- convert UTRix characters into UTF-8 bytes.
Although this actually appears to be code page 1251 (Windows Russian). Do you really want to handle all your strings in CP1251? If so, you have to change it a little bit to encode the escape of the four digit into a different charset. It's messy:
function url_or_maybe_jsescape_decode ($ s, $ charset, $ ifform) {if ($ isform) $ s = str_replace ('+', '', $ s); $ S = preg_replace ('/% u (....) /', 'and # x $ 1;', $ s); $ S = preg_replace ('/%( ...)/', 'and; # x $ 1;', $ s); $ S = html_entity_decode ($ s, ENT_COMPAT, $ characterset); $ S = str_replace ('& amp;!', '& Amp;', $ s); $ S = html_entity_decode ($ s, ENT_COMPAT, 'utf-8'); Return $ s; } Echo url_or_maybe_jsescape_decode ( '% CE% EB% E5% E3 +% DF% EA% F3% F8% EA% E8% ED', 'cp1251', TRUE);
I strongly recommend:
-
Fix flash file so that it is not appropriate
encoderIconcontent
and no moreEscape
, so that you can use a standard URL-decoder instead of this ugly hack. -
Using UTF-8 through its application, you can only support languages other than Russian, and you need to worry about the input encoding of the submitted form. Do not need to.
(All encodings that do not suck UTF-8, and this is a fact proven by science!)
Comments
Post a Comment