android - How to remove accent characters from an InputStream -


I am trying to parse the Rs2.0 feed on Android using the pull parser.

  XmlPullParser parser = Xml.newPullParser (); Parser.setput (url.open (), faucet);  

Feed XML prompts say that the encoding is "UTF-8". When I open the remote stream and pass it to my bridge parser, I get an invalid token, the document is not an exception exception.

When I save the XML file and open it in the browser, then the browser file has the report of Unicode 0x12 character (severe pronunciation?) And fails to render XML.

What is the best way to handle such cases, assuming that I have no control over returning XML to me?

Thank you.

Where did you know that 0x12 is a serious pronunciation? The character range 0x00-0x7F in UTF-8 is encoded as ASCII, and ASCII code point 0x12 is a control letter, DC2 or CTRL + R.

It looks like an encoding problem of some sort. The easiest way to solve this is to see the file saved in the hex editor.

  1. Initially byte order mark (BOM) can confuse some XML parser
  2. The XML declaration says that the encoding is in the UTF-8, it can not actually be that encoding, and the file will be decoded incorrectly.
  3. Not all Unicode characters are legal in XML, so Firefox refuses to render it. Specifically, XML tips say that 0x9, 0xA and 0xD are only valid characters that are less than 0x20, Therefore 0x12 will definitely mess with sequential parsons.

If you can upload files for pastebine or similar, I can help find the cause and recommend a proposal.

Edit: OK, you can not upload that's understandable.

The XML you are receiving, gets corrupted in some way, and the ideal action is to contact the responsible party to make it, to see if the problem can be solved. is.

The thing to check before doing this though - are you sure that you are not being able to waste data? Some types of communication (SMS) Allow only 7-bit characters This 0x92 (ASCII forward tick / astrophysic-grape accent) will start in 0x12. Looks like a coincidence, especially if they appear in the file, where you expect to pronounce.

Otherwise, what you have to do best is:

  1. However, be strictly defensive, and on the transparent setInput .

  2. Similarly, by passing a different encoding as the second parameter, force the parsers to use the second character encoding. The encodings trying to add "UTF-8" are "ISO-8859-1" and "UTF-16", there is a complete list of supported encodings for Java - you can try all of these (I want to run Android A definitive list of supported encodings was not found.)

  3. As the last resort, you can erase invalid characters, e.g. Remove all the characters below 0x20 which are not whitespace (0x9,0xA and 0xD are all white spots.) If they are difficult to remove, you can replace them instead. For example

      class ReplacingInputStream Filter InputStream Extended {Public int reading () throws IOException {int read = super.read () ; If (reading! = - 1 & amp; amp; and & lt; 0x20 & amp;;! (Read == 0x9 || read == 0xA || read == 0xB)) = 0x20; Read the reflection; }}  

    You wrap it around your current input stream, and it filters invalid characters. Note that you can easily do more damage to XML or end up with foul XML, but equally it can allow you to get the data you need or can easily see where the problems are Are there.


Comments