When I run Eclith or when I run my script in iPython then it has failed:
< Pre> 'ASCI' codec can not decode the byte 0xe2 in position 32: not in serial number (128)
I do not know why, but when I only perform feeds I am URL) using the same URL statement, there is no error thrown. This is sorting me a big time.
The code is as simple:
Try: d = feedparser.parse (url) Except the exception, e: logging. Terror ('error while retrieving feed.') Logging. Terror (E) logging. Terror (formatExceptionInfo (none)) logging.error (formatExceptionInfo1 ())
Here is a stack trace:
d = feedparser.parse (url) "Python26 \" in the file "C: \ Python 26 \ lib \ site-packages \ feedparser.py", line 2623, parse feedparser.feed (data) file "C: \ feed sgmllib.SGMLParser.feed (self, data) Lib \ site-packages \ feedparser.py ", line 1441, feed" C: \ Python26 \ lib \ sgmllib.py ", line 104, the self.goahead (0) file in the feed" C: \ Python26 \ lib \ sgmllib. Py ", line 143, goahead k = self.parse_endtag (i) file" C: \ Python26 \ lib \ sgmllib.py ", line 320, parse_endtag .finish_endtag (tag) in the file" C: \ Python26 \ lib \ sgmllib .py ", line 360, finish_endtag Self.unknown_endtag (tag) file "C: \ Pyt Hon26 \ lib \ site-packages \ feedparser.py", line 476, unknown_endtag method () file "C: \ Python 26 \ lib \ site-packages \ feedparser.py ", Line 1318, _end_content value = self.popContent ('content') in the file" C: \ Python26 \ lib \ site-packages \ feedparser.py ", line 700, popcontent value = self.pop (tag) file" C: \ Python26 \ lib \ site-packages \ feedparser pop output = _resolveRelativeURIs (output, self.baseuri, self.encoding) file "C: \ Python 26 \ lib \ site-packages \ feedparser.py", line 1594, _resolveRelativeURIs (HtmlSource) file in p.feed "C: \ Python 26 \ lib \ site-packages \ feedparser.py", line 1441, feed sgmllib.SGMLPars In the er.feed (self, data) file "C: \ Python26 \ lib \ sgmllib.py", line 104, the self.goahead (0) file in the feed "C: \ Python26 \ lib \ sgmllib.py", line 138 , Goahead k = self.parse_starttag in the file "C: \ Python26 \ lib \ sgmllib.py", line 296, parse_starttag self.finish_starttag (tag, attrs) in the file "C: \ Python26 \ lib \ sgmllib.py ", Line 338, tag self.unknown_starttag (tag, attrs) in the finish_start" c: \ Python26 \ lib \ site-packages \ feedparser.py ", line 1588, unknown_starttag attrs = [(key, (in self.relative_uris ( Tag, key)) and the key for the self.resolveURI (value) or value attrs file "C: \ Python26 \ lib \ site-packages \ feedparser.py", line 1584, in the solution RI return _urljoin (self.baseuri, uri) file "C: \ Python26 \ lib \ site-packages \ feedparser.py", line file 286, _ urlparse.urljoin (base, yuri): "Python26 \ lib \ Urlparse.py ", line 215, in urljoin params, query, piece)) file" C: \ Python 26 \ lib \ urlparse.py ", line 184, urlunparse return urlunsplit ((plan, netloc, url, query, piece )) File "C: \ Python 26 \ lib \ urlparse .p", line192, urlunsplit url = scheme + ':' + url file "d: \ Python26 \ lib \ encodings \ cp1252.py", line 15 , Encoded return codecs.charmap_decode (input, errors, decoding_table)
partially resolved Or:
It is reproduced on passage URL Feedrprrs. Pars () is Unicode when it is an ascii URL, and for the record, you need a feed in which some high characters are Unicode characters. I'm not sure why this is.
It seems that the problem you have is some text in the URL encoding (such as Latin-1 , Where defaults ( In the documents of feeders More visas The wire explains the problem. Unfortunately there are no "magic bullets" to solve this common issue (due to the boozos breaking the XML rules) you can try to catch this exception , And read the contents of the URL separately in the handler (use 0xe2
is without a proper content-type header "upper and ACIRC;
will be lowercase A with a circle at the top) feedparser
encoding estimates If not, try that feedparser
from that (whose first ARG is a URL with a URL, a file stream, or data with a Unicode string It is possible). ascii
), and fails. urllib2
) and try to decode them with various possible encodings - then when you finally get a useful Unicode object here , The feed feedparser.parse
Comments
Post a Comment