python - grabbing a substring while scraping with Python2.6 -


Hey, can anyone help with the following?

I am trying to scrape the site on which & lt; / Strong> 9780375853401 & lt; / Li & gt ;, & lt; Li & gt; & Lt; Strong & gt; Pub Date: & lt; / Strong> 05/11/2010 & lt; / Li & gt;] [& lt; Li & gt; & Lt; Strong & gt; UPC: & lt; / Strong> 490355000372 & lt; / Li>, & lt; Li & gt; & Lt; Strong & gt; Catalog number: & lt; / Strong> 15024/25 & lt; / Li>, & lt; Li & gt; & Lt; Strong & gt; Label: & lt; / Strong> Camera & lt; / Li & gt;]

Here is a piece of code that I used to get the data using McKenz and Sundersup. I'm stuck here because it will not let me use the search () function for a list

  br_results = mechanize.urlopen (br_results) html = br_results.read () soup = beautiful Html) local_links = soup.findAll ("a", {"class": "down-arrow csa"}) upc_code = soup.findAll ("ul", {"class": "bc-meta3"}) For Upc_code: upc_text = upc.contents.contents print upc_text  

I imagine < Code> upc_code is the list you are showing us, and local_links does anyone have your question correct? Given that you do not tell it in your code further ...?

So I'm not sure that upc_text will be in the body of your loop, which is upc a ul tag List of (possibly) - upc.contents a li tag, and I do not think how upc.contents.contents can work - what are you seeing as a result of that code? I was expecting an exception!

Anyway, the way I write the loop will be something like this: in the listitems for the listitems = upc.findAll 'li' for the upc_code in anitem: print anitem.contents [ 1]

Because you want the second child of each list item (first a strong tag, second, the shipping string you want.

If this is not the second child of each list item that you want, please make it clear; For example, you can identify the strong Get her next brother, if she suits you better - just make the nested loop body

  in print anitem.find ('strong'). NextSibling  

Comments