dramaticcat@sh.itjust.works to Lemmy Shitpost@lemmy.world · 1 year agoChad scrapersh.itjust.worksimagemessage-square83fedilinkarrow-up1975arrow-down128
arrow-up1947arrow-down1imageChad scrapersh.itjust.worksdramaticcat@sh.itjust.works to Lemmy Shitpost@lemmy.world · 1 year agomessage-square83fedilink
minus-squarebill_1992@lemmy.worldlinkfedilinkarrow-up154arrow-down2·1 year agoEveryone loves the idea of scraping, no one likes maintaining scrapers that break once a week because the CSS or HTML changed.
minus-squareAnonymousllama@lemmy.worldlinkfedilinkarrow-up20·1 year agoThis one. One of the best motivators. Sense of satisfaction when you get it working and you feel unstoppable (until the next subtle changes happens anyway)
minus-squarecamr_on@lemmy.worldlinkfedilinkarrow-up25·1 year agoI loved scraping until my ip was blocked for botting lol. I know there’s ways around it it’s just work though
minus-squarePennomi@lemmy.worldlinkfedilinkEnglisharrow-up39·1 year agoI successfully scraped millions of Amazon product listings simply by routing through TOR and cycling the exit node every 10 seconds.
minus-squarecamr_on@lemmy.worldlinkfedilinkarrow-up14·1 year agoThat’s a good idea right there, I like that
minus-squareferret@sh.itjust.workslinkfedilinkEnglisharrow-up4·1 year agolmao, yeah, get all the exit nodes banned from amazon.
minus-squarecamr_on@lemmy.worldlinkfedilinkarrow-up7·1 year agoI’m coding baby’s first bot over here lol, I could probably do better
minus-squaresynae[he/him]@lemmy.sdf.orglinkfedilinkEnglisharrow-up7·1 year agoToken ring for me baybeee
minus-squaredangblingus@lemmy.worldlinkfedilinkarrow-up10·1 year agoOr in the case of wikipedia, every table on successive pages for sequential data is formatted differently.
Everyone loves the idea of scraping, no one likes maintaining scrapers that break once a week because the CSS or HTML changed.
deleted by creator
This one. One of the best motivators. Sense of satisfaction when you get it working and you feel unstoppable (until the next subtle changes happens anyway)
I feel this
I loved scraping until my ip was blocked for botting lol. I know there’s ways around it it’s just work though
I successfully scraped millions of Amazon product listings simply by routing through TOR and cycling the exit node every 10 seconds.
That’s a good idea right there, I like that
This guy scrapes
lmao, yeah, get all the exit nodes banned from amazon.
You guys use IP’S?
I’m coding baby’s first bot over here lol, I could probably do better
Token ring for me baybeee
Or in the case of wikipedia, every table on successive pages for sequential data is formatted differently.