Hacker News new | comments | show | ask | jobs | submit login
Show HN: Apify SDK – A scalable web crawling and scraping library for JavaScript (github.com)
78 points by jancurn 26 days ago | hide | past | web | favorite | 8 comments

Hey guys, today we’re showing HN a new open-source library that we have been working on for almost a year. It incorporates lessons learned from scraping of thousands of websites over the last 4 years. We figured there was no such universal library for JavaScript, while for example Python has one (https://scrapy.org/). That wasn’t fair, because JavaScript is THE language of the web :)

Anyway, we hope you’ll give it a shot and we’re really looking forward to hear what you think about it. All feedback welcome!

I wish I could upvote this more. This solves a huge problem for me and will definitely be taking a peek at this over the weekend.

Thank you so much for making and sharing this!

Thanks, this looks solid, with a really extensive documentation. I will give it a try for my next crawling/bot project :)

Awesome, looking forward to hear what you think :)

This comes just in time when I needed to replace an old scraper!

Does it have to run on an instance or can we also use a serverless environnement?

The SDK runs anywhere where you have Node running. And if you can run headless Chrome with Puppeteer there too, than you can use it in the SDK too. This might require several libraries and configuration settings. If I’m not mistaken, Google Cloud Functions support Puppeteer by default, AWS Lambda does not. With any Docker-based serverless platform such as Zeit Now or Apify Cloud you just need to use the right Docker image.

I'm a huge fan of Apify and look forward to exploring this new SDK. Thanks y'all.

Thank you so much!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact