How to build a simple web scraper with node.js and cheerio

The web contains a vast amount of valuable information, from product prices and market trends to articles, reviews, and research data. While this information is often publicly available, manually collecting it can be time-consuming, especially when data is required from multiple pages or websites.

This is where web scraping comes in. Web scraping is the process of automatically extracting data from websites, allowing developers, businesses, and researchers to gather information efficiently and at scale. Instead of copying data one item at a time, a web scraper can visit web pages, extract specific information based on predefined rules, and store the results in a structured format such as JSON, CSV, spreadsheets, or databases.

Web scraping is commonly used for market research, price monitoring, lead generation, content aggregation, and data analysis. By automating the data collection process, developers can save time and focus on analyzing the information rather than gathering it manually.

In this guide, you'll learn how to build a simple web scraper using Node.js, Axios, and Cheerio. You'll learn how to fetch a web page, parse its HTML content, and extract specific data using CSS selector techniques that will feel familiar if you've worked with the DOM in frontend development.

Web scraping is commonly used for:

Price monitoring — tracking product prices across e-commerce sites to stay competitive or find the best deal
Market research — collecting data from multiple sources to identify trends and patterns
Lead generation — extracting contact information or business listings from directories
Content aggregation — pulling articles, reviews, or listings from multiple sites into one place
Data analysis — gathering large datasets for research, reporting, or machine learning

Key Takeaways (TL;DR)

Cheerio is a lightweight, fast option for scraping static HTML pages; no browser required.
Pair it with Axios to fetch pages and CSS selectors, extracting exactly the data you need.
Most modern websites load content with JavaScript after the initial HTML response. Cheerio alone cannot see that content.
One ScrapingBee API call with render_js: true replaces a full headless browser setup and returns fully rendered HTML ready to parse.
All the code in this tutorial is tested and working; you can clone it from GitHub and run it in minutes.

Prerequisites

To follow along with this guide, you'll need:

Node.js (v18 or higher)
A package manager — npm (comes with Node.js) or yarn
A free ScrapingBee account for the final section

Create a fresh project folder and virtual environment first:

mkdir node-scraper && cd node-scraper

Next, initialize a new Node.js project:

npm init -y

Now install the packages we need:

npm install axios cheerio

Here is what each package does:

axios — handles HTTP requests to fetch the HTML content of a webpage
cheerio — parses the fetched HTML and lets you extract data using CSS selectors

Create the main file where you'll write your scraper:

touch scraper.js

Your project structure should look like this:

node-scraper/
├── node_modules/
├── package.json
└── scraper.js

Structuring and Saving the Scraped Data

Logging data to the console is useful for testing, but in a real project, you'll want to save it somewhere. In this section, you'll write the scraped books to a JSON file so they can be used in other applications or processed further.

Node.js has a built-in fs module for working with the file system, no extra installation is needed.

Step 1: Import the fs module

At the top of your scraper.js file, add the fs module alongside your existing imports:

const axios = require('axios');
const cheerio = require('cheerio');
const fs = require('fs');

Step 2: Write the scraped data to a file

After the .each() loop, add the following line to save the results:

fs.writeFileSync('books.json', JSON.stringify(books, null, 2));

console.log(`✅ Scraped ${books.length} books. Data saved to books.json`);

Your complete scraper.js file should now look like this:

const axios = require('axios');
const cheerio = require('cheerio');
const fs = require('fs');

const url = 'http://books.toscrape.com';

async function scrapeBooks() {
  try {
    const response = await axios.get(url);
    const $ = cheerio.load(response.data);

    const books = [];

    $('article.product_pod').each((index, element) => {
      const title = $(element).find('h3 a').attr('title');
      const price = $(element).find('.price_color').text().trim();
      const rating = $(element).find('p.star-rating').attr('class').replace('star-rating ', '');
      const availability = $(element).find('.availability').text().trim();

      books.push({ title, price, rating, availability });
    });

    fs.writeFileSync('books.json', JSON.stringify(books, null, 2));
    console.log(`✅ Scraped ${books.length} books. Data saved to books.json`);

  } catch (error) {
    console.error('Error fetching page:', error.message);
  }
}

scrapeBooks();

Step 3: Run the scraper

node scraper.js

You should see this in your terminal:

Scraped 20 books. Data saved to books.json
A new books.json file will appear in your project folder with content like this:
json
[
  {
    "title": "A Light in the Attic",
    "price": "£51.77",
    "rating": "Three",
    "availability": "In stock"
  },
  {
    "title": "Tipping the Velvet",
    "price": "£53.74",
    "rating": "One",
    "availability": "In stock"
  }
]

fs.writeFileSync() writes the scraped data to books.json in your project directory . JSON.stringify(books, null, 2) converts the books array into a formatted JSON string; the 2 argument adds indentation, making the output human-readable. The confirmation message tells you exactly how many books were scraped, so you can quickly verify that the scraper ran successfully.

Your project folder should now look like this:

node-scraper/
├── node_modules/
├── books.json
├── package.json
└── scraper.js

You now have a fully working static web scraper. But there's a limitation worth knowing about, and it's one that affects a large portion of the modern web.

Step 4: Handling JS-rendered pages with ScrapingBee

ScrapingBee is a web scraping API that handles browser rendering for you. Instead of sending a request directly to the target site, you send it to ScrapingBee's API along with your target URL.

ScrapingBee spins up a real browser in the background, waits for the JavaScript to execute, and returns the fully rendered HTML ready for Cheerio to parse.

To get your API key, you need to create a ScrapingBee account.

Head to scrapingbee.com and click on Sign up. No credit card is required.

Once you're signed in, go to your dashboard. Your API key will be displayed at the top of the page.

Copy the API key and keep it somewhere handy; you'll need it in the next step.

Step 5: Create a new file for the ScrapingBee Scraper

Keep your original scraper.js intact. Create a new file for this section:

touch scraper-bee.js

Step 6: Build the ScrapingBee request

Add the following to scraper-bee.js:

const axios = require('axios');
const cheerio = require('cheerio');
const fs = require('fs');

const API_KEY = 'YOUR_API_KEY';
const TARGET_URL = 'https://quotes.toscrape.com/js/';

async function scrapeWithBee() {
    try {
        const response = await axios.get('https://app.scrapingbee.com/api/v1/', {
            params: {
                api_key: API_KEY,
                url: TARGET_URL,
                render_js: true,
            }
        });

        const $ = cheerio.load(response.data);
        const quotes = [];

        $('.quote').each((index, element) => {
            const text = $(element).find('.text').text().trim();
            const author = $(element).find('.author').text().trim();

            quotes.push({ text, author });
        });

        fs.writeFileSync('quotes.json', JSON.stringify(quotes, null, 2));
        console.log(`✅ Scraped ${quotes.length} quotes. Data saved to quotes.json`);

    } catch (error) {
        console.error('Error:', error.message);
    }
}

scrapeWithBee();

Replace YOUR_API_KEY with the key from your ScrapingBee dashboard.

Step 7: Run the scraper

node scraper-bee.js

You should see:

Scraped 10 quotes. Data saved to quotes.json

Open quotes.json and you'll find:

[
  {
    "text": ""The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."",
    "author": "Albert Einstein"
  },
  {
    "text": ""It is our choices, Harry, that show what we truly are, far more than our abilities."",
    "author": "J.K. Rowling"
  }
]

The structure of this scraper is almost identical to the Cheerio-only version, and that is intentional. The only meaningful change is where the request goes.

Instead of sending axios.get() directly to the target site, you send it to ScrapingBee's API endpoint at https://app.scrapingbee.com/api/v1/. Along with the request, you pass three parameters: your api_key to authenticate the request, the url of the page you want to scrape, and render_js: true, which tells ScrapingBee to spin up a real browser, execute the JavaScript on the page, and return the fully rendered HTML.

From that point, everything works exactly as before. Cheerio loads the rendered HTML, your selectors extract the data, and the results are saved to a JSON file. The heavy lifting, browser management, JavaScript execution, and bot detection handling happen entirely on ScrapingBee's end.

Conclusion

You now have a working web scraper built with Node.js and Cheerio that fetches a static webpage, extracts structured data, and saves it to a JSON file. You also saw firsthand where Cheerio hits its limits and how ScrapingBee fills that gap by handling JavaScript-rendered pages without any additional browser setup on your end.

ScrapingBee handles the heavy lifting, browser rendering, proxy rotation, and bot detection so you can focus on the data that matters.

Frequently Asked Questions

Can I use Cheerio to scrape any website?

Cheerio works well for static websites that serve fully rendered HTML. If a website loads its content with JavaScript after the initial page load, Cheerio will not be able to see that content. In those cases, you need a tool like ScrapingBee that handles browser rendering for you.

Is web scraping legal?

Web scraping is generally legal when applied to publicly available data, but it depends on the website's terms of service and the laws in your jurisdiction. Always check a website's robots.txt file and terms of service before scraping it, and never scrape data behind a login without permission.

What is the difference between Axios and Cheerio?

Axios and Cheerio serve different roles in this tutorial. Axios fetches the raw HTML of a webpage by sending an HTTP request. Cheerio then parses that HTML and lets you extract specific data using CSS selectors. You need both Axios to get the page and Cheerio to read it.

What happens when my ScrapingBee free credits run out?

ScrapingBee offers paid plans that scale with your usage. You can check their pricing page for current plan details and credit costs for different request types.

Build a Simple Web Scraper with Node.js and Cheerio

Key Takeaways (TL;DR)

Prerequisites

Structuring and Saving the Scraped Data

Step 3: Run the scraper

Step 4: Handling JS-rendered pages with ScrapingBee

Step 5: Create a new file for the ScrapingBee Scraper

Step 6: Build the ScrapingBee request

Step 7: Run the scraper

Conclusion

Frequently Asked Questions

Can I use Cheerio to scrape any website?

Is web scraping legal?

What is the difference between Axios and Cheerio?

What happens when my ScrapingBee free credits run out?

Comments

More from this blog

How to Participate in Hacktoberfest

Decoding HTTP Status Codes: Understanding the Meaning Behind the Numbers

How to Make Your First Open Source Contribution

My Outreachy Internship with Bioconductor

Command Palette

Key Takeaways (TL;DR)

Prerequisites

Structuring and Saving the Scraped Data

Step 3: Run the scraper

Step 4: Handling JS-rendered pages with ScrapingBee

Step 5: Create a new file for the ScrapingBee Scraper

Step 6: Build the ScrapingBee request

Step 7: Run the scraper

Conclusion

Frequently Asked Questions

Can I use Cheerio to scrape any website?

Is web scraping legal?

What is the difference between Axios and Cheerio?

What happens when my ScrapingBee free credits run out?

Comments

More from this blog