Simple API Scraper with node-fetch

For roughly the past 8 years, I’ve programmed primarily in PHP. In that time, a lot has changed in web development. Currently, many jobs and tools require some working knowledge of JavaScript; whether it is vanilla JS, Node, npm, TypeScript, React, Vue, Svelte, Express, Jest, or any of the other tens of thousands of projects out there. While there is no shortage of excellent reading material online about all of these technologies, you can only read so much before you need to actually do something with it. In my recent experiments with various tooling and packages, I came across node-fetch, which simplifies making HTTP requests in Node JS applications. Because HTTP requests are a core technology of the internet, it’s good to be familiar with how to incorporate them into one’s toolkit. It can also a fun exercise to simply retrieve data from a website via the command line.

And because this was just for fun, I didn’t think it was necessary to create a whole new repository on Github so I’ve included the code below. It’s really simple, and would be even simpler if done in vanilla JS but I like to complicate things so I made an attempt in TypeScipt.

package.json

{
  "devDependencies": {
    "@types/node": "^17.0.16",
    "eslint": "^8.8.0",
    "prettier": "^2.5.1",
    "typescript": "^4.5.5"
  },
  "dependencies": {
    "node-fetch": "^3.2.0"
  },
  "scripts": {
    "scrape": "tsc && node dist/scrape.js",
    "build": "tsc"
  },
  "type": "module"
}

I was conducting a few experiments in the same folder and another of those ran into issues with ts-node but, using that package would simplify this setup. For instance, instead of running tsc && node dist/scrape.js, we could just run ts-node scrape.ts in the “scrape script”.

tsconfig.json

{
  "compilerOptions": {
    "lib": ["es2021"],
    "target": "ES2021",
    "module": "ESNext",
    "strict": true,
    "outDir": "dist",
    "sourceMap": true,
    "moduleResolution": "Node",
    "esModuleInterop": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules", "**/*.spec.ts"]
}

In an effort to make other experimental scripts work with TypeScript, this configuration became needlessly complicated. 😅

scrape.ts

import fetch from 'node-fetch';

const url = 'https://closingtags.com/wp-json/wp/v2/posts?per_page=100';

async function scrape(url: string) {
    console.log(`Scraping... ${url}`);

    fetch(url)
    .then((res) => res.json() as any as [])
    .then((json) => {
        json.forEach(element => console.table([element['id'], element['title']['rendered'], element['link']]));
    });
}

scrape(url);

The scrape.ts script itself is quite simple, coming in at only 15 lines. Firstly, it imports the node-fetch package as “fetch” which we’ll use to make the requests. It then defines a URL endpoint we should scrape. To prevent the script from clogging up the log files of someone else’s site, I’ve pointed it to the WordPress REST API of this very site; which returns all of the posts in JSON format. Next, the script sets up the scrape function which takes our URL and passes it to fetch (imported earlier from node-fetch). We get the data from the URL as JSON (do some converting of the types so TypeScript will leave us alone about the expected types 😬), and output each returned item’s ID, title, and URL in it’s own table to the console. Simple!

There are lots of ways this could be expanded on like saving the retrieved data to a file or database, grabbing data from the HTML and searching the DOM to only get specific chunks of the page by using cheerio, or even asking the user for the URL on startup. My intentions for this script weren’t to build some elaborate project, but rather to practice fundamentals I’ve been learning about over the past few months. This groundwork will serve for better and more interesting projects in the future.

By Dylan Hildenbrand

Leave a Reply Cancel reply