Site Scraping With DOMParser

guides/examples/web_scraper.livemd

Alexander Koutmos

@akoutmos

deno_ex

Share to X

Share to Bluesky

More notebooks

Site Scraping With DOMParser

Mix.install([
  {:kino, "~> 0.9"},
  {:jason, "~> 1.4"},
  {:deno_ex, "~> 0.2"}
])

Introduction

Using the built in Deno fetch API, and the DOMParser library you can fetch HTML documents and parse them. In this example, we are fetching all of the blog posts from my blog’s home page, marshalling all that data into a JSON blob, outputting that JSON blob to STDOUT, and then decoding it using Jason.

Code

form = Kino.Control.form([file: Kino.Input.file("File")], submit: "Process image")

script = """
import { DOMParser } from 'https://deno.land/x/deno_dom/deno-dom-wasm.ts';

const url = 'https://akoutmos.com';

try {
  const res = await fetch(url);
  const html = await res.text();
  const document: any = new DOMParser().parseFromString(html, 'text/html');

  const result = Array.from(document.querySelectorAll('.post-preview a')).map((elem) => {
    return {
      title: elem.querySelector('h2').innerText,
      summary: elem.querySelector('div').innerText,
      link: elem.getAttribute('href')
    }
  })

  console.log(JSON.stringify(result))
} catch(error) {
  console.log(error);
}
"""

{:ok, result} = DenoEx.run({:stdin, script}, [], allow_net: true)

result
|> Jason.decode!()
|> IO.inspect()

:ok

Other notebooks:

@andyl

elix_util

Examples

vegalite.livemd

tutorial data-science intermediate vega_lite jason

2022-8-18
Chris Martin
@trbngr

elixir_cqrs_tools

Using cqrs_tools with Commanded

commanded.livemd

tutorial advanced gen-server otp commanded cqrs_tools ecto etso jason

2022-8-18
Yejun Su
@goofansu

ogp

ogp

ogp.livemd

tutorial intermediate ogp kino

2022-8-18
@TomBers

livebookNotes

Attractors

attractors.livemd

advanced data-science decimal vega_lite kino

2022-8-18
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Pinecone

pinecone.livemd

tutorial advanced data-science apis kino pinecone jason evision

2023-7-16
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Image

image.livemd

tutorial advanced image req kino

2023-2-3
Sérgio Deusdedith de Araujo Neto
@osergioneto

estudos

Examples - Bumblebee

bumblebee_examples.livemd

tutorial advanced data-science nx exla axon kino bumblebee

2023-6-23

Back