Benchmarks of Rust HTML libs
Find a file
2024-09-15 18:27:30 +04:00
html sample HTMLs in a separate commit 2024-09-15 18:27:08 +04:00
src benchmark code itself 2024-09-15 18:27:30 +04:00
.gitignore benchmark code itself 2024-09-15 18:27:30 +04:00
Cargo.lock benchmark code itself 2024-09-15 18:27:30 +04:00
Cargo.toml benchmark code itself 2024-09-15 18:27:30 +04:00
README.md benchmark code itself 2024-09-15 18:27:30 +04:00

Benchmark Rust libraries working with HTML:

  • scraper (built with html5ever)
  • tl
  • (haven't found anything else that can build DOM, select an element and serialize)

Output is unformatted CSV (probably because I'm lazy), a nushell's from csv can be used to print a neat table.

Test results on my PC:

~/code/html-rs-bench> cargo run -r | save -f result; open result | from csv
    Finished `release` profile [optimized] target(s) in 0.03s
     Running `target/release/html-rs-bench`
╭───┬───────────┬─────────┬───────────┬───────────┬───────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────╮
│ # │  engine   │  page   │ parse min │ parse avg │ parse max │ select min │ select avg │ select max │ serial min │ serial avg │ serial max │
├───┼───────────┼─────────┼───────────┼───────────┼───────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ 0 │ html5ever │ tochb   │     77.42 │     78.08 │     81.70 │       0.00 │       0.00 │       0.02 │      67.16 │      67.75 │      69.38 │
│ 1 │ html5ever │ android │    123.75 │    124.81 │    126.88 │       0.00 │       0.00 │       0.02 │      66.44 │      66.74 │      66.99 │
│ 2 │ html5ever │ 10mb    │    135.00 │    135.34 │    136.73 │       0.00 │       0.00 │       0.02 │     248.11 │     248.69 │     249.58 │
│ 3 │ tl        │ tochb   │     19.60 │     19.72 │     20.00 │       0.00 │       0.00 │       0.02 │    1024.16 │    1025.30 │    1027.68 │
│ 4 │ tl        │ android │     11.68 │     11.78 │     11.96 │       0.00 │       0.00 │       0.02 │      82.70 │      84.07 │      92.40 │
│ 5 │ tl        │ 10mb    │      2.09 │      2.14 │      2.24 │       0.00 │       0.00 │       0.00 │      14.04 │      14.19 │      14.56 │
╰───┴───────────┴─────────┴───────────┴───────────┴───────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────╯

Keep in mind that, despite tl being really fast (except serializing many tags in tochb sample), it's not a browser-grade parser, and also that it doesn't fully support the CSS selector syntax (e.g. a[data-abc] is OK, but not a[data-abc="123"]).

I do not own anything in the sample HTML pages. Here are the sources:

Size of developer.android.com is surprising... It's ~3.8 MiB even for the API reference homepage. It's bigger than the whole Tocharian-B dictionary.