html-rs-bench/README.md
2024-09-15 18:27:30 +04:00

37 lines
3.8 KiB
Markdown

Benchmark Rust libraries working with HTML:
- [`scraper`](https://crates.io/crates/scraper) (built with [`html5ever`](https://crates.io/crates/html5ever))
- [`tl`](https://crates.io/crates/tl)
- *(haven't found anything else that can build DOM, select an element and serialize)*
Output is unformatted CSV (probably because I'm lazy),
a nushell's `from csv` can be used to print a neat table.
Test results on my PC:
```
~/code/html-rs-bench> cargo run -r | save -f result; open result | from csv
Finished `release` profile [optimized] target(s) in 0.03s
Running `target/release/html-rs-bench`
╭───┬───────────┬─────────┬───────────┬───────────┬───────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────╮
│ # │ engine │ page │ parse min │ parse avg │ parse max │ select min │ select avg │ select max │ serial min │ serial avg │ serial max │
├───┼───────────┼─────────┼───────────┼───────────┼───────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ 0 │ html5ever │ tochb │ 77.42 │ 78.08 │ 81.70 │ 0.00 │ 0.00 │ 0.02 │ 67.16 │ 67.75 │ 69.38 │
│ 1 │ html5ever │ android │ 123.75 │ 124.81 │ 126.88 │ 0.00 │ 0.00 │ 0.02 │ 66.44 │ 66.74 │ 66.99 │
│ 2 │ html5ever │ 10mb │ 135.00 │ 135.34 │ 136.73 │ 0.00 │ 0.00 │ 0.02 │ 248.11 │ 248.69 │ 249.58 │
│ 3 │ tl │ tochb │ 19.60 │ 19.72 │ 20.00 │ 0.00 │ 0.00 │ 0.02 │ 1024.16 │ 1025.30 │ 1027.68 │
│ 4 │ tl │ android │ 11.68 │ 11.78 │ 11.96 │ 0.00 │ 0.00 │ 0.02 │ 82.70 │ 84.07 │ 92.40 │
│ 5 │ tl │ 10mb │ 2.09 │ 2.14 │ 2.24 │ 0.00 │ 0.00 │ 0.00 │ 14.04 │ 14.19 │ 14.56 │
╰───┴───────────┴─────────┴───────────┴───────────┴───────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────╯
```
Keep in mind that, despite `tl` being really fast (except serializing many tags in `tochb` sample),
it's not a browser-grade parser, and also that it doesn't fully support the CSS selector syntax
(e.g. `a[data-abc]` is OK, but not `a[data-abc="123"]`).
I do not own anything in the sample HTML pages. Here are the sources:
- `tochb` - [Dictionary of Tocharian B](https://www.win.tue.nl/~aeb/natlang/ie/tochB.html)
- `android` - [BluetoothDevice Android API reference](https://developer.android.com/reference/android/bluetooth/BluetoothDevice)
- `10mb` - [Random 10MB page with Shakespeare's poem](https://github.com/adriancbjie/my-kitty-cat/raw/master/web/staticpages/10MB.html)
Size of developer.android.com is surprising...
It's ~3.8 MiB even for the API reference homepage.
It's bigger than the whole Tocharian-B dictionary.