3.8 KiB
3.8 KiB
Benchmark Rust libraries working with HTML:
scraper
(built withhtml5ever
)tl
- (haven't found anything else that can build DOM, select an element and serialize)
Output is unformatted CSV (probably because I'm lazy),
a nushell's from csv
can be used to print a neat table.
Test results on my PC:
~/code/html-rs-bench> cargo run -r | save -f result; open result | from csv
Finished `release` profile [optimized] target(s) in 0.03s
Running `target/release/html-rs-bench`
╭───┬───────────┬─────────┬───────────┬───────────┬───────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────╮
│ # │ engine │ page │ parse min │ parse avg │ parse max │ select min │ select avg │ select max │ serial min │ serial avg │ serial max │
├───┼───────────┼─────────┼───────────┼───────────┼───────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ 0 │ html5ever │ tochb │ 77.42 │ 78.08 │ 81.70 │ 0.00 │ 0.00 │ 0.02 │ 67.16 │ 67.75 │ 69.38 │
│ 1 │ html5ever │ android │ 123.75 │ 124.81 │ 126.88 │ 0.00 │ 0.00 │ 0.02 │ 66.44 │ 66.74 │ 66.99 │
│ 2 │ html5ever │ 10mb │ 135.00 │ 135.34 │ 136.73 │ 0.00 │ 0.00 │ 0.02 │ 248.11 │ 248.69 │ 249.58 │
│ 3 │ tl │ tochb │ 19.60 │ 19.72 │ 20.00 │ 0.00 │ 0.00 │ 0.02 │ 1024.16 │ 1025.30 │ 1027.68 │
│ 4 │ tl │ android │ 11.68 │ 11.78 │ 11.96 │ 0.00 │ 0.00 │ 0.02 │ 82.70 │ 84.07 │ 92.40 │
│ 5 │ tl │ 10mb │ 2.09 │ 2.14 │ 2.24 │ 0.00 │ 0.00 │ 0.00 │ 14.04 │ 14.19 │ 14.56 │
╰───┴───────────┴─────────┴───────────┴───────────┴───────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────╯
Keep in mind that, despite tl
being really fast (except serializing many tags in tochb
sample),
it's not a browser-grade parser, and also that it doesn't fully support the CSS selector syntax
(e.g. a[data-abc]
is OK, but not a[data-abc="123"]
).
I do not own anything in the sample HTML pages. Here are the sources:
tochb
- Dictionary of Tocharian Bandroid
- BluetoothDevice Android API reference10mb
- Random 10MB page with Shakespeare's poem
Size of developer.android.com is surprising... It's ~3.8 MiB even for the API reference homepage. It's bigger than the whole Tocharian-B dictionary.