mirror of
https://github.com/TxtDot/documentation.git
synced 2024-11-22 12:56:22 +03:00
Written 2 paragraphs, fixed typos
I'll continue tomorrow
This commit is contained in:
parent
e4a607d2d5
commit
e6d89733c4
1 changed files with 27 additions and 4 deletions
|
@ -1,11 +1,34 @@
|
|||
# Getting Started
|
||||
|
||||
## What is this?
|
||||
## What is this
|
||||
|
||||
*txtdot* is a proxy that requests the page by the given URL,
|
||||
extracts only useful data (text, links, pictures, tables, etc.)
|
||||
extracts only useful data including text, links, pictures and tables,
|
||||
and returns it as an HTML page with a minimalistic design
|
||||
optimized for reading text.
|
||||
optimized for text reading.
|
||||
|
||||
*txtdot* increases the loading speed and reducing client's bandwidth usage
|
||||
*txtdot* increases the loading speed and reduces client's bandwidth usage
|
||||
since no unnecessary code and no scripts are transfered.
|
||||
Also, you won't see any advertisement (unless it's a static picture that is hard to detect as ads).
|
||||
There are no trackers too.
|
||||
|
||||
## How it works
|
||||
|
||||
This project exists thanks to great Mozilla's
|
||||
[Readability.js](https://github.com/mozilla/readability) library.
|
||||
The initial idea was to process HTML with it on the server
|
||||
so the client does not need to download and execute heavy JS,
|
||||
doesn't need to use an adblock.
|
||||
|
||||
Readability performs its work very well in most cases.
|
||||
But not always. For example, check any StackOverflow page or Google search results.
|
||||
|
||||
So [artegoser](https://github.com/artegoser) wrote the basis of the code
|
||||
keeping in mind that we'll extend txtdot with other *engines*.
|
||||
For now, engines are functions taking a URL as a parameter,
|
||||
returning an object that contains extracted HTML and plain text, page title and language.
|
||||
The object is rendered with ejs template (or, in `/api/parse`, just sent as JSON).
|
||||
|
||||
If a `?engine=` parameter wasn't passed, txtdot checks
|
||||
if the requested domain is assigned to a specific engine,
|
||||
for example, "stackoverflow.com" -> "stackoverflow".
|
||||
|
|
Loading…
Add table
Reference in a new issue