mirror of
https://github.com/TxtDot/documentation.git
synced 2024-11-22 12:56:22 +03:00
Written 2 paragraphs, fixed typos
I'll continue tomorrow
This commit is contained in:
parent
e4a607d2d5
commit
e6d89733c4
1 changed files with 27 additions and 4 deletions
|
@ -1,11 +1,34 @@
|
||||||
# Getting Started
|
# Getting Started
|
||||||
|
|
||||||
## What is this?
|
## What is this
|
||||||
|
|
||||||
*txtdot* is a proxy that requests the page by the given URL,
|
*txtdot* is a proxy that requests the page by the given URL,
|
||||||
extracts only useful data (text, links, pictures, tables, etc.)
|
extracts only useful data including text, links, pictures and tables,
|
||||||
and returns it as an HTML page with a minimalistic design
|
and returns it as an HTML page with a minimalistic design
|
||||||
optimized for reading text.
|
optimized for text reading.
|
||||||
|
|
||||||
*txtdot* increases the loading speed and reducing client's bandwidth usage
|
*txtdot* increases the loading speed and reduces client's bandwidth usage
|
||||||
since no unnecessary code and no scripts are transfered.
|
since no unnecessary code and no scripts are transfered.
|
||||||
|
Also, you won't see any advertisement (unless it's a static picture that is hard to detect as ads).
|
||||||
|
There are no trackers too.
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
This project exists thanks to great Mozilla's
|
||||||
|
[Readability.js](https://github.com/mozilla/readability) library.
|
||||||
|
The initial idea was to process HTML with it on the server
|
||||||
|
so the client does not need to download and execute heavy JS,
|
||||||
|
doesn't need to use an adblock.
|
||||||
|
|
||||||
|
Readability performs its work very well in most cases.
|
||||||
|
But not always. For example, check any StackOverflow page or Google search results.
|
||||||
|
|
||||||
|
So [artegoser](https://github.com/artegoser) wrote the basis of the code
|
||||||
|
keeping in mind that we'll extend txtdot with other *engines*.
|
||||||
|
For now, engines are functions taking a URL as a parameter,
|
||||||
|
returning an object that contains extracted HTML and plain text, page title and language.
|
||||||
|
The object is rendered with ejs template (or, in `/api/parse`, just sent as JSON).
|
||||||
|
|
||||||
|
If a `?engine=` parameter wasn't passed, txtdot checks
|
||||||
|
if the requested domain is assigned to a specific engine,
|
||||||
|
for example, "stackoverflow.com" -> "stackoverflow".
|
||||||
|
|
Loading…
Add table
Reference in a new issue