From e27619b7ab79374bece0cfaedbe5ccca48c37b57 Mon Sep 17 00:00:00 2001 From: DarkCat09 <50486086+DarkCat09@users.noreply.github.com> Date: Fri, 1 Sep 2023 12:49:12 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20TxtDot/d?= =?UTF-8?q?ocumentation@a978a32255282e076163aa5bfd1477ef68604943=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 404.html | 57 ++++ docker/index.html | 445 ++++++++++++++++++++++++++++++ env/index.html | 528 +++++++++++++++++++++++++++++++++++ index.html | 57 ++++ reverse/index.html | 581 +++++++++++++++++++++++++++++++++++++++ search/search_index.json | 2 +- selfhost/index.html | 190 ++++++++----- sitemap.xml.gz | Bin 127 -> 127 bytes 8 files changed, 1793 insertions(+), 67 deletions(-) create mode 100644 docker/index.html create mode 100644 env/index.html create mode 100644 reverse/index.html diff --git a/404.html b/404.html index 1f5a9e4..99eab87 100644 --- a/404.html +++ b/404.html @@ -273,6 +273,63 @@ + + + + + +
  • + + + + + Docker + + + + +
  • + + + + + + + + +
  • + + + + + Configuring + + + + +
  • + + + + + + + + +
  • + + + + + Reverse Proxy + + + + +
  • + + + diff --git a/docker/index.html b/docker/index.html new file mode 100644 index 0000000..f5358e1 --- /dev/null +++ b/docker/index.html @@ -0,0 +1,445 @@ + + + + + + + + + + + + + + + + + + + + + + + Docker - txtdot + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + Skip to content + + +
    +
    + +
    + + + + + + +
    + + +
    + +
    + + + + + + +
    +
    + + + +
    +
    +
    + + + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    + + + + + + + +

    Docker

    +

    If you prefer hosting without Docker, see Self-Hosting instead.

    +

    Docker Engine and Docker Compose are required.

    +

    Note that built images are not provided via Docker Hub. +If you can't or don't want to build them on your server +and don't want to setup a CI/CD system, +let us know, +we'll consider setting up a GitHub Actions workflow.

    +
    git clone https://github.com/txtdot/txtdot.git
    +cd txtdot
    +docker compose build
    +docker compose up -d
    +
    + + + + + + +
    +
    + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + \ No newline at end of file diff --git a/env/index.html b/env/index.html new file mode 100644 index 0000000..7d85b5a --- /dev/null +++ b/env/index.html @@ -0,0 +1,528 @@ + + + + + + + + + + + + + + + + + + + + + + + Configuring - txtdot + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + Skip to content + + +
    +
    + +
    + + + + + + +
    + + +
    + +
    + + + + + + +
    +
    + + + +
    +
    +
    + + + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    + + + + + + + +

    Configuring

    +

    txtdot can be configured either with environment variables +or with the .env file in the working directory which has higher priority. +For sample config, see .env.example.

    +

    HOST

    +

    Default: 0.0.0.0

    +

    Host where HTTP server should listen for connections. +Set it to 127.0.0.1 if your txtdot instance is behind reverse proxy, +0.0.0.0 otherwise.

    +

    PORT

    +

    Default: 8080

    +

    Port where HTTP server should listen for connections.

    +

    REVERSE_PROXY

    +

    Default: false

    +

    Set it to true only if your txtdot instance runs behind reverse proxy. +Needed for processing X-Forwarded headers.

    + + + + + + +
    +
    + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + \ No newline at end of file diff --git a/index.html b/index.html index 1d85b96..eecb388 100644 --- a/index.html +++ b/index.html @@ -341,6 +341,63 @@ + + + + + +
  • + + + + + Docker + + + + +
  • + + + + + + + + +
  • + + + + + Configuring + + + + +
  • + + + + + + + + +
  • + + + + + Reverse Proxy + + + + +
  • + + + diff --git a/reverse/index.html b/reverse/index.html new file mode 100644 index 0000000..df25e9c --- /dev/null +++ b/reverse/index.html @@ -0,0 +1,581 @@ + + + + + + + + + + + + + + + + + + + + + Reverse Proxy - txtdot + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + Skip to content + + +
    +
    + +
    + + + + + + +
    + + +
    + +
    + + + + + + +
    +
    + + + +
    +
    +
    + + + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    + + + + + + + +

    Reverse Proxy

    +

    Nginx

    +

    Basically, you just need to set the domain, TLS certificates, +Host and X-Forwarded headers (so txtdot could know the hostname) +and pass all requests to txtdot.

    +
    server {
    +    listen 443 ssl http2;
    +    listen [::]:443 ssl http2;
    +
    +    # Replace the domain
    +    server_name txt.dc09.ru;
    +
    +    ssl_certificate ...pem;
    +    ssl_certificate_key ...key;
    +    # More options here:
    +    # https://ssl-config.mozilla.org/#server=nginx&config=modern
    +
    +    location / {
    +        # Replace 8080 port if needed
    +        proxy_pass http://127.0.0.1:8080;
    +
    +        proxy_set_header Host $host;
    +        proxy_set_header X-Real-IP $remote_addr;
    +        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    +        proxy_set_header X-Forwarded-Proto $scheme;
    +    }
    +}
    +
    +

    On the official instance, TLS is configured in the main nginx config, +so we omit these options below.

    +

    Nginx serves static files faster than NodeJS, let's configure it:

    +
    server {
    +    ...
    +
    +    location /static/ {
    +        alias /home/txtdot/src/dist/static/;
    +    }
    +}
    +
    +

    What about rate-limiting? We don't want the hackers to overload our proxy.

    +

    The config below rate-limits to 2 requests per second, +allows to put up to 4 requests into the queue, +sets the maximum size for zone to 10 megabytes. +See the Nginx blog post for detailed explanation.

    +
    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;
    +
    +server {
    +    ...
    +    location / {
    +        limit_req zone=txtdotapi burst=4;
    +        ...
    +    }
    +    ...
    +}
    +
    +

    Let's put all together. +Here's our sample config:

    +
    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;
    +
    +server {
    +    listen 443 ssl http2;
    +    listen [::]:443 ssl http2;
    +
    +    server_name txt.dc09.ru;
    +
    +    location / {
    +        limit_req zone=txtdotapi burst=4;
    +        proxy_pass http://127.0.0.1:8080;
    +
    +        proxy_set_header Host $host;
    +        proxy_set_header X-Real-IP $remote_addr;
    +        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    +        proxy_set_header X-Forwarded-Proto $scheme;
    +    }
    +
    +    location /static/ {
    +        alias /home/txtdot/src/dist/static/;
    +    }
    +}
    +
    +

    Apache

    +

    Coming soon. +If you are familiar with Apache httpd and want to help, +write a config here (a small explanation as above also would be great) +and open a pull request.

    + + + + + + +
    +
    + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json index 0c4fede..4538824 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Getting Started","text":""},{"location":"#what-is-this","title":"What is this","text":"

    txtdot is a proxy that requests the page by the given URL, extracts only useful data including text, links, pictures and tables, and returns it as an HTML page with a minimalistic design optimized for text reading.

    txtdot increases the loading speed and reduces client's bandwidth usage since no unnecessary code and no scripts are transferred. Also, you won't see any advertisement (unless it's a static picture that is hard to detect as ads). There are no trackers too.

    "},{"location":"#how-to-use-it","title":"How to use it","text":"

    txtdot is an open source software, so everyone can host it on his own server. The official instance is txt.dc09.ru, the list of all instances is here.

    On the main page, there's a handy form where you can specify a URL, choose an engine and a format for parsed data. On the /get page, \"Home\" button returns you to /, \"Original page\" opens the entered URL in the same window without txtdot proxy.

    The latest docs for API endpoints can be found here. For handy JSON API, use /api/parse returning an engine result object (see below). For pure HTML response, use /api/raw-html. Note that both API and browser endpoints on txt.dc09.ru are ratelimited to 2 requests per second.

    "},{"location":"#how-it-works","title":"How it works","text":"

    This project exists thanks to great Mozilla's Readability.js library. The initial idea was to process HTML with it on the server so the client does not need to download and execute heavy JS, doesn't need to use an adblock.

    Readability performs its work very well in most cases. But not always. For example, check any StackOverflow page or Google search results.

    So artegoser wrote the basis of the code keeping in mind that we'll extend txtdot with other engines. For now, engines are functions taking a URL as a parameter, returning an object that contains extracted HTML and plain text, page title and language. The object is rendered with ejs template (or, in /api/parse, just sent as JSON).

    If an ?engine= parameter wasn't passed, but txtdot found that a specific engine is assigned to the requested domain, for example, \"stackoverflow.com\": stackoverflow, it uses that engine to process the URL. Otherwise, the page is parsed with the engine assigned to * (it's Readability).

    "},{"location":"selfhost/","title":"Self-Hosting","text":""},{"location":"selfhost/#without-docker","title":"Without Docker","text":"

    Install Node and NPM:

    # Debian, Ubuntu\nsudo apt install nodejs npm\n# CentOS\nsudo yum install nodejs\n# Arch\nsudo pacman -S nodejs npm\n# Alpine\ndoas apk add nodejs npm\n

    Create a user for txtdot, log in:

    # Not Alpine (coreutils)\nsudo useradd -r -m -s /sbin/nologin -U txtdot\nsudo -u txtdot -i\n\n# Alpine (busybox)\ndoas addgroup -S txtdot\ndoas adduser -h /home/txtdot -s /sbin/nologin -G txtdot -S -D txtdot\ndoas -u txtdot bash\n

    Clone the repo:

    git clone https://github.com/txtdot/txtdot.git src\n

    Install packages, compile TS:

    cd src\nnpm install\nnpm run build\n

    Manually start the server to check if it works (Ctrl+C to exit):

    npm run start\n

    Log out from txtdot account: exit

    "},{"location":"selfhost/#add-txtdot-to-autostart","title":"Add txtdot to autostart","text":"

    Either using systemd unit file:

    wget https://github.com/TxtDot/txtdot/blob/main/txtdot.service\nsudo chown root:root txtdot.service\nsudo chmod 755 txtdot.service\nsudo mv txtdot.service /etc/systemd/system/\nsudo systemctl daemon-reload\nsudo systemctl enable txtdot\nsudo systemctl start txtdot\n

    Or using OpenRC script:

    wget -O txtdot https://github.com/TxtDot/txtdot/blob/main/txtdot.init\ndoas chown root:root txtdot\ndoas chmod 755 txtdot\ndoas mv txtdot /etc/init.d/\ndoas rc-update add txtdot\ndoas rc-service txtdot start\n

    Or using crontab:

    sudo crontab -u txtdot -e\n# The command will open an editor\n# Add this line to the end of the file:\n@reboot sleep 10 && cd /home/txtdot/src && npm run start\n# Save the file and exit\n
    "},{"location":"selfhost/#with-docker","title":"With Docker","text":"

    Docker Engine and Docker Compose are required.

    Note that built images are not provided via Docker Hub. If you can't or don't want to build them on your server and don't want to setup a CI/CD system, let us know, we'll consider setting up a GitHub Actions workflow.

    git clone https://github.com/txtdot/txtdot.git\ncd txtdot\ndocker compose build\ndocker compose up -d\n
    "}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Getting Started","text":""},{"location":"#what-is-this","title":"What is this","text":"

    txtdot is a proxy that requests the page by the given URL, extracts only useful data including text, links, pictures and tables, and returns it as an HTML page with a minimalistic design optimized for text reading.

    txtdot increases the loading speed and reduces client's bandwidth usage since no unnecessary code and no scripts are transferred. Also, you won't see any advertisement (unless it's a static picture that is hard to detect as ads). There are no trackers too.

    "},{"location":"#how-to-use-it","title":"How to use it","text":"

    txtdot is an open source software, so everyone can host it on his own server. The official instance is txt.dc09.ru, the list of all instances is here.

    On the main page, there's a handy form where you can specify a URL, choose an engine and a format for parsed data. On the /get page, \"Home\" button returns you to /, \"Original page\" opens the entered URL in the same window without txtdot proxy.

    The latest docs for API endpoints can be found here. For handy JSON API, use /api/parse returning an engine result object (see below). For pure HTML response, use /api/raw-html. Note that both API and browser endpoints on txt.dc09.ru are ratelimited to 2 requests per second.

    "},{"location":"#how-it-works","title":"How it works","text":"

    This project exists thanks to great Mozilla's Readability.js library. The initial idea was to process HTML with it on the server so the client does not need to download and execute heavy JS, doesn't need to use an adblock.

    Readability performs its work very well in most cases. But not always. For example, check any StackOverflow page or Google search results.

    So artegoser wrote the basis of the code keeping in mind that we'll extend txtdot with other engines. For now, engines are functions taking a URL as a parameter, returning an object that contains extracted HTML and plain text, page title and language. The object is rendered with ejs template (or, in /api/parse, just sent as JSON).

    If an ?engine= parameter wasn't passed, but txtdot found that a specific engine is assigned to the requested domain, for example, \"stackoverflow.com\": stackoverflow, it uses that engine to process the URL. Otherwise, the page is parsed with the engine assigned to * (it's Readability).

    "},{"location":"docker/","title":"Docker","text":"

    If you prefer hosting without Docker, see Self-Hosting instead.

    Docker Engine and Docker Compose are required.

    Note that built images are not provided via Docker Hub. If you can't or don't want to build them on your server and don't want to setup a CI/CD system, let us know, we'll consider setting up a GitHub Actions workflow.

    git clone https://github.com/txtdot/txtdot.git\ncd txtdot\ndocker compose build\ndocker compose up -d\n
    "},{"location":"env/","title":"Configuring","text":"

    txtdot can be configured either with environment variables or with the .env file in the working directory which has higher priority. For sample config, see .env.example.

    "},{"location":"env/#host","title":"HOST","text":"

    Default: 0.0.0.0

    Host where HTTP server should listen for connections. Set it to 127.0.0.1 if your txtdot instance is behind reverse proxy, 0.0.0.0 otherwise.

    "},{"location":"env/#port","title":"PORT","text":"

    Default: 8080

    Port where HTTP server should listen for connections.

    "},{"location":"env/#reverse_proxy","title":"REVERSE_PROXY","text":"

    Default: false

    Set it to true only if your txtdot instance runs behind reverse proxy. Needed for processing X-Forwarded headers.

    "},{"location":"reverse/","title":"Reverse Proxy","text":""},{"location":"reverse/#nginx","title":"Nginx","text":"

    Basically, you just need to set the domain, TLS certificates, Host and X-Forwarded headers (so txtdot could know the hostname) and pass all requests to txtdot.

    server {\n    listen 443 ssl http2;\n    listen [::]:443 ssl http2;\n\n    # Replace the domain\n    server_name txt.dc09.ru;\n\n    ssl_certificate ...pem;\n    ssl_certificate_key ...key;\n    # More options here:\n    # https://ssl-config.mozilla.org/#server=nginx&config=modern\n\n    location / {\n        # Replace 8080 port if needed\n        proxy_pass http://127.0.0.1:8080;\n\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n        proxy_set_header X-Forwarded-Proto $scheme;\n    }\n}\n

    On the official instance, TLS is configured in the main nginx config, so we omit these options below.

    Nginx serves static files faster than NodeJS, let's configure it:

    server {\n    ...\n\n    location /static/ {\n        alias /home/txtdot/src/dist/static/;\n    }\n}\n

    What about rate-limiting? We don't want the hackers to overload our proxy.

    The config below rate-limits to 2 requests per second, allows to put up to 4 requests into the queue, sets the maximum size for zone to 10 megabytes. See the Nginx blog post for detailed explanation.

    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;\n\nserver {\n    ...\n    location / {\n        limit_req zone=txtdotapi burst=4;\n        ...\n    }\n    ...\n}\n

    Let's put all together. Here's our sample config:

    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;\n\nserver {\n    listen 443 ssl http2;\n    listen [::]:443 ssl http2;\n\n    server_name txt.dc09.ru;\n\n    location / {\n        limit_req zone=txtdotapi burst=4;\n        proxy_pass http://127.0.0.1:8080;\n\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n        proxy_set_header X-Forwarded-Proto $scheme;\n    }\n\n    location /static/ {\n        alias /home/txtdot/src/dist/static/;\n    }\n}\n
    "},{"location":"reverse/#apache","title":"Apache","text":"

    Coming soon. If you are familiar with Apache httpd and want to help, write a config here (a small explanation as above also would be great) and open a pull request.

    "},{"location":"selfhost/","title":"Self-Hosting","text":"

    If you prefer hosting with Docker, see Docker instead.

    "},{"location":"selfhost/#install-nodejs-and-npm","title":"Install nodejs and npm","text":"

    For Debian, Ubuntu: packages in the repository are so old, consider installing them with NodeSource. Minimal required version is NodeJS 18.

    Other distros:

    # CentOS\nsudo yum install nodejs\n# Arch\nsudo pacman -S nodejs npm\n# Alpine\ndoas apk add nodejs npm\n
    "},{"location":"selfhost/#create-a-user-for-txtdot","title":"Create a user for txtdot","text":"

    Almost all distros except Alpine:

    sudo useradd -r -m -s /sbin/nologin -U txtdot\nsudo -u txtdot bash\n

    Alpine Linux with busybox and doas:

    doas addgroup -S txtdot\ndoas adduser -h /home/txtdot -s /sbin/nologin -G txtdot -S -D txtdot\ndoas -u txtdot bash\n
    "},{"location":"selfhost/#build-config-and-launch","title":"Build, config and launch","text":"

    Clone the git repository, cd into it:

    git clone https://github.com/txtdot/txtdot.git src\ncd src\n

    Copy and modify the sample config file (see the Configuring section):

    cp .env.example .env\nnano .env\n

    Install packages, compile TS:

    npm install\nnpm run build\n

    Manually start the server to check if it works (Ctrl+C to exit):

    npm run start\n

    Log out from the txtdot account:

    exit\n
    "},{"location":"selfhost/#add-txtdot-to-autostart","title":"Add txtdot to autostart","text":"

    Either using systemd unit file:

    wget https://raw.githubusercontent.com/TxtDot/txtdot/main/config/txtdot.service\nsudo chown root:root txtdot.service\nsudo chmod 644 txtdot.service\nsudo mv txtdot.service /etc/systemd/system/\nsudo systemctl daemon-reload\nsudo systemctl enable txtdot\nsudo systemctl start txtdot\n

    Or using OpenRC script:

    wget -O txtdot https://raw.githubusercontent.com/TxtDot/txtdot/main/config/txtdot.init\ndoas chown root:root txtdot\ndoas chmod 755 txtdot\ndoas mv txtdot /etc/init.d/\ndoas rc-update add txtdot\ndoas rc-service txtdot start\n

    Or using crontab:

    sudo crontab -u txtdot -e\n# The command will open an editor\n# Add this line to the end of the file:\n@reboot sleep 10 && cd /home/txtdot/src && npm run start\n# Save the file and exit\n
    "}]} \ No newline at end of file diff --git a/selfhost/index.html b/selfhost/index.html index b8fe167..44e9869 100644 --- a/selfhost/index.html +++ b/selfhost/index.html @@ -12,6 +12,8 @@ + + @@ -313,28 +315,29 @@ @@ -374,28 +434,29 @@