diff --git a/docker/index.html b/docker/index.html index f5358e1..510e269 100644 --- a/docker/index.html +++ b/docker/index.html @@ -387,17 +387,16 @@

Docker

If you prefer hosting without Docker, see Self-Hosting instead.

-

Docker Engine and Docker Compose are required.

-

Note that built images are not provided via Docker Hub. -If you can't or don't want to build them on your server -and don't want to setup a CI/CD system, -let us know, -we'll consider setting up a GitHub Actions workflow.

-
git clone https://github.com/txtdot/txtdot.git
-cd txtdot
-docker compose build
+

Download docker-compose.yml and txtdot configs, +edit them and then start the container:

+
wget https://raw.githubusercontent.com/TxtDot/txtdot/main/docker-compose.yml
+wget -O .env https://raw.githubusercontent.com/TxtDot/txtdot/main/.env.example
+nano .env
 docker compose up -d
 
+

Alternatively, you can configure txtdot with +the environment section of docker-compose config +(don't forget to remove .env and volumes).

diff --git a/env/index.html b/env/index.html index 7d85b5a..3668571 100644 --- a/env/index.html +++ b/env/index.html @@ -371,6 +371,20 @@ REVERSE_PROXY + + +
  • + + PROXY_RES + + +
  • + +
  • + + SWAGGER + +
  • @@ -445,6 +459,20 @@ REVERSE_PROXY + + +
  • + + PROXY_RES + + +
  • + +
  • + + SWAGGER + +
  • @@ -481,6 +509,13 @@ Set it to 127.0.0.1 if your txtdot instance is behind reverse proxy

    Default: false

    Set it to true only if your txtdot instance runs behind reverse proxy. Needed for processing X-Forwarded headers.

    +

    PROXY_RES

    +

    Default: true

    +

    Whether to allow proxying images, video, audio +and everything else through your txtdot instance.

    +

    SWAGGER

    +

    Default: false

    +

    Whether to add /doc route for Swagger API docs.

    diff --git a/search/search_index.json b/search/search_index.json index 4538824..f948270 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Getting Started","text":""},{"location":"#what-is-this","title":"What is this","text":"

    txtdot is a proxy that requests the page by the given URL, extracts only useful data including text, links, pictures and tables, and returns it as an HTML page with a minimalistic design optimized for text reading.

    txtdot increases the loading speed and reduces client's bandwidth usage since no unnecessary code and no scripts are transferred. Also, you won't see any advertisement (unless it's a static picture that is hard to detect as ads). There are no trackers too.

    "},{"location":"#how-to-use-it","title":"How to use it","text":"

    txtdot is an open source software, so everyone can host it on his own server. The official instance is txt.dc09.ru, the list of all instances is here.

    On the main page, there's a handy form where you can specify a URL, choose an engine and a format for parsed data. On the /get page, \"Home\" button returns you to /, \"Original page\" opens the entered URL in the same window without txtdot proxy.

    The latest docs for API endpoints can be found here. For handy JSON API, use /api/parse returning an engine result object (see below). For pure HTML response, use /api/raw-html. Note that both API and browser endpoints on txt.dc09.ru are ratelimited to 2 requests per second.

    "},{"location":"#how-it-works","title":"How it works","text":"

    This project exists thanks to great Mozilla's Readability.js library. The initial idea was to process HTML with it on the server so the client does not need to download and execute heavy JS, doesn't need to use an adblock.

    Readability performs its work very well in most cases. But not always. For example, check any StackOverflow page or Google search results.

    So artegoser wrote the basis of the code keeping in mind that we'll extend txtdot with other engines. For now, engines are functions taking a URL as a parameter, returning an object that contains extracted HTML and plain text, page title and language. The object is rendered with ejs template (or, in /api/parse, just sent as JSON).

    If an ?engine= parameter wasn't passed, but txtdot found that a specific engine is assigned to the requested domain, for example, \"stackoverflow.com\": stackoverflow, it uses that engine to process the URL. Otherwise, the page is parsed with the engine assigned to * (it's Readability).

    "},{"location":"docker/","title":"Docker","text":"

    If you prefer hosting without Docker, see Self-Hosting instead.

    Docker Engine and Docker Compose are required.

    Note that built images are not provided via Docker Hub. If you can't or don't want to build them on your server and don't want to setup a CI/CD system, let us know, we'll consider setting up a GitHub Actions workflow.

    git clone https://github.com/txtdot/txtdot.git\ncd txtdot\ndocker compose build\ndocker compose up -d\n
    "},{"location":"env/","title":"Configuring","text":"

    txtdot can be configured either with environment variables or with the .env file in the working directory which has higher priority. For sample config, see .env.example.

    "},{"location":"env/#host","title":"HOST","text":"

    Default: 0.0.0.0

    Host where HTTP server should listen for connections. Set it to 127.0.0.1 if your txtdot instance is behind reverse proxy, 0.0.0.0 otherwise.

    "},{"location":"env/#port","title":"PORT","text":"

    Default: 8080

    Port where HTTP server should listen for connections.

    "},{"location":"env/#reverse_proxy","title":"REVERSE_PROXY","text":"

    Default: false

    Set it to true only if your txtdot instance runs behind reverse proxy. Needed for processing X-Forwarded headers.

    "},{"location":"reverse/","title":"Reverse Proxy","text":""},{"location":"reverse/#nginx","title":"Nginx","text":"

    Basically, you just need to set the domain, TLS certificates, Host and X-Forwarded headers (so txtdot could know the hostname) and pass all requests to txtdot.

    server {\n    listen 443 ssl http2;\n    listen [::]:443 ssl http2;\n\n    # Replace the domain\n    server_name txt.dc09.ru;\n\n    ssl_certificate ...pem;\n    ssl_certificate_key ...key;\n    # More options here:\n    # https://ssl-config.mozilla.org/#server=nginx&config=modern\n\n    location / {\n        # Replace 8080 port if needed\n        proxy_pass http://127.0.0.1:8080;\n\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n        proxy_set_header X-Forwarded-Proto $scheme;\n    }\n}\n

    On the official instance, TLS is configured in the main nginx config, so we omit these options below.

    Nginx serves static files faster than NodeJS, let's configure it:

    server {\n    ...\n\n    location /static/ {\n        alias /home/txtdot/src/dist/static/;\n    }\n}\n

    What about rate-limiting? We don't want the hackers to overload our proxy.

    The config below rate-limits to 2 requests per second, allows to put up to 4 requests into the queue, sets the maximum size for zone to 10 megabytes. See the Nginx blog post for detailed explanation.

    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;\n\nserver {\n    ...\n    location / {\n        limit_req zone=txtdotapi burst=4;\n        ...\n    }\n    ...\n}\n

    Let's put all together. Here's our sample config:

    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;\n\nserver {\n    listen 443 ssl http2;\n    listen [::]:443 ssl http2;\n\n    server_name txt.dc09.ru;\n\n    location / {\n        limit_req zone=txtdotapi burst=4;\n        proxy_pass http://127.0.0.1:8080;\n\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n        proxy_set_header X-Forwarded-Proto $scheme;\n    }\n\n    location /static/ {\n        alias /home/txtdot/src/dist/static/;\n    }\n}\n
    "},{"location":"reverse/#apache","title":"Apache","text":"

    Coming soon. If you are familiar with Apache httpd and want to help, write a config here (a small explanation as above also would be great) and open a pull request.

    "},{"location":"selfhost/","title":"Self-Hosting","text":"

    If you prefer hosting with Docker, see Docker instead.

    "},{"location":"selfhost/#install-nodejs-and-npm","title":"Install nodejs and npm","text":"

    For Debian, Ubuntu: packages in the repository are so old, consider installing them with NodeSource. Minimal required version is NodeJS 18.

    Other distros:

    # CentOS\nsudo yum install nodejs\n# Arch\nsudo pacman -S nodejs npm\n# Alpine\ndoas apk add nodejs npm\n
    "},{"location":"selfhost/#create-a-user-for-txtdot","title":"Create a user for txtdot","text":"

    Almost all distros except Alpine:

    sudo useradd -r -m -s /sbin/nologin -U txtdot\nsudo -u txtdot bash\n

    Alpine Linux with busybox and doas:

    doas addgroup -S txtdot\ndoas adduser -h /home/txtdot -s /sbin/nologin -G txtdot -S -D txtdot\ndoas -u txtdot bash\n
    "},{"location":"selfhost/#build-config-and-launch","title":"Build, config and launch","text":"

    Clone the git repository, cd into it:

    git clone https://github.com/txtdot/txtdot.git src\ncd src\n

    Copy and modify the sample config file (see the Configuring section):

    cp .env.example .env\nnano .env\n

    Install packages, compile TS:

    npm install\nnpm run build\n

    Manually start the server to check if it works (Ctrl+C to exit):

    npm run start\n

    Log out from the txtdot account:

    exit\n
    "},{"location":"selfhost/#add-txtdot-to-autostart","title":"Add txtdot to autostart","text":"

    Either using systemd unit file:

    wget https://raw.githubusercontent.com/TxtDot/txtdot/main/config/txtdot.service\nsudo chown root:root txtdot.service\nsudo chmod 644 txtdot.service\nsudo mv txtdot.service /etc/systemd/system/\nsudo systemctl daemon-reload\nsudo systemctl enable txtdot\nsudo systemctl start txtdot\n

    Or using OpenRC script:

    wget -O txtdot https://raw.githubusercontent.com/TxtDot/txtdot/main/config/txtdot.init\ndoas chown root:root txtdot\ndoas chmod 755 txtdot\ndoas mv txtdot /etc/init.d/\ndoas rc-update add txtdot\ndoas rc-service txtdot start\n

    Or using crontab:

    sudo crontab -u txtdot -e\n# The command will open an editor\n# Add this line to the end of the file:\n@reboot sleep 10 && cd /home/txtdot/src && npm run start\n# Save the file and exit\n
    "}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Getting Started","text":""},{"location":"#what-is-this","title":"What is this","text":"

    txtdot is a proxy that requests the page by the given URL, extracts only useful data including text, links, pictures and tables, and returns it as an HTML page with a minimalistic design optimized for text reading.

    txtdot increases the loading speed and reduces client's bandwidth usage since no unnecessary code and no scripts are transferred. Also, you won't see any advertisement (unless it's a static picture that is hard to detect as ads). There are no trackers too.

    "},{"location":"#how-to-use-it","title":"How to use it","text":"

    txtdot is an open source software, so everyone can host it on his own server. The official instance is txt.dc09.ru, the list of all instances is here.

    On the main page, there's a handy form where you can specify a URL, choose an engine and a format for parsed data. On the /get page, \"Home\" button returns you to /, \"Original page\" opens the entered URL in the same window without txtdot proxy.

    The latest docs for API endpoints can be found here. For handy JSON API, use /api/parse returning an engine result object (see below). For pure HTML response, use /api/raw-html. Note that both API and browser endpoints on txt.dc09.ru are ratelimited to 2 requests per second.

    "},{"location":"#how-it-works","title":"How it works","text":"

    This project exists thanks to great Mozilla's Readability.js library. The initial idea was to process HTML with it on the server so the client does not need to download and execute heavy JS, doesn't need to use an adblock.

    Readability performs its work very well in most cases. But not always. For example, check any StackOverflow page or Google search results.

    So artegoser wrote the basis of the code keeping in mind that we'll extend txtdot with other engines. For now, engines are functions taking a URL as a parameter, returning an object that contains extracted HTML and plain text, page title and language. The object is rendered with ejs template (or, in /api/parse, just sent as JSON).

    If an ?engine= parameter wasn't passed, but txtdot found that a specific engine is assigned to the requested domain, for example, \"stackoverflow.com\": stackoverflow, it uses that engine to process the URL. Otherwise, the page is parsed with the engine assigned to * (it's Readability).

    "},{"location":"docker/","title":"Docker","text":"

    If you prefer hosting without Docker, see Self-Hosting instead.

    Download docker-compose.yml and txtdot configs, edit them and then start the container:

    wget https://raw.githubusercontent.com/TxtDot/txtdot/main/docker-compose.yml\nwget -O .env https://raw.githubusercontent.com/TxtDot/txtdot/main/.env.example\nnano .env\ndocker compose up -d\n

    Alternatively, you can configure txtdot with the environment section of docker-compose config (don't forget to remove .env and volumes).

    "},{"location":"env/","title":"Configuring","text":"

    txtdot can be configured either with environment variables or with the .env file in the working directory which has higher priority. For sample config, see .env.example.

    "},{"location":"env/#host","title":"HOST","text":"

    Default: 0.0.0.0

    Host where HTTP server should listen for connections. Set it to 127.0.0.1 if your txtdot instance is behind reverse proxy, 0.0.0.0 otherwise.

    "},{"location":"env/#port","title":"PORT","text":"

    Default: 8080

    Port where HTTP server should listen for connections.

    "},{"location":"env/#reverse_proxy","title":"REVERSE_PROXY","text":"

    Default: false

    Set it to true only if your txtdot instance runs behind reverse proxy. Needed for processing X-Forwarded headers.

    "},{"location":"env/#proxy_res","title":"PROXY_RES","text":"

    Default: true

    Whether to allow proxying images, video, audio and everything else through your txtdot instance.

    "},{"location":"env/#swagger","title":"SWAGGER","text":"

    Default: false

    Whether to add /doc route for Swagger API docs.

    "},{"location":"reverse/","title":"Reverse Proxy","text":""},{"location":"reverse/#nginx","title":"Nginx","text":"

    Basically, you just need to set the domain, TLS certificates, Host and X-Forwarded headers (so txtdot could know the hostname) and pass all requests to txtdot.

    server {\n    listen 443 ssl http2;\n    listen [::]:443 ssl http2;\n\n    # Replace the domain\n    server_name txt.dc09.ru;\n\n    ssl_certificate ...pem;\n    ssl_certificate_key ...key;\n    # More options here:\n    # https://ssl-config.mozilla.org/#server=nginx&config=modern\n\n    location / {\n        # Replace 8080 port if needed\n        proxy_pass http://127.0.0.1:8080;\n\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n        proxy_set_header X-Forwarded-Proto $scheme;\n    }\n}\n

    On the official instance, TLS is configured in the main nginx config, so we omit these options below.

    Nginx serves static files faster than NodeJS, let's configure it:

    server {\n    ...\n\n    location /static/ {\n        alias /home/txtdot/src/dist/static/;\n    }\n}\n

    What about rate-limiting? We don't want the hackers to overload our proxy.

    The config below rate-limits to 2 requests per second, allows to put up to 4 requests into the queue, sets the maximum size for zone to 10 megabytes. See the Nginx blog post for detailed explanation.

    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;\n\nserver {\n    ...\n    location / {\n        limit_req zone=txtdotapi burst=4;\n        ...\n    }\n    ...\n}\n

    Let's put all together. Here's our sample config:

    limit_req_zone $binary_remote_addr zone=txtdotapi:10m rate=2r/s;\n\nserver {\n    listen 443 ssl http2;\n    listen [::]:443 ssl http2;\n\n    server_name txt.dc09.ru;\n\n    location / {\n        limit_req zone=txtdotapi burst=4;\n        proxy_pass http://127.0.0.1:8080;\n\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n        proxy_set_header X-Forwarded-Proto $scheme;\n    }\n\n    location /static/ {\n        alias /home/txtdot/src/dist/static/;\n    }\n}\n
    "},{"location":"reverse/#apache","title":"Apache","text":"

    Coming soon. If you are familiar with Apache httpd and want to help, write a config here (a small explanation as above also would be great) and open a pull request.

    "},{"location":"selfhost/","title":"Self-Hosting","text":"

    If you prefer hosting with Docker, see Docker instead.

    "},{"location":"selfhost/#install-nodejs-and-npm","title":"Install nodejs and npm","text":"

    For Debian, Ubuntu: packages in the repository are so old, consider installing them with NodeSource. Minimal required version is NodeJS 18.

    Other distros:

    # CentOS\nsudo yum install nodejs\n# Arch\nsudo pacman -S nodejs npm\n# Alpine\ndoas apk add nodejs npm\n
    "},{"location":"selfhost/#create-a-user-for-txtdot","title":"Create a user for txtdot","text":"

    Almost all distros except Alpine:

    sudo useradd -r -m -s /sbin/nologin -U txtdot\nsudo -u txtdot bash\n

    Alpine Linux with busybox and doas:

    doas addgroup -S txtdot\ndoas adduser -h /home/txtdot -s /sbin/nologin -G txtdot -S -D txtdot\ndoas -u txtdot bash\n
    "},{"location":"selfhost/#build-config-and-launch","title":"Build, config and launch","text":"

    Clone the git repository, cd into it:

    git clone https://github.com/txtdot/txtdot.git src\ncd src\n

    Copy and modify the sample config file (see the Configuring section):

    cp .env.example .env\nnano .env\n

    Install packages, compile TS:

    npm install\nnpm run build\n

    Manually start the server to check if it works (Ctrl+C to exit):

    npm run start\n

    Log out from the txtdot account:

    exit\n
    "},{"location":"selfhost/#add-txtdot-to-autostart","title":"Add txtdot to autostart","text":"

    Either using systemd unit file:

    wget https://raw.githubusercontent.com/TxtDot/txtdot/main/config/txtdot.service\nsudo chown root:root txtdot.service\nsudo chmod 644 txtdot.service\nsudo mv txtdot.service /etc/systemd/system/\nsudo systemctl daemon-reload\nsudo systemctl enable txtdot\nsudo systemctl start txtdot\n

    Or using OpenRC script:

    wget -O txtdot https://raw.githubusercontent.com/TxtDot/txtdot/main/config/txtdot.init\ndoas chown root:root txtdot\ndoas chmod 755 txtdot\ndoas mv txtdot /etc/init.d/\ndoas rc-update add txtdot\ndoas rc-service txtdot start\n

    Or using crontab:

    sudo crontab -u txtdot -e\n# The command will open an editor\n# Add this line to the end of the file:\n@reboot sleep 10 && cd /home/txtdot/src && npm run start\n# Save the file and exit\n
    "}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index c80ac85..ca1809a 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ