# Intro Mitm-Archive consists of two parts: - [Addon for mitmproxy](https://git.dc09.ru/mitm-archive/addon) intercepting and saving (archiving) all HTTP responses, written in Python - [Server](https://git.dc09.ru/mitm-archive/server) giving exactly the same responses as in an archive for corresponding method+domain+port+path+query, written in Go "Archive" is an SQLite3 database and a directory storing headers and body for each archived response. See [Format](#format) section for details. # User guide: addon ## Installing mitmproxy First, check if there is a package provided by your Linux distro and its version is 10.x (NOT 9.x or less). If there isn't, or you are using Windows, you can download official pre-built binaries: . In case you are on a Linux distro without glibc or you don't trust official binaries (that's wise), use `pipx install mitmproxy`. Mitmproxy also contains native code, so the following packages are required: `base-devel` (includes `gcc`), `openssl-devel`, `libbsd-devel`, `python3-devel`. Note: these are package names in Void Linux repository; they may not match with yours. Native library inside `pylsqpack` depends on BSD's `sys/queue.h` which is provided by `libbsd-devel`, but located in `bsd/sys/queue.h`. The simpliest solution is: ```bash $ sudo ln -s /usr/include/bsd/sys/queue.h /usr/include/sys/queue.h ``` Now you can run `pipx install mitmproxy` ## Configuring HTTPS proxy Start `mitmproxy` or `mitmweb`. 1st is a CLI, 2nd provides web UI. I'll assume that you are using Firefox (or forks). FF supports importing certificates browser-wide and it's simplier to configure HTTP proxy than in Chromium. I recommend to create a separate browser profile, because next we'll import a TLS cert, and you must remember to remove it after creating an archive for security reasons. On Firefox, it's `about:profiles` in address bar > Create a New Profile. **It's just an advice;** if manually switching proxy off and removing mitmproxy cert is OK (you're sure you won't forget), then use your main profile, but close any active tabs that may produce extra requests that you don't want to be archived (e.g. messenger web clients like Element or Telegram Web). Now, point your browser to the proxy on `127.0.0.1:8080`. On Firefox, it's Settings > Network Settings (at the bottom) > Settings… > Manual proxy configuration > HTTP: `127.0.0.1`, Port: `8080` > Checkbox "Also use this proxy for HTTPS". Go to `http://mitm.it`, ignore warnings about an unencrypted connection (mitm.it is served by your local mitmproxy), click "Get mitmproxy-ca-cert.pem" below "Firefox". Import it: Settings > Privacy & Security > Certificates > View Certificates… > "Authorities" tab > Import… > Choose the downloaded cert > Checkbox "Trust this CA to identify web sites" > OK. ## Archiving web sites To get the addon, either clone the git repo: ```bash $ git clone https://git.dc09.ru/mitm-archive/addon $ cd addon ``` … or just download the script: ```bash $ mkdir addon && cd addon $ curl https://git.dc09.ru/mitm-archive/addon/raw/branch/main/addon.py >addon.py ``` Stop mitmproxy if it's still running (q and then y for mitmproxy; Ctrl+C for mitmweb), then re-launch it with the mitm-archive addon: `mitmproxy -s addon.py` (or mitmweb). **Each HTTP response** that comes to mitmproxy is archived: metadata is in `./archive.db` SQLite database, headers and body are in `./storage/{id}/headers` and `./storage/{id}/body` respectively. To adjust these paths, set the environment variables: ```bash $ export SQLITE_DB_PATH=archive.db $ export STORAGE=storage $ mitmproxy -s addon.py ``` # User guide: server // TODO # What's not implemented - Filter host instead of archiving everything (literally 2 lines of code, could be added soon after I figure out the best way to configure this) - Addon is configured with env vars, Server uses command-line options; should be unified? Probably useful, but would overcomplicate the storage format and server logic: - Alphabetically sort query arguments both in addon and server (for now, if an archive contains `/api?key=val&abc=def`, the same request `/api?abc=def&key=val` gives 404, because the URLs are not exactly the same) Harder to implement and definitely will overcomplicate the project while neither I nor anyone else need this: - Config option to omit some query args (if there is no `/api?key=val&abc=def` and it's allowed to omit abc, then search for `/api?key=val`) - Store request/response cookies in an archive - Config option to disable saving cookies specified by key (e.g. in case they contain credentials) - Config option to omit some cookies - Invent a custom format or find an existing one (kind of HashMap) for storing query args and cookies that will make the operations listed above more handy For these usage screnarios, especially with cookies, it's simplier and overall better to self-host the web site server you are trying to archive or re-implement it in your favourite programming language and self-host. # Format SQLite3 database contains `data` table with the following columns: - `id` - integer primary key for each archived response - `method` - string, specifies the request method, default `"GET"` - `url` - string, URL formatted as `$scheme://$host:$port$path$query` (e.g. `https://dc09.ru:443/path?key=val`), required - `code` - integer, HTTP response status code, default `200` INSERT query is executed with `RETURNING id` clause. In file system storage, the addon creates a directory (if not exists) with the numeric ID returned by SQLite as its name, writes raw binary body data without any modifications to `{id}/body` file, writes headers in HTTP/1 format (`name: value\r\n`) to `{id}/headers` file. The FS storage structure can be represented graphically this way: ``` storage/ |- 1/ | |- headers | |- body | |- 2/ | |- headers | |- body | |- {id}/ ... ... ... ```