122 lines
5.8 KiB
Markdown
122 lines
5.8 KiB
Markdown
# Intro
|
|
|
|
Mitm-Archive consists of two parts:
|
|
- [Addon for mitmproxy](https://git.dc09.ru/mitm-archive/addon) intercepting and saving (archiving) all HTTP responses, written in Python
|
|
- [Server](https://git.dc09.ru/mitm-archive/server) giving exactly the same responses as in an archive for corresponding method+domain+port+path+query, written in Go
|
|
|
|
"Archive" is an SQLite3 database and a directory storing headers and body for each archived response. See [Format](#format) section for details.
|
|
|
|
# User guide: addon
|
|
|
|
## Installing mitmproxy
|
|
First, check if there is a package provided by your Linux distro and its version is 10.x (NOT 9.x or less).
|
|
If there isn't, or you are using Windows, you can download official pre-built binaries: <https://mitmproxy.org/>.
|
|
|
|
In case you are on a Linux distro without glibc or you don't trust official binaries (that's wise), use `pipx install mitmproxy`.
|
|
Mitmproxy also contains native code, so the following packages are required: `base-devel` (includes `gcc`), `openssl-devel`, `libbsd-devel`, `python3-devel`.
|
|
Note: these are package names in Void Linux repository; they may not match with yours.
|
|
|
|
Native library inside `pylsqpack` depends on BSD's `sys/queue.h` which is provided by `libbsd-devel`, but located in `bsd/sys/queue.h`.
|
|
The simpliest solution is:
|
|
```bash
|
|
$ sudo ln -s /usr/include/bsd/sys/queue.h /usr/include/sys/queue.h
|
|
```
|
|
|
|
Now you can run `pipx install mitmproxy`
|
|
|
|
## Configuring HTTPS proxy
|
|
Start `mitmproxy` or `mitmweb`. 1st is a CLI, 2nd provides web UI.
|
|
|
|
I'll assume that you are using Firefox (or forks).
|
|
FF supports importing certificates browser-wide and it's simplier to configure HTTP proxy than in Chromium.
|
|
|
|
I recommend to create a separate browser profile, because next we'll import a TLS cert, and you must remember to remove it after creating an archive for security reasons.
|
|
On Firefox, it's `about:profiles` in address bar > Create a New Profile.
|
|
**It's just an advice;** if manually switching proxy off and removing mitmproxy cert is OK (you're sure you won't forget), then use your main profile,
|
|
but close any active tabs that may produce extra requests that you don't want to be archived (e.g. messenger web clients like Element or Telegram Web).
|
|
|
|
Now, point your browser to the proxy on `127.0.0.1:8080`.
|
|
On Firefox, it's Settings > Network Settings (at the bottom) > Settings… > Manual proxy configuration > HTTP: `127.0.0.1`, Port: `8080` > Checkbox "Also use this proxy for HTTPS".
|
|
|
|
Go to `http://mitm.it`, ignore warnings about an unencrypted connection (mitm.it is served by your local mitmproxy),
|
|
click "Get mitmproxy-ca-cert.pem" below "Firefox".
|
|
Import it: Settings > Privacy & Security > Certificates > View Certificates… > "Authorities" tab > Import… >
|
|
Choose the downloaded cert > Checkbox "Trust this CA to identify web sites" > OK.
|
|
|
|
## Archiving web sites
|
|
To get the addon, either clone the git repo:
|
|
```bash
|
|
$ git clone https://git.dc09.ru/mitm-archive/addon
|
|
$ cd addon
|
|
```
|
|
… or just download the script:
|
|
```bash
|
|
$ mkdir addon && cd addon
|
|
$ curl https://git.dc09.ru/mitm-archive/addon/raw/branch/main/addon.py >addon.py
|
|
```
|
|
|
|
Stop mitmproxy if it's still running (<kbd>q</kbd> and then <kbd>y</kbd> for mitmproxy; <kbd>Ctrl+C</kbd> for mitmweb),
|
|
then re-launch it with the mitm-archive addon: `mitmproxy -s addon.py` (or mitmweb).
|
|
|
|
**Each HTTP response** that comes to mitmproxy is archived:
|
|
metadata is in `./archive.db` SQLite database,
|
|
headers and body are in `./storage/{id}/headers` and `./storage/{id}/body` respectively.
|
|
|
|
To adjust these paths, set the environment variables:
|
|
```bash
|
|
$ export SQLITE_DB_PATH=archive.db
|
|
$ export STORAGE=storage
|
|
$ mitmproxy -s addon.py
|
|
```
|
|
|
|
# User guide: server
|
|
// TODO
|
|
|
|
# What's not implemented
|
|
- Filter host instead of archiving everything (literally 2 lines of code, could be added soon after I figure out the best way to configure this)
|
|
- Addon is configured with env vars, Server uses command-line options; should be unified?
|
|
|
|
Probably useful, but would overcomplicate the storage format and server logic:
|
|
- Alphabetically sort query arguments both in addon and server
|
|
(for now, if an archive contains `/api?key=val&abc=def`,
|
|
the same request `/api?abc=def&key=val` gives 404,
|
|
because the URLs are not exactly the same)
|
|
|
|
Harder to implement and definitely will overcomplicate the project while neither I nor anyone else need this:
|
|
- Config option to omit some query args (if there is no `/api?key=val&abc=def` and it's allowed to omit abc, then search for `/api?key=val`)
|
|
- Store request/response cookies in an archive
|
|
- Config option to disable saving cookies specified by key (e.g. in case they contain credentials)
|
|
- Config option to omit some cookies
|
|
- Invent a custom format or find an existing one (kind of HashMap) for storing query args and cookies that will make the operations listed above more handy
|
|
|
|
For these usage screnarios, especially with cookies, it's simplier and overall better
|
|
to self-host the web site server you are trying to archive
|
|
or re-implement it in your favourite programming language and self-host.
|
|
|
|
# Format
|
|
SQLite3 database contains `data` table with the following columns:
|
|
- `id` - integer primary key for each archived response
|
|
- `method` - string, specifies the request method, default `"GET"`
|
|
- `url` - string, URL formatted as `$scheme://$host:$port$path$query` (e.g. `https://dc09.ru:443/path?key=val`), required
|
|
- `code` - integer, HTTP response status code, default `200`
|
|
|
|
INSERT query is executed with `RETURNING id` clause.
|
|
|
|
In file system storage, the addon creates a directory (if not exists) with the numeric ID returned by SQLite as its name,
|
|
writes raw binary body data without any modifications to `{id}/body` file,
|
|
writes headers in HTTP/1 format (`name: value\r\n`) to `{id}/headers` file.
|
|
|
|
The FS storage structure can be represented graphically this way:
|
|
```
|
|
storage/
|
|
|- 1/
|
|
| |- headers
|
|
| |- body
|
|
|
|
|
|- 2/
|
|
| |- headers
|
|
| |- body
|
|
|
|
|
|- {id}/
|
|
... ... ...
|
|
```
|