Capture all browser HTTP[s] calls to load a web page

Sun, Jan 26, 2020 Read as Markdown

How does one find out what network calls, browser requests to load web pages?

The simple method - download the HTML page, parse the page, find out all the network calls using web parsers like beautifulsoup.

The shortcoming in the method, what about the network calls made by your browser before requesting the web page? For example, firefox makes a call to ocsp.digicert.com to obtain revocation status on digital certificates. The protocol is Online Certificate Status Protocol.

The network tab in the browser dev tool doesn’t display the network call to ocsp.digicert.com.

One of the ways to find out all the requests originating from the machine is by using a proxy.

Proxy

MITMProxy is a python interactive man-in-the-middle proxy for HTTP and HTTPS calls. Installation instructions here.

After installing the proxy, you can run the command mitmweb or mitmdump. Both will run the proxy in the localhost in the port 8080, with mitmweb, you get intuitive web UI to browse the request/response cycle and order of the web requests.

As the name indicate, “man in the middle,” the proxy gives the ability to modify the request and response. To do so, a simple python script with a custom function can do the trick. The script accepts two python functions, request and response. After every request, request and after every response, response method will be called. There are other supported concepts like addons.

To extract the details of the request and response. Here is a small script, https://gitlab.com/snippets/1933443.

After receiving a successful response, the MITM invokes the function response; the function collects the details and dumps the details to the JSON file. uuid4 will ensure a unique file name for every function call.

Even though MITM provides an option to dump the response, it’s in binary format and suited for data analysis.Next is to simulate the browser request.

Selenium

Selenium is one of the tools used by web-developers and testers for an end to end web testing. Selenium helps developers to test the code on the browser and assert the HTML elements on the same page.

To get selenium Python web driver working, one needs to install Java first, followed by geckodriver, and finally Python selenium driver(pip install selenium). Don’t forget to place geckodriver in $PATH.

Code: https://gitlab.com/snippets/1933454

Run the MITM in a terminal, then selenium another terminal, the output directory must be filled with JSON files.

Here is how the directory looks

The sample JSON files output