Merge branch 'main' into astro-site

This commit is contained in:
Luke Harding 2024-05-01 01:34:43 -04:00
commit e700d91e9e
2 changed files with 39 additions and 1 deletions

View File

@ -1,7 +1,45 @@
# Map the Internet # Map the Internet (MTI)
A project that is designed to scrape the internet to create a visualization of the internet. A project that is designed to scrape the internet to create a visualization of the internet.
## How does it work?
![A diagram depicting how the workers communicate with the server](static/diagram.png?)
The process of mapping the internet is fairly simple:
1. The server assigns a worker a domain/list of paths to index.
2. The worker goes through each page of respecting robots.txt.
3. The worker makes a list of all a tags which link to an external domain.
4. The list is sent to the server which compares each external domain to its database.
The process operates in 3 stages:
1. Initial Stage/Exponential Growth - The number of new domains is much greater than the existing domains.
2. Saturation Stage - The number of new domains is lower than the amount of existing domains.
3. Rescanning Stage - Existing domains are periodically rescanned looking for new links.
## MTI Worker Protocol (MTIWP) Specification
MTIWP is a TCP-based binary application layer protocol used to communicate between the workers and coordination server.
The packet is made up of the following fields:
- Version (2 Bytes)
- Worker ID (6 Bytes)
- Timestamp (8 Bytes)
- Method (1 Byte)
- Payload Length (2 Bytes)
- Payload (0-65535 Bytes)
Valid Methods:
- ACK (0x00)
- Ping (0x01)
- Pong (0x02)
- Hello (0x03)
- Index (0x04)
- Cancel (0x05)
- Summary (0x06)
Further Detail Goes Here
## Contact and License
Project Contact: Luke Harding <luke@lukeh990.io> Project Contact: Luke Harding <luke@lukeh990.io>
Licensed Under a GPL v2 License Licensed Under a GPL v2 License

BIN
static/diagram.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB