# Map the Internet (MTI) A project that is designed to scrape the internet to create a visualization of the internet. ## How does it work? ![A diagram depicting how the workers communicate with the server](static/diagram.png?) The process of mapping the internet is fairly simple: 1. The server assigns a worker a domain/list of paths to index. 2. The worker goes through each page of respecting robots.txt. 3. The worker makes a list of all a tags which link to an external domain. 4. The list is sent to the server which compares each external domain to its database. The process operates in 3 stages: 1. Initial Stage/Exponential Growth - The number of new domains is much greater than the existing domains. 2. Saturation Stage - The number of new domains is lower than the amount of existing domains. 3. Rescanning Stage - Existing domains are periodically rescanned looking for new links. ## MTI Worker Protocol (MTIWP) Specification MTIWP is a TCP-based binary application layer protocol used to communicate between the workers and coordination server. The packet is made up of the following fields: - Version (2 Bytes) - Worker ID (6 Bytes) - Timestamp (8 Bytes) - Method (1 Byte) - Payload Length (2 Bytes) - Payload (0-65535 Bytes) Valid Methods: - ACK (0x00) - Ping (0x01) - Pong (0x02) - Hello (0x03) - Index (0x04) - Cancel (0x05) - Summary (0x06) Further Detail Goes Here