Compare commits
2 Commits
00861dc717
...
a4ede3e887
Author | SHA1 | Date | |
---|---|---|---|
a4ede3e887 | |||
eb79169903 |
38
README.md
38
README.md
@ -1,3 +1,39 @@
|
||||
# map-the-internet
|
||||
# Map the Internet (MTI)
|
||||
|
||||
A project that is designed to scrape the internet to create a visualization of the internet.
|
||||
|
||||
## How does it work?
|
||||
![A diagram depicting how the workers communicate with the server](static/diagram.png?)
|
||||
|
||||
The process of mapping the internet is fairly simple:
|
||||
1. The server assigns a worker a domain/list of paths to index.
|
||||
2. The worker goes through each page of respecting robots.txt.
|
||||
3. The worker makes a list of all a tags which link to an external domain.
|
||||
4. The list is sent to the server which compares each external domain to its database.
|
||||
|
||||
The process operates in 3 stages:
|
||||
1. Initial Stage/Exponential Growth - The number of new domains is much greater than the existing domains.
|
||||
2. Saturation Stage - The number of new domains is lower than the amount of existing domains.
|
||||
3. Rescanning Stage - Existing domains are periodically rescanned looking for new links.
|
||||
|
||||
## MTI Worker Protocol (MTIWP) Specification
|
||||
MTIWP is a TCP-based binary application layer protocol used to communicate between the workers and coordination server.
|
||||
|
||||
The packet is made up of the following fields:
|
||||
- Version (2 Bytes)
|
||||
- Worker ID (6 Bytes)
|
||||
- Timestamp (8 Bytes)
|
||||
- Method (1 Byte)
|
||||
- Payload Length (2 Bytes)
|
||||
- Payload (0-65535 Bytes)
|
||||
|
||||
Valid Methods:
|
||||
- ACK (0x00)
|
||||
- Ping (0x01)
|
||||
- Pong (0x02)
|
||||
- Hello (0x03)
|
||||
- Index (0x04)
|
||||
- Cancel (0x05)
|
||||
- Summary (0x06)
|
||||
|
||||
Further Detail Goes Here
|
||||
|
BIN
static/diagram.png
Normal file
BIN
static/diagram.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 36 KiB |
Loading…
Reference in New Issue
Block a user