# Binance P2P Data Collector This tool continuously collects public peer-to-peer (P2P) market advertisements from Binance P2P for Venezuela (VES/USDT), normalizing, validating, and saving them as atomic date-partitioned Parquet files for subsequent exploratory data analysis and arbitrage modeling. ## Project Structure ``` p2p-collector/ ├── collect_p2p.py # Entry point: argument parsing, validation/daemon modes ├── config.yaml # Application configuration (endpoints, delays, validation limits) ├── binance_client.py # HTTP client, pagination logic, retry, and 429 backoff ├── normalizer.py # Converts raw nested API responses into a flat 23-column schema ├── validator.py # Row-level filtering and snapshot-level integrity checks ├── storage.py # Atomic Parquet writes, schema references, and checkpoints ├── scheduler.py # Loop executor, initial start offsets, signal handling ├── alert.py # Write alert marker files on 5 consecutive failures & logger setup ├── utils.py # Time and sleep/jitter helpers ├── requirements.txt # Package dependencies (httpx, pandas, pyarrow, pyyaml) ├── Makefile # Automation targets (setup, test, run, clean) └── tests/ # Suite of unit tests for all components ``` ## Prerequisites - **Python 3.8+** (Developed and tested with Python 3.14) - **Make** (utility for running Makefile targets) ## Installation & Setup Set up the Python virtual environment and install all dependencies: ```bash make setup ``` ## Running the Collector ### Mode 1: Continuous Daemon Mode Runs indefinitely, fetching snapshots according to the configured interval (default: 5 minutes) with a ±10% sleep jitter to prevent pattern recognition. Handles graceful shutdown on SIGINT/SIGTERM. ```bash make run ``` ### Mode 2: One-shot Mode (Test/Debug) Runs exactly one cycle (one BUY snapshot and one SELL snapshot), writes the results to disk, and exits immediately: ```bash make run-once ``` ### Mode 3: Validate-Only Mode Validates existing Parquet files without making any network calls. It prints statistics (row count, min/max prices, payment methods) and checks for critical schema issues: ```bash make validate PATH_TO_VALIDATE=data/raw/buy_ads/year=2026/month=06/day=05/ ``` ## Running Tests Run the test suite to verify the client, normalizer, storage, and validation behaviors: ```bash make test ``` ## Output Directory Structure The data is saved under `./data/` folder inside the project root: ``` data/ ├── raw/ │ ├── buy_ads/ │ │ └── year=YYYY/month=MM/day=DD/ │ │ ├── _schema.parquet # Empty schema reference │ │ └── snapshot_YYYYMMDD_HHMMSS.parquet # Atomic snapshot data │ └── sell_ads/ │ └── year=YYYY/month=MM/day=DD/ │ ├── _schema.parquet │ └── snapshot_YYYYMMDD_HHMMSS.parquet ├── logs/ │ └── collector.log # Rotating logs ├── alerts/ │ └── YYYYMMDD_HHMMSS_5_failures.alert # Alert marker JSON file (only on failures) └── checkpoint.json # Restart resilience marker ``` ## Checkpoint Format A checkpoint file is updated on every successful snapshot, ensuring that restarting the daemon will not query the API until the expected interval has passed: ```json { "last_completed_snapshot": "2026-06-05T13:30:00Z", "last_buy_ad_count": 47, "last_sell_ad_count": 53, "consecutive_failures": 0, "total_snapshots": 284, "first_snapshot": "2026-06-01T00:00:00Z", "version": "1.0" } ```