What this is

A Python-based log parser and weather enricher. It demonstrates how to combine multiple Python libraries in a single, cohesive workflow: CLI argument parsing, logging, CSV processing with pandas, live API integration, defensive error handling, and automated exporting/archiving.

Purpose: skills practice and demonstration. This is a conceptual exercise to learn, integrate, and ship a clean, original script

Below is a demonstration of the script in action.



Tech stack

  • Python (3.x)
  • pandas – CSV parsing & data enrichment
  • requests – Weather API calls
  • python-dotenv – load WEATHER_API_KEY from .env
  • argparse – CLI interface
  • logging – structured logs: main.log, errors.log, export.log
  • zipfile – package outputs
  • Standard libs: os, sys, json, time, datetime, pathlib

What the script does

  1. Accepts CLI inputpython3 main.py logs.csv "Tokyo" --units C
    • csv: path to input log file
    • city: city for live weather lookup
    • --units: C or F (default F)
  2. Reads logs & filters only CRITICAL rows with pandas.
  3. Pulls live weather (current temp + condition) from a public API using an API key loaded from .env.
  4. Enriches each CRITICAL entry with the weather fields.
  5. Exports the result as timestamped CSV and JSON, then zips both into a single archive.
    Example outputs:
    • critical_with_weather_YYYYMMDD_HHMM.csv
    • critical_with_weather_YYYYMMDD_HHMM.json
    • critical_with_weather_YYYYMMDD_HHMM.zip
  6. Logs the lifecycle (start/end, API attempts, errors, export/zip steps) to dedicated log files.

Development log (Parts 1–3)

Part 1 — Argument parsing & logging foundation

Goal: My goal for part 1 was to establish a solid base so the script accepts user input and reports progress/errors consistently. Here’s what i did

  • Implemented positional CLI args (csv, city) and optional --units.
  • Validated input file exists; fail fast with a clear message.
  • Built a split logging system:
    • main.log – info/progress
    • errors.log – warnings/errors
    • export.log – export/packaging events (set up here for later phases)
  • Logged Script start/end for clear lifecycle markers.
  • Note: I implemented this on my own first, then used AI for polish and troubleshooting (e.g., avoiding multiple basicConfig() pitfalls and separating concerns with dedicated loggers).

Part 2 — Resilience + enrichment

Focus: Make the script resilient and enrich the filtered data with live weather info.

  • Constructed the Weather API URL using the CLI city and WEATHER_API_KEY from .env.
  • Added retry logic with a short delay for transient request failures.
  • Robust exception handling for:
    • missing/invalid CSV
    • empty CSV or bad format
    • request failures/timeouts
    • invalid JSON or missing keys
  • Filtered CRITICAL rows via pandas; handled “no CRITICAL rows” gracefully.
  • Enriched the filtered DF with weather_location, weather_temp, weather_unit, weather_condition.
    • Used .copy() before enrichment to avoid SettingWithCopyWarning.
  • Exported to CSV and JSON (initially with static filenames).

Parts 3 Export polish & packaging

Focus: Professionalize the export flow and user experience.

  • Timestamped filenames (YYYYMMDD_HHMM) for CSV/JSON to avoid overwrites and aid traceability.
  • Zipped exports into a single archive for clean delivery.
  • Used arcname=os.path.basename(...) in zipf.write() so the ZIP contains just file names, not host paths.
  • Prompted the user (yes/no/quit) whether to delete the original CSV/JSON after zipping.
  • Expanded export lifecycle logging (file creation, zip success, cleanup).
  • Ensured clean terminal output and avoided pandas warnings by working on a copied DF.

Troubleshooting & decisions

1) pandas filter/enrich bug

  • Original attempt (buggy): critical_rows = df[df.get('level') == 'CRITICAL']
    • df.get('level') here is wrong usage; and modifying a filtered view later can trigger SettingWithCopyWarning.
  • Final fix: critical_rows = df[df['level'] == 'CRITICAL'].copy()
    • Use column selection via df['level'] and create a copy before mutation.

2) ZIP contents & “zip slip” learning

  • Original zipping:
    • This can store full/relative paths as-is.
  • Reviewed and improved:
    • I reviewed this choice and learned about zip-slip (a path traversal vulnerability during extraction).
    • My script only creates archives, it doesn’t extract them, so it’s not directly vulnerable.
    • Still, using arcname=os.path.basename(...) is a no-downside best practice that keeps archives clean and predictable.

Code walkthroughs (with line-by-line explanations)

Walkthrough A — CLI parsing & early validation

Line by line

  • ArgumentParser(...): creates a CLI interface with a helpful description.
  • add_argument("csv"): required positional argument for the input CSV path.
  • add_argument("city"): required positional argument for the target city.
  • add_argument("--units", choices=["C","F"], default="F"): optional flag constrained to C/F.
  • args = parser.parse_args(): parses actual CLI values into args.
  • if not os.path.isfile(args.csv): fail fast if the file doesn’t exist.
  • errors_logger.error(...): record the problem in errors.log.
  • print(...) + sys.exit(1): inform user and exit non-zero.
  • logger.info(...): confirm the CSV is found in main.log.

Why this matters

  • Clear contracts for how to run the script.
  • Early, explicit validation and structured logging improve UX and debuggability.

Walkthrough B — Weather API request with retry + parsing

url = f"https://api.weatherapi.com/v1/current.json?q={args.city}&key={API_KEY}"
for attempt in range(3):
    try:
        response = requests.get(url, timeout=5)
        if response.status_code == 200:
            break
        else:
            logging.warning(f"Attempt {attempt+1}: API returned {response.status_code}")
    except requests.exceptions.RequestException as e:
        logging.warning(f"Attempt {attempt+1}: {e}")
    time.sleep(2)
else:
    print("All attempts failed.")
    errors_logger.error("All API attempts failed.")
    sys.exit(1)

data = response.json()
location = f"{data['location']['name']}, {data['location']['country']}"
temp, unit_symbol = (
    (data["current"]["temp_f"], "°F") if args.units == "F" else (data["current"]["temp_c"], "°C")
)
condition = data["current"]["condition"]["text"]

Line by line

  • Build url using user-provided city and secret API_KEY.
  • for attempt in range(3): bounded retry loop.
  • requests.get(..., timeout=5): prevent hanging; fail fast.
  • if response.status_code == 200: break: success path exits the loop.
  • logging.warning(...): log non-200s and network exceptions.
  • time.sleep(2): small backoff between attempts.
  • else: on the for: runs only if loop never breaks; we bail with log + exit.
  • response.json(): parse JSON payload (exception-handled in the full script).
  • Extract location, choose temp by unit, and get condition.

Why this matters

  • Solid network hygiene (timeouts, bounded retries).
  • Clean separation of transport errors vs data parsing.
  • Reliable behavior under flaky networks.

Walkthrough C — Filter & enrich safely with pandas

df = pd.read_csv(args.csv)
critical_rows = df[df['level'] == 'CRITICAL'].copy()

critical_rows.loc[:, "weather_location"] = location
critical_rows.loc[:, "weather_temp"] = temp
critical_rows.loc[:, "weather_unit"] = unit_symbol
critical_rows.loc[:, "weather_condition"] = condition

Line by line

  • pd.read_csv(...): load the input logs into a DataFrame.
  • df['level'] == 'CRITICAL': boolean mask for the target rows.
  • .copy(): create a new, writable DataFrame to avoid view/assignment issues.
  • .loc[:, "weather_location"] = ...: column-wise assignment (explicit, clear).
  • Repeat for weather_temp, weather_unit, weather_condition.

Why this matters

  • Demonstrates correct pandas filtering and mutation without SettingWithCopyWarning.
  • Keeps the original df untouched while producing a clean enriched subset.

Reflection:

Compared to my Nmap Dashboard project, I used AI less here:

  • I wrote most of the code myself, then used AI for polish, debugging, and second opinions.
  • When AI suggested alternatives, I researched and decided deliberately (e.g., .copy() to avoid chained assignment issues, arcname for ZIP hygiene).
  • The process helped me go a layer deeper into Python and pandas; I can now read and explain everything I wrote and most AI suggestions I kept.

How to run

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# create .env with your API key
echo 'WEATHER_API_KEY=your_api_key_here' > .env

# run the script
python3 main.py logs.csv "Tokyo" --units C

Outputs

  • critical_with_weather_YYYYMMDD_HHMM.csv
  • critical_with_weather_YYYYMMDD_HHMM.json
  • critical_with_weather_YYYYMMDD_HHMM.zip

TRY IT OUT AT https://github.com/yairemartinez/Log-Parser-Weather-Enricher

Closing Summary

Working through this capstone gave me confidence in handling end-to-end Python scripting projects from parsing input to enriching data, handling errors, exporting results, and securing the workflow. While the script itself is conceptual, the process taught me transferable skills: how to think about data pipelines, build resilient code, and apply security-minded decision making.

THANKS FOR READING. 🙂