P

Data Extraction Architect

PST.AG

cebu, cebu, Philippines Full-time June 22, 2026

Found Description

Specification-Driven Extraction Engineering:

Design and maintain declarative extraction specifications—using Pydantic models, JSON schemas, or domain-specific languages—that describe exactly which fields to capture, their types, and validation rules.

Implement pipelines that translate these specifications into executable extraction plans, leveraging both classical (Scrapy, Playwright) and AI-augmented (LLM-based semantic parsing) backends.

Build reusable specification libraries for recurring data types (product prices, tariff codes, regulatory texts) to accelerate onboarding of new sources.



Autonomous & Self-Healing Systems:

Deploy self-healing spiders that automatically detect website layout changes and repair themselves using Model Context Protocol (MCP) servers (e.g., Scrapy MCP Server, Playwright MCP).

Integrate semantic extraction (Scrapy-LLM, custom LLM pipelines) to eliminate selector brittleness—spiders rely on field description...

Ready to Apply?

Submit your application for Data Extraction Architect at PST.AG

Apply Now