ESP Optimization — Data Requirements
ESP wells are the highest-value, highest-risk artificial lift assets in any portfolio. Predictive failure detection requires a minimum of 10–15 documented failure events and two years of hourly sensor data before model training. The data requirements below reflect this reality at each phase.
Establishes whether AI ESP optimization is technically viable and commercially justified. Data requirements are high-level — enough to confirm instrument coverage, characterize production risk, and frame the optimization objective before committing to detailed assessment work.
- Field-level production rates (oil, gas, water) — historical trend
- Number of active ESP wells and their operating status
- Reservoir drive mechanism and depletion stage
- High-level production decline profile
- Known production deferrals and ESP downtime history
- ESP inventory — pump make/model, motor rating, number of stages
- VFD presence and frequency operating range per well
- Downhole gauge presence and last known calibration
- Average historical ESP run life across the asset
- SCADA system type and historian availability
- Surface facility capacity limits (separator, export, water handling)
- Current optimization workflow and ESP monitoring practice
- Existing ESP failure tracking and maintenance logs
- IT/OT integration landscape and data access restrictions
Full data landscape audit. Every source, quality level, and gap is documented. This phase determines whether sufficient historical data — particularly failure events and sensor continuity — exists to train reliable production optimization and predictive failure models.
- Per-well production rates — oil, gas, water (daily and hourly)
- Wellhead and flowing tubing head pressure (FTHP) history
- Casing head pressure (CHP) history
- Flowing bottomhole pressure — measured or calculated from intake gauge
- Static bottomhole pressure and reservoir pressure surveys
- Production allocation methodology and well test frequency
- Reservoir depletion history and pressure-production trends
- Oil API gravity and viscosity by well / zone
- GOR and WOR histories
- Bubble point pressure — critical for ESP intake pressure management
- Full PVT analysis — Bo, Rs, µo, µg
- H2S and CO2 content (impacts motor insulation and material selection)
- Produced water chemistry and salinity
- Pump model, stage count, and manufacturer performance curves
- Motor rating (hp, voltage, current), cable size and length
- VFD make/model and full frequency operating range
- Downhole gauge depth, make, last calibration date
- Tubing size, packer depth, and completion schematic
- Wellbore trajectory (MD/TVD) for deviated/horizontal wells
- Setting depth and pump intake pressure design point
- Historian system type (OSIsoft PI, Aspen IP21, Wonderware, etc.)
- Tag inventory — motor current, frequency, intake/discharge pressure, motor temp, vibration
- Data completeness — % tag uptime over last 3 years per well
- Known sensor drift events and calibration records
- Timestamp integrity and timezone consistency
- OT/IT network architecture and data access pathway
- Well test history — frequency, method, last test date per well
- Productivity index (PI) per well
- Skin factor from pressure transient analysis (PTA)
- Inflow performance relationship (IPR) curves
- Reservoir permeability from core or well test
- ESP failure log — date, failure mode, run life, operating conditions at failure
- Failure mode breakdown: mechanical seal, motor burnout, gas lock, scale, sand ingestion
- Pre-failure SCADA signatures available for each event
- Workover and ESP changeout history per well
- Sand, scale, and corrosion event records
Audit findings drive the model architecture choice, data pipeline design, and technology stack. For ESP optimization, the key architectural decisions are: supervised failure prediction vs. unsupervised anomaly detection (driven by failure event count), and VFD setpoint optimization approach based on available pump curve and downhole gauge coverage.
- Historian API access and query performance benchmarking
- Real-time streaming capability — OPC-UA, REST, or MQTT from SCADA
- Data lake or cloud storage target (Azure, AWS, on-prem)
- ETL requirements and data transformation specifications
- Latency requirements for VFD setpoint recommendation loops
- Cybersecurity and OT segmentation constraints
- Feature set — motor current, frequency, intake pressure, temperature, vibration, production rate
- Label definition — failure event (binary/time-to-failure) and production rate targets
- Train / validation / test split strategy (temporal, not random)
- Handling of missing data and known sensor outage periods
- Time-windowing strategy for failure prediction horizon (7/14/30 day)
- Cross-well vs. per-well model approach based on equipment homogeneity
- Full pump performance curves from manufacturer (H-Q, efficiency, power)
- Motor current vs. frequency operating envelope for VFD optimization
- Cable resistance and voltage drop calculations at operating temperature
- Nodal analysis model inputs for physics-informed hybrid models
- Bubble point pressure for intake pressure management constraints
- VFD operating envelope — min/max frequency, ramp rate limits
- Motor overload protection thresholds
- Surface facility capacity constraints per stream
- Intake pressure floor — must stay above bubble point
- Regulatory production caps or injection limits
All data is cleaned, structured, and used to build and validate the AI model. This is the most data-intensive phase — every sensor feature must be engineered, every failure event labeled, and model performance validated against held-out field data before any deployment decisions are made.
- Cleaned time-series — outliers removed, dropout periods flagged
- ESP failure events labeled with full operating conditions at time of failure
- Production uplift labels from historical VFD frequency changes
- VFD setpoint change records correlated with production and motor response
- Synchronized multi-source dataset at consistent timestamp resolution
- Normalization and scaling applied per feature
- Held-out well test data for production rate prediction validation
- Known failure events withheld from training for failure model testing
- Historical optimization decisions for recommendation engine backtesting
- Physics-based pump curve outputs for hybrid model calibration
- Operator log entries correlated with anomaly events
- Latest PVT data — any fluid sample updates since Assess phase
- Updated reservoir pressure from most recent surveys
- Revised IPR curves post any stimulation or workover
- Updated water cut trends for each well
- Baseline KPIs: current production efficiency, deferral rate, ESP run life
- Model performance thresholds agreed with operations (accuracy, recall)
- Operator trust metrics — acceptable human override frequency
- Integration test data for SCADA / DCS write-back validation
The model deploys into the live production environment. Data requirements shift from historical to real-time — live sensor feeds drive inference, VFD setpoint recommendations are generated continuously, and actioned outcomes feed back into the continuous learning loop to improve performance over time.
- Streaming motor current, frequency, intake pressure, motor temp, vibration — <5 min latency
- Live production rates (or well-test-corrected allocation)
- Wellhead pressure and FTHP per well
- VFD setpoint commands — actioned vs. recommended tracking
- Surface facility real-time constraints (separator level, export pressure)
- Operator override log — when and why VFD recommendations were rejected
- Production response data post-setpoint change (actioned recommendations)
- New failure events labeled in real time as they occur
- Monthly well test updates to refresh production allocation
- Model drift monitoring — prediction error trending over time
- ESP changeout records — new pump model, stages, motor rating, run date
- VFD range updates following equipment changes
- Workover outcomes — new completion parameters and post-workover IPR
- Downhole gauge replacements and new calibration baseline
- Production uplift attributed to AI recommendations (vs. baseline)
- Production deferral avoided through predictive failure alerts
- ESP run life improvement vs. pre-deployment baseline
- Operator adoption rate and recommendation acceptance trending
- Model retraining triggers and schedule