Rod Pump Optimization — Data Requirements
Rod pump wells are the most widely deployed artificial lift method in U.S. onshore operations and among the most data-rich — dynagraph (surface and downhole) card data carries a diagnostic signature for every failure mode from pump-off to rod parting to gas interference. The data requirements below reflect the unique value and unique complexity of dynagraph-based AI optimization.
Establishes whether AI rod pump optimization is technically viable for the asset. At this stage, the key questions are the scope of instrumentation coverage (particularly dynagraph card collection), the frequency and severity of rod and pump failures, and the degree to which stroke/speed settings are currently being actively managed.
- Field-level and per-well oil, gas, water production rates — historical trend
- Number of rod pump wells in scope and their operating status
- Reservoir drive mechanism and depletion stage
- Production decline profile and estimated deferral from pump-off events
- Known high-GOR or high-viscosity wells — flags for gas interference or fluid pound risk
- Beam pump unit inventory — API unit size, gear rating, counterbalance type
- Downhole pump type (tubing vs. rod pump), size, and plunger clearance class
- Rod string design — grade, size taper, rod count per well
- VFD or multi-speed drive presence and operating range
- Current stroke length and stroke speed (SPM) settings per well
- Dynagraph card collection capability — surface load cell and position sensor status
- Current pump-off control (POC) system type and settings
- Current optimization workflow — manual, POC-automated, or unmanaged
- Surface facility capacity limits
- IT/OT integration landscape and historian availability
Full technical audit of the rod pump system and supporting data. This phase goes deeper than standard production data review — it audits the dynagraph card library, validates rod string designs, assesses pump fill efficiency history, and determines whether the historian and card data have the quality and volume to support an AI model for pump-off control, fillage optimization, and failure prediction.
- Per-well oil, gas, and water production rates — daily and hourly where available
- Wellhead pressure and flowing tubing head pressure (FTHP) history
- Casing head pressure (CHP) history — gas cap and annular gas management
- Flowing bottomhole pressure — measured or pump intake pressure derived
- Static bottomhole pressure and reservoir pressure surveys
- Well test history — frequency, method, last test date per well
- Pump submergence and fluid level history (fluid level shots where available)
- Oil API gravity and dead oil viscosity by well — viscosity critical for pump selection and rod loading
- Solution GOR, producing GOR history, and bubble point pressure
- Full PVT analysis — Bo, Rs, µo (viscosity vs. temperature curve for waxy/heavy oil wells)
- Water cut history and produced water specific gravity
- Free gas at pump intake — calculated from producing GOR and pump intake pressure
- Paraffin and scale deposition history — impacts pump and rod operating envelope
- Rod string design — grade, diameter taper (from surface to pump), rod count per section
- Rod string API design report — polished rod load, torque analysis, safety factors
- Downhole pump — type, bore size, plunger clearance, barrel length, valve type
- Pump setting depth (MD and TVD) and seating nipple specification
- Tubing size, tubing anchor depth, and packer presence
- Surface unit — API unit designation, stroke length, peak torque rating
- Counterbalance type (rotary vs. conventional) and counterbalance effect (CBE) measurements
- VFD or multi-speed drive — frequency range and current speed settings
- Surface dynagraph card library — date, well, load (lbf) and position (in) data pairs
- Card collection frequency — continuous, periodic, or event-triggered
- Downhole dynagraph cards (wave equation derived) where available
- Card labeling history — pump-off, full pump, fluid pound, gas interference, rod part, valve leak labels
- Card resolution — data points per stroke cycle
- Load cell and position sensor calibration records and known drift events
- POC system type and pump-off detection method (card shape vs. peak torque limit)
- Historian system type and tag inventory
- Motor current and power consumption history per well
- Stroke per minute (SPM) history and VFD frequency history
- Polished rod torque and peak gear box torque trends
- Data completeness — % tag uptime over last 3 years per well
- Known sensor dropout events and calibration records
- Rod failure log — date, failure mode, depth of failure, operating conditions, dynagraph card at failure
- Pump repair/replacement log — failure mode found at surface: worn plunger, scored barrel, valve failure
- Tubing leak history — associated with rod wear at couplings
- Scale, paraffin, and corrosion event records per well
- Workover history — rod string redesign, pump resizing, tubing changes
Audit findings determine the optimization architecture. Rod pump AI optimization spans three distinct problem types — pump fillage optimization (stroke/speed tuning), pump condition classification (dynagraph card pattern recognition), and predictive rod failure detection — and the architecture must address each based on the data available. Model selection is also governed by whether VFDs are installed enabling real-time setpoint actioning.
- POC / SCADA system API access and card data export format
- Real-time dynagraph card streaming capability — card capture frequency and format (CSV, binary)
- Historian tag access for SPM, torque, current, and production allocation data
- Data lake target for card image and time-series storage
- Latency requirements for pump-off control loop — POC setpoint write-back pathway
- Cybersecurity and OT segmentation constraints
- Feature set — card shape parameters (peak load, minimum load, area, shape index), SPM, fluid level, water cut
- Label definition — pump condition class (full pump, pump-off, gas interference, fluid pound, valve leak, rod part warning)
- Classification vs. regression split — condition classification and fillage prediction as separate models
- Card preprocessing strategy — normalization to standard load/position envelope per well
- Time-series features from SCADA for rod fatigue accumulation model
- Cross-well transfer learning potential — pump and rod design homogeneity assessment
- Wave equation model inputs — rod string data, fluid properties, pump geometry (for Gibbs or API RP 11L-based simulation)
- Predicted downhole dynagraph cards from wave equation for physics-informed hybrid model
- Rod fatigue accumulation model — rod loading history and API modified Goodman diagram
- Pump fillage efficiency curves — theoretical vs. actual at varying SPM and fluid level
- Motor torque loading envelope — gear box torque limit and counterbalance optimization inputs
- Minimum pump submergence — fluid level floor above pump intake
- VFD operating range — min/max SPM limits and ramp rate constraints
- Peak gear box torque limit — must not be exceeded at any stroke point
- Surface facility capacity constraints per stream
- Rod fatigue life budget — operating constraint to protect rod run life
All data is cleaned, cards labeled, and models built and validated. Rod pump model validation must go beyond production accuracy metrics — the card classification model must be tested against known failure events, and the pump fillage model must be validated against observed pump-off events and fluid level shot data. Physics consistency is a required validation check.
- Cleaned dynagraph card dataset — normalized load/position, outliers removed, incomplete cards flagged
- Condition-labeled card library — expert-reviewed labels with consensus protocol applied
- Failure-correlated card sequences — cards in the 24–72 hours preceding rod failure events
- SPM, torque, and motor current time-series aligned to card timestamps
- Pump fillage efficiency labels computed from card area and wave equation model
- Fluid level shot data integrated for pump submergence correlation
- Held-out card sequences from known failure events — withheld from training
- Well test data for production rate prediction validation
- Wave equation simulation outputs for hybrid model calibration
- Historical SPM change events with documented production response
- Operator log entries correlated with pump condition events
- Latest PVT data — especially viscosity updates for heavy or waxy oil wells
- Updated reservoir pressure and fluid level surveys
- Revised IPR curves post any stimulation or workover
- Updated water cut trends — affects pump fillage efficiency significantly
- Baseline KPIs: pump-off frequency, average dynagraph fill percentage, rod failure rate, average pump run life
- Card classification accuracy threshold agreed with operations (>90% on full pump / pump-off classes)
- Rod failure recall threshold — number of warnings before failure is the primary metric
- POC setpoint integration test — verified write-back to pump controller
The model deploys into the live production environment. Dynagraph cards are classified in real time as they arrive from the POC system, stroke speed recommendations are generated continuously, and rod fatigue accumulation is tracked per well. New failure events are labeled and fed back into the continuous learning loop. Data governance is critical — rod string redesigns or pump changes that go unlogged will silently degrade model accuracy.
- Streaming dynagraph cards — real-time load and position data from load cell and position sensor
- Live SPM and VFD frequency data per well
- Motor current and gear box torque trending
- Live production rates — test-corrected allocation
- Fluid level shots as available — manual or acoustic fluid level gun readings
- Surface facility real-time constraints
- Operator override log — when and why pump-off control or SPM recommendations were overridden
- New failure events — rod failures, pump failures — labeled with pre-failure card sequences
- Production response data post-SPM change (actioned recommendations)
- Monthly well test updates to refresh production allocation
- Model drift monitoring — card classification accuracy and fillage prediction error trending
- Rod string redesign records — new grade, size taper, and API design report — must be updated within 48 hours
- Pump replacement records — new bore size, plunger class, valve type
- Surface unit changes — stroke length adjustments, counterbalance modifications
- VFD range updates following unit changes
- Workover outcomes — tubing changes, packer depth, completion updates
- Production uplift from pump fillage optimization vs. pre-deployment baseline
- Rod failure events predicted and avoided vs. total events — predictive value tracking
- Pump run life improvement vs. baseline
- Reduction in unnecessary workover interventions triggered by false pump-off signals
- Operator adoption rate and recommendation acceptance trending
- Model retraining triggers — driven by changes in fluid properties, pump design, or failure mode distribution