Skip to Tutorial Content

Introduction

Welcome to the tutorial on Repairable Systems Analysis! In this tutorial, we will explore models and methods used to analyze systems that are repaired and returned to service after each failure. These methods are widely used in fleet management, industrial maintenance, and infrastructure planning.

Unlike life data analysis, which focuses on the time to first failure of a non-repairable component, repairable systems analysis models the entire history of recurring failures on a single system or fleet. The goal is to characterize the failure process, detect trends, and predict future failure behavior.

Learning Objectives

By the end of this tutorial, you will be able to:

  • Define the distinction between repairable and non-repairable systems.
  • Explain the Homogeneous Poisson Process (HPP) and Non-Homogeneous Poisson Process (NHPP) as failure process models.
  • Describe the Power Law Process and its parameters.
  • Fit and interpret the Crow-AMSAA model using nhpp() from ReliaGrowR.
  • Apply the Piecewise NHPP model to detect changes in failure behavior.
  • Compute fleet-level exposure using exposure() from ReliaGrowR.
  • Estimate the Mean Cumulative Function (MCF) using mcf() from ReliaGrowR.
  • Interpret model parameters to characterize deterioration, stability, or improvement.
  • Use R to predict future failure counts and recurrence rates.

Repairable vs Non-Repairable Systems

Before applying any model, it is essential to classify the system correctly.

A non-repairable system is discarded or replaced with a new, statistically identical unit after each failure. The unit of analysis is the time to first failure, and Weibull analysis applies. Examples include ball bearings, fuses, and light bulbs.

A repairable system is restored to operating condition after each failure and continues accumulating operating time. The unit of analysis is the sequence of inter-failure times or cumulative failure counts over the system’s life. Examples include diesel generators, industrial pumps, and aircraft engines.

Two idealized repair assumptions are common:

  • Perfect repair (“as-good-as-new”): The system is restored to the same condition as a new unit. This is modeled by a renewal process.
  • Minimal repair (“as-bad-as-old”): The system is restored to the same condition it was in just before the failure — the failure is fixed, but nothing else changes. This is modeled by a Non-Homogeneous Poisson Process (NHPP).

Most field maintenance data is best described by the minimal repair assumption, making NHPP the standard framework for repairable systems analysis.

The recurrence rate (intensity function) \(\rho(t)\) describes the instantaneous rate of failure occurrence at time \(t\):

\[\rho(t) = \frac{dE[N(t)]}{dt}\]

where \(E[N(t)]\) is the expected cumulative number of failures by time \(t\).

Failure Processes: HPP and NHPP

Homogeneous Poisson Process (HPP)

The Homogeneous Poisson Process assumes a constant failure rate \(\lambda\) throughout the system’s life:

\[E[N(t)] = \lambda t \qquad \rho(t) = \lambda\]

The mean time between failures is simply \(\text{MTBF} = 1/\lambda\). The HPP is appropriate when failures occur at random, with no trend toward improvement or deterioration.

Non-Homogeneous Poisson Process (NHPP)

The Non-Homogeneous Poisson Process allows the intensity function to change with time:

\[E[N(t)] = \Lambda(t) \qquad \rho(t) = \frac{d\Lambda(t)}{dt}\]

When \(\rho(t)\) increases with time the system is deteriorating; when it decreases the system is improving. The Power Law Process is the most widely used NHPP for repairable systems.

Use the controls below to explore how the shape of the NHPP intensity function changes with the beta parameter, compared to a constant HPP rate.

When \(\beta < 1\) the NHPP intensity decreases over time (improving system). When \(\beta = 1\) the NHPP reduces to the HPP (constant rate). When \(\beta > 1\) the intensity increases (aging or deteriorating system).

The Power Law Process

The Power Law Process is an NHPP whose mean value function (expected cumulative failures) follows a power law:

\[E[N(t)] = \lambda t^{\beta}\]

The corresponding intensity function is:

\[\rho(t) = \lambda \beta t^{\beta - 1}\]

Parameter interpretation:

  • \(\lambda > 0\) — scale parameter; governs the overall magnitude of the failure rate.
  • \(\beta\) — shape parameter:
    • \(\beta < 1\): failure intensity is decreasing (system improving).
    • \(\beta = 1\): constant intensity, equivalent to HPP.
    • \(\beta > 1\): failure intensity is increasing (system deteriorating or aging).

The instantaneous MTBF at time \(t\) is the reciprocal of the intensity:

\[\text{MTBF}(t) = \frac{1}{\rho(t)} = \frac{1}{\lambda \beta t^{\beta - 1}}\]

Given \(n\) failures observed at times \(t_1 < t_2 < \cdots < t_n\) over a total operating time \(T\), the maximum likelihood estimates are:

\[\hat{\beta} = \frac{n}{\displaystyle\sum_{i=1}^{n} \ln(T/t_i)} \qquad \hat{\lambda} = \frac{n}{T^{\hat{\beta}}}\]

Quiz: Failure Processes and the Power Law

NHPP Analysis

The nhpp() function fits the Power Law Process (Crow-AMSAA model) to cumulative failure data by maximum likelihood, returning estimates of \(\lambda\) and \(\beta\). The fitted mean value function is plotted against cumulative time:

\[E[N(t)] = \hat{\lambda}\, t^{\hat{\beta}}\]

A curve that bends downward (concave) on a cumulative failures vs. time plot indicates \(\hat{\beta} < 1\) — the failure rate is falling. A curve that bends upward (convex) indicates \(\hat{\beta} > 1\) — the failure rate is rising.

The nhpp() function accepts:

  • time — a numeric vector of cumulative event times.
  • event — an optional numeric vector of event counts at each time (if NULL, each time is treated as a single event).

The following data represent failure records from a fleet of industrial pumps recorded at 10 inspection intervals.

times    <- c(500, 1200, 2100, 3300, 4800, 6500, 8400, 10500, 12800, 15000)
failures <- c(3, 4, 4, 5, 4, 3, 3, 3, 2, 2)

fit <- nhpp(time = times, event = failures)
plot(fit,
     main = "Crow-AMSAA Model: Industrial Pump Fleet",
     xlab = "Cumulative Operating Hours",
     ylab = "Cumulative Failures")

The fitted curve shape directly reveals the failure trend: a concave curve (\(\hat{\beta} < 1\)) means each successive failure takes longer to arrive — the system is improving.

Quiz: NHPP Analysis

Exercise: Fit a Crow-AMSAA Model

A compressor fleet recorded cumulative operating hours and failure counts below. Use nhpp() with method = "LS" to fit a Power Law model and plot the result.

times    <- c(1000, 2500, 4000, 5500, 8000)
failures <- c(2, 3, 4, 5, 5)
# Pass time and event to nhpp(), then plot the result
# fit <- nhpp(time = times, event = failures, method = "LS")
# plot(fit)
times    <- c(1000, 2500, 4000, 5500, 8000)
failures <- c(2, 3, 4, 5, 5)
fit <- nhpp(time = times, event = failures, method = "LS")
plot(fit, main = "Crow-AMSAA: Compressor Fleet",
     xlab = "Cumulative Hours", ylab = "Cumulative Failures")

Piecewise NHPP

Sometimes a system’s failure behavior changes at a specific point in its operating life, for example after a major overhaul, a design modification, or a change in operating environment. The Piecewise NHPP model fits a separate Power Law to each segment of the time axis, separated by one or more breakpoints \(\tau\):

\[E[N(t)] = \begin{cases} \lambda_1 t^{\beta_1} & 0 < t \leq \tau \\ E[N(\tau)] + \lambda_2 (t - \tau)^{\beta_2} & t > \tau \end{cases}\]

Each segment has its own \(\lambda_i\) and \(\beta_i\), allowing the model to capture, for example, a deteriorating early period followed by improvement after an overhaul.

Pass a known breakpoint time as the breaks argument to nhpp(). The function will fit a separate Power Law to each segment.

# Extended pump data: fleet underwent a major overhaul at hour 8000
times2    <- c(500, 1200, 2100, 3300, 4800, 6500, 8400, 9200, 10100, 11300,
               12800, 14500, 16400, 18500, 20800)
failures2 <- c(3, 4, 4, 5, 6, 6, 5, 3, 3, 2, 2, 2, 2, 1, 1)

fit_pw <- nhpp(time = times2, event = failures2, breaks = 8000, method = "LS")
plot(fit_pw,
     main = "Piecewise NHPP: Fleet Overhaul at Hour 8000",
     xlab = "Cumulative Operating Hours",
     ylab = "Cumulative Failures")

The first segment (before the estimated breakpoint) has a steeper slope (\(\hat{\beta}_1 > 1\), increasing failure rate), while the second segment (post-overhaul) shows a flatter or downward-bending curve, confirming the overhaul was effective in reducing the failure rate.

Note that the breaks argument serves as an initial estimate for the breakpoint location. The segmented algorithm optimizes the breakpoint position from the data, so the fitted breakpoint in the plot may differ slightly from the value passed to breaks. In the example above, the fitted breakpoint is estimated near hour 9,820, close to the known overhaul time of 8,000.

Quiz: Piecewise NHPP

Exposure Analysis

When analyzing a fleet of systems rather than a single system, individual units enter and leave observation at different times. Exposure is the total operating time accumulated across all systems that are currently at risk. It answers the question: “How much system-time has been observed at each point in calendar time?”

The exposure() function computes:

  • Cumulative exposure — total system-time at risk up to each event time.
  • Number of systems at risk — how many units were under observation at each event time.
  • Event rate — cumulative events divided by cumulative exposure, providing a fleet-wide failure rate estimate.

The function accepts per-event records with three columns:

  • id — system identifier.
  • time — cumulative event or end-of-observation time for that system.
  • event — 1 for a failure event, 0 for censoring (end of observation without a failure).

The dataset below records individual failure events for a fleet of five industrial pumps, each observed for 3,000 hours.

pump_data <- data.frame(
  id    = c(1, 1, 1,  2, 2, 2,  3, 3,  4, 4, 4, 4,  5, 5),
  time  = c(310, 850, 1620,  420, 1050, 2100,  580, 1890,  240, 710, 1380, 2400,  530, 1740),
  event = c(1, 1, 1,  1, 1, 1,  1, 1,  1, 1, 1, 1,  1, 1)
)
exp_result <- exposure(id = pump_data$id, time = pump_data$time,
                       event = pump_data$event)
plot(exp_result)

The exposure plot shows how the total observed system-time accumulates across the fleet. The event rate curve reveals whether the fleet-wide failure rate is increasing, stable, or decreasing over time.

Quiz: Exposure Analysis

Mean Cumulative Function

The Mean Cumulative Function (MCF) is a non-parametric estimate of the expected cumulative number of failures per system as a function of time. It is the repairable-systems analogue of the Kaplan-Meier estimator in survival analysis: it makes no distributional assumptions about the failure process and can accommodate systems with different observation periods.

The MCF at time \(t\) is estimated using the Nelson-Aalen estimator:

\[\hat{M}(t) = \sum_{t_j \leq t} \frac{d_j}{n_j}\]

where \(d_j\) is the number of failures at time \(t_j\) and \(n_j\) is the number of systems still under observation at \(t_j\). The MCF provides a visual summary of the recurrence trend without requiring a parametric model.

The mcf() function accepts the same per-event data format as exposure(). Passing end_time ensures that systems remaining under observation beyond their last failure are correctly kept in the risk set.

end_times <- c("1" = 3000, "2" = 3000, "3" = 3000, "4" = 3000, "5" = 3000)

mcf_result <- mcf(id = pump_data$id, time = pump_data$time,
                  event = pump_data$event,
                  end_time = end_times)
plot(mcf_result,
     main = "Mean Cumulative Function: Industrial Pump Fleet",
     xlab = "Operating Hours",
     ylab = "Expected Cumulative Failures per System")

The slope of the MCF curve reveals the recurrence trend: a steepening slope indicates an increasing failure rate; a flattening slope indicates improvement. Confidence bounds (shown by default) reflect uncertainty in the estimate.

Quiz: Mean Cumulative Function

Exercise: Mean Cumulative Function

Three systems were observed with the following failure times (hours). Each system was observed until hour 2000. Compute and plot the MCF using mcf().

id    <- c(rep("1", 3), rep("2", 4), rep("3", 2))
time  <- c(400, 900, 1500,   300, 700, 1200, 1800,   600, 1400)
event <- rep(1, 9)
end_times <- c("1" = 2000, "2" = 2000, "3" = 2000)
# Use mcf() with the id, time, event, and end_time arguments, then plot
# fit <- mcf(id = id, time = time, event = event, end_time = end_times)
# plot(fit)
id    <- c(rep("1", 3), rep("2", 4), rep("3", 2))
time  <- c(400, 900, 1500,   300, 700, 1200, 1800,   600, 1400)
event <- rep(1, 9)
end_times <- c("1" = 2000, "2" = 2000, "3" = 2000)
fit <- mcf(id = id, time = time, event = event, end_time = end_times)
plot(fit, main = "Mean Cumulative Function", xlab = "Time (hours)", ylab = "MCF")

Interpreting Results and Repair Effectiveness

Reading the Beta Parameter

The estimated \(\hat{\beta}\) from any Power Law fit gives immediate operational insight:

\(\hat{\beta}\) Interpretation Typical cause
\(< 1\) Failure rate decreasing Effective maintenance, infant mortality resolved, improving conditions
\(= 1\) Constant failure rate Random external shocks dominate, stable operating regime
\(> 1\) Failure rate increasing Wear-out, accumulating damage, aging components, deteriorating environment

Cumulative vs Instantaneous MTBF

Two distinct MTBF metrics arise in repairable systems analysis:

  • Cumulative MTBF: total operating time divided by total number of failures — a historical average.
  • Instantaneous MTBF: the expected time to the next failure if the system continues from its current state.

\[\text{Instantaneous MTBF}(t) = \frac{1}{\hat{\lambda}\,\hat{\beta}\,t^{\hat{\beta}-1}}\]

For \(\hat{\beta} > 1\) the instantaneous MTBF falls below the cumulative average at late times. The system’s near-term reliability is worse than its history suggests.

Predicting Future Failures

The expected number of additional failures from current time \(T\) to a future time \(T + \Delta t\) is:

\[E[\Delta N] = \hat{\lambda}\,(T + \Delta t)^{\hat{\beta}} - \hat{\lambda}\,T^{\hat{\beta}}\]

Use the controls below to explore how the forecast changes with model parameters and horizon.

Quiz: Interpreting Results

Exercise: Predict Future Failures

A system has been operating for 5,000 cumulative hours. A Power Law fit yielded λ = 0.0005 and β = 1.2. Calculate the expected number of additional failures in the next 1,000 hours (from hour 5,000 to hour 6,000).

lambda    <- 0.0005
beta      <- 1.2
T_current <- 5000
T_future  <- 6000
# E[ΔN] = lambda * T_future^beta - lambda * T_current^beta
lambda    <- 0.0005
beta      <- 1.2
T_current <- 5000
T_future  <- 6000
expected_failures <- lambda * T_future^beta - lambda * T_current^beta
cat("Expected additional failures:", round(expected_failures, 2), "\n")

Summary

Congratulations on completing the Repairable Systems Analysis tutorial! You have learned how to:

  • Distinguish repairable from non-repairable systems and select the appropriate analytical framework.
  • Apply the HPP and NHPP as models for recurrent failure data.
  • Interpret the Power Law Process parameters \(\lambda\) and \(\beta\) to characterize system failure trends.
  • Use nhpp() to fit and plot the Crow-AMSAA model.
  • Use nhpp() with breaks to model systems with structural changes in failure behavior.
  • Use exposure() to compute fleet-level operating time at risk and event rates.
  • Use mcf() to non-parametrically estimate the expected cumulative failures per system.
  • Forecast future failure counts using the fitted Power Law model.
Quiz: Repairable Systems — Review

To Get Help

?nhpp
?exposure
?mcf
help(package = "ReliaGrowR")

Additional Resources

References

  • Aden-Buie G, Schloerke B, Allaire J, Rossell Hayes A (2023). learnr: Interactive Tutorials for R. https://rstudio.github.io/learnr/

  • Ascher H, Feingold H (1984). Repairable Systems Reliability. Marcel Dekker, New York.

  • Crow, Larry H. (1975). Reliability Analysis for Complex, Repairable Systems. Army Material Systems Analysis Activity, 40. https://apps.dtic.mil/sti/citations/ADA020296

  • Govan P (2024). ReliaGrowR: Reliability Growth Analysis. R package. https://cran.r-project.org/package=ReliaGrowR

  • Guo H, Mettas A, Sarakakis G, Niu P (2010). Piecewise NHPP models with maximum likelihood estimation for repairable systems. 2010 Proceedings — Annual Reliability and Maintainability Symposium (RAMS), 1–7. https://doi.org/10.1109/RAMS.2010.5448029

  • Meeker W. Q., Escobar L. A. (1998). Statistical Methods for Reliability Data. Wiley-Interscience, New York.

Repairable Systems Analysis