# Reliability, Confidence, and Sample Size, Part II

In Part I the problem was to find the sample size, n, given failure count, c = 0, confidence level = 1 – P(c = 0), and minimum reliability = (1 – p’). The table giving sample size, n, with failures c = 0, for certain common combinations of confidence level and minimum reliability is reproduced below.

While I would like that none of the samples fail testing, failures do happen. Does that mean testing should stop on first fail? Are the test results useless? In this part I will flip the script. I will talk about what value I can extract from test results if I encounter one or more failures in the test sample.

I start with the binomial formula as before

It gives us the likelihood, P(x = c), of finding exactly c failures in n samples for a particular population failure rate p’. (Note that 1 – P(x ≤ c) is our confidence level, and 1 – p’ = q’ is our desired reliability.)

However, knowing the likelihood of finding just c failures in n samples isn’t enough. Different samples of size n from the same population will give different counts of failures c. If I am okay with c failures in n samples, then I must be okay with less than c failures, too! Therefore, I need to know the cumulative likelihood of finding c or less failures in n samples, or P(x ≤ c). That likelihood is calculated as the sum of the individual probabilities. For example, if c = 2 samples fail, I calculate P(x ≤ 2) = P(x = 2) + P(x = 1) + P(x = 0).

For a particular failure rate p’, I can make the statement that my confidence is 1 – P(x ≤ c) that the failure rate is no greater than p’ or alternatively my reliability is no less than q’ = (1 – p’).

It is useful to build a plot of P(x ≤ c) versus p’ to understand the relationship between the two for a given sample size n and failure count c. This plot is referred to as the operating characteristic (OC) curve for a particular n and c combination.

For example, given n = 45, and c = 2, my calculations would look like:

The table below shows a few values that were calculated:

A plot of P(c ≤ 2) versus p’ looks like:

From the plot I can see that the more confidence I require, the higher the failure rate or lesser the reliability estimate will be (e.g. 90% confidence with 0.887 reliability, or 95% confidence with 0.868 reliability.) Viewed differently, the more reliability I require, the less confidence I have in my estimate (e.g. 0.95 reliability with 40% confidence level).

Which combination of confidence and reliability to use depends on the user’s needs. There is no prescription for choosing one over another.

I may have chosen a sample size of n = 45 expecting c = 0 failures for testing with the expectation of having 90% confidence at 0.95 reliability in my results. But just because I got c = 2 failures doesn’t mean the effort is for naught. I could plot the OC curve for the combination of n, and c to understand how my confidence and reliability has been affected. Maybe there is a combination that is acceptable. Of course, I would need to explain why the new confidence, and reliability levels are acceptable if I started with something else.

Operating characteristic curves can be constructed in MS Excel or Libre Calc with the help of BINOM.DIST(c, n, p’, 1) function.

Once I have values for p’ and P(c ≤ 2), I can create an X-Y graph with X = p’, and Y = P(c ≤ 2).

**Links**

[1] Burr, Irving W. Elementary Statistical Quality Control. New York, NY: Marcel Dekker, Inc. 1979. Print. ISBN 0-8247-6686-5

# Reliability, Confidence, and Sample Size, Part I

Say I have designed a widget that is supposed to survive N cycles of stress S applied at frequency f.

I can demonstrate that the widgets will conform to the performance requirement by manufacturing a set of them and testing them. Such testing, though, runs headlong into the question of sample size. How many widgets should I test?

For starters, however many widgets I choose to test, I would want all of them to survive i.e. the number of failures, c, in my sample, n, should be zero. (The reason for this has more to do with the psychology of perception than statistics.)

If I get zero failures (c = 0) in 30 samples (n = 30), does that mean I have perfect quality relative to my requirement? No, because the sample failure rate, p = 0/30 or 0%, is a point estimate for the population failure rate, p’. If I took a different sample of 30 widgets from the same population, I may get one, two, or more failures.

The sample failure rate, p, is the probability of failure for a single widget as calculated from test data. It is a statistic. It estimates the population parameter, p’, which is the theoretical probability of failure for a single widget. The probability of failure for a single widget tells us how likely it is to fail the specified test.

If we know the likelihood of a widget failing the test, p’, then we also know the likelihood of it surviving the test, q’ = (1 – p’). The value, q’, is also known as the reliability of the widget. It is the probability that a widget will perform its intended function under stated conditions for the specified interval.

The likelihood of finding c failures in n samples from a stable process with p’ failure rate is given by the binomial formula.

But here I am interested in just the case where I find zero failures in n samples. What is the likelihood of me finding zero failures in n samples for a production process with p’ failure rate?

If I know the likelihood of finding zero failure in n samples from a production process with p’ failure rate, then I know the likelihood of finding 1 or more failures in n samples from the production process, too. It is P(c ≥ 1) = 1 – P(0). This is the confidence with which I can say that the failure rate of the production process is no worse than p’.

Usually a lower limit is specified for the reliability of the widget. For example, I might want the widget to survive the test at least 95% of the time or q’ = 0.95. This is the same as saying I want the failure rate to be no more than p’ = 0.05.

I would also want to have high confidence in this minimum reliability (or maximum failure rate). For example, I might require 90% confidence that the minimum reliability of the widget is q’ = 0.95.

A 90% confidence that the reliability is at least 95% is the same as saying 9 out of 10 times I will find one or more failures, c, in my sample, n, if the reliability were less than or equal to 95%. This is also the same as saying that 1 out of 10 times I will find zero failures, c, in my sample, n, if the reliability were less than or equal to 95%. This, in turn, is the same as saying P(0) = 10% or 0.1 for p’ = 0.05.

With P(0) and p’ defined, I can calculate the sample size, n, that will satisfy these requirements.

The formula can be used to calculate the sample size for specific values of minimum reliability and confidence level. However, there are standard minimum reliability and confidence level values used in industry. The table below provides the sample sizes with no failures for some standard values of minimum reliability and confidence level.

What should the reliability of the widget be? That depends on how critical its function is.

What confidence level should you choose? That again depends on how sure you need to be about the reliability of the widget.

*Note: A basic assumption of this method is that the failure rate, p’, is constant for all the widgets being tested. This is only possible if the production process producing these widgets is in control. If this cannot be demonstrated, then this method will not help you establish the minimum reliability for your widget with any degree of confidence.*

**Links**

[1] Burr, Irving W. Elementary Statistical Quality Control. New York, NY: Marcel Dekker, Inc. 1979. Print. ISBN 0-8247-6686-5

# Some Observations and Thoughts on Design Controls

In my role as a quality engineer supporting product design and development at various medical device manufacturers I got practical experience with each company’s design and development process. As a matter of regulation^{[1]}, each medical device manufacturer has procedures that control the design of their products. Unfortunately, they are not particularly useful.

I’ve observed that the Quality function at these companies develops and deploys all the procedures that the Quality System regulations require^{[2]}. However, professionals in the Quality function typically don’t have the subject matter expertise in a particular function such as product design and development or manufacturing to develop usable procedures for that function.

Here I share an example product design and development procedure typical of those I have seen deployed:

This type of process, laid out in the order of the text of the regulation, would suggest that product design and development is a sequence of steps executed in series.

At first glance it seems logical and sensible. First you catalog the user needs. Next you convert those user needs into design inputs (i.e. engineering requirements.) You then transform the design inputs through the design process into design outputs (i.e. drawings or prototypes.) Those design outputs are then verified (i.e. inspected and tested) against the design inputs. After that the design is validated by the user in the actual or simulated use environment. And finally, the design is transferred to manufacturing for mass production.

It wrongly suggests, albeit implicitly, that these steps also represent phases of design and development where a review is conducted after each block, and that a single traceability matrix, with columns corresponding to each block, is enough to capture the activity of the total design effort.

I have tried to figure out how this would work for a design involving multiple components that are assembled together, but I cannot find a way. This type of design for the product design and development process is fatally flawed as it doesn’t model the real nature of products which is often components/systems embedded within systems. Trying to map the total design effort into this format is like trying to fit a square peg in a round hole, an impossible and ultimately frustrating exercise.

Just because language is linear, in that ideas are expressed one after the other as the regulation does, doesn’t mean that the process being described is linear, too. In fact, the design and development process is most certainly not linear. It is deeply iterative with iterations built within iterations!

The FDA’s “Design Control Guidance for Medical Device Manufacturers”^{[3]} provides an explanation of the iterative nature of the design and development process. The guidance includes a simplified process flow chart, but it does not adequately communicate the complexity that makes up the actual design and development process. The guidance even explicitly says so.

In practice, feedback paths would be required between each phase of the process and previous phases, representing the iterative nature of product development. However, this detail has beenomittedfrom the figure…

The language of the guidance in the above paragraph unfortunately implies that each block of the waterfall design process is a phase. It clarifies this further on where it says:

When the design input has been reviewed and the design input requirements are determined to be acceptable, an iterative process of translating those requirements into a device design begins. The first step is conversion of the requirements into system or high-level specifications. Thus, these specifications are a design output. Upon verification that the high-level specifications conform to the design input requirements, they become the design input for the next step in the design process, and so on.

This basic technique is used throughout the design process. Each design input is converted into a new design output; each output is verified as conforming to its input; and it then becomes the design input for another step in the design process. In this manner, the design input requirements are translated into a device design conforming to those requirements.

While the regulation does not prescribe a method for designing and developing a product, the guidance does point in a particular direction. The best representation I could find that captures the direction in the guidance is this graphic adapted from “The House of Quality” by John Hauser and Don Clausing^{[4]}:

The first “house” shows the “*conversion of the requirements* [Customer attributes] *into system or high-level specifications* [Engineering characteristics]”. The body of the house allows for the verification that “*high-level specifications conform to the design input requirements*”. The engineering characteristics then “*become the design input for the next step in the design process, and so on.*”

It’s obvious from the linked houses and the guidance that verification is not a one time or single type of activity. It is performed at each step of the design and development process wherever inputs are converted to outputs. Implicit in this point is that the type of verification is unique to the particular step or phase of the design and development process.

Each house may be thought of as a phase of the design and development process. The houses offer natural breaks. The design process of the next phase, converting inputs into outputs, depends on the successful completion of the previous phase, so it is nearly impossible to move too far down the process as gaps will be immediately apparent!

Each house can be considered its own traceability matrix where every design output is tied to one or more design inputs. And because the houses are all linked to one another it is possible to trace an attribute of the manufactured product all the way back to the customer need it helps address.

While they may not have a firm conceptual understanding of the design and development process, and thus cannot explain it, I believe most engineers have an instinctual feel for it in practice. But a poorly designed design and development process creates unnecessary and insoluble problems for project teams. The teams I’ve been on have responded to such hurdles by running two parallel processes: one that is the practical design and development effort, and the other is the documentation effort—a hidden factory. I don’t think it’s possible to calculate the cost of such waste.

**Links**

[1] 21 CFR Part 820.30 (a) https://www.ecfr.gov/cgi-bin/text-idx?SID=a018454b01dab73d0d1cef9f95be36a9&mc=true&node=pt21.8.820&rgn=div5#se21.8.820_130 Retrieved 2017-07-05

[2] QSR Required Procedures https://shrikale.wordpress.com/2017/05/18/qsr-required-procedures/ Retrieved 2017-07-05

[3] Design Control Guidance For Medical Device Manufacturers https://www.fda.gov/RegulatoryInformation/Guidances/ucm070627.htm Retrieved 2017-07-05

[4] The House of Quality. Harvard Business Review, pages 63-77, Vol 66 No 3, May 1988.

[5] Product Design and Development, 5th Edition. McGraw Hill, 2016.