Coffee shops provide a great opportunity to observe the flow of product from customer order and collection of cash to the delivery of the order and order pickup by the customer.
The coffee shop I sit at has the entrance for its order queue close to the entrance to the store. So, when customers walk into the store they immediately find themselves in queue to place their order. The customers place their order at one of two cash registers that are set side by side along the path of the flow. They then walk further to the end of the line where their order is delivered. In a relative measure, the exit of the queue is much farther from the store entrance than the entrance of the queue.
Recently I made a couple of observations:
1. Some customers order specialty coffee products (e.g. espressos, lattes, etc.), while others order brewed coffee. Specialty coffee products takes time to make while brewed coffee is ready to serve on demand.
Customers who order specialty coffee products move to the end of line and wait there for their order. Almost all of them wait right at exit of the queue. During a surge a cluster of people forms there essentially blocking the exit.
Customers who order brewed coffee have their coffee delivered to them right at the cash registers. Their order is not delivered at the end of line. So these customers, almost exclusively, exit the queue through the entrance of the queue instead of taking their order and following the line and exiting at its end. I suspect two contributing factors: the exit to the store is closer to the entrance of the queue, and the exit of the queue is blocked by the cluster of people waiting to pick up their order.
2. Many customers after picking up their order from the end of line still do not exit from there. They instead move back through the queue and exit through the queue entrance and then on through the store exit. I suspect that is because there is no direct way to exit the store from the exit of the queue. The customer has to navigate through the seating area.
As I made my observations on how people were behaving, I found myself getting irritated. Why couldn’t these customers, who had a brain and the ability to sense their environment, follow the line from the queue entrance to the queue exit and then out of the store? It’s not hard! Stop creating back-flows! How inconsiderate! So selfish! So oblivious! Ugh! I’m sure my disgust was plainly apparent on my face. I recall my many sanctimonious conversations with friends and colleagues on the thoughtless behavior of people.
Then I experienced an epiphany. My mind, without my conscious awareness, flipped its perspective and answered the question, “What is it about the design of the space that led people to use it in the way they were?” It shifted from blaming the human to accepting human behavior as an uncontrollable factor and addressing the inadequacy of the design of the space that enabled humans to behave in an undesirable way. That released my mind from being stuck and frustrated to feeling creative. With that one realization, my mind started working on redesigning the space.
Still, I wanted to continue observing the activity to understand it a little more deeply. But what happened caught me by surprise. Even though I had had the epiphany that the design of the space was the problem, and that people were responding to the design, I still found myself getting irritated with them for what I ascribed to them as their conscious decisions. That triggered my second epiphany, that unless I consciously focused on the first epiphany, my mind will naturally shift to blaming people for their behavior instead of the design of the space that enables it.
Postscript: Our brain evolved in an environment to notice activity that signaled potential danger: movement, sound, smell, etc. So it is biased to see this foreground. So much so that most times it doesn’t even see the background; the relatively unchanging environment. People and their behavior are always in the foreground. The context for their behavior, the design of the space, is in the background. When we are faced with behavior problems, our mind instinctively focuses on the human, rather than his environment. It takes conscious awareness to not do that.
Virtually every component is made to be assembled with its counterpart(s) into sub-assemblies and final assemblies.
If individual pieces of a given component could be made identical to one another, then they would either all conform or all fail to conform to the component’s design requirements. If they conform, then we could pick a piece at random for use in a sub- or final-assembly. It would fit and function without fail, as intended.
But material varies, machine operation varies, the work method varies, workers vary, measurement varies, as does the environment. Such variation, especially in combination, makes it impossible to produce anything identical. Variation is a fundamental principle of nature. All things vary.
Variation affects the fit, the form and the function of a component. And, it is propagated along the assembly line such that the final product is at times a mixed bag of conforming and nonconforming widgets.
Material Consider 316 Stainless Steel. It is used to make medical devices. Manufacturers buy it from metal producers in sheet stock or bar stock.
If we measured the dimensional attributes of the received stock, e.g. its diameter or length, for several different purchase orders, we would see that they were not identical between orders. They vary. If we measured these attributes for pieces received in a single order, we would see that they were not identical between pieces of that order either. If we measured these attributes for a single piece at different points in its cross-section, we would see that they, too, were not identical. If we then zoomed in to investigate the crystalline structure of the stainless steel, we would see that the crystals were not identical in shape or size.
The elemental composition, in percent by weight, of 316 Stainless Steel is: 0.08% Carbon, 2.00% Manganese, 0.75% Silicon, 16.00-18.00% Chromium, 10.00-14.00% Nickel, 2.00-3.00% Molybdenum, 0.045% Phosphorous, 0.030% Sulfur, 0.10% Nitrogen, and the balance is Iron. We see that the amount of Chromium, Nickel, Molybdenum and Iron are specified as ranges i.e. they are expected to vary within them by design!
These are some of the ways a specific raw material used in the production of medical devices varies. Keep in mind that a medical device isn’t a single component but an assembly of several components likely made of different materials that will vary in just such ways as well. All this variation affects the processing (i.e. machining, cleaning, passivation, etc.) of the material during manufacturing, as well as the device performance in use.
Machine One piece of equipment used in the production of medical device components is the CNC (Computer Numerical Control) machine. Its condition, as with all production equipment, varies with use.
Take the quality of the lubricating fluid: it changes properties (e.g. its viscosity) with temperature thus affecting its effectiveness. The sharpness of the cutting tool deteriorates with use. A component made with a brand new cutting tool will not be identical to one made from a used cutting tool whose cutting edges have dulled. The cutting is also affected by both the feed-rate and the rotation-rate of the cutting tool. Neither of which remain perfectly constant at a particular setting.
What’s more, no two machines perform in identical ways even when they are the same make and model made by the same manufacturer. In use, they will almost never be in the same state as each other, with one being used more or less than the other, and consumables like cutting tools in different states of wear. Such variability will contribute to the variability between the individual pieces of the component.
Method Unless there is standardized work, we would all do the work in the best way we know how. Each worker will have a best way slightly different from another. Variation in best ways will find its way into the pieces made using them.
These days a production tool like a CNC machine offers customized operation. The user can specify the settings for a large number of operating parameters. Users write “code” or develop a “recipe” that specifies the settings for various operating parameters in order to make a particular component. If several such pieces of code or recipes exist, one different from another, and they are used to make a particular component, they will produce pieces of that component that vary from one to another.
When and how an adjustment is made to control parameters of a tool will affect the degree of variation between one piece and another. Consider the method where a worker makes adjustment(s) after each piece is made to account for its deviation from the target versus one where a worker makes an adjustment only when a process shift is detected. Dr. Deming and Dr. Wheeler have shown that tampering with a stable process, as the first worker does, will serve to increase the variation in the process.
All such variation in method will introduce variability into the manufactured pieces.
Man There are a great many ways in which humans vary physically from one another. Some workers are men, others are women. Some are short, others are tall. Some are young, others are older. Some have short fat fingers, others have long thin fingers. Some have great eyesight, others need vision correction. Some have great hearing, others need hearing aids. Some are right handed, others are left handed. Some are strong, others not so much. Some have great hand-eye coordination, others do not. We all come from diverse ethnic backgrounds.
Not all workers have identical knowledge. Some have multiple degrees, others are high school graduates. Some have long experience doing a job, others are fresh out of school. Some have strong knowledge in a particular subject, others do not. Some have deep experience in a single task, others have shallow experience. Some have broad experience, others have focused experience.
Last, but not least, we all bring varying mindsets to work. Some may be intrinsically motivated, others need to be motivated externally. Some may be optimists, others may be pessimists. Some want to get better everyday, others are happy where they are. Some like change, others resist it. Some are data driven, others use their instinct.
All this variation affects the way a job gets done. The variation is propagated into the work and ultimately manifests itself in the variation of the manufactured component.
Measurement We consider a measured value as fact, immutable. But that’s not true. Measuring the same attribute repeatedly does not produce identical results between measurements.
Just like production tools, measurement tools wear from use. This affects the measurement made with it over the course of its use.
And also just like production tools, the method (e.g. how a part is oriented, where on the part the measurement is made, etc.) used to make a measurement affects the measured value. There is no true value of any measured attribute. Different measurement methods produce different measurements of the same attribute.
So even if by chance two pieces were made identical we wouldn’t be able to tell because of the variability inherent in the measurement process.
Environment Certain environmental factors affect all operations regardless of industry. One of them is time. It is not identical from one period to the next. Months in a year are not identical in duration. Seasons in a year are different from one another. Daytime and nighttime are not identical to one another. Weekdays and weekends are not identical to one another.
Even in a climate controlled facility the temperature cycles periodically around a target. It varies between locations as well. Lighting changes over the course of the day. Certain parts of the workplace may be darker than others. Noise, too, changes over the course of the day: quiet early in the morning or into the night, and noisier later into the day. There is variation in the type of noise, as well. Vibration by definition is variation. It can come from a heavy truck rolling down the street or the motor spinning the cutting tool in a production machine. Air movement or circulation isn’t the same under a vent as compared to a spot away from a vent, or when the system is on versus when it is off.
The 5M+E (Material, Machine, Method, Man, Measurement, and Environment) is just one way to categorize sources of variation. The examples in each are just a few of the different sources of variation that can affect the quality of individual pieces of a component. While we cannot eliminate variation, it is possible to systematically reduce it and achieve greater and greater uniformity in the output of a process. The objective of a business is to match the Voice of the Customer (VOC) and the Voice of the Process (VOP). The modern day world-class definition of quality is run-to-target with minimal variation!
“Our approach has been to investigate one by one the causes of various “unnecessaries” in manufacturing operations…”
— Taiichi Ohno describing the development of the Toyota Production System
 Kume, Hitoshi. Statistical Methods for Quality Improvement. Tokyo, Japan: The Association for Overseas Technical Scholarship. 2000. Print. ISBN 4-906224-34-2
 Monden, Yasuhiro. Toyota Production System. Norcross, GA: Industrial Engineering and Management Press. 1983. Print. ISBN 0-89806-034-6
 Wheeler, Donald J. and David S. Chambers. Understanding Statistical Process Control. Knoxville, TN: SPC Press, Inc. 1986. Print. ISBN 0-945320-01-9
In Part I the problem was to find the sample size, n, given failure count, c = 0, confidence level = 1 – P(c = 0), and minimum reliability = (1 – p’). The table giving sample size, n, with failures c = 0, for certain common combinations of confidence level and minimum reliability is reproduced below.
While I would like that none of the samples fail testing, failures do happen. Does that mean testing should stop on first fail? Are the test results useless? In this part I will flip the script. I will talk about what value I can extract from test results if I encounter one or more failures in the test sample.
I start with the binomial formula as before
It gives us the likelihood, P(x = c), of finding exactly c failures in n samples for a particular population failure rate p’. (Note that 1 – P(x ≤ c) is our confidence level, and 1 – p’ = q’ is our desired reliability.)
However, knowing the likelihood of finding just c failures in n samples isn’t enough. Different samples of size n from the same population will give different counts of failures c. If I am okay with c failures in n samples, then I must be okay with less than c failures, too! Therefore, I need to know the cumulative likelihood of finding c or less failures in n samples, or P(x ≤ c). That likelihood is calculated as the sum of the individual probabilities. For example, if c = 2 samples fail, I calculate P(x ≤ 2) = P(x = 2) + P(x = 1) + P(x = 0).
For a particular failure rate p’, I can make the statement that my confidence is 1 – P(x ≤ c) that the failure rate is no greater than p’ or alternatively my reliability is no less than q’ = (1 – p’).
It is useful to build a plot of P(x ≤ c) versus p’ to understand the relationship between the two for a given sample size n and failure count c. This plot is referred to as the operating characteristic (OC) curve for a particular n and c combination.
For example, given n = 45, and c = 2, my calculations would look like:
The table below shows a few values that were calculated:
A plot of P(c ≤ 2) versus p’ looks like:
From the plot I can see that the more confidence I require, the higher the failure rate or lesser the reliability estimate will be (e.g. 90% confidence with 0.887 reliability, or 95% confidence with 0.868 reliability.) Viewed differently, the more reliability I require, the less confidence I have in my estimate (e.g. 0.95 reliability with 40% confidence level).
Which combination of confidence and reliability to use depends on the user’s needs. There is no prescription for choosing one over another.
I may have chosen a sample size of n = 45 expecting c = 0 failures for testing with the expectation of having 90% confidence at 0.95 reliability in my results. But just because I got c = 2 failures doesn’t mean the effort is for naught. I could plot the OC curve for the combination of n, and c to understand how my confidence and reliability has been affected. Maybe there is a combination that is acceptable. Of course, I would need to explain why the new confidence, and reliability levels are acceptable if I started with something else.
Once I have values for p’ and P(c ≤ 2), I can create an X-Y graph with X = p’, and Y = P(c ≤ 2).
 Burr, Irving W. Elementary Statistical Quality Control. New York, NY: Marcel Dekker, Inc. 1979. Print. ISBN 0-8247-6686-5
I can demonstrate that the widgets will conform to the performance requirement by manufacturing a set of them and testing them. Such testing, though, runs headlong into the question of sample size. How many widgets should I test?
For starters, however many widgets I choose to test, I would want all of them to survive i.e. the number of failures, c, in my sample, n, should be zero. (The reason for this has more to do with the psychology of perception than statistics.)
If I get zero failures (c = 0) in 30 samples (n = 30), does that mean I have perfect quality relative to my requirement? No, because the sample failure rate, p = 0/30 or 0%, is a point estimate for the population failure rate, p’. If I took a different sample of 30 widgets from the same population, I may get one, two, or more failures.
The sample failure rate, p, is the probability of failure for a single widget as calculated from test data. It is a statistic. It estimates the population parameter, p’, which is the theoretical probability of failure for a single widget. The probability of failure for a single widget tells us how likely it is to fail the specified test.
If we know the likelihood of a widget failing the test, p’, then we also know the likelihood of it surviving the test, q’ = (1 – p’). The value, q’, is also known as the reliability of the widget. It is the probability that a widget will perform its intended function under stated conditions for the specified interval.
The likelihood of finding c failures in n samples from a stable process with p’ failure rate is given by the binomial formula.
But here I am interested in just the case where I find zero failures in n samples. What is the likelihood of me finding zero failures in n samples for a production process with p’ failure rate?
If I know the likelihood of finding zero failure in n samples from a production process with p’ failure rate, then I know the likelihood of finding 1 or more failures in n samples from the production process, too. It is P(c ≥ 1) = 1 – P(0). This is the confidence with which I can say that the failure rate of the production process is no worse than p’.
Usually a lower limit is specified for the reliability of the widget. For example, I might want the widget to survive the test at least 95% of the time or q’ = 0.95. This is the same as saying I want the failure rate to be no more than p’ = 0.05.
I would also want to have high confidence in this minimum reliability (or maximum failure rate). For example, I might require 90% confidence that the minimum reliability of the widget is q’ = 0.95.
A 90% confidence that the reliability is at least 95% is the same as saying 9 out of 10 times I will find one or more failures, c, in my sample, n, if the reliability were less than or equal to 95%. This is also the same as saying that 1 out of 10 times I will find zero failures, c, in my sample, n, if the reliability were less than or equal to 95%. This, in turn, is the same as saying P(0) = 10% or 0.1 for p’ = 0.05.
With P(0) and p’ defined, I can calculate the sample size, n, that will satisfy these requirements.
The formula can be used to calculate the sample size for specific values of minimum reliability and confidence level. However, there are standard minimum reliability and confidence level values used in industry. The table below provides the sample sizes with no failures for some standard values of minimum reliability and confidence level.
What should the reliability of the widget be? That depends on how critical its function is.
What confidence level should you choose? That again depends on how sure you need to be about the reliability of the widget.
Note: A basic assumption of this method is that the failure rate, p’, is constant for all the widgets being tested. This is only possible if the production process producing these widgets is in control. If this cannot be demonstrated, then this method will not help you establish the minimum reliability for your widget with any degree of confidence.
 Burr, Irving W. Elementary Statistical Quality Control. New York, NY: Marcel Dekker, Inc. 1979. Print. ISBN 0-8247-6686-5
In my role as a quality engineer supporting product design and development at various medical device manufacturers I got practical experience with each company’s design and development process. As a matter of regulation, each medical device manufacturer has procedures that control the design of their products. Unfortunately, they are not particularly useful.
I’ve observed that the Quality function at these companies develops and deploys all the procedures that the Quality System regulations require. However, professionals in the Quality function typically don’t have the subject matter expertise in a particular function such as product design and development or manufacturing to develop usable procedures for that function.
Here I share an example product design and development procedure typical of those I have seen deployed:
This type of process, laid out in the order of the text of the regulation, would suggest that product design and development is a sequence of steps executed in series.
At first glance it seems logical and sensible. First you catalog the user needs. Next you convert those user needs into design inputs (i.e. engineering requirements.) You then transform the design inputs through the design process into design outputs (i.e. drawings or prototypes.) Those design outputs are then verified (i.e. inspected and tested) against the design inputs. After that the design is validated by the user in the actual or simulated use environment. And finally, the design is transferred to manufacturing for mass production.
It wrongly suggests, albeit implicitly, that these steps also represent phases of design and development where a review is conducted after each block, and that a single traceability matrix, with columns corresponding to each block, is enough to capture the activity of the total design effort.
I have tried to figure out how this would work for a design involving multiple components that are assembled together, but I cannot find a way. This type of design for the product design and development process is fatally flawed as it doesn’t model the real nature of products which is often components/systems embedded within systems. Trying to map the total design effort into this format is like trying to fit a square peg in a round hole, an impossible and ultimately frustrating exercise.
Just because language is linear, in that ideas are expressed one after the other as the regulation does, doesn’t mean that the process being described is linear, too. In fact, the design and development process is most certainly not linear. It is deeply iterative with iterations built within iterations!
The FDA’s “Design Control Guidance for Medical Device Manufacturers” provides an explanation of the iterative nature of the design and development process. The guidance includes a simplified process flow chart, but it does not adequately communicate the complexity that makes up the actual design and development process. The guidance even explicitly says so.
In practice, feedback paths would be required between each phase of the process and previous phases, representing the iterative nature of product development. However, this detail has been omitted from the figure…
The language of the guidance in the above paragraph unfortunately implies that each block of the waterfall design process is a phase. It clarifies this further on where it says:
When the design input has been reviewed and the design input requirements are determined to be acceptable, an iterative process of translating those requirements into a device design begins. The first step is conversion of the requirements into system or high-level specifications. Thus, these specifications are a design output. Upon verification that the high-level specifications conform to the design input requirements, they become the design input for the next step in the design process, and so on.
This basic technique is used throughout the design process. Each design input is converted into a new design output; each output is verified as conforming to its input; and it then becomes the design input for another step in the design process. In this manner, the design input requirements are translated into a device design conforming to those requirements.
While the regulation does not prescribe a method for designing and developing a product, the guidance does point in a particular direction. The best representation I could find that captures the direction in the guidance is this graphic adapted from “The House of Quality” by John Hauser and Don Clausing:
The first “house” shows the “conversion of the requirements [Customer attributes] into system or high-level specifications [Engineering characteristics]”. The body of the house allows for the verification that “high-level specifications conform to the design input requirements”. The engineering characteristics then “become the design input for the next step in the design process, and so on.”
It’s obvious from the linked houses and the guidance that verification is not a one time or single type of activity. It is performed at each step of the design and development process wherever inputs are converted to outputs. Implicit in this point is that the type of verification is unique to the particular step or phase of the design and development process.
Each house may be thought of as a phase of the design and development process. The houses offer natural breaks. The design process of the next phase, converting inputs into outputs, depends on the successful completion of the previous phase, so it is nearly impossible to move too far down the process as gaps will be immediately apparent!
Each house can be considered its own traceability matrix where every design output is tied to one or more design inputs. And because the houses are all linked to one another it is possible to trace an attribute of the manufactured product all the way back to the customer need it helps address.
While they may not have a firm conceptual understanding of the design and development process, and thus cannot explain it, I believe most engineers have an instinctual feel for it in practice. But a poorly designed design and development process creates unnecessary and insoluble problems for project teams. The teams I’ve been on have responded to such hurdles by running two parallel processes: one that is the practical design and development effort, and the other is the documentation effort—a hidden factory. I don’t think it’s possible to calculate the cost of such waste.
 21 CFR Part 820.30 (a) https://www.ecfr.gov/cgi-bin/text-idx?SID=a018454b01dab73d0d1cef9f95be36a9&mc=true&node=pt21.8.820&rgn=div5#se21.8.820_130 Retrieved 2017-07-05
 QSR Required Procedures https://shrikale.wordpress.com/2017/05/18/qsr-required-procedures/ Retrieved 2017-07-05
 Design Control Guidance For Medical Device Manufacturers https://www.fda.gov/RegulatoryInformation/Guidances/ucm070627.htm Retrieved 2017-07-05
 The House of Quality. Harvard Business Review, pages 63-77, Vol 66 No 3, May 1988.
 Product Design and Development, 5th Edition. McGraw Hill, 2016.
In walking, just walk. In sitting, just sit. Above all, don’t wobble.
The companies I’ve worked for have been neurotic. They dither. When decisions are made they have an irrational and anxious quality about them.
My experience of work can be described as a shuddering paralysis. In an effort to take everything into account teams I’ve been on enter into an infinite regression of analysis that often takes us off course, delaying action. (I have been guilty of contributing to this.) However, the essence of a business is to act, to do.
When we do act, we don’t just act, but worry about whether that action is the best possible; we complain about all the flaws we find in the method; we even wonder whether the goal is the right goal. So our attention is split, bouncing between acting and thinking. Instead of moving gracefully toward our goal, we wobble. I wobble.
Perhaps Yúnmén wouldn’t mind if I rephrased his quote as “In planning, just plan. In doing, just do. Above all, don’t wobble.”
In the course of an average workday we make hundreds of decisions. Some of those decisions require engaging our conscious awareness. In my previous post I described how the quality of those decisions deteriorate as that awareness or willpower fatigues with use.
However, there are decisions where human error occurs with certainty even if our attention is totally focused on the task. Consider the Muller-Lyer illusion below:
The two vertical lines are of the same length. Even after knowing this, we all continue to perceive the line on the left to be longer than the line on the right. The “fact” that the two lines are of different lengths is simply obvious to us. Because of its obviousness we don’t stop to check our judgment before acting on it. Such actions, based on erroneous perception, are likely to produce faulty outcomes.
This error in our human perception/cognition system is hard-wired into our brains. No amount of retraining or conscious effort will correct it. So corrective actions that identify retraining as the way to prevent recurrence of this type of error won’t be effective. It will only serve to demoralize the worker. What, then, is an effective corrective action for such errors?
We can develop and use tools and methods that circumvent the brain’s perception/cognition system, for example with an overlay (red lines in the figure below), or actually measuring each line and comparing those values to one another. This does add a step to the evaluation process; an after-the-fact fix to a faulty design. Ideally, though, we would want our designs to take into account human limitations and avoid creating such illusions in the first place.
 Muller-Lyer illusion https://en.wikipedia.org/wiki/Muller-Lyer_illusion Retrieved 2017-06-22