Queuing theory is a branch of mathematics that studies and models the act of waiting in lines. The seminal paper on queuing theory  was published in 1909 by Danish mathematician A.K. Erlang; in it, he studied
the problem of determining how many telephone circuits were necessary to provide phone service that would prevent customers from waiting too long for an available circuit. In developing a solution to this problem, he began to realize that the problem of minimizing waiting time was applicable to many fields, and began developing the theory further. Erlang’s switchboard problem laid the path for modern queuing theory .
Queueing theory boils down to answering simple questions:
How likely is it that objects/units/persons will queue up and wait in line?
How long will the line be?
How long will the wait be?
How busy will the system be?
How much capacity is needed to meet an expected level of demand?
Knowing how to think about these kinds of questions will help analysts and stakeholder anticipate bottlenecks. As a result, they will build systems and teams to be more efficient and more scalable, to have higher performance and lower costs, and to ultimately provide better service to their customers and end users.
Queueing theory also allows for the quantitative treatment of bottlenecks and effect on performance. For instance, a question such as “how long will the wait be, on average?” will have an answer, but so will other questions concerning the variability of wait times, the distribution of wait times, and the likelihood that a customer will receive extremely poor service, and so on .
Let us consider a simple example. Suppose a grocery store has a single checkout line and a single cashier. If, on average, one shopper arrives at the line to pay for their groceries every 5 minutes and if scanning, bagging, and paying takes 4.5 minutes, on average, would we expect customers to have to wait in line?
When the problem is presented this way, our intuition says that there should be no waiting in line, and that the cashier should be idle, on average, 30 seconds every 5 minutes, only being busy 90% of the time. No one ever has to wait before being served!
If you have ever been in a grocery store, however, you know that this is not what happens in reality; many shoppers will wait in line, and they will have to wait a long time before being processed.
Fundamentally, queueing happens for three reasons:
irregular arrivals – shoppers do not arrive at the checkout line on a regular schedule; they are sometimes spaced far apart and sometimes close together, so they overlap (an overlap automatically causes queueing and waiting);
irregular job sizes – shoppers do not all get processed in 4.5 minutes; someone shopping for a large family will require much more time than someone shopping only for themselves, for instance (when this happens, overlap is again a problem because new shoppers will arrive and be ready to check out while the existing ones are still in progress), and
waste – lost time can never be regained; shoppers overlap because the second shopper arrived too soon, before the first shopper had the time to finish being served, but looking at it the other way, perhaps it’s not the second shopper’s fault; perhaps the first shopper should have arrived earlier, but they wasted time reading a gossip magazine while the cashier was idle! They missed their chance for quick service and, as a result, made the second shopper have to wait.
Irregular arrival times and job sizes are guaranteed to cause queueing. The only time there is no queueing is when the job sizes are uniform, the arrivals are timed evenly, and there is little enough work for the cashier to keep up with the arrival. Even when the cashier is barely busy, irregular arrivals or arrivals in bursts will cause some queueing.
In general, queueing gets worse when the following is true of the system:
high utilisation – the busier the cashier is, the longer it takes to recover from wasted time;
high variability – the more variability in arrivals or job sizes, the more waste and the more overlap (queueing) occurs, and
insufficient number of servers – fewer cashiers means less capacity to absorb arrival spikes, leading to more wasted time and higher utilisation.
In order to describe queues, we must first know and understand some useful probability distributions, as well as input and output processes.