19.2 Terminology

Queueing theory studies systems and processes in terms of three key concepts:

  • customers are the units of work that the system serves – a customer can be a real person, or it can be whatever the system is supposed to process and complete: a web request, a database query, a part to be milled by a machine, etc.;

  • servers are the objects that do the processing work – a server might be the cashier at the grocery store, a web server, a database server, a milling machine, etc., and

  • queues are where the units of work wait if the server is busy and can not start the work as they arrive – a queue may be a physical line, reside in memory, etc.

Components of a generic queueing system, by [David Hare](https://community.alteryx.com/t5/Engine-Works/Tackling-Queued-Jobs-With-Queueing-Theory-Part-1/ba-p/475036).

Figure 19.1: Components of a generic queueing system, by David Hare.

Useful Distributions

Three distributions play a central role in queueing theory: Poisson, exponential, and Erlang distributions.

Poisson Distribution

The Poisson distribution counts the number of discrete events occurring in a fixed time period; it is closely connected to the exponential distribution, which (among other applications) measures the time between arrivals of the events. The Poisson distribution is a discrete distribution; the random variable can only take non-negative integer values. The exponential distribution can take any (nonnegative) real value.

Consider the problem of determining the probability of \(n\) arrivals being observed during a time interval of length \(t\), where the following assumptions are made:

  • the probability that an arrival is observed during a small time interval (say of length \(\nu\)) is proportional to the length of interval; let the proportionality constant be \(\lambda\), so that the probability is \(\lambda\nu\);

  • the probability of two or more arrivals in a small interval is zero, and

  • the number of arrivals in any time interval is independent of the number in non-overlapping time interval (for example, the number of arrivals occurring between times 5 and 25 does not provide information about the number of arrivals occurring between times 30 and 50).

Let \(P(n;t)\) be the probability of observing \(n\) arrivals in a time interval of length \(t\). Then, for some \(\lambda>0\), \[P_{\lambda}(n; t) = \frac{(\lambda t)^n}{n!} e^{-\lambda t}, \ n=0,1,2,\cdots\] is the probability mass function of the Poisson distribution for the discrete random variable \(n\) – the number of arrivals – for a given length of time interval \(t\) (see Figure 19.2).

Example: on average, 50 customers arrive in a coffee shop every hour. What is the probability that exactly 20 customers will arrive in a 30-minute period, if the arrivals follow a Poisson distribution?

Solution: given \(\lambda=50\) customers per hour, \(t=30 \text{ min}=0.5 \text{ hr}\) and \(n=20\), we have \[P_{50}(20;0.5)=\frac{(50 \cdot 0.5)^{20}}{20!}e^{-50\cdot 0.5}\approx 5.2\%.\] We can evaluate the probability directly in R via

[1] 0.05191747

In a queueing system, such arrivals are referred to as Poisson arrivals. The time between successive arrivals is called the inter-arrival time.

Exponential Distribution

If the number of arrivals in a given time interval follows a Poisson distribution with parameter \(\lambda t\), the inter-arrival times follow an exponential distribution with probability density function \[f_{\lambda}(t) = \lambda e^{-\lambda t}, \ \textrm{for }t>0,\] and the probability \(P(W\leq t)\) that a customer’s waiting time \(W\) is smaller than the length of the time interval \(t\) is \[P(W\leq t) = 1 - e^{-\lambda t}\] (see Figure~\(\ref{fig:qsfig2}\))). We would write that \(W\sim \text{Exp}(\lambda)\).

Poisson (with $\lambda t=2.3$) and exponential distributions (with parameter $\lambda$). The shaded area (right) represents the probability that a customer will wait up to the length of the time interval $t$.Poisson (with $\lambda t=2.3$) and exponential distributions (with parameter $\lambda$). The shaded area (right) represents the probability that a customer will wait up to the length of the time interval $t$.

Figure 19.2: Poisson (with \(\lambda t=2.3\)) and exponential distributions (with parameter \(\lambda\)). The shaded area (right) represents the probability that a customer will wait up to the length of the time interval \(t\).

Example: a manager of a fast food restaurant observes that an average of 9 customers are served by a waiter in a one-hour time period. Assuming that the service time follows an exponential distribution, what is the probability that a customer will be served within 15 minutes?

Solution: let \(w\) be the average waiting time. Given \(\mu=9\) customers per hours, \(t=15 \text{ min}=0.25 \text{ hr}\), we have \[P(w\leq 15 \text{ min})=1-e^{-9 \times 0.25} \approx 89.5\%.\]

We can evaluate the probability directly in R via

[1] 0.8946008

In general, if the arrival rate is stationary, if bulk arrivals (two or more simultaneous arrivals) cannot occur, and if past arrivals do not affect future arrivals, then inter-arrival times follow an exponential distribution with parameter \(\lambda\), and the number of arrivals in any interval of length \(t\) is Poisson with parameter \(\lambda t\).

One of the most attractive features of the exponential distribution relating to inter-arrival times is that it is memoryless – if a random variable \(X\) follows an exponential distribution, then for all non-negative values of \(t\) and \(h\), \[P(X \geq t + h|X \geq t) = P(X \geq h).\] No other density function satisfies this property [377].

The memoryless property of the exponential distribution is important because it implies that the probability distribution of the time until the next arrival is independent of the time since the last arrival. This is clearly not always the case – imagine if that was so when waiting for public transportation!

For instance, if we know that at least \(t\) time units have elapsed since the last arrival, then the distribution of the time \(h\) until the next arrival is independent of \(t\). If \(h=4\), say, then we must have \[P(X>9|X>5)= P(X>7|X>3) = P(X>4).\]

Example: The time \(W\) a customer spends waiting in a bank queue is exponentially distributed with mean \(\lambda=10 \text{ min}\), say. If they’ve already waited 10 minutes, what is the probability that they will have had to wait more than 15 minutes in total, when all is said and done?

Solution: thanks to the memory-less property of the exponential distribution, we have \[P(W>15\mid W>10)=P(W>15-10=5)=\exp(-5/\lambda)=\exp(-1/2)\approx 60.6\%. \]

We can evaluate the probability directly in R via

[1] 0.6065307
Erlang Distribution

The exponential distribution is not always an appropriate model of inter-arrival times, however (perhaps the process should not be memoryless, say). A common alternative is to use the Erlang distribution \(\mathcal{E}(R,k)\), a continuous random variable with rate and shape parameters \(R>0\) and \(k\in \mathbb{Z}^+\), respectively, whose probability density function is \[f_{R,k}(t) = \frac{R (Rt)^{k-1} e^{-Rt}}{(k-1)!},\ t\geq 0.\]

If \(k=1\), the Erlang distribution reduces to an exponential distribution with parameter \(R\). It can further be shown that if \(X\sim \mathcal{E}(R,k)\), where \(R=k\lambda\), then \(X\sim X_{1}+X_{2}+\cdots+X_{k},\) where each \(X_{i}\sim \text{Exp}(R)\) is an independent random variable.

When we model the inter-arrival process as an Erlang distribution \(\mathcal{E}(k\lambda,k)\), we are really saying that it is equivalent to customers going through \(k\) phases (each of which is memoryless) before being served. For this reason, the shape parameter is often referred to as the number of phases of the Erlang distribution [378].

Probability distribution functions for various Erlang random variable [Wikipedia].

Figure 19.3: Probability distribution functions for various Erlang random variable Wikipedia.

19.2.1 Input/Arrival Process

The input process is usually called the arrival process. Arrivals are called customers. In the models under consideration, we assume that arrivals cannot be simultaneous (this might be unrealistic when modeling arrivals at a restaurant, say). If simultaneous arrivals are possible (in theory and/or in practice), we say that bulk arrivals are allowed.

Usually, we assume that the arrival process is unaffected by the number of customers in the system. In the context of a bank, this would imply that whether there are 500 or 5 people at the bank, the process governing arrivals remains unchanged. There are two common situations in which the arrival process may depend on the number of customers present. The first occurs when arrivals are drawn from a small population – the so-called finite source models – if all members of the population are already in the system, there cannot be another arrival!

Another such situation arises when the rate at which customers arrive at the facility decreases when the facility becomes too crowded. For example, when customers see that a restaurant’s parking lot is full, they might very well decide to go to another restaurant or forego eating out altogether. If a customer arrives but fails to enter the system, we say that the customer has balked.

19.2.2 Output/Service Process

To describe the output process (often called the service process) of a queuing system, we usually specify a probability distribution – the service time distribution – which governs the customers’ service time.

In most cases, we assume that the service time distribution is independent of the number of customers present in the system. This implies, for example, that the server does not work faster when more customers are present. We can distinguish two types of servers: in parallel and in series.

Servers are in parallel if they all provide the same type of service and a customer only needs to pass through one of them to complete their service. For example, the tellers in a bank are usually arranged in parallel; typically, customers only need to be serviced by one teller, and any teller can perform the desired service.

Servers are in series if a customer must pass through several servers before their service is complete. An assembly line is an example of such a queuing system. Input and output processes occur in a variety of situations:

  • situation: purchasing Blue Jays tickets at the Rogers Centre
    input: baseball fans arrive at the ticket office
    output: tellers serve the baseball fans

  • situation: pizza parlour
    input: requests for pizza delivery are received
    output: pizza parlour prepares and bakes pizzas, and sends them to be delivered

  • situation: government service centre
    input: citizen/residents enter the service centre
    output: receptionist assigns them to a specific queue based on their needs:

    • input: citizen/residents enter a specific queue based on their needs
      output: public servant addresses their needs
  • situation: hospital blood bank
    input: pints of blood arrive
    output: patients use up pints of blood

  • situation: garage
    input: cars break down and are sent to the garage for repairs
    output: cars are repaired by mechanics and sent back on the streets

The relevant computations are fairly easy to execute, as the following examples demonstrate.

Example: On average, 4.6 customers enter a coffee shop each hour. If the arrivals follow a Poisson process, what is the probability that at most two customers will enter in a 30 minute period?

Solution: since \(30 \text{ min}=0.5 \text{ hr}\), we have \[\begin{aligned} P_{\lambda=4.6}&(n\leq 2;t=0.5)=P_{4.6}(0,0.5)+P_{4.6}(1,0.5)+P_{4.6}(2,0.5) \\ &=e^{-4.6\cdot 0.5}\left[\frac{(4.6\cdot 0.5)^0}{0!}+\frac{(4.6\cdot 0.5)^1}{1!}+\frac{(4.6\cdot 0.5)^2}{2!}\right] \\ &\approx 0.5960; \end{aligned}\] the corresponding Poisson distribution is shown in Figure 19.2.

We can evaluate the probability directly in R via

[1] 0.5960388

Example: in a fast food restaurant, a cashier serves on average 9 customers in a one-hour time period. If the service time follows an exponential distribution, what percentage of customers will be served in 10 minutes or less? After 30 minutes?

Solution: since \(1 \text{ hr}=60 \text{ mins}\), we have \(\mu=9 \text{ customers}/60 \text{ minutes}\), and so \[\begin{aligned} P(W\leq 10/60)&=1-e^{-9 \cdot 10/60} \approx 0.7769\\ P(W>30/60)&=e^{-9\cdot 30/60}\approx 0.0111.\end{aligned}\]

19.2.3 Queue Discipline

To describe a queuing system completely, we must also describe the queue discipline and the manner in which customers join lines. The queue discipline describes the method used to determine the order in which customers are served:

  • the most common queue discipline is the first come, first served (FCFS) discipline, in which customers are served in the order of their arrival, as one would expect to see in an Ottawa coffee shop;

  • under the last come, first served (LCFS) discipline, the most recent arrivals are the first to enter service; for example, if we consider exiting from an elevator to be the service, then a crowded elevator illustrates such a discipline;

  • sometimes the order in which customers arrive has no effect on the order in which they are served; this would be the case if the next customer to enter service is randomly chosen from those customers waiting for service, a situation referred to as service in random order (SIRO) discipline; when callers to an inter-city bus company are put on hold, the luck of the draw often determines which caller will next be serviced by an operator;

  • finally, priority discipline classifies each arrival into one of several categories, each of which is assigned a priority level (a triage process); within each priority level, customers enter the queue on a FCFS basis; such a discipline is often used in emergency rooms to determine the order in which customers receive treatment, and in copying and computer time-sharing facilities, where priority is usually given to jobs with shorter processing times.

19.2.4 Method Used by Arrivals to Join Queue

Another important factor for the behaviour of the queuing system is the method used by customers to determine which line to join. For example, in some banks, customers must join a single line, but in other banks, customers may choose the line they want to join.

When there are several lines, customers often join the shortest line. Unfortunately, in many situations (such as at the supermarket), it is difficult to define the shortest line. If there are several lines at a queuing facility, it is important to know whether or not customers are allowed to switch, or jockey, between lines. In most queuing systems with multiple lines, jockeying is permitted, but jockeying at a custom inspection booth would not be recommended (if it is even allowed), for instance.


S. M. Ross, Introduction to Probability Models, 11th ed. San Diego, CA, USA: Academic Press, 2014.
C. Newell, Applications of Queueing Theory. Springer Netherlands, 2013.