Optimize Restaurant Staff Levels

Photo by ELEVATE on Pexels.com

QUESTION:

A restaurant would like to optimize its staff levels. It would like to ensure that it’s not paying for uneeded staff, while also having enough staff to serve customers. It also would like to take its staff into consideration by ensuring they’re not overworked. How can we achieve this?

DATA:

Data is from the client’s POS system.

MODEL:

An MMCK qeueing model.

CRITERIA:

Staff not underworked (Utilization Rate < 80%) Staff not overworked (Utilization Rate > 90%) Minimum staff level

library(readr)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(lubridate)
library(queueing)

We can first load the data downloaded directly from the client’s POS system.

  #read in data
original_data <- read.csv('receipt_data.csv', sep = ',', na=c("","NA"))

  
  #clean data
data <- data.frame(original_data$Date, original_data$Time) %>% drop_na() 
  #recast our data columns as Dates, Times, & Timestamps
colnames(data) <- c('Date', 'Time')
data$Date <- as.Date(data$Date, format = '%m/%d/%y')
data$Timestamp <- as.POSIXct(paste(data$Date, data$Time),format = '%Y-%m-%d %I:%M %p') 
data$Month <- month(data$Timestamp)
str(data)
## 'data.frame':    7510 obs. of  4 variables:
##  $ Date     : Date, format: "2019-09-26" "2019-09-26" ...
##  $ Time     : Factor w/ 665 levels "1:00 pm","1:01 pm",..: 557 430 558 527 592 524 557 523 496 473 ...
##  $ Timestamp: POSIXct, format: "2019-09-26 20:07:00" "2019-09-26 18:00:00" ...
##  $ Month    : num  9 9 9 9 9 9 9 9 9 9 ...
head(data)
  Date<date>Time<fctr>Timestamp<S3: POSIXct>Month<dbl>
12019-09-268:07 pm2019-09-26 20:07:009
42019-09-266:00 pm2019-09-26 18:00:009
52019-09-268:08 pm2019-09-26 20:08:009
112019-09-267:37 pm2019-09-26 19:37:009
202019-09-268:42 pm2019-09-26 20:42:009
212019-09-267:34 pm2019-09-26 19:34:009

6 rows

The data is read in, cleaned, and we’ve added some additional columns to help with our analysis. We can see we have 7,510 receipts to analyze.

Now let’s plot our distribution of receipts to see if anything is abnormal.

ggplot(data = data) + geom_bar(mapping = aes(x = Month)) + theme_economist_white() + ggtitle('Receipts per Month')

It appears that the beginning and end of our data may be missing, since their frequencies are much lower than the other months. Let’s check.

summary(data$Date)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "2019-03-28" "2019-05-13" "2019-06-28" "2019-06-26" "2019-08-07" 
##         Max. 
## "2019-09-26"

We were correct. Our data starts almost at the end of March. Our data ends slightly before the last day of September. We’ll have to remove both to prevent skews in our data.

  #filter March and September data
data <-  filter(data, Month > 3, Month < 9)

summary(data$Date )
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "2019-04-01" "2019-05-10" "2019-06-18" "2019-06-16" "2019-07-23" 
##         Max. 
## "2019-08-31"
ggplot(data = data) + geom_bar(mapping = aes(x = Month))+ theme_economist_white() + ggtitle('Receipts per Month')

Looks much better. So we can see that we have on average between 1,100 and 1,500 receipts each month.

Now let’s determine if our customers begin eating lunch and dinner at the same time each month. It may be that these change throughout the year. If there are significant changes, we may need to model each period separately.

data$Hour <- hour(data$Timestamp)

  #group data by month and hour
hour_grouedby_month <- data %>% group_by(Month, Hour) %>% summarize(count_of_receipts =n())

ggplot(data = data ) + geom_bar(mapping = aes(x=Hour)) + facet_wrap(~Month, nrow = 5, ncol = 1)+ theme_economist_white() + ggtitle('Receipts per Month per Hour')

Perfect. Above, we can see that lunches are always the same time, 11am – 1pm (which makes since since these are the normal lunch times). Secondly, we see that dinner tends to start at the same time each month (4pm). Generally, dinner ends around 8, yet in July we end a little later at 9. We can likely get away with a simple model, but may want to go back and run a separate model for July, if the client sees that as being a significant problem.

We now know our times: 1. Lunch: 11:00 – 13:00 2. Dinner: 16:00 – 20:00

Let’s create a designator so we can easly pick out receipts in these two periods. We can ignore all others.

data <- data %>% filter((Hour >= 11 & Hour <= 14) | (Hour >=16 & Hour <= 20 ))
hist(data$Hour)
data$Meal <- ifelse((data$Hour >= 11 & data$Hour <= 14),'lunch', 'dinner' )

To use a queuing model, we’re interesting in how often customers arrive, as well as how long they typically spend in the system (restaurant). We don’t have access to how often customers arrive, but we can use the timestamp of their receipt as a proxy. Given that the time to eat is most likely normally distributed, we’re likely to find the same distribution on the arrival times as we would on departure times(receipt times).

We can calculate the arrival times by lagging our timestamp. Of course if we’re in between days or period, the lagged value will be much greater than 1 hour. We will omit these.

  #create lags
data <- data %>% arrange(desc(Timestamp))
data$lag <- lag(data$Timestamp, n=1)
data$intra_arrival <- as.numeric(data$lag - data$Timestamp, units="mins")

data <- data %>% filter(intra_arrival < 60)
head(data)
  Date<date>Time<fctr>Timestamp<S3: POSIXct>Month<dbl>Hour<int>Meal<chr>lag<S3: POSIXct>intra_arrival<dbl>
12019-08-318:21 pm2019-08-31 20:21:00820dinner2019-08-31 20:48:0027
22019-08-318:09 pm2019-08-31 20:09:00820dinner2019-08-31 20:21:0012
32019-08-318:07 pm2019-08-31 20:07:00820dinner2019-08-31 20:09:002
42019-08-318:01 pm2019-08-31 20:01:00820dinner2019-08-31 20:07:006
52019-08-317:53 pm2019-08-31 19:53:00819dinner2019-08-31 20:01:008
62019-08-317:50 pm2019-08-31 19:50:00819dinner2019-08-31 19:53:003

6 rows

Now we can begin modeling. We’ll model the Lunch and Dinner crowds separately, so we can obtain better estimates of the optimal staff levels.

  #separate data since we'll model each separately
lunch_data <- data %>% filter(Meal == 'lunch')
dinner_data <- data %>% filter(Meal == 'dinner')
hist(lunch_data$intra_arrival, breaks = 60, col = 'pink')
hist(dinner_data$intra_arrival, breaks = 60, col = 'lightblue')

Both distributions look very similar let’s see when the average intra-arrival time is for each.

  #average intra-arrival times
avg_lunch_arrival <- mean(lunch_data$intra_arrival)
avg_dinner_arrival <- mean(dinner_data$intra_arrival)
'Average Intra-Arrival Times'
## [1] "Average Intra-Arrival Times"
paste('Lunch: ', round(avg_lunch_arrival,2))
## [1] "Lunch:  8.28"
paste('Dinner: ', round(avg_dinner_arrival,2))
## [1] "Dinner:  8"

So, on aveage, we have a customer arriving every 8.3 minutes for lunch and every 8.0 minutes for dinner.

For our model, we’ll also need to know the average length of a meal. There is a good Cornell study that finds that the average meal in North America is 59.0 minutes.

Finally, we need to know our capacity. For this problem, it’ll be the number of tables we have – 10. We will then pick the optimal number of servers that keeps our utilization rate between 80% and 90%.

Thus, for our lunch model we’ll have the following parameters: 

  • lambda = 7.23 customers/hr
  • mu = 1.02 customers/hr
  • k=10

Time to finally model. We’ll start with lunch

lambda = 7.23
mu = 1.02
k = 10
c = seq(1,10)
                          #arrival   #time to eat  #servers  #capacity
l_results <- list(0)

for(c_i in c){
inputs <- NewInput.MMCK(lambda = lambda, mu = mu, c=c_i, k=k)

model <- QueueingModel.i_MMCK(inputs)
l_results[c_i] <- model$RO
}
l_final_data <- do.call(rbind,Map(data.frame, 'Number of Servers'=sort(c, decreasing = FALSE), 'Utilization'=l_results))
l_final_data
Number.of.Servers<int>Utilization<dbl>
11.0000000
20.9999948
30.9996666
40.9954633
50.9757111
60.9296846
70.8625867
80.7881844
90.7162198
100.6504919

1-10 of 10 rows

ggplot(data = l_final_data) + geom_col(mapping = aes(x=Number.of.Servers, y=Utilization)) +scale_x_continuous(breaks = c(seq(0,10, by=1)), labels = c(seq(0,10, by=1))) + geom_hline(yintercept = .8, color = 'red')  + geom_hline(yintercept = .9, color = 'red')+ theme_economist_white() + ggtitle('Server Utilization (Lunch)')

As we can see above, we need to have at least 7 servers if we want at least an 80% utilization rate while not overworking servers (at 90% utilization).

Thus, for our dinner model we’ll have the following parameters: lambda = 7.50 customers/hr mu = 1.02 customers/hr k = 10

  • lambda = 7.50 customers/hr
  • mu = 1.02 customers/hr
  • k = 10

lambda = 7.50
mu = 1.02
k = 10
c = seq(1,10)
                          #arrival   #time to eat  #servers  #capacity
d_results <- list(0)

for(c_i in c){
inputs <- NewInput.MMCK(lambda = lambda, mu = mu, c=c_i, k=k)

model <- QueueingModel.i_MMCK(inputs)
d_results[c_i] <- model$RO
}
d_final_data <- do.call(rbind,Map(data.frame, 'Number of Servers'=sort(c, decreasing = FALSE), 'Utilization'=d_results))
d_final_data
Number.of.Servers<int>Utilization<dbl>
11.0000000
20.9999962
30.9997492
40.9964339
50.9799694
60.9392720
70.8767646
80.8047190
90.7331979
100.6667163

1-10 of 10 rows

ggplot(data = d_final_data) + geom_col(mapping = aes(x=Number.of.Servers, y=Utilization)) +scale_x_continuous(breaks = c(seq(0,10, by=1)), labels = c(seq(0,10, by=1))) + geom_hline(yintercept = .8, color = 'red')  + geom_hline(yintercept = .9, color = 'red')+ theme_economist_white() + ggtitle('Server Utilization (Dinner)')

Similar to lunch, it our optimal server level for Dinner is 7, given that we’d like a utilization rate of at least 80% while haveing as little staff as possible.

We can now safely say that our restaurant should staff 7 employees for the Lunch and Dinner shifts. This should ensure each employee is busy (with a utilization of at least 80%), but is not being overworked. This will also ensure the business is operating with minimal payroll expenses.

%d bloggers like this: