Background¶

Chicago Restaurant Week (hereafter, CRW) is an annual event during which diners explore prix-fixe menus at restaurants throughout Chicago. This year's CRW took place during Jan 24, 2020 – Feb 9, 2020.

Despite that information of all restaurants can be found on the official website, an in-depth data analysis of the restaurants that participated this event would be interesting in many ways. On a macro level, it helps understand Chicago's food culture, sheding light on restaurants' geographical distribution and its potential social implications. On a micro level, it enables customized recommendation for a specific user who wants to find a restaurant.

This project consists of 4 parts:

The data: Crawl restaurants' data from the official website
A bird's-eye view of the data: descriptive statistics and visualization
Aspect 1: What's new in the menu?: Analyzing menu texts (TODO)
Aspect 2: Which restaurant to go? Price, cuisine, and recommendation (TODO)

# Load modules
import pandas as pd
import numpy as np
import re, os
import json
import matplotlib.pyplot as plt
import seaborn as sns

1. The data¶

The data was crawled from the website using Python (code). Since the website was available only during CRW, the code does not work now (but maybe it will work again in 2021!)

The data come from three sources: 1) the index page of CRW (code), 2) the detail page of each restaurant (code), and 3) Yelp API (code). Below is detailed introduction of each data source and their roles.

1) The index of CRW. From the index page of the official website we can get basic information of restaurants, such as name, address, cuisine style, and url to detail page of each restaurant. There are 444 restaurants that participated CRW in total.

Below are some examples:

df_Rindex = pd.read_csv("Rindex.txt", sep = "|")

print(df_Rindex.shape)
df_Rindex.head(n=5)

(444, 8)

2) The detail page of each restaurant. Each restaurant has a detailed page, which can be found on the index page, and it provides more detailed information. Two columns (neighborhood 'neighbor' and description 'des') are incorporated in the index data frame, and the rest are stored separately (under the "./Details/" directory) because of large variation among restaurants.

Below are some examples:

df_Rdes = pd.read_csv("Rindex_w_des.txt", sep = "|")

print(df_Rdes.shape)
df_Rdes.head(n=5)

(444, 10)

All of these restaurants provided neighborhood information, while 16 (4%) of them did not provide a description:

print(df_Rdes.loc[df_Rdes['neighbor'].isnull()].shape)
print(df_Rdes.loc[df_Rdes['des'].isnull()].shape)

(0, 10)
(16, 10)

As for detailed information, different restaurants provided very different information, ranging from 0 to 55 features. This piece of data is stored but not analyzed due to its complexity.

ex1 = pd.read_csv("./Details/18.txt", sep = "|")
print(ex1.shape)
ex1.sample(n=5)

(49, 3)

ex2 = pd.read_csv("./Details/307.txt", sep = "|")
print(ex2.shape)
ex2

(1, 3)

3) Yelp API. The goal is to get more information of the restaurants and especially their non-CRW, daily-routine information. If a restaurant is not found in Yelp, then it's filled with a certain set of values.

with open("yelp_best_match_json.txt") as rf1:
    dic = json.load(rf1)
    
df_yelp = pd.DataFrame.from_dict(dic, orient = 'index')
print(df_yelp.shape)
df_yelp.head(n=5)

(444, 6)

Combining these three sources of data, we get the final dataset.

with open("final_data.txt") as rf1:
    dic = json.load(rf1)

df = pd.DataFrame.from_dict(dic, orient = 'index')
print(df.shape)
df.sample(n=5)

(444, 20)

A "ratio_name" value and a "ratio_address" value are calculated to show to what extent the restaurant's information from the official website matches that from Yelp. These values range from 0 to 100, and higher values indicate better match. We consider a match as failed only when both values are below 50.

There are two restaurants that failed to find a match in Yelp, and we exclude them from relevant analysis (e.g. those involving price and rating).

df.loc[(df['ratio_address'] < 50) & (df['ratio_name'] < 50)]

# The distribution of ratio_name and ratio_address
grid = sns.JointGrid(df['ratio_name'], df['ratio_address'])
grid.plot_joint(sns.scatterplot, color="g")
grid.plot_marginals(sns.rugplot, height=1, color="g")

<seaborn.axisgrid.JointGrid at 0x10fe12d30>

2. A bird's-eye view of the data: descriptive statistics and visualization¶

Where are the restaurants?: Geographic distribution¶

df[["latitude","longitude"]] = df["coordinates"].apply(pd.Series)

df_in_Yelp = df.loc[(df['ratio_address'] >= 50) | (df['ratio_name'] >= 50)]

# Visualize
g = sns.scatterplot(x = "longitude",
                y = "latitude",
                hue = 'neighbor',
                marker = 'o',
                data = df_in_Yelp)
g.legend_.remove()

from sklearn.covariance import EllipticEnvelope

X = df_in_Yelp[["longitude","latitude"]].values
clf = EllipticEnvelope(contamination = 0.1)
clf.fit(X)
y_pred = clf.predict(X)

# scatter
colors = np.array(['#377eb8', '#ff7f00'])
plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[(y_pred + 1) // 2])

# contour
xx, yy = np.meshgrid(np.linspace(-88.4, -87.5, 150),
                     np.linspace(41.7, 42.3, 150))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='black')

<matplotlib.contour.QuadContourSet at 0x113734fd0>

# zoom in
df_downtown = df_in_Yelp.loc[y_pred == 1]
g = sns.scatterplot(x = "longitude",
                y = "latitude",
                hue = 'neighbor',
                marker = 'o',
                data = df_downtown)

g.legend_.remove()

# What neighbors are excluded?
dfm1 = df_downtown.groupby(['neighbor'])['ind'].count().reset_index().sort_values(by='ind',ascending=False).rename(columns = {'ind':'count_dt'})   
dfm2 = df_in_Yelp.groupby(['neighbor'])['ind'].count().reset_index().sort_values(by='ind',ascending=False).rename(columns = {'ind':'count_yelp'})   
dfm_diff = pd.merge(dfm2, dfm1, how = 'left').fillna(0)
dfm_diff['diff'] = dfm_diff['count_yelp'] - dfm_diff['count_dt'] 
dfm_diff.loc[dfm_diff['diff'] > 0]

dfm = df_downtown.groupby(['neighbor'])['ind'].count().reset_index().sort_values(by='ind',ascending=False).rename(columns = {'ind':'count'})   
dfm

fig, ax = plt.subplots(figsize=(9,5))
ax.plot(dfm['neighbor'], dfm['count'], 'o-', color = '#6A5ACD')
plt.xlabel("Neighborhood", fontsize = 14, fontweight="bold")
plt.ylabel("count", fontsize = 14, fontweight="bold")
plt.xticks(fontsize=12, rotation = 90)
plt.yticks(fontsize=12)
plt.title("# Restaurants by neighborhood", fontsize = 14, fontweight="bold")
plt.grid(True)

for i in range(dfm.shape[0]):
    plt.annotate(str(dfm.loc[i]['count']), 
                 (dfm.loc[i]['neighbor'], dfm.loc[i]['count']*1.02), 
                 fontsize = 10)

There are 3 neighborhoods with a lot of restaurants participating CRW: they are River North (93), West Loop (46), and Loop (45).

Price¶

print(df_in_Yelp.groupby(['price'])['name'].count().reset_index().sort_values(by='name',ascending=False))

     price  name
0       $$   240
1      $$$   142
3  unknown    42
2     $$$$    18

I care about restaurants that are usually expensive. There (at least) 18 restaurants. I'll check and see what these restaurants are.

df_in_Yelp.loc[df_in_Yelp['price'] == "$$$$", ['name', 'cuisines', 'neighbor', 'rating', 'review_count', 'meal_option']].sort_values(by = ['rating', 'review_count'], ascending=False)

We get a clear idea about which restaurants to go if we just care about the price.

Cuisines¶

Since a restaurant may have more than one cuisine style, we spread the cuisines column.

df_cuisine = pd.concat([pd.Series(row['ind'], row['cuisines'].lstrip('[').rstrip("]").replace(" ","").split(","))              
                    for _, row in df_in_Yelp.iterrows()]).reset_index()
df_cuisine.columns = ["cuisines","ind"]
df_cuisine = df_cuisine.drop_duplicates()

print(df_cuisine.shape)
df_cuisine.groupby(['cuisines'])['ind'].count().reset_index().sort_values(by='ind',ascending=False)

(470, 2)

American cuisines dominate the restaurants!

Ratings¶

Most restaurants have a raing around 4, with few restaurants having ratings as low as 2.0.

print(df_in_Yelp.groupby(['rating'])['ind'].count().reset_index().sort_values(by='rating',ascending=False))

   rating  ind
6     5.0    1
5     4.5   65
4     4.0  240
3     3.5  105
2     3.0   25
1     2.5    5
0     2.0    1

df_in_Yelp.groupby(['rating'])['ind'].count().reset_index().plot.bar(x='rating', y='ind', rot=0)

<matplotlib.axes._subplots.AxesSubplot at 0x113a7aac8>

!jupyter nbconvert --execute --to html CRWnotebook.ipynb

	page	pg_id	name	cuisines	address	url	alt_option	meal_option
0	1	1	1776 Restaurant	['American']	397 W Virginia Street	https://www.choosechicago.com/listing/1776-res...	['Gluten-free']	['$36 Dinner']
1	1	2	20 East	['American']	20 E. Delaware Pl.	https://www.choosechicago.com/listing/20-east/	[]	['$36 Dinner']
2	1	3	312 Chicago	['Italian']	136 N. LaSalle St.	https://www.choosechicago.com/listing/312-chic...	[]	['$36 Dinner']
3	1	4	676 Restaurant & Bar	['American']	676 N. Michigan Ave.	https://www.choosechicago.com/listing/676-rest...	['Vegetarian', 'Gluten-free']	['$24 Lunch', '$36 Dinner', '$48 Dinner']
4	1	5	90th Meridian	['American Contemporary']	231 S. LaSalle St. Ste. 108	https://www.choosechicago.com/listing/90th-mer...	[]	['$24 Lunch']

	page	pg_id	name	cuisines	address	url	alt_option	meal_option	neighbor	des
0	1	1	1776 Restaurant	['American']	397 W Virginia Street	https://www.choosechicago.com/listing/1776-res...	['Gluten-free']	['$36 Dinner']	Northwest Suburbs	We believe fine dining is more than the very b...
1	1	2	20 East	['American']	20 E. Delaware Pl.	https://www.choosechicago.com/listing/20-east/	[]	['$36 Dinner']	River North	At 20 East, creative menus featuring quality-d...
2	1	3	312 Chicago	['Italian']	136 N. LaSalle St.	https://www.choosechicago.com/listing/312-chic...	[]	['$36 Dinner']	Loop	NaN
3	1	4	676 Restaurant & Bar	['American']	676 N. Michigan Ave.	https://www.choosechicago.com/listing/676-rest...	['Vegetarian', 'Gluten-free']	['$24 Lunch', '$36 Dinner', '$48 Dinner']	The Magnificent Mile	We are an American Contemporary restaurant wit...
4	1	5	90th Meridian	['American Contemporary']	231 S. LaSalle St. Ste. 108	https://www.choosechicago.com/listing/90th-mer...	[]	['$24 Lunch']	Loop	Our menu has something for all diets and occas...

	section	feature	value
10	Facility Meeting Space Details	Max Capacity, Classroom Style	60.0
16	Facility Meeting Space Details	On-Site Catering	NaN
7	Facility Meeting Space Details	Private Meeting Space Available	NaN
11	Facility Meeting Space Details	Max Capacity, Reception Style (Standing)	100.0
23	Facility Meeting Space Details	Number of Meeting Rooms	NaN

	Yname	Yaddress	price	rating	coordinates	review_count
0	1776 Restaurant	{'address1': '397 W Virginia St', 'address2': ...	$$$	4.0	{'latitude': 42.233316, 'longitude': -88.335639}	144
1	20 East	{'address1': '20 E Delaware Pl', 'address2': '...	$$	4.0	{'latitude': 41.8994, 'longitude': -87.6275}	63
10	Ada Street	{'address1': '1664 N Ada St', 'address2': '', ...	$$$	4.0	{'latitude': 41.9124416, 'longitude': -87.6620...	515
100	Coda Di Volpe	{'address1': '3335 N Southport Ave', 'address2...	$$	4.0	{'latitude': 41.94265, 'longitude': -87.66354}	276
101	Cold Storage	{'address1': '1000 W Fulton Market', 'address2...	$$	4.0	{'latitude': 41.88724, 'longitude': -87.65277}	181

	page	pg_id	name	cuisines	address	url	alt_option	meal_option	neighbor	des	ind	Yname	Yaddress	price	rating	coordinates	review_count	ratio_name	Yaddress1	ratio_address
43	1	44	Berghoff Restaurant	['American', 'German']	17 W. Adams St.	https://www.choosechicago.com/listing/berghoff...	['Gluten-free']	['$24 Lunch', '$36 Dinner']	Loop	unknown	43	The Berghoff Restaurant	{'address1': '17 W Adams St', 'address2': '', ...	$$	3.5	{'latitude': 41.8793326, 'longitude': -87.6286...	1047.0	90	17 W Adams St	93
428	9	29	Vivere	['Italian']	71 W. Monroe St.	https://www.choosechicago.com/listing/vivere/	[]	['$24 Lunch', '$48 Dinner']	Loop	This is where we get very contemporary at the ...	428	Vivere	{'address1': '71 W Monroe St', 'address2': Non...	$$$	3.5	{'latitude': 41.8803520202637, 'longitude': -8...	142.0	100	71 W Monroe St	93
425	9	26	Victory Tap Chicago	['Italian']	1416 S. Michigan Ave.	https://www.choosechicago.com/listing/victory-...	['Vegetarian']	['$36 Dinner', '$48 Dinner']	South Loop	A sophisticated sibling to the larger Armand’s...	425	Victory Tap Chicago	{'address1': '1416 S Michigan Ave', 'address2'...	$$	4.0	{'latitude': 41.8635, 'longitude': -87.62454}	270.0	100	1416 S Michigan Ave	95
364	8	15	Taste 222	['American']	222 N. Canal St.	https://www.choosechicago.com/listing/taste-222/	['Vegetarian', 'Gluten-free']	['$24 Lunch', '$48 Dinner']	West Loop	Taste 222 is an intimate, upscale-chic restaur...	364	Taste 222	{'address1': '222 N Canal St', 'address2': '',...	$$	4.0	{'latitude': 41.886363, 'longitude': -87.64001}	85.0	100	222 N Canal St	93
218	5	19	McCormick & Schmick’s Seafood & Steaks – Rosemont	['Seafood']	5320 N. River Rd.	https://www.choosechicago.com/listing/mccormic...	[]	['$24 Lunch', '$48 Dinner']	Northwest Suburbs	McCormick and Schmicks is the Nation’s premier...	218	McCormick & Schmick's Seafood & Steaks	{'address1': '5320 N River Rd', 'address2': ''...	$$$	3.0	{'latitude': 41.973998, 'longitude': -87.862742}	296.0	85	5320 N River Rd	94

	page	pg_id	name	cuisines	address	url	alt_option	meal_option	neighbor	des	ind	Yname	Yaddress	price	rating	coordinates	review_count	ratio_name	Yaddress1	ratio_address
139	3	40	Fogo de Chão	['Brazilian']	1824 Abriter Ct. Ste. K200	https://www.choosechicago.com/listing/fogo-de-...	[]	['$48 Dinner']	Southwest Suburbs	TBD	139	Fogo de Chao Brazilian Steakhouse	{'address1': '661 N Lasalle Blvd', 'address2':...	$$$	4.0	{'latitude': 41.89418, 'longitude': -87.63251}	1622.0	49	661 N Lasalle Blvd	27
207	5	8	Macello Cucina di Puglia	['Italian']	1235 W. Lake St.	https://www.choosechicago.com/listing/macello-...	['Vegetarian', 'Gluten-free']	['$24 Lunch', '$48 Dinner']	West Loop	unknown	207	unknown	unknown	unknown	-1.0	{'latitude': 0, 'longitude': 0}	-1.0	13	unknown	9

	neighbor	count_yelp	count_dt	diff
7	Northwest Suburbs	17	0.0	17.0
11	West Suburbs	15	2.0	13.0
13	North Suburbs	13	3.0	10.0
16	O'Hare	4	0.0	4.0
17	Southwest Suburbs	4	3.0	1.0

	name	cuisines	neighbor	rating	review_count	meal_option
317	RPM Steak	['Steakhouse']	River North	4.5	1407.0	['$24 Lunch', '$48 Dinner']
288	Prime & Provisions	['Steakhouse']	Loop	4.5	1106.0	['$24 Brunch', '$24 Lunch', '$48 Dinner']
376	The Capital Grille – Rosemont	['American', 'Steakhouse']	O'Hare	4.5	452.0	['$24 Lunch', '$36 Dinner']
426	Vie	['American Contemporary']	West Suburbs	4.5	305.0	['$48 Dinner']
215	Mastro’s Steakhouse	['Steakhouse']	River North	4.0	1252.0	['$48 Dinner']
250	NoMI Kitchen	['French']	The Magnificent Mile	4.0	727.0	['$24 Lunch', '$36 Dinner', '$48 Dinner']
236	Morton’s The Steakhouse – Chicago (The Original)	['Steakhouse']	Gold Coast	4.0	300.0	['$48 Dinner']
179	Katana	['Asian Fusion']	River North	4.0	297.0	['$24 Lunch', '$48 Dinner']
158	GT Prime	['Steakhouse']	River North	4.0	273.0	['$48 Dinner']
255	Odyssey Lake Michigan	['American']	Streeterville	4.0	265.0	['$48 Dinner']
42	Bellemore	['American Contemporary']	West Town	4.0	227.0	['$24 Lunch', '$48 Dinner']
239	Morton’s The Steakhouse – Rosemont	['Steakhouse']	Northwest Suburbs	4.0	195.0	['$48 Dinner']
240	Morton’s The Steakhouse – Schaumburg	['Steakhouse']	Northwest Suburbs	4.0	164.0	['$48 Dinner']
61	Brindille	['French']	River North	4.0	156.0	['$48 Dinner']
412	Topolobampo	['Mexican']	River North	3.5	997.0	['$24 Lunch']
84	Chicago Chop House	['Steakhouse']	River North	3.5	745.0	['$48 Dinner']
253	Oceanique Restaurant	['Seafood']	North Suburbs	3.5	236.0	['$48 Dinner']
238	Morton’s The Steakhouse – Northbrook	['Steakhouse']	North Suburbs	3.5	132.0	['$48 Dinner']

	cuisines	ind
1	'American'	94
22	'Italian'	64
2	'AmericanContemporary'	61
36	'Steakhouse'	47
32	'Seafood'	28
27	'Mexican'	24
15	'French'	21
23	'Japanese/Sushi'	16
16	'GastroTavern/Pub'	13
26	'Mediterranean'	9
25	'Latin'	9
4	'BBQ/Ribs'	8
31	'Pizza'	8
35	'Spanish/Tapas'	7
3	'AsianFusion'	6
19	'Indian'	4
39	'WineBar'	4
9	'Chinese'	4
17	'German'	3
18	'Greek'	3
34	'Southern'	3
11	'Eclectic'	3
7	'Brazilian'	3
38	'Vietnamese'	3
5	'Bakery/Café/Deli/Diner'	3
29	'PanAsian'	2
37	'Vegetarian/Vegan'	2
33	'SoulFood'	2
20	'InternationalFusion'	2
21	'Irish'	2
14	'FoodHall'	2
12	'Filipino'	2
28	'MiddleEastern'	1
30	'Peruvian'	1
24	'Korean'	1
13	'Fondue'	1
10	'DessertBar'	1
8	'Cajun/Creole'	1
6	'Bistro'	1
0	'African'	1

	neighbor	count
19	River North	93
28	West Loop	46
13	Loop	45
25	Streeterville	27
9	Lincoln Park	24
5	Gold Coast	22
8	Lakeview	20
31	Wicker Park / Bucktown	16
30	West Town	16
22	South Loop	16
26	The Magnificent Mile	14
12	Logan Square	12
11	Little Italy / University Village	4
15	Old Town	3
24	Southwest Suburbs	3
32	Wrigleyville	3
14	North Suburbs	3
10	Lincoln Square	3
7	Irving Park	3
21	Roscoe Village	3
27	Uptown	2
29	West Suburbs	2
16	Pilsen	2
1	Andersonville	2
6	Hyde Park	2
4	Edgewater	2
3	Bridgeport	2
2	Avondale	2
23	South Suburbs	1
20	Rogers Park	1
18	Ravenswood	1
17	Portage Park	1
0	Albany Park	1