Send me your location, I mean Starbucks and Dunkin' - Part 0

Sat 23 September 2017 in Business,

"How many coffee shops in Boston?"

Hmmm...well, suppose there are 1 million people in Boston. Then... NOPE. Stop it. This series is nothing about strategy for any brain teaser. This is an analysis about two major coffeehouse chains, Starbucks vs Dunkin' Donuts, "corporate + license" vs pure franchise, in Boston area. I will walk you through their distribution and strategies of choosing a location (if consistent). This series is roughly separated by following parts.

Get the data prepared!

In this post, I will show you how I collected data sets for this series. You can also find my code here. Enjoy!

Three kinds of data sets will be needed for this series. However, you can also find the data from the folder and skip this post :)

Zip code and name of each region in Boston and its neighborhood
Latitude and longitude of all Starbucks' and Dunkin's store
Latitude and longitude of all merchants near each store

Take a breath. Here we go!

import requests
from bs4 import BeautifulSoup as Soup
import pandas as pd
import json
import settings

Zip code

Let's start from the easiest part! Click here to download the spreadsheet from Mass.gov.

I rename following regions to make them more specific.

Zip Code	Original Name	New Name
02446	Brookline	Brookline North
02467	Brookline	Brookline Chestnut Hill
02210	Boston	Boston Seaport

Location of Starbucks and Dunkin'

Use the zip code we get from the above to scrape all geometric data from their official websites. (Starbucks and Dunkin')

zip_code = pd.read_table('./data/zip_boston_neighborhood.txt', header=0, names = ['zip_code','region'], dtype = 'object')

#To get longitude and latitude from web content
def store_dict(source, id, coordinates, zip_code, output):
    out = output
    for store in source:
        if store[id] in output.keys():
            continue
        else:
            out[store[id]] = {'zip_code': zip_code, 'geo': store[coordinates]}
    return out

Starbucks

sb_store = {}
for z in zip_code.zip_code.values:
    url = "https://www.starbucks.com/store-locator?place={}".format(z)
    page = requests.get(url)
    soup = Soup(page.content, 'html.parser')
    store = soup.find_all('div',id = 'bootstrapData')
    store = store[0]
    store = store.get_text()
    sb_dict = json.loads(store)
    sb_dict = sb_dict['storeLocator']['locationState']['locations']
    sb_store = store_dict(sb_dict, 'id', 'coordinates', z, sb_store)

Dunkin'

dd_store = {}
dd_dict = []
for z in zip_code.zip_code.values:
    url = "https://www.mapquestapi.com/search/v2/radius?callback=json111206657990027752725_1504756855257&key=Gmjtd%7Clu6t2luan5%252C72%253Do5-larsq&origin={}&units=m&maxMatches=30&radius=25&hostedData=mqap.33454_DunkinDonuts&ambiguities=ignore&_=1504756855258".format(z)
    page = requests.get(url)
    soup = Soup(page.content,'html.parser')
    store = soup.get_text()
    store = store[store.find("(")+1:store.rfind(")")]
    dd_dict_temp = json.loads(store)
    dd_dict_temp = dd_dict_temp['searchResults']
    for dd in dd_dict_temp:
        dd_dict.append(dd['fields'])
    dd_store = store_dict(dd_dict, 'recordid', 'mqap_geography', z, dd_store)

Integration

sb_nearby = pd.DataFrame([{'id': key, 'lat': sb_store[key]['geo']['latitude'], 'lon': sb_store[key]['geo']['longitude']}, 'zip_code': sb_store[key]['zip_code'] for key in sb_store.keys()])
dd_nearby = pd.DataFrame([{'id': key, 'lat': dd_store[key]['geo']['latLng']['lat'],'lon': dd_store[key]['geo']['latLng']['lng']}, 'zip_code': sb_store[key]['zip_code'] for key in dd_store.keys()])
sb_nearby['type'] = 'starbucks'
dd_nearby['type'] = 'dunkin'
sb_nearby['id'] = sb_nearby['id'].astype('str')
dd_nearby['id'] = dd_nearby['id'].astype('str')
nearby = pd.concat([sb_nearby, dd_nearby],ignore_index=True)
nearby = pd.merge(nearby, zip_code, how = 'inner', on = 'zip_code')

Merchants Nearby

Finally, this is the last data set we need. Before doing it, you need to get a key for Google Place API. Google Places API Radar Search Service could be used to find up to 200 nearby merchants by types for each search. Thus, we are able to get the number of each type of merchants for a given store. A list of valid types could be found here. Additional parameters, like minprice and maxprice, are usefule if you are interested in diving deeper.

#search nearby merchants using Google Place API
def search_nearby(datasets, type_list, API_KEY, optional_para = None):

    #the key of the dictionay is used as the suffix for output, the value is used as a part of URL
    optional = {'':""}
    if optional_para is not None:
        for key in optional_para.keys():
            optional[key] = "&{}={}".format(key,optional_para[key])

    near = {}
    for suffix in optional.keys():
        for type in type_list:
            print("Working on {}".format(type))
            new_column = []        
            for i in range(len(datasets.values)):
                row = datasets.values[i]
                url = 'https://maps.googleapis.com/maps/api/place/radarsearch/json?location={},{}&radius=1000&type={}{}&key={}'.format(row[1], row[2], type, optional[suffix], API_KEY)
                page = requests.get(url)
                soup = Soup(page.content,'html.parser')
                place = soup.get_text()
                place = json.loads(place)
                place = place['results']
                for merchant in place:
                    near[row[0]] = dict([merchant['place_id']]: merchant['geometry']['location'])
                new_column.append(len(place))
                print("{}% is completed.".format(round(i/len(datasets.values)*100, 2)))
                variable_name = "{}_{}".format(type, suffix)
                datasets[variable_name] = pd.Series(np.array(new_column), index = datasets.index) 

    return (datasets, near)

Price level of restaurants and clothing stores is available in Google Place API for search. Thus, I decided to see if Starbucks or Dunkin' has special preference.

#Define cheap and expensive as price level from 0-2 and 3-4 respectively
optional_keywords_price_level = {'cheap': {'minprice':0, 'maxprice':2}, 'expensive': {'minprice':3, 'maxprice':4}}
#Search nearby cheap and expensive restaurant and clothing store 
type_list_with_price = ['restaurant','clothing_store']
nearby, merchants = search_nearby(nearby, type_list_with_price, settings.API_KEY, optional_keywords_price_level)

Search other types of merchants

#Search other nearby merchants
type_list = ['atm', 'bakery', 'bank', 'beauty_salon', 'book_store','cafe','car_repair', 'movie_theater', 'convenience_store', 'dentist', 'florist', 'gas_station', 'gym', 'home_goods_store',\
             'hospital', 'laundry', 'liquor_store', 'museum', 'park', 'pharmacy', 'police', 'real_estate_agency', 'school', 'shopping_mall', 'stadium', 'transit_station', 'university',\
             'accounting', 'art_gallery', 'bicycle_store', 'car_dealer', 'car_rental', 'car_repair', 'church', 'city_hall', 'department_store', 'electronics_store', 'embassy', 'funeral_home',\
             'fire_station', 'hindu_temple', 'veterinary_care', 'synagogue', 'post_office', 'physiotherapist', 'parking', 'mosque', 'local_government_office', 'library']
nearby, merchants_1 = search_nearby(nearby, type_list, settings.API_KEY)
merchants = merchants.update(merchants_1)

Done! Don't forget to save it!

nearby.to_pickle('nearby.pkl')
with open('merchants.pkl', 'wb') as f:
    pickle.dump(merchants, f, pickle.HIGHEST_PROTOCOL)

Why didn't I use Google API to find all Starbucks' and Dunkin's stores directly?

Google API does not support searching by zip code. You have to provide a pair of latitude and longitude as the center of region. It might be a little more difficult to search through all regions in Boston. Luckily, we can use zip code in their official searching engine. It makes life much easier!

What's next?

You will have a brief idea of where are those stores located in Boston. Heatmaps will be used to show the density of coffeehouse distribution. Based on the pattern of store locations, it is possible to find some strategies of their business and marketing. And it is also possible to help them find an ideal location according to their location preference.