U.S. Open - 2nd Rd: Which PLAYER will card a LOWER COMBINED SCORE on Holes 5-8?
Brandt Snedeker (USA) or Tie
Henrik Stenson (SWE)
Inputs To Solve
Pebble Beach Golf Links Course Stats
##### User Estimates #####
hole_5 = {'2':29,
'3':82,
'4':36,
'5':9}
hole_6 = {'3':8,
'4':67,
'5':63,
'6':12,
'7':5,
'8':1}
hole_7 = {'2':38,
'3':108,
'4':9,
'5':1}
hole_8 = {'3':7,
'4':96,
'5':47,
'6':11}
print("The observed distribution of scores on hole 5 is: %s" % hole_5)
print("The observed distribution of scores on hole 6 is: %s" % hole_6)
print("The observed distribution of scores on hole 7 is: %s" % hole_7)
print("The observed distribution of scores on hole 8 is: %s" % hole_8)
The observed distribution of scores on hole 5 is: {‘2’: 29, ‘3’: 82, ‘4’: 36, ‘5’: 9}
The observed distribution of scores on hole 6 is: {‘3’: 8, ‘4’: 67, ‘5’: 63, ‘6’: 12, ‘7’: 5, ‘8’: 1}
The observed distribution of scores on hole 7 is: {‘2’: 38, ‘3’: 108, ‘4’: 9, ‘5’: 1}
The observed distribution of scores on hole 8 is: {‘3’: 7, ‘4’: 96, ‘5’: 47, ‘6’: 11} ***
Method to Solve
- Enumerate all the possible combinations of observed combined scores for one golfer on the 4 holes and their respective probabilities.
- Enumerate the combination of all possible total scores for 2 golfers and their respective probabilities.
- Sum the probabilities of all the combinations where Henrik Stenson’s score is greater than Brandt Snedeker’s.
import numpy as np
import pandas as pd
t=sum(hole_5.values())
for i in hole_5:
hole_5[i] = hole_5[i]/t
t=sum(hole_6.values())
for i in hole_6:
hole_6[i] = hole_6[i]/t
t=sum(hole_7.values())
for i in hole_7:
hole_7[i] = hole_7[i]/t
t=sum(hole_8.values())
for i in hole_8:
hole_8[i] = hole_8[i]/t
y = np.array([(w,x,y,z) for w in hole_5.keys() for x in hole_6.keys() for y in hole_7.keys() for z in hole_8.keys()])
scores = pd.DataFrame(y)
scores.columns = ['h5','h6','h7','h8']
scores = scores.convert_objects(convert_numeric=True)
scores['total_strokes'] = scores.sum(axis=1)
x = np.array([(w,x,y,z) for w in hole_5.values() for x in hole_6.values() for y in hole_7.values() for z in hole_8.values()])
probability = pd.DataFrame(x)
probability.columns = ['h5','h6','h7','h8']
probability['p'] = probability.product(axis=1)
probs = {}
for score in set(scores['total_strokes']):
probs[score] = probability['p'][scores['total_strokes']==score].sum()
probs
{10: 0.00010096479095707293, 11: 0.002802679328599061, 12: 0.026674936381972655, 13: 0.11266946712422014, 14: 0.2391680446253382, 15: 0.2758595972674954, 16: 0.19563598833038628, 17: 0.09729863459844809, 18: 0.03617613517198673, 19: 0.010652261540639467, 20: 0.0024721845534811833, 21: 0.00043419506533733446, 22: 5.114172933185025e-05, 23: 3.607521455476177e-06, 24: 1.619703510621957e-07}
y = np.array([(y,z)for y in probs.keys() for z in probs.keys()])
scores = pd.DataFrame(y)
scores.columns = ['Snedeker','Stenson']
scores = scores.convert_objects(convert_numeric=True)
scores['stenson_wins'] = scores['Stenson'] < scores['Snedeker']
print(scores.tail())
Snedeker Stenson stenson_wins 220 24 20 True 221 24 21 True 222 24 22 True 223 24 23 True 224 24 24 False
x = np.array([(y,z) for y in probs.values() for z in probs.values()])
probability = pd.DataFrame(x)
probability.columns = ['Snedeker','Stenson']
probability['p'] = probability.product(axis=1)
print(probability.tail())
Snedeker Stenson p 220 1.619704e-07 2.472185e-03 4.004206e-10 221 1.619704e-07 4.341951e-04 7.032673e-11 222 1.619704e-07 5.114173e-05 8.283444e-12 223 1.619704e-07 3.607521e-06 5.843115e-13 224 1.619704e-07 1.619704e-07 2.623439e-14
p = probability['p'][scores['stenson_wins']].sum()
Solution
print("The proability that Henrik Stenson's COMBINED SCORE on Holes 5-8 is lower ~%s" % round(p,3))
print("The proability that Brandt Snedeker's COMBINED SCORE on Holes 5-8 is lower or a Tie is ~%s" % round((1-p),3))
The proability that Henrik Stenson’s COMBINED SCORE on Holes 5-8 is lower ~0.402
The proability that Brandt Snedeker’s COMBINED SCORE on Holes 5-8 is lower or a Tie is ~0.598