Post

Olympic medals

Olympic medal counts are kind of boring - they're highly correlated with team size. Medal rates tell a more interesting story.



0. Setup

1
2
3
4
5
6
7
8
9
10
# Load the autoreload extension
%load_ext autoreload

# Enable autoreloading for all modules
%autoreload 2

# Import local modules
from util_core import *
import olympic_medals.util_olympic_medals as U

1
2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


1. Data load

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Base loads
all_data = U.load_base_datasets()

medal_df = all_data["medal_df"]
event_df = all_data["event_df"]
team_df = all_data["team_df"]
athlete_df = all_data["athlete_df"]

# Combined entry data
entry_df = U.build_combined_event_entries(team_df, athlete_df)
all_data["entry_df"] = entry_df

# Store
save_object(all_data, "in/olympic_medals_all_data.p")

1
2
3
4
5
6
Notice: Dropped 358/1698 (21.1%) non-current team records.
null_df: 0
Notice: Dropped 3/11113 (0.0%) non-current team records.
Exploded to 14978 records, one per athlete per event entered.
Notice: Dropped 435/14978 (2.9%) invalid event records.
Notice: Dropped 6494/14543 (44.7%) team event records.

1.1 Medals

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Side quest - how many medals were awarded in each event?
count_df = medal_df["event_key"].value_counts().to_frame().reset_index()

# Print summary stats
total_e = len(count_df)
gt3 = len(count_df[count_df["count"]>3])
eq3 = len(count_df[count_df["count"]==3])
lt3 = len(count_df[count_df["count"]<3])
print(f"Isolated a total of {total_e} events awarding {len(medal_df)} cummulative medals...")
print(f"{gt3} ({gt3/total_e*100:.1f}%) events awarded 4+ medals")
print(f"{eq3} ({eq3/total_e*100:.1f}%) events awarded 3 medals")
print(f"{lt3} ({lt3/total_e*100:.1f}%) events awarded <3 medals")

# Look at 57 events with 4+ medals
gt3_df = count_df[count_df["count"]>3].copy()
gt3_df["discipline"] = gt3_df["event_key"].apply(lambda x: x.split(": ")[0])
print(gt3_df["discipline"].value_counts())

# View sample
medal_df.head()

1
2
3
4
5
6
7
8
9
10
11
12
13
Isolated a total of 329 events awarding 1044 cummulative medals...
57 (17.3%) events awarded 4+ medals
272 (82.7%) events awarded 3 medals
0 (0.0%) events awarded <3 medals
discipline
Wrestling              18
Judo                   15
Boxing                 13
Taekwondo               8
Athletics               1
Canoe Sprint            1
Artistic Gymnastics     1
Name: count, dtype: int64
medal_typemedal_codemedal_datenamegenderdisciplineeventevent_typeurl_eventcodecountry_codecountrycountry_longevent_key
0Gold Medal1.02024-07-27Remco EVENEPOELMCycling RoadMen's Individual Time TrialATH/en/paris-2024/results/cycling-road/men-s-indi...1903136BELBelgiumBelgiumCycling Road: Men's Individual Time Trial
1Silver Medal2.02024-07-27Filippo GANNAMCycling RoadMen's Individual Time TrialATH/en/paris-2024/results/cycling-road/men-s-indi...1923520ITAItalyItalyCycling Road: Men's Individual Time Trial
2Bronze Medal3.02024-07-27Wout van AERTMCycling RoadMen's Individual Time TrialATH/en/paris-2024/results/cycling-road/men-s-indi...1903147BELBelgiumBelgiumCycling Road: Men's Individual Time Trial
3Gold Medal1.02024-07-27Grace BROWNWCycling RoadWomen's Individual Time TrialATH/en/paris-2024/results/cycling-road/women-s-in...1940173AUSAustraliaAustraliaCycling Road: Women's Individual Time Trial
4Silver Medal2.02024-07-27Anna HENDERSONWCycling RoadWomen's Individual Time TrialATH/en/paris-2024/results/cycling-road/women-s-in...1912525GBRGreat BritainGreat BritainCycling Road: Women's Individual Time Trial

1.2 Events

1
2
3
4
5
6
# Print summary stats and view sample
total_e2 = len(event_df)
assert total_e2 == total_e, f"Warning total_e: {total_e}, total_e2: {total_e2}"
print(f"Isolated a total of {total_e2} events...")
event_df.head()

1
Isolated a total of 329 events...
eventtagsportsport_codesport_urlevent_key
0Men's IndividualarcheryArcheryARChttps://olympics.com/en/paris-2024/sports/archeryArchery: Men's Individual
1Women's IndividualarcheryArcheryARChttps://olympics.com/en/paris-2024/sports/archeryArchery: Women's Individual
2Men's TeamarcheryArcheryARChttps://olympics.com/en/paris-2024/sports/archeryArchery: Men's Team
3Women's TeamarcheryArcheryARChttps://olympics.com/en/paris-2024/sports/archeryArchery: Women's Team
4Mixed TeamarcheryArcheryARChttps://olympics.com/en/paris-2024/sports/archeryArchery: Mixed Team

1.3 Teams

1
2
3
# View sample
team_df.head()

codecurrentteamteam_gendercountry_codecountrycountry_longdisciplinedisciplines_codeeventsathletescoachesathletes_codesnum_athletescoaches_codesnum_coachesevent_key
0ARCMTEAM3---CHN01TruePeople's Republic of ChinaMCHNChinaPeople's Republic of ChinaArcheryARCMen's Team['KAO Wenchao', 'LI Zhongyuan', 'WANG Yan']NaN['1913366', '1913367', '1913369']3.0NaNNaNArchery: Men's Team
1ARCMTEAM3---COL01TrueColombiaMCOLColombiaColombiaArcheryARCMen's Team['ARCILA Santiago', 'ENRIQUEZ Jorge', 'HERNAND...NaN['1935642', '1543412', '1935644']3.0NaNNaNArchery: Men's Team
2ARCMTEAM3---FRA01TrueFranceMFRAFranceFranceArcheryARCMen's Team['ADDIS Baptiste', 'CHIRAULT Thomas', 'VALLADO...NaN['1541270', '1541272', '1541275']3.0NaNNaNArchery: Men's Team
3ARCMTEAM3---GBR01TrueGreat BritainMGBRGreat BritainGreat BritainArcheryARCMen's Team['HALL Conor', 'HALL Tom', 'WISE Alex']NaN['1560988', '1560989', '1561003']3.0NaNNaNArchery: Men's Team
4ARCMTEAM3---IND01TrueIndiaMINDIndiaIndiaArcheryARCMen's Team['BOMMADEVARA Dhiraj', 'JADHAV Pravin Ramesh',...NaN['1546108', '1546112', '1546110']3.0NaNNaNArchery: Men's Team

1.4 Athletes

1
2
3
# View sample
athlete_df.head()

codecurrentnamename_shortname_tvgenderfunctioncountry_codecountrycountry_long...langcoachreasonheroinfluencephilosophysporting_relativesritualother_sportsevent_key
01532872TrueALEKSANYAN ArturALEKSANYAN AArtur ALEKSANYANMaleAthleteARMArmeniaArmenia...Armenian, English, RussianGevorg Aleksanyan (ARM), fatherHe followed his father and his uncle into the ...Footballer Zinedine Zidane (FRA), World Cup wi...His father, Gevorg Aleksanyan"Wrestling is my life." (mediamax.am. 18 May 2...NaNNaNNaNWrestling: Men's Greco-Roman 97kg
11532873TrueAMOYAN MalkhasAMOYAN MMalkhas AMOYANMaleAthleteARMArmeniaArmenia...ArmenianNaNNaNNaNNaN"To become a good athlete, you first have to b...Uncle, Roman Amoyan (wrestling), 2008 Olympic ...NaNNaNWrestling: Men's Greco-Roman 77kg
21532874TrueGALSTYAN SlavikGALSTYAN SSlavik GALSTYANMaleAthleteARMArmeniaArmenia...ArmenianPersonal: Martin Alekhanyan (ARM).<br>National...NaNNaNNaNNaNNaNNaNNaNWrestling: Men's Greco-Roman 67kg
31532944TrueHARUTYUNYAN ArsenHARUTYUNYAN AArsen HARUTYUNYANMaleAthleteARMArmeniaArmenia...ArmenianNational: Habetnak KurghinyanWhile doing karate he noticed wrestlers traini...Wrestler Armen Nazaryan (ARM, BUL), two-time O...NaN“Nothing is impossible, set goals in front of ...NaNNaNNaNWrestling: Men's Freestyle 57kg
41532945TrueTEVANYAN VazgenTEVANYAN VVazgen TEVANYANMaleAthleteARMArmeniaArmenia...Armenian, RussianNational: Habetnak Kurghinyan (ARM)“My family did not like wrestling very much. A...NaNNaNNaNNaNNaNNaNWrestling: Men's Freestyle 65kg

5 rows × 37 columns

1.5 Combined entries

1
2
3
# View sample
entry_df.head()

event_keygendercountry_codecountry
0Archery: Men's TeamMCHNChina
1Archery: Men's TeamMCOLColombia
2Archery: Men's TeamMFRAFrance
3Archery: Men's TeamMGBRGreat Britain
4Archery: Men's TeamMINDIndia


2. Medal rates

1
2
3
4
5
6
7
8
9
# Load
all_data = load_object("in/olympic_medals_all_data.p")

medal_df = all_data["medal_df"]
event_df = all_data["event_df"]
team_df = all_data["team_df"]
athlete_df = all_data["athlete_df"]
entry_df = all_data["entry_df"]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Now...we can count entries by country and medals by country

# Entries
country_entry_df = entry_df["country"].value_counts().to_frame().reset_index().rename(columns={"count":"entries"})

# Medals
country_medal_df = medal_df["country"].value_counts().to_frame().reset_index().rename(columns={"count":"total_medals"})
country_medal_detail_df = medal_df[["country", "medal_type"]].value_counts().to_frame().reset_index().rename(columns={"count":"medals"})
country_medal_detail_df = pd.pivot_table(country_medal_detail_df, index="country", columns="medal_type", values="medals")
country_medal_detail_df.columns = ["_".join(x.lower().split(" "))+"s" for x in country_medal_detail_df.columns]
detail_cols = ["gold_medals", "silver_medals", "bronze_medals"]
country_medal_df = pd.merge(country_medal_df, country_medal_detail_df[detail_cols].reset_index(), how="left", on="country")
country_medal_df[detail_cols] = country_medal_df[detail_cols].fillna(0).astype(int)

# Validate medal details
country_medal_df["checksum"] = country_medal_df[detail_cols].sum(axis=1)
country_medal_df["checksum_match"] = country_medal_df["checksum"] == country_medal_df["total_medals"]
assert country_medal_df["checksum_match"].sum() == len(country_medal_df)
country_medal_df.drop(columns=["checksum", "checksum_match"], inplace=True)

# Join
country_df = pd.merge(country_entry_df, country_medal_df, how="left", on="country")
country_df[["total_medals"]+detail_cols] = country_df[["total_medals"]+detail_cols].fillna(0).astype(int)

# Weighted medal count
def calc_weighted_medals(x, weights=[2, 1.5, 1], rate=False):
    gold_sum = x["gold_medals"] * weights[0]
    silver_sum = x["silver_medals"] * weights[1]
    bronze_sum = x["bronze_medals"] * weights[2]
    result = gold_sum + silver_sum + bronze_sum
    if rate: result /= x["entries"]
    return result
country_df["total_medals_weighted"] = country_df.apply(lambda x: calc_weighted_medals(x), axis=1)

# Unweighted medal rate
country_df["medal_rate_unweighted"] = country_df["total_medals"] / country_df["entries"]

# Weighted medal rate
country_df["medal_rate_weighted"] = country_df.apply(lambda x: calc_weighted_medals(x, rate=True), axis=1)

# Add ranks and sort
country_df["medal_rate_unweighted_rank"] = country_df["medal_rate_unweighted"].rank(method="min", ascending=False)
country_df["medal_rate_weighted_rank"] = country_df["medal_rate_weighted"].rank(method="min", ascending=False)
country_df = country_df.sort_values(["medal_rate_weighted_rank", "medal_rate_unweighted_rank"]).reset_index(drop=True)

# Export and view sample
country_df.to_csv(f"out/country_medal_rankings.csv", index=False)
country_df

countryentriestotal_medalsgold_medalssilver_medalsbronze_medalstotal_medals_weightedmedal_rate_unweightedmedal_rate_weightedmedal_rate_unweighted_rankmedal_rate_weighted_rank
0Saint Lucia521103.50.4000000.7000001.01.0
1Dominica411002.00.2500000.5000009.02.0
2DPR Korea1560247.00.4000000.4666671.03.0
3IR Iran411236318.00.2926830.4390245.04.0
4Bahrain1542116.50.2666670.4333336.05.0
....................................
201Solomon Islands200000.00.0000000.00000093.093.0
202Liechtenstein100000.00.0000000.00000093.093.0
203Somalia100000.00.0000000.00000093.093.0
204Belize100000.00.0000000.00000093.093.0
205Nauru100000.00.0000000.00000093.093.0

206 rows × 11 columns



3. Plots

3.1 Medal Wins vs Chances

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "total_medals",
    # Labels
    "title": "Medal Wins vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Medal Wins",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 10,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/medal_wins_vs_chances.html")

3.2 Weighted Medal Wins vs Chances

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "total_medals_weighted",
    # Labels
    "title": "Weighted Medal Wins vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Weighted Medal Wins (Gold = 2, Silver = 1.5, Bronze = 1)",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 10,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/weighted_medal_wins_vs_chances.html")

3.3 Medal Rate vs Chances

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "medal_rate_unweighted",
    # Labels
    "title": "Medal Rate vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Medal Rate",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 0.1,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/medal_rate_vs_chances.html")

3.4 Weighted Medal Rate vs Chances

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "medal_rate_weighted",
    # Labels
    "title": "Weighted Medal Rate vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Weighted Medal Rate (Gold = 2, Silver = 1.5, Bronze = 1)",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 0.1,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/weighted_medal_rate_vs_chances.html")

1


3. Other analysis

1
2
country_df[["entries", "total_medals"]].corr()

entriestotal_medals
entries1.000000.89764
total_medals0.897641.00000
1
2
country_df[["entries", "medal_rate_unweighted", "medal_rate_weighted"]].corr()

entriesmedal_rate_unweightedmedal_rate_weighted
entries1.0000000.3723750.392037
medal_rate_unweighted0.3723751.0000000.976465
medal_rate_weighted0.3920370.9764651.000000
1
This post is licensed under CC BY 4.0 by the author.