Olympic medals
Olympic medal counts are kind of boring - they're highly correlated with team size. Medal rates tell a more interesting story.
0. Setup
1
2
3
4
5
6
7
8
9
10
# Load the autoreload extension
%load_ext autoreload
# Enable autoreloading for all modules
%autoreload 2
# Import local modules
from util_core import *
import olympic_medals.util_olympic_medals as U
1
2
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
1. Data load
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Base loads
all_data = U.load_base_datasets()
medal_df = all_data["medal_df"]
event_df = all_data["event_df"]
team_df = all_data["team_df"]
athlete_df = all_data["athlete_df"]
# Combined entry data
entry_df = U.build_combined_event_entries(team_df, athlete_df)
all_data["entry_df"] = entry_df
# Store
save_object(all_data, "in/olympic_medals_all_data.p")
1
2
3
4
5
6
Notice: Dropped 358/1698 (21.1%) non-current team records.
null_df: 0
Notice: Dropped 3/11113 (0.0%) non-current team records.
Exploded to 14978 records, one per athlete per event entered.
Notice: Dropped 435/14978 (2.9%) invalid event records.
Notice: Dropped 6494/14543 (44.7%) team event records.
1.1 Medals
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Side quest - how many medals were awarded in each event?
count_df = medal_df["event_key"].value_counts().to_frame().reset_index()
# Print summary stats
total_e = len(count_df)
gt3 = len(count_df[count_df["count"]>3])
eq3 = len(count_df[count_df["count"]==3])
lt3 = len(count_df[count_df["count"]<3])
print(f"Isolated a total of {total_e} events awarding {len(medal_df)} cummulative medals...")
print(f"{gt3} ({gt3/total_e*100:.1f}%) events awarded 4+ medals")
print(f"{eq3} ({eq3/total_e*100:.1f}%) events awarded 3 medals")
print(f"{lt3} ({lt3/total_e*100:.1f}%) events awarded <3 medals")
# Look at 57 events with 4+ medals
gt3_df = count_df[count_df["count"]>3].copy()
gt3_df["discipline"] = gt3_df["event_key"].apply(lambda x: x.split(": ")[0])
print(gt3_df["discipline"].value_counts())
# View sample
medal_df.head()
1
2
3
4
5
6
7
8
9
10
11
12
13
Isolated a total of 329 events awarding 1044 cummulative medals...
57 (17.3%) events awarded 4+ medals
272 (82.7%) events awarded 3 medals
0 (0.0%) events awarded <3 medals
discipline
Wrestling 18
Judo 15
Boxing 13
Taekwondo 8
Athletics 1
Canoe Sprint 1
Artistic Gymnastics 1
Name: count, dtype: int64
medal_type | medal_code | medal_date | name | gender | discipline | event | event_type | url_event | code | country_code | country | country_long | event_key | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Gold Medal | 1.0 | 2024-07-27 | Remco EVENEPOEL | M | Cycling Road | Men's Individual Time Trial | ATH | /en/paris-2024/results/cycling-road/men-s-indi... | 1903136 | BEL | Belgium | Belgium | Cycling Road: Men's Individual Time Trial |
1 | Silver Medal | 2.0 | 2024-07-27 | Filippo GANNA | M | Cycling Road | Men's Individual Time Trial | ATH | /en/paris-2024/results/cycling-road/men-s-indi... | 1923520 | ITA | Italy | Italy | Cycling Road: Men's Individual Time Trial |
2 | Bronze Medal | 3.0 | 2024-07-27 | Wout van AERT | M | Cycling Road | Men's Individual Time Trial | ATH | /en/paris-2024/results/cycling-road/men-s-indi... | 1903147 | BEL | Belgium | Belgium | Cycling Road: Men's Individual Time Trial |
3 | Gold Medal | 1.0 | 2024-07-27 | Grace BROWN | W | Cycling Road | Women's Individual Time Trial | ATH | /en/paris-2024/results/cycling-road/women-s-in... | 1940173 | AUS | Australia | Australia | Cycling Road: Women's Individual Time Trial |
4 | Silver Medal | 2.0 | 2024-07-27 | Anna HENDERSON | W | Cycling Road | Women's Individual Time Trial | ATH | /en/paris-2024/results/cycling-road/women-s-in... | 1912525 | GBR | Great Britain | Great Britain | Cycling Road: Women's Individual Time Trial |
1.2 Events
1
2
3
4
5
6
# Print summary stats and view sample
total_e2 = len(event_df)
assert total_e2 == total_e, f"Warning total_e: {total_e}, total_e2: {total_e2}"
print(f"Isolated a total of {total_e2} events...")
event_df.head()
1
Isolated a total of 329 events...
event | tag | sport | sport_code | sport_url | event_key | |
---|---|---|---|---|---|---|
0 | Men's Individual | archery | Archery | ARC | https://olympics.com/en/paris-2024/sports/archery | Archery: Men's Individual |
1 | Women's Individual | archery | Archery | ARC | https://olympics.com/en/paris-2024/sports/archery | Archery: Women's Individual |
2 | Men's Team | archery | Archery | ARC | https://olympics.com/en/paris-2024/sports/archery | Archery: Men's Team |
3 | Women's Team | archery | Archery | ARC | https://olympics.com/en/paris-2024/sports/archery | Archery: Women's Team |
4 | Mixed Team | archery | Archery | ARC | https://olympics.com/en/paris-2024/sports/archery | Archery: Mixed Team |
1.3 Teams
1
2
3
# View sample
team_df.head()
code | current | team | team_gender | country_code | country | country_long | discipline | disciplines_code | events | athletes | coaches | athletes_codes | num_athletes | coaches_codes | num_coaches | event_key | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ARCMTEAM3---CHN01 | True | People's Republic of China | M | CHN | China | People's Republic of China | Archery | ARC | Men's Team | ['KAO Wenchao', 'LI Zhongyuan', 'WANG Yan'] | NaN | ['1913366', '1913367', '1913369'] | 3.0 | NaN | NaN | Archery: Men's Team |
1 | ARCMTEAM3---COL01 | True | Colombia | M | COL | Colombia | Colombia | Archery | ARC | Men's Team | ['ARCILA Santiago', 'ENRIQUEZ Jorge', 'HERNAND... | NaN | ['1935642', '1543412', '1935644'] | 3.0 | NaN | NaN | Archery: Men's Team |
2 | ARCMTEAM3---FRA01 | True | France | M | FRA | France | France | Archery | ARC | Men's Team | ['ADDIS Baptiste', 'CHIRAULT Thomas', 'VALLADO... | NaN | ['1541270', '1541272', '1541275'] | 3.0 | NaN | NaN | Archery: Men's Team |
3 | ARCMTEAM3---GBR01 | True | Great Britain | M | GBR | Great Britain | Great Britain | Archery | ARC | Men's Team | ['HALL Conor', 'HALL Tom', 'WISE Alex'] | NaN | ['1560988', '1560989', '1561003'] | 3.0 | NaN | NaN | Archery: Men's Team |
4 | ARCMTEAM3---IND01 | True | India | M | IND | India | India | Archery | ARC | Men's Team | ['BOMMADEVARA Dhiraj', 'JADHAV Pravin Ramesh',... | NaN | ['1546108', '1546112', '1546110'] | 3.0 | NaN | NaN | Archery: Men's Team |
1.4 Athletes
1
2
3
# View sample
athlete_df.head()
code | current | name | name_short | name_tv | gender | function | country_code | country | country_long | ... | lang | coach | reason | hero | influence | philosophy | sporting_relatives | ritual | other_sports | event_key | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1532872 | True | ALEKSANYAN Artur | ALEKSANYAN A | Artur ALEKSANYAN | Male | Athlete | ARM | Armenia | Armenia | ... | Armenian, English, Russian | Gevorg Aleksanyan (ARM), father | He followed his father and his uncle into the ... | Footballer Zinedine Zidane (FRA), World Cup wi... | His father, Gevorg Aleksanyan | "Wrestling is my life." (mediamax.am. 18 May 2... | NaN | NaN | NaN | Wrestling: Men's Greco-Roman 97kg |
1 | 1532873 | True | AMOYAN Malkhas | AMOYAN M | Malkhas AMOYAN | Male | Athlete | ARM | Armenia | Armenia | ... | Armenian | NaN | NaN | NaN | NaN | "To become a good athlete, you first have to b... | Uncle, Roman Amoyan (wrestling), 2008 Olympic ... | NaN | NaN | Wrestling: Men's Greco-Roman 77kg |
2 | 1532874 | True | GALSTYAN Slavik | GALSTYAN S | Slavik GALSTYAN | Male | Athlete | ARM | Armenia | Armenia | ... | Armenian | Personal: Martin Alekhanyan (ARM).<br>National... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Wrestling: Men's Greco-Roman 67kg |
3 | 1532944 | True | HARUTYUNYAN Arsen | HARUTYUNYAN A | Arsen HARUTYUNYAN | Male | Athlete | ARM | Armenia | Armenia | ... | Armenian | National: Habetnak Kurghinyan | While doing karate he noticed wrestlers traini... | Wrestler Armen Nazaryan (ARM, BUL), two-time O... | NaN | “Nothing is impossible, set goals in front of ... | NaN | NaN | NaN | Wrestling: Men's Freestyle 57kg |
4 | 1532945 | True | TEVANYAN Vazgen | TEVANYAN V | Vazgen TEVANYAN | Male | Athlete | ARM | Armenia | Armenia | ... | Armenian, Russian | National: Habetnak Kurghinyan (ARM) | “My family did not like wrestling very much. A... | NaN | NaN | NaN | NaN | NaN | NaN | Wrestling: Men's Freestyle 65kg |
5 rows × 37 columns
1.5 Combined entries
1
2
3
# View sample
entry_df.head()
event_key | gender | country_code | country | |
---|---|---|---|---|
0 | Archery: Men's Team | M | CHN | China |
1 | Archery: Men's Team | M | COL | Colombia |
2 | Archery: Men's Team | M | FRA | France |
3 | Archery: Men's Team | M | GBR | Great Britain |
4 | Archery: Men's Team | M | IND | India |
2. Medal rates
1
2
3
4
5
6
7
8
9
# Load
all_data = load_object("in/olympic_medals_all_data.p")
medal_df = all_data["medal_df"]
event_df = all_data["event_df"]
team_df = all_data["team_df"]
athlete_df = all_data["athlete_df"]
entry_df = all_data["entry_df"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Now...we can count entries by country and medals by country
# Entries
country_entry_df = entry_df["country"].value_counts().to_frame().reset_index().rename(columns={"count":"entries"})
# Medals
country_medal_df = medal_df["country"].value_counts().to_frame().reset_index().rename(columns={"count":"total_medals"})
country_medal_detail_df = medal_df[["country", "medal_type"]].value_counts().to_frame().reset_index().rename(columns={"count":"medals"})
country_medal_detail_df = pd.pivot_table(country_medal_detail_df, index="country", columns="medal_type", values="medals")
country_medal_detail_df.columns = ["_".join(x.lower().split(" "))+"s" for x in country_medal_detail_df.columns]
detail_cols = ["gold_medals", "silver_medals", "bronze_medals"]
country_medal_df = pd.merge(country_medal_df, country_medal_detail_df[detail_cols].reset_index(), how="left", on="country")
country_medal_df[detail_cols] = country_medal_df[detail_cols].fillna(0).astype(int)
# Validate medal details
country_medal_df["checksum"] = country_medal_df[detail_cols].sum(axis=1)
country_medal_df["checksum_match"] = country_medal_df["checksum"] == country_medal_df["total_medals"]
assert country_medal_df["checksum_match"].sum() == len(country_medal_df)
country_medal_df.drop(columns=["checksum", "checksum_match"], inplace=True)
# Join
country_df = pd.merge(country_entry_df, country_medal_df, how="left", on="country")
country_df[["total_medals"]+detail_cols] = country_df[["total_medals"]+detail_cols].fillna(0).astype(int)
# Weighted medal count
def calc_weighted_medals(x, weights=[2, 1.5, 1], rate=False):
gold_sum = x["gold_medals"] * weights[0]
silver_sum = x["silver_medals"] * weights[1]
bronze_sum = x["bronze_medals"] * weights[2]
result = gold_sum + silver_sum + bronze_sum
if rate: result /= x["entries"]
return result
country_df["total_medals_weighted"] = country_df.apply(lambda x: calc_weighted_medals(x), axis=1)
# Unweighted medal rate
country_df["medal_rate_unweighted"] = country_df["total_medals"] / country_df["entries"]
# Weighted medal rate
country_df["medal_rate_weighted"] = country_df.apply(lambda x: calc_weighted_medals(x, rate=True), axis=1)
# Add ranks and sort
country_df["medal_rate_unweighted_rank"] = country_df["medal_rate_unweighted"].rank(method="min", ascending=False)
country_df["medal_rate_weighted_rank"] = country_df["medal_rate_weighted"].rank(method="min", ascending=False)
country_df = country_df.sort_values(["medal_rate_weighted_rank", "medal_rate_unweighted_rank"]).reset_index(drop=True)
# Export and view sample
country_df.to_csv(f"out/country_medal_rankings.csv", index=False)
country_df
country | entries | total_medals | gold_medals | silver_medals | bronze_medals | total_medals_weighted | medal_rate_unweighted | medal_rate_weighted | medal_rate_unweighted_rank | medal_rate_weighted_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Saint Lucia | 5 | 2 | 1 | 1 | 0 | 3.5 | 0.400000 | 0.700000 | 1.0 | 1.0 |
1 | Dominica | 4 | 1 | 1 | 0 | 0 | 2.0 | 0.250000 | 0.500000 | 9.0 | 2.0 |
2 | DPR Korea | 15 | 6 | 0 | 2 | 4 | 7.0 | 0.400000 | 0.466667 | 1.0 | 3.0 |
3 | IR Iran | 41 | 12 | 3 | 6 | 3 | 18.0 | 0.292683 | 0.439024 | 5.0 | 4.0 |
4 | Bahrain | 15 | 4 | 2 | 1 | 1 | 6.5 | 0.266667 | 0.433333 | 6.0 | 5.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
201 | Solomon Islands | 2 | 0 | 0 | 0 | 0 | 0.0 | 0.000000 | 0.000000 | 93.0 | 93.0 |
202 | Liechtenstein | 1 | 0 | 0 | 0 | 0 | 0.0 | 0.000000 | 0.000000 | 93.0 | 93.0 |
203 | Somalia | 1 | 0 | 0 | 0 | 0 | 0.0 | 0.000000 | 0.000000 | 93.0 | 93.0 |
204 | Belize | 1 | 0 | 0 | 0 | 0 | 0.0 | 0.000000 | 0.000000 | 93.0 | 93.0 |
205 | Nauru | 1 | 0 | 0 | 0 | 0 | 0.0 | 0.000000 | 0.000000 | 93.0 | 93.0 |
206 rows × 11 columns
3. Plots
3.1 Medal Wins vs Chances
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
# Data
"name_col": "country",
"size_col": "entries",
"color_col": "entries",
"x_axis_col": "entries",
"y_axis_col": "total_medals",
# Labels
"title": "Medal Wins vs Chances",
"x_axis_title": "Medal Chances",
"y_axis_title": "Medal Wins",
# Formatting
"min_point_size": 5,
"max_point_size": 30,
"x_axis_dtick": 20,
"y_axis_dtick": 10,
"width": 800,
"height": 800
}
# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/medal_wins_vs_chances.html")
3.2 Weighted Medal Wins vs Chances
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
# Data
"name_col": "country",
"size_col": "entries",
"color_col": "entries",
"x_axis_col": "entries",
"y_axis_col": "total_medals_weighted",
# Labels
"title": "Weighted Medal Wins vs Chances",
"x_axis_title": "Medal Chances",
"y_axis_title": "Weighted Medal Wins (Gold = 2, Silver = 1.5, Bronze = 1)",
# Formatting
"min_point_size": 5,
"max_point_size": 30,
"x_axis_dtick": 20,
"y_axis_dtick": 10,
"width": 800,
"height": 800
}
# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/weighted_medal_wins_vs_chances.html")
3.3 Medal Rate vs Chances
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
# Data
"name_col": "country",
"size_col": "entries",
"color_col": "entries",
"x_axis_col": "entries",
"y_axis_col": "medal_rate_unweighted",
# Labels
"title": "Medal Rate vs Chances",
"x_axis_title": "Medal Chances",
"y_axis_title": "Medal Rate",
# Formatting
"min_point_size": 5,
"max_point_size": 30,
"x_axis_dtick": 20,
"y_axis_dtick": 0.1,
"width": 800,
"height": 800
}
# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/medal_rate_vs_chances.html")
3.4 Weighted Medal Rate vs Chances
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Setup
config = {
# Data
"name_col": "country",
"size_col": "entries",
"color_col": "entries",
"x_axis_col": "entries",
"y_axis_col": "medal_rate_weighted",
# Labels
"title": "Weighted Medal Rate vs Chances",
"x_axis_title": "Medal Chances",
"y_axis_title": "Weighted Medal Rate (Gold = 2, Silver = 1.5, Bronze = 1)",
# Formatting
"min_point_size": 5,
"max_point_size": 30,
"x_axis_dtick": 20,
"y_axis_dtick": 0.1,
"width": 800,
"height": 800
}
# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/weighted_medal_rate_vs_chances.html")
1
3. Other analysis
1
2
country_df[["entries", "total_medals"]].corr()
entries | total_medals | |
---|---|---|
entries | 1.00000 | 0.89764 |
total_medals | 0.89764 | 1.00000 |
1
2
country_df[["entries", "medal_rate_unweighted", "medal_rate_weighted"]].corr()
entries | medal_rate_unweighted | medal_rate_weighted | |
---|---|---|---|
entries | 1.000000 | 0.372375 | 0.392037 |
medal_rate_unweighted | 0.372375 | 1.000000 | 0.976465 |
medal_rate_weighted | 0.392037 | 0.976465 | 1.000000 |
1
This post is licensed under CC BY 4.0 by the author.