Olympic medals

Olympic medal counts are kind of boring - they're highly correlated with team size. Medal rates tell a more interesting story.

Posted Oct 15, 2024

By DT Data

10 min read

0. Setup

  
# Load the autoreload extension
%load_ext autoreload

# Enable autoreloading for all modules
%autoreload 2

# Import local modules
from util_core import *
import olympic_medals.util_olympic_medals as U

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

1. Data load

  
# Base loads
all_data = U.load_base_datasets()

medal_df = all_data["medal_df"]
event_df = all_data["event_df"]
team_df = all_data["team_df"]
athlete_df = all_data["athlete_df"]

# Combined entry data
entry_df = U.build_combined_event_entries(team_df, athlete_df)
all_data["entry_df"] = entry_df

# Store
save_object(all_data, "in/olympic_medals_all_data.p")

Notice: Dropped 358/1698 (21.1%) non-current team records.
null_df: 0
Notice: Dropped 3/11113 (0.0%) non-current team records.
Exploded to 14978 records, one per athlete per event entered.
Notice: Dropped 435/14978 (2.9%) invalid event records.
Notice: Dropped 6494/14543 (44.7%) team event records.

1.1 Medals

  
# Side quest - how many medals were awarded in each event?
count_df = medal_df["event_key"].value_counts().to_frame().reset_index()

# Print summary stats
total_e = len(count_df)
gt3 = len(count_df[count_df["count"]>3])
eq3 = len(count_df[count_df["count"]==3])
lt3 = len(count_df[count_df["count"]<3])
print(f"Isolated a total of {total_e} events awarding {len(medal_df)} cummulative medals...")
print(f"{gt3} ({gt3/total_e*100:.1f}%) events awarded 4+ medals")
print(f"{eq3} ({eq3/total_e*100:.1f}%) events awarded 3 medals")
print(f"{lt3} ({lt3/total_e*100:.1f}%) events awarded <3 medals")

# Look at 57 events with 4+ medals
gt3_df = count_df[count_df["count"]>3].copy()
gt3_df["discipline"] = gt3_df["event_key"].apply(lambda x: x.split(": ")[0])
print(gt3_df["discipline"].value_counts())

# View sample
medal_df.head()

Isolated a total of 329 events awarding 1044 cummulative medals...
57 (17.3%) events awarded 4+ medals
272 (82.7%) events awarded 3 medals
0 (0.0%) events awarded <3 medals
discipline
Wrestling              18
Judo                   15
Boxing                 13
Taekwondo               8
Athletics               1
Canoe Sprint            1
Artistic Gymnastics     1
Name: count, dtype: int64

	medal_type	medal_code	medal_date	name	gender	discipline	event	event_type	url_event	code	country_code	country	country_long	event_key
0	Gold Medal	1.0	2024-07-27	Remco EVENEPOEL	M	Cycling Road	Men's Individual Time Trial	ATH	/en/paris-2024/results/cycling-road/men-s-indi...	1903136	BEL	Belgium	Belgium	Cycling Road: Men's Individual Time Trial
1	Silver Medal	2.0	2024-07-27	Filippo GANNA	M	Cycling Road	Men's Individual Time Trial	ATH	/en/paris-2024/results/cycling-road/men-s-indi...	1923520	ITA	Italy	Italy	Cycling Road: Men's Individual Time Trial
2	Bronze Medal	3.0	2024-07-27	Wout van AERT	M	Cycling Road	Men's Individual Time Trial	ATH	/en/paris-2024/results/cycling-road/men-s-indi...	1903147	BEL	Belgium	Belgium	Cycling Road: Men's Individual Time Trial
3	Gold Medal	1.0	2024-07-27	Grace BROWN	W	Cycling Road	Women's Individual Time Trial	ATH	/en/paris-2024/results/cycling-road/women-s-in...	1940173	AUS	Australia	Australia	Cycling Road: Women's Individual Time Trial
4	Silver Medal	2.0	2024-07-27	Anna HENDERSON	W	Cycling Road	Women's Individual Time Trial	ATH	/en/paris-2024/results/cycling-road/women-s-in...	1912525	GBR	Great Britain	Great Britain	Cycling Road: Women's Individual Time Trial

1.2 Events

  
# Print summary stats and view sample
total_e2 = len(event_df)
assert total_e2 == total_e, f"Warning total_e: {total_e}, total_e2: {total_e2}"
print(f"Isolated a total of {total_e2} events...")
event_df.head()

Isolated a total of 329 events...

	event	tag	sport	sport_code	sport_url	event_key
0	Men's Individual	archery	Archery	ARC	https://olympics.com/en/paris-2024/sports/archery	Archery: Men's Individual
1	Women's Individual	archery	Archery	ARC	https://olympics.com/en/paris-2024/sports/archery	Archery: Women's Individual
2	Men's Team	archery	Archery	ARC	https://olympics.com/en/paris-2024/sports/archery	Archery: Men's Team
3	Women's Team	archery	Archery	ARC	https://olympics.com/en/paris-2024/sports/archery	Archery: Women's Team
4	Mixed Team	archery	Archery	ARC	https://olympics.com/en/paris-2024/sports/archery	Archery: Mixed Team

1.3 Teams

  
# View sample
team_df.head()

	code	current	team	team_gender	country_code	country	country_long	discipline	disciplines_code	events	athletes	coaches	athletes_codes	num_athletes	coaches_codes	num_coaches	event_key
0	ARCMTEAM3---CHN01	True	People's Republic of China	M	CHN	China	People's Republic of China	Archery	ARC	Men's Team	['KAO Wenchao', 'LI Zhongyuan', 'WANG Yan']	NaN	['1913366', '1913367', '1913369']	3.0	NaN	NaN	Archery: Men's Team
1	ARCMTEAM3---COL01	True	Colombia	M	COL	Colombia	Colombia	Archery	ARC	Men's Team	['ARCILA Santiago', 'ENRIQUEZ Jorge', 'HERNAND...	NaN	['1935642', '1543412', '1935644']	3.0	NaN	NaN	Archery: Men's Team
2	ARCMTEAM3---FRA01	True	France	M	FRA	France	France	Archery	ARC	Men's Team	['ADDIS Baptiste', 'CHIRAULT Thomas', 'VALLADO...	NaN	['1541270', '1541272', '1541275']	3.0	NaN	NaN	Archery: Men's Team
3	ARCMTEAM3---GBR01	True	Great Britain	M	GBR	Great Britain	Great Britain	Archery	ARC	Men's Team	['HALL Conor', 'HALL Tom', 'WISE Alex']	NaN	['1560988', '1560989', '1561003']	3.0	NaN	NaN	Archery: Men's Team
4	ARCMTEAM3---IND01	True	India	M	IND	India	India	Archery	ARC	Men's Team	['BOMMADEVARA Dhiraj', 'JADHAV Pravin Ramesh',...	NaN	['1546108', '1546112', '1546110']	3.0	NaN	NaN	Archery: Men's Team

1.4 Athletes

  
# View sample
athlete_df.head()

	code	current	name	name_short	name_tv	gender	function	country_code	country	country_long	...	lang	coach	reason	hero	influence	philosophy	sporting_relatives	ritual	other_sports	event_key
0	1532872	True	ALEKSANYAN Artur	ALEKSANYAN A	Artur ALEKSANYAN	Male	Athlete	ARM	Armenia	Armenia	...	Armenian, English, Russian	Gevorg Aleksanyan (ARM), father	He followed his father and his uncle into the ...	Footballer Zinedine Zidane (FRA), World Cup wi...	His father, Gevorg Aleksanyan	"Wrestling is my life." (mediamax.am. 18 May 2...	NaN	NaN	NaN	Wrestling: Men's Greco-Roman 97kg
1	1532873	True	AMOYAN Malkhas	AMOYAN M	Malkhas AMOYAN	Male	Athlete	ARM	Armenia	Armenia	...	Armenian	NaN	NaN	NaN	NaN	"To become a good athlete, you first have to b...	Uncle, Roman Amoyan (wrestling), 2008 Olympic ...	NaN	NaN	Wrestling: Men's Greco-Roman 77kg
2	1532874	True	GALSTYAN Slavik	GALSTYAN S	Slavik GALSTYAN	Male	Athlete	ARM	Armenia	Armenia	...	Armenian	Personal: Martin Alekhanyan (ARM).<br>National...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Wrestling: Men's Greco-Roman 67kg
3	1532944	True	HARUTYUNYAN Arsen	HARUTYUNYAN A	Arsen HARUTYUNYAN	Male	Athlete	ARM	Armenia	Armenia	...	Armenian	National: Habetnak Kurghinyan	While doing karate he noticed wrestlers traini...	Wrestler Armen Nazaryan (ARM, BUL), two-time O...	NaN	“Nothing is impossible, set goals in front of ...	NaN	NaN	NaN	Wrestling: Men's Freestyle 57kg
4	1532945	True	TEVANYAN Vazgen	TEVANYAN V	Vazgen TEVANYAN	Male	Athlete	ARM	Armenia	Armenia	...	Armenian, Russian	National: Habetnak Kurghinyan (ARM)	“My family did not like wrestling very much. A...	NaN	NaN	NaN	NaN	NaN	NaN	Wrestling: Men's Freestyle 65kg

5 rows × 37 columns

1.5 Combined entries

  
# View sample
entry_df.head()

	event_key	gender	country_code	country
0	Archery: Men's Team	M	CHN	China
1	Archery: Men's Team	M	COL	Colombia
2	Archery: Men's Team	M	FRA	France
3	Archery: Men's Team	M	GBR	Great Britain
4	Archery: Men's Team	M	IND	India

2. Medal rates

  
# Load
all_data = load_object("in/olympic_medals_all_data.p")

medal_df = all_data["medal_df"]
event_df = all_data["event_df"]
team_df = all_data["team_df"]
athlete_df = all_data["athlete_df"]
entry_df = all_data["entry_df"]

  
# Now...we can count entries by country and medals by country

# Entries
country_entry_df = entry_df["country"].value_counts().to_frame().reset_index().rename(columns={"count":"entries"})

# Medals
country_medal_df = medal_df["country"].value_counts().to_frame().reset_index().rename(columns={"count":"total_medals"})
country_medal_detail_df = medal_df[["country", "medal_type"]].value_counts().to_frame().reset_index().rename(columns={"count":"medals"})
country_medal_detail_df = pd.pivot_table(country_medal_detail_df, index="country", columns="medal_type", values="medals")
country_medal_detail_df.columns = ["_".join(x.lower().split(" "))+"s" for x in country_medal_detail_df.columns]
detail_cols = ["gold_medals", "silver_medals", "bronze_medals"]
country_medal_df = pd.merge(country_medal_df, country_medal_detail_df[detail_cols].reset_index(), how="left", on="country")
country_medal_df[detail_cols] = country_medal_df[detail_cols].fillna(0).astype(int)

# Validate medal details
country_medal_df["checksum"] = country_medal_df[detail_cols].sum(axis=1)
country_medal_df["checksum_match"] = country_medal_df["checksum"] == country_medal_df["total_medals"]
assert country_medal_df["checksum_match"].sum() == len(country_medal_df)
country_medal_df.drop(columns=["checksum", "checksum_match"], inplace=True)

# Join
country_df = pd.merge(country_entry_df, country_medal_df, how="left", on="country")
country_df[["total_medals"]+detail_cols] = country_df[["total_medals"]+detail_cols].fillna(0).astype(int)

# Weighted medal count
def calc_weighted_medals(x, weights=[2, 1.5, 1], rate=False):
    gold_sum = x["gold_medals"] * weights[0]
    silver_sum = x["silver_medals"] * weights[1]
    bronze_sum = x["bronze_medals"] * weights[2]
    result = gold_sum + silver_sum + bronze_sum
    if rate: result /= x["entries"]
    return result
country_df["total_medals_weighted"] = country_df.apply(lambda x: calc_weighted_medals(x), axis=1)

# Unweighted medal rate
country_df["medal_rate_unweighted"] = country_df["total_medals"] / country_df["entries"]

# Weighted medal rate
country_df["medal_rate_weighted"] = country_df.apply(lambda x: calc_weighted_medals(x, rate=True), axis=1)

# Add ranks and sort
country_df["medal_rate_unweighted_rank"] = country_df["medal_rate_unweighted"].rank(method="min", ascending=False)
country_df["medal_rate_weighted_rank"] = country_df["medal_rate_weighted"].rank(method="min", ascending=False)
country_df = country_df.sort_values(["medal_rate_weighted_rank", "medal_rate_unweighted_rank"]).reset_index(drop=True)

# Export and view sample
country_df.to_csv(f"out/country_medal_rankings.csv", index=False)
country_df

	country	entries	total_medals	gold_medals	silver_medals	bronze_medals	total_medals_weighted	medal_rate_unweighted	medal_rate_weighted	medal_rate_unweighted_rank	medal_rate_weighted_rank
0	Saint Lucia	5	2	1	1	0	3.5	0.400000	0.700000	1.0	1.0
1	Dominica	4	1	1	0	0	2.0	0.250000	0.500000	9.0	2.0
2	DPR Korea	15	6	0	2	4	7.0	0.400000	0.466667	1.0	3.0
3	IR Iran	41	12	3	6	3	18.0	0.292683	0.439024	5.0	4.0
4	Bahrain	15	4	2	1	1	6.5	0.266667	0.433333	6.0	5.0
...	...	...	...	...	...	...	...	...	...	...	...
201	Solomon Islands	2	0	0	0	0	0.0	0.000000	0.000000	93.0	93.0
202	Liechtenstein	1	0	0	0	0	0.0	0.000000	0.000000	93.0	93.0
203	Somalia	1	0	0	0	0	0.0	0.000000	0.000000	93.0	93.0
204	Belize	1	0	0	0	0	0.0	0.000000	0.000000	93.0	93.0
205	Nauru	1	0	0	0	0	0.0	0.000000	0.000000	93.0	93.0

206 rows × 11 columns

3. Plots

3.1 Medal Wins vs Chances

  
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "total_medals",
    # Labels
    "title": "Medal Wins vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Medal Wins",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 10,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/medal_wins_vs_chances.html")

3.2 Weighted Medal Wins vs Chances

  
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "total_medals_weighted",
    # Labels
    "title": "Weighted Medal Wins vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Weighted Medal Wins (Gold = 2, Silver = 1.5, Bronze = 1)",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 10,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/weighted_medal_wins_vs_chances.html")

3.3 Medal Rate vs Chances

  
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "medal_rate_unweighted",
    # Labels
    "title": "Medal Rate vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Medal Rate",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 0.1,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/medal_rate_vs_chances.html")

3.4 Weighted Medal Rate vs Chances

  
# Setup
config = {
    # Data
    "name_col": "country",
    "size_col": "entries",
    "color_col": "entries",
    "x_axis_col": "entries",
    "y_axis_col": "medal_rate_weighted",
    # Labels
    "title": "Weighted Medal Rate vs Chances",
    "x_axis_title": "Medal Chances",
    "y_axis_title": "Weighted Medal Rate (Gold = 2, Silver = 1.5, Bronze = 1)",
    # Formatting
    "min_point_size": 5,
    "max_point_size": 30,
    "x_axis_dtick": 20,
    "y_axis_dtick": 0.1,
    "width": 800,
    "height": 800
}

# Plot and save html version
fig = plotly_scatter_v1(country_df, config)
fig.write_html("md/olympic_medals/weighted_medal_rate_vs_chances.html")

3. Other analysis

  
country_df[["entries", "total_medals"]].corr()

	entries	total_medals
entries	1.00000	0.89764
total_medals	0.89764	1.00000

  
country_df[["entries", "medal_rate_unweighted", "medal_rate_weighted"]].corr()

	entries	medal_rate_unweighted	medal_rate_weighted
entries	1.000000	0.372375	0.392037
medal_rate_unweighted	0.372375	1.000000	0.976465
medal_rate_weighted	0.392037	0.976465	1.000000

Sports, Olympics

This post is licensed under CC BY 4.0 by the author.