[{"body":"","link":"https://miguswong.github.io/","section":"","tags":null,"title":""},{"body":"","link":"https://miguswong.github.io/tags/data-analysis/","section":"tags","tags":null,"title":"Data Analysis"},{"body":"","link":"https://miguswong.github.io/tags/data-visualization/","section":"tags","tags":null,"title":"Data Visualization"},{"body":"","link":"https://miguswong.github.io/tags/index/","section":"tags","tags":null,"title":"Index"},{"body":" Project Background The goal of this project is to investigate Ironman 140.6 race/performance data in order to provide recommendations on potential future budget marketing strategies.\nThe IRONMAN Group is a global organization that operates a wide range of endurance sports events. They are best known for their IRONMAN Triathlon Series, which includes full-distance and half-distance triathlons. The full IRONMAN race consists of a 2.4-mile swim, a 112-mile bike ride, and a 26.2-mile marathon run. This group is owned by Advance, a private, family-owned business, and Orkila Capital, a growth equity firm. They have grown significantly since their inception in 1978 and now host events in over 55 countries.\nThe goal of this data analysis is to identify potential emerging demographic trends based on participant data.\nUseful Resources Python Analysis Notebook: here Kaggle Dataset: here Webscraping Process: here\nDataset Structure The process of data collection can be further explored in this repository\nExecutive Summary Ironman participation continues to recover from pre-pandemic levels in European and North American locations. However, Asia has already reached pre-pandemic levels of participants and was the only continent to have consistently positive increase in participants since 2021. Competing in Ironman competitions is at an all-time high for younger age groups. Both 18-24 and 25-29 age groups had more participants in 2024 compared to any year previously; including 2014 when Ironman was at the height of its popularity. Insights Deep-Dive Shifts in the Global Market Ironman races reached peak participant saturation around 2014 and saw steady participant numbers (which could be indicative that there was larger demand to participate in these races than there were spots available). However, with the COVID-19 Pandemic in 2020, a majority of races were shut down and deferred to later. Participants and races did not rebound until 2022 which saw a sharp increase in the number of races but participants remained lower than pre-pandemic levels. The most likely explanation here is that the effects of COVID-19 were still being felt; races that were supposed to happen in 2020 happened later in 2021 or 2022 and there were a significant portion of customers who opted for a ticket refund rather than a deferred entry into their Ironman Race.\n2023 was the first year that operations were somewhat returned to \u0026quot;normalcy\u0026quot; and deferred races have occurred. Total participants have not yet recovered back to previous levels but there are indications that triathlon racing is returning to its original popularity. While the total number of Ironman Races offerings have decreased, Europe saw the largest decrease in race offerings decreasing from 18 races in 2023 to just 14 this year. Despite this, there are no signs that Ironman is struggling to sell entries as for the number of races offered considering that the total number of participants in Ironman 2024 was nearly 5% increase from the previous year. This is reflected by the sharp increase of participants per race for Europe in the graph below.\nWhile Europe and North America have continued to dominate a large portion of Ironman Races in series, they are still nowhere near pre-pandemic levels. This is not the same story for Asia which could be seen as a quickly growing market. Despite having little to no change in the total number of offered races (from 7 to 8 races per year for the past 10 decade), Asia not only recovered but surpassed total participant levels even prior to Ironman's 2014 peak. This is likely a result of a quickly growing middle-class who have more expendable assets to participate in these types of athleisure events.\nYear Continent Total Races Participant Count % Change in Participants 2022 Asia Pacific 7 5191 2023 Asia Pacific 8 6739 29.82% 2024 Asia Pacific 8 7467 10.80% ------ --------------- ------------- ------------------- -------------------------- 2022 Europe 24 34383 2023 Europe 18 26456 -23.05% 2024 Europe 14 27946 5.63% ------ --------------- ------------- ------------------- -------------------------- 2022 North America 15 23885 2023 North America 11 17324 -27.47% 2024 North America 10 17609 1.65% Changes in Age Demographic Customers Competing in Ironman competitions is at an all-time high for younger age groups. Both 18-24 and 25-29 age groups had more participants in 2024 compared to any year previously; including 2014 when Ironman was at the height of its popularity - on the basis of participants. What has been generally (and really still is) a field dominated by athletes 30+, there appears to be a shift to younger generations investing their time and resources towards endurance events such as Ironman. Year Division Count Change 2024 M18-24 2107 54.4% 2024 M80-84 15 36.4% 2024 M25-29 4559 28.3% 2024 M70-74 212 18.4% 2024 M30-34 6159 11.7% 2024 M60-64 2037 9.8% 2024 M65-69 644 8.1% 2024 M55-59 3912 2.7% 2024 M35-39 6264 1.3% 2024 M50-54 6410 -2.6% 2024 M40-44 7084 -3.3% 2024 M45-49 6451 -6.0% 2024 M75-79 43 -8.5% Recommendations Make Ironman races more accessible to younger generations: Younger generations show a growing interest in participating in Ironman races, particularly the 18-25 age group. While other age groups are seeing a decline, this demographic is expanding. However, the high cost of participation, with registration fees around $1,000, can be a significant barrier for these young athletes. This financial hurdle may deter many from participating despite their interest. Host more Events in Asia: Asian Markets show strong potential for continued increases in participation and should be a larger focus going forward. Being able to host more opportunites in Asia not only will bring exposure to the race itself, but could also be seen as an attractive offering for athletes looking for a \u0026quot;vacation race\u0026quot; destination. Especially those from Europe and North America. Clarifying Questions, Assumptions, and Caveats Official Results from the Ironman Website were not used in this analysis. Instead, a proxy website was scraped to obtain all the triathalon results (here). ","link":"https://miguswong.github.io/post/ironmandataanalysis/","section":"post","tags":["Personal Projects","Python","Data Analysis","Data Visualization","Sports"],"title":"Ironman Triathalon Data - Market Insights"},{"body":"","link":"https://miguswong.github.io/tags/personal-projects/","section":"tags","tags":null,"title":"Personal Projects"},{"body":"","link":"https://miguswong.github.io/post/","section":"post","tags":["index"],"title":"Posts"},{"body":"","link":"https://miguswong.github.io/tags/python/","section":"tags","tags":null,"title":"Python"},{"body":"","link":"https://miguswong.github.io/tags/sports/","section":"tags","tags":null,"title":"Sports"},{"body":"","link":"https://miguswong.github.io/tags/","section":"tags","tags":null,"title":"Tags"},{"body":"","link":"https://miguswong.github.io/tags/webscraping/","section":"tags","tags":null,"title":"Webscraping"},{"body":"Introduction Ironman race results data was scraped from a third-party website for the purpose of EDA. All the data and code used for extracting results data can be found in the following links:\nGitHub Repository here\nKaggle Dataset here\nThe following files contained in the Github (mainly the Jupyter notebook and Python script) were used to scrape 140.6 Ironman race results ranging from 2002 to 2024 (as of 12-05-2024). Note, the data was not scraped from the official Ironman website, but a proxy-website not owned by Ironman.\nThe notebook and scripts were designed in a way that generated 3 CSVs that follow the format of a standard relational database and can be joined together utilizing various IDs and could be readily uploaded to a SQL database.\nHow to run Web scraping begins in IronMan Scraping (Scrape Only).ipynb. All cells should be run in order from top to bottom and should result in series and race data being generated.\nThe case for parallel processing Individual race results data is dynamically loaded onto the webpage meaning that BeautifulSoup4, which was used to scrape other information from the website, cannot be used here. Instead, Selenium allows for automated browser testing (and in this scenario, actual data loading). The main caveat here is that the process of dynamically loading result data and launching a physical browser onto the computer is not only slow but painfully slow testing showed that on average, Selenium was able to process ~12 rows of data per second and with over 1,000,000 rows of data, this process would have taken around 25 hours to complete furthermore, utilizing the notebook to scrape results data meant that at any point if the script failed, the cell would have to be restarted which could be problematic especially with these long processing times.\nInstead, a Python script utilizing the subprocess package was utilized to launch multiple browsers and scrape races simultaneously. There are 2 .py files master.py and worker.py.\nMaster.py functions as the coordinator of web scraping and launches the instances of worker.py. Depending on the number of subprocesses requested, the script will partition the work equally (based on the number of races, not the number of rows to scrape) and kick off instances of worker.py. In the event that there are races that have already been scraped and a CSV file has already been generated, master.py will skip these races and removed them from consideration for scraping.\nWorker.py is the script that actually handles browser launching, web scraping, and file generation. The script is passed index information from Master.py for what races the script will be responsible for and iterates through the list. There are instances in which Selenium may unexpectedly fail to scrape web pages and the worker instance may immediately exit. In these scenarios, master.py was rerun.\nBelow is an example of how the script can be used along with how to specify how many worker (in this case, 8) instances you want:\n1python master.py --num_workers 8 Note, that increasing the number of workers does not necessarily linearly increase web scraping performance. There are diminishing returns for launching more browsers simultaneously, as around 10 workers dropped individual worker performance down to ~5 rows/second. Worker scripts will generate CSV files and place them in the following file path: \u0026quot;./IronManData/raceResultsData/\u0026quot;. The Python notebook contains code to combine all these CSVs and place them in the same file path as the races and series data.\n1#After each individual csv data has been created, they need to be combined into a single csv as our \u0026#34;master\u0026#34; csv 2 3# Directory containing the CSV files 4directory = \u0026#39;./IronManData/raceResultsData\u0026#39; 5 6# Initialize an empty list to store individual DataFrames 7data_frames = [] 8 9# Iterate through all CSV files in the directory 10for filename in os.listdir(directory): 11 if filename.endswith(\u0026#39;.csv\u0026#39;): 12 file_path = os.path.join(directory, filename) 13 # Read the CSV file 14 df = pd.read_csv(file_path) 15 # Append the DataFrame to the list 16 data_frames.append(df) 17 18# Concatenate all DataFrames in the list 19combined_df = pd.concat(data_frames, ignore_index=True) 20 21# Write the combined DataFrame to a new CSV file 22combined_df.to_csv(\u0026#39;./IronManData/sql/results.csv\u0026#39;, index=False) 23 24print(\u0026#34;All CSV files combined successfully!\u0026#34;) Resulting Dataset 3 CSVs should be generated at this point and should contain all the relevant information about Ironman results from 2002-2024.\nSeries.csv\nColumn Description id Unique identifier for the series (Primary Key) location Location of the series continent Continent where the series is held link URL link to the series details Races.csv\nColumn Description year Year of the race link URL link to the race details totalkonaSlots Total Kona slots available maleKonaSlots Kona slots available for males femaleKonaSlots Kona slots available for females male1st Time of the first male finisher female1st Time of the first female finisher finishers Total number of finishers dnf Number of Did Not Finish (DNF) dq Number of Disqualifications (DQ) id Unique identifier for the race (Primary Key) seriesID Identifier for the series (Foreign Key) Results.csv\nColumn Description bib Bib number of the participant name Name of the participant athleteLink URL link to the athlete's profile country Country of the participant gender Gender of the participant division Division category of the participant divLink URL link to the division details divisionRank Rank of the participant in their division overallTime Total time taken by the participant overallRank Overall rank of the participant swimTime Swim time of the participant swimRank Swim rank of the participant bikeTime Bike time of the participant bikeRank Bike rank of the participant runTime Run time of the participant runRank Run rank of the participant finishStatus Finish status of the participant dnf Did Not Finish status raceID Identifier for the race (Foreign Key) athleteID Identifier for the athlete ","link":"https://miguswong.github.io/post/webscraping_iroman_results/","section":"post","tags":["Personal Projects","Python","Webscraping","Sports"],"title":"Webscraping Ironman Triathalon Results"},{"body":"Introduction Hey Everyone,\nMy name is Migus Wong. I am creating this website with hugo for my Week 2 assignment with for MSDS 431 - Data Engineering with Go! However, this is something that I have been wanting to do for a while so I look forward to adding more content onto the webpage and using this site to track my personal projects personal endeavors.\nUltimately, this website was created with the use of Hugo. While the Hugo community is huge and there is plenty of documentation out there for utilizing git pages and Hugo, I prefer video-style tutorials; this was the one that I followed along with.\nIf you are interested in creating a website that looks like this, I would highly suggest checking out the Hugo Clarity Theme which also included a an example site to get you started building out your website.\n","link":"https://miguswong.github.io/post/my-first-post/","section":"post","tags":["Personal"],"title":"My First Post: Hello World!"},{"body":"","link":"https://miguswong.github.io/tags/personal/","section":"tags","tags":null,"title":"Personal"},{"body":" Hello, and welcome to my website! I'm Migus Wong; currently a full-time grad student at Northwestern University pursing a Masters of Science in Data Science.\nProfessional Summary Results-driven Data Science graduate student with a strong background in healthcare technology (pre and post implementation) and process engineering looking to progress in a machine learning and model-building role while enhancing leadership skills. Proven expertise in customer success, project management, and the implementation of AI-based solutions in clinical settings.\nEducation Northwestern University School of Professional Studies Evanston, IL Master of Science in Data Science December 2025 (Expected) GPA: 3.940Specialization in Data EngineeringRelevant Coursework: Database Systems, Applied Statistics with R, Data Engineering with Go, Practical Machine Learning Colorado School of Mines Golden, CO Bachelor of Science in Chemical Engineering December 2021 GPA: 3.715NCAA Division II Swimmer - D2 ADA Academic Achievement Award RecipientCapstone Project: Led a group of 4 other classmates through the development and presentation of a batch process design intended to convert hemp waste into carbon nanotubes. Programming and Technical Skills Language or Software Competency Applications R Intermediate Data Analysis \u0026amp; Visualization Python Intermediate Data Engineering (Pandas), Web Scraping (BeautifulSoup, Selenium) SQL Intermediate PostgreSQL MS Office Expert Excel, Word, PowerPoint Tableau Data Visualization Go Beginner Concurrent data processing Experience Technical Solutions Engineer Verona, WI Epic Systems June 2022 - September 2024 Coordinated with 50+ Hospital IT Analysts amongst two large Hospital Organizations on projects related to system maintenance, optimization, and regulatory reporting. Organized twice-a-year clinician-focused webinars for over 30 medical specialties. Grew average webinar attendance by 100% over the course of 18 months. Customer Success Lead for the company’s Genetics application module. Regularly met with R\u0026D leads and customers to better understand current functionality gaps between Epic and its competitors. Successfully led the implementation and iterative improvement of OpenAI’s GPT- 4o model to aid clinical support staff in responding to patient medical advice requests leading to over a 30% usage of unedited generated messages. Identified, developed, and implemented a solution to correct over 3,000,000 erroneous lab results in time for customer to submit to regulatory bodies. Field Engineer Golden, CO Torus Americas January 2022 - April 2022 Collaborated with other engineers in the precise installation and maintenance of automated metrology gauges used for quality testing in the can manufacturing industry. Traveled across the country 75% of the time working directly with line technicians as well as operational leadership. Additional Information Core Competencies: Project Management, Strong Leadership and team member skills, Decision Analytics; Statistical Knowledge, Customer Service, Escalation management/ Prioritization Interests: Travel Experiences; Fitness (Running, Triathlons, Powerlifting), Lifelong learning, PADI Open Water Diver ","link":"https://miguswong.github.io/about/","section":"","tags":null,"title":"About Me"},{"body":"","link":"https://miguswong.github.io/archives/","section":"","tags":null,"title":""},{"body":"","link":"https://miguswong.github.io/categories/","section":"categories","tags":null,"title":"Categories"},{"body":"","link":"https://miguswong.github.io/series/","section":"series","tags":null,"title":"Series"}]