Deploying an MLflow Model with Streamlit: predicting footballer positions

GK, DF, MF, or FW?
deployment
mlflow
classification
machine learning
Author

Li Yong Pan

Published

April 18, 2024

In the past I have done quite a few classification problems, but most of them were a code-along or done with trivial datasets that nearly every data scientist has done. With the expanded responsibilities as a data scientist, which already are broadly defined across the industry, I wanted to write about deploying a model with an interface. It is not uncommon to deploy models by serving them as a REST API, but I would only do that for models that could be used at larges scale. We will be using MLflow to log our model, both PySpark (on Databricks) and the scikit-learn framework to train a model, Streamlit for the interface of the app, and deployment using Docker on my Raspberry Pi. Because we use an interface and do not serve this as an API, some additional and or different steps need to be taken compared to only serving it using FastAPI or Flask on a remote server.

Today we will train and deploy a model that can predict based on footballing stats whether a player is a goalkeeper, defender, midfielder or a forward. I have always had the idea to incorporate football into these types of projects and found this project to be the perfect use case. Our aim in this post is to create a model decent enough to deploy, but ot does not necessarily have to be groundbreaking across all metrics. Let’s do it!

Step 1: Finding a dataset

Kaggle is the perfect platform to find for these niche datasets. What is Kaggle?

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

It also has various user-contributed datasets. The one we will be using today is 2023-2023 Football Player Stats.

Note

There is also an API to download datasets from Kaggle, but it is a little overkill for a small project like this. It would make more sense if this dataset was continuously updated, but it will not be as it only concerns 2022-2023 season data.

Step 2: Exploring context of the data

This dataset contains 2022-2023 football player stats per 90 minutes.

Only players of Premier League, Ligue 1, Bundesliga, Serie A and La Liga are listed.

These competitions are usually denoted as the Top 5 European leagues, which is based on the UEFA rankings. Comparing a player that has played 90 minutes and has attempted 50 passes to a player that has played 45 minutes and also attempted 50 passes would be unfair. Let’s read the data

import pandas as pd
df = pd.read_csv("2022_2023_Football_Player_Stats.csv", sep=";", encoding="latin-1")
df.head()
Rk Player Nation Pos Squad Comp Age Born MP Starts ... Off Crs TklW PKwon PKcon OG Recov AerWon AerLost AerWon%
0 1 Brenden Aaronson USA MFFW Leeds United Premier League 22 2000 20 19 ... 0.17 2.54 0.51 0.0 0.0 0.00 4.86 0.34 1.19 22.2
1 2 Yunis Abdelhamid MAR DF Reims Ligue 1 35 1987 22 22 ... 0.05 0.18 1.59 0.0 0.0 0.00 6.64 2.18 1.23 64.0
2 3 Himad Abdelli FRA MFFW Angers Ligue 1 23 1999 14 8 ... 0.00 1.05 1.40 0.0 0.0 0.00 8.14 0.93 1.05 47.1
3 4 Salis Abdul Samed GHA MF Lens Ligue 1 22 2000 20 20 ... 0.00 0.35 0.80 0.0 0.0 0.05 6.60 0.50 0.50 50.0
4 5 Laurent Abergel FRA MF Lorient Ligue 1 30 1993 15 15 ... 0.00 0.23 2.02 0.0 0.0 0.00 6.51 0.31 0.39 44.4

5 rows × 124 columns

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("football").getOrCreate()

df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/tables/2022_2023_Football_Player_Stats.csv", sep=";")
df.head()

The dataset contains 124 columns, and we will not use all of them for the model.

  • Rk : Rank
  • Player : Player’s name
  • Nation : Player’s nation
  • Pos : Position
  • Squad : Squad’s name
  • Comp : League that squat occupies
  • Age : Player’s age
  • Born : Year of birth
  • MP : Matches played
  • Starts : Matches started
  • Min : Minutes played
  • 90s : Minutes played divided by 90
  • Goals : Goals scored or allowed
  • Shots : Shots total (Does not include penalty kicks)
  • SoT : Shots on target (Does not include penalty kicks)
  • SoT% : Shots on target percentage (Does not include penalty kicks)
  • G/Sh : Goals per shot
  • G/SoT : Goals per shot on target (Does not include penalty kicks)
  • ShoDist : Average distance, in yards, from goal of all shots taken (Does not include penalty kicks)
  • ShoFK : Shots from free kicks
  • ShoPK : Penalty kicks made
  • PKatt : Penalty kicks attempted
  • PasTotCmp : Passes completed
  • PasTotAtt : Passes attempted
  • PasTotCmp% : Pass completion percentage
  • PasTotDist : Total distance, in yards, that completed passes have traveled in any direction
  • PasTotPrgDist : Total distance, in yards, that completed passes have traveled towards the opponent’s goal
  • PasShoCmp : Passes completed (Passes between 5 and 15 yards)
  • PasShoAtt : Passes attempted (Passes between 5 and 15 yards)
  • PasShoCmp% : Pass completion percentage (Passes between 5 and 15 yards)
  • PasMedCmp : Passes completed (Passes between 15 and 30 yards)
  • PasMedAtt : Passes attempted (Passes between 15 and 30 yards)
  • PasMedCmp% : Pass completion percentage (Passes between 15 and 30 yards)
  • PasLonCmp : Passes completed (Passes longer than 30 yards)
  • PasLonAtt : Passes attempted (Passes longer than 30 yards)
  • PasLonCmp% : Pass completion percentage (Passes longer than 30 yards)
  • Assists : Assists
  • PasAss : Passes that directly lead to a shot (assisted shots)
  • Pas3rd : Completed passes that enter the 1/3 of the pitch closest to the goal
  • PPA : Completed passes into the 18-yard box
  • CrsPA : Completed crosses into the 18-yard box
  • PasProg : Completed passes that move the ball towards the opponent’s goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area
  • PasAtt : Passes attempted
  • PasLive : Live-ball passes
  • PasDead : Dead-ball passes
  • PasFK : Passes attempted from free kicks
  • TB : Completed pass sent between back defenders into open space
  • Sw : Passes that travel more than 40 yards of the width of the pitch
  • PasCrs : Crosses
  • TI : Throw-Ins taken
  • CK : Corner kicks
  • CkIn : Inswinging corner kicks
  • CkOut : Outswinging corner kicks
  • CkStr : Straight corner kicks
  • PasCmp : Passes completed
  • PasOff : Offsides
  • PasBlocks : Blocked by the opponent who was standing it the path
  • SCA : Shot-creating actions
  • ScaPassLive : Completed live-ball passes that lead to a shot attempt
  • ScaPassDead : Completed dead-ball passes that lead to a shot attempt
  • ScaDrib : Successful dribbles that lead to a shot attempt
  • ScaSh : Shots that lead to another shot attempt
  • ScaFld : Fouls drawn that lead to a shot attempt
  • ScaDef : Defensive actions that lead to a shot attempt
  • GCA : Goal-creating actions
  • GcaPassLive : Completed live-ball passes that lead to a goal
  • GcaPassDead : Completed dead-ball passes that lead to a goal
  • GcaDrib : Successful dribbles that lead to a goal
  • GcaSh : Shots that lead to another goal-scoring shot
  • GcaFld : Fouls drawn that lead to a goal
  • GcaDef : Defensive actions that lead to a goal
  • Tkl : Number of players tackled
  • TklWon : Tackles in which the tackler’s team won possession of the ball
  • TklDef3rd : Tackles in defensive 1/3
  • TklMid3rd : Tackles in middle 1/3
  • TklAtt3rd : Tackles in attacking 1/3
  • TklDri : Number of dribblers tackled
  • TklDriAtt : Number of times dribbled past plus number of tackles
  • TklDri% : Percentage of dribblers tackled
  • TklDriPast : Number of times dribbled past by an opposing player
  • Blocks : Number of times blocking the ball by standing in its path
  • BlkSh : Number of times blocking a shot by standing in its path
  • BlkPass : Number of times blocking a pass by standing in its path
  • Int : Interceptions
  • Tkl+Int : Number of players tackled plus number of interceptions
  • Clr : Clearances
  • Err : Mistakes leading to an opponent’s shot
  • Touches : Number of times a player touched the ball. Note: Receiving a pass, then dribbling, then sending a pass counts as one touch
  • TouDefPen : Touches in defensive penalty area
  • TouDef3rd : Touches in defensive 1/3
  • TouMid3rd : Touches in middle 1/3
  • TouAtt3rd : Touches in attacking 1/3
  • TouAttPen : Touches in attacking penalty area
  • TouLive : Live-ball touches. Does not include corner kicks, free kicks, throw-ins, kick-offs, goal kicks or penalty kicks.
  • ToAtt : Number of attempts to take on defenders while dribbling
  • ToSuc : Number of defenders taken on successfully, by dribbling past them
  • ToSuc% : Percentage of take-ons Completed Successfully
  • ToTkl : Number of times tackled by a defender during a take-on attempt
  • ToTkl% : Percentage of time tackled by a defender during a take-on attempt
  • Carries : Number of times the player controlled the ball with their feet
  • CarTotDist : Total distance, in yards, a player moved the ball while controlling it with their feet, in any direction
  • CarPrgDist : Total distance, in yards, a player moved the ball while controlling it with their feet towards the opponent’s goal
  • CarProg : Carries that move the ball towards the opponent’s goal at least 5 yards, or any carry into the penalty area
  • Car3rd : Carries that enter the 1/3 of the pitch closest to the goal
  • CPA : Carries into the 18-yard box
  • CarMis : Number of times a player failed when attempting to gain control of a ball
  • CarDis : Number of times a player loses control of the ball after being tackled by an opposing player
  • Rec : Number of times a player successfully received a pass
  • RecProg : Completed passes that move the ball towards the opponent’s goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area
  • CrdY : Yellow cards
  • CrdR : Red cards
  • 2CrdY : Second yellow card
  • Fls : Fouls committed
  • Fld : Fouls drawn
  • Off : Offsides
  • Crs : Crosses
  • TklW : Tackles in which the tackler’s team won possession of the ball
  • PKwon : Penalty kicks won
  • PKcon : Penalty kicks conceded
  • OG : Own goals
  • Recov : Number of loose balls recovered
  • AerWon : Aerials won
  • AerLost : Aerials lost
  • AerWon% : Percentage of aerials won

The variable that we want to predict is in the Pos column. However, a quick glance shows that some players have more than one position defined.

df["Pos"].unique().tolist()
['MFFW', 'DF', 'MF', 'FWMF', 'FW', 'DFFW', 'MFDF', 'GK', 'DFMF', 'FWDF']
df.select("Pos").distinct().show()

Some players play as midfielder (MF) but also occasionally as a defender (DF), denoted by MFDF. And what about players that have not played many matches? On to some feature engineering!

Step 3: Feature engineering

First thing we will do is filter the players that have played at least 5 matches.

df = df[df["MP"] >= 5]
df.shape
(2066, 124)
df = df.filter(df["MP"] >= 5)
df.count()

We still have 2066 observations after filtering, which is fine. Second thing to do is get rid of multiple positions for one player.

df["Pos"].apply(len).unique().tolist()
[4, 2]
from pyspark.sql.functions import length

df.withColumn("PosMod", substring(col("Pos"), 1, 2)).select("PosMod").distinct().collect()

The output tells us that Pos has either length 2 or 4. For the sake of simplicity we will use the first two letters and define that as the position of the player.

df.loc[:, "PosMod"] = df["Pos"].str[:2]
df["PosMod"].unique().tolist()
['MF', 'DF', 'FW', 'GK']
from pyspark.sql.functions import col

df = df.withColumn("PosMod", substring(col("Pos"), 1, 2))
df.select("PosMod").distinct().show()

4 positions, perfect! The final thing to do with the features is encode the positions into numerical values.

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
df["Class"] = label_encoder.fit_transform(df["PosMod"])
label_encoder.classes_.tolist()
['DF', 'FW', 'GK', 'MF']
from pyspark.ml.feature import StringIndexer

indexer = StringIndexer(inputCol="PosMod", outputCol="Class")
df = indexer.fit(df).transform(df)

The encoding is as follows

Position (Class) Encoding
DF = Defender 0
FW = Forward 1
GK = Goalkeeper 2
MF = Midfielder 3

It’s extremely frustrating and it grinds my gears, but I don’t think there’s a way to insert a custom mapping. I would have preferred that the order would be GK - DF - MF - FW. Makes sense right?

What remains is to define the columns to use and split the data for training and testing. With some trial and error I found a selection that leads to a decent model. Remember, the interface should be user-friendly which means that I do not want to many input variables.

cols = ["Shots", "PasMedAtt", "Pas3rd", "Clr"]
X = df[cols]
y = df["Class"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10)
cols = ["Shots", "PasMedAtt", "Pas3rd", "Clr", "Class"]
train, test = df.select(cols).randomSplit([0.7, 0.3], seed=10)
Column Description
Shots Shots total (Does not include penalty kicks) per 90 minutes
PassMedAtt Passes attempted (Passes between 15 and 30 yards) per 90 minutes
Pas3rd Completed passes that enter the 1/3 of the pitch closest to the goal per 90 minutes
Clr Clearances per 90 minutes

In another post I would like to revise this selection with grid optimisation. For the current post the selection suffices.

Step 4: Training the model

Before training the model, we should scale the columns as players usually have many more passes attempted than for example shots or clearances.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
from pyspark.ml.feature import StandardScaler, VectorAssembler

assembler = VectorAssembler(inputCols=cols, outputCol="features_to_scale")
train = assembler.transform(train)
test = assembler.transform(test)

scaler = StandardScaler(inputCol="features_to_scale", outputCol="scaled_features")
scaler_model = scaler.fit(train)
train_scaled = scaler_model.transform(train)
test_scaled = scaler_model.transform(test)

Now comes the part where we integrate MLflow. Again, although MLflow allows us to evaluate iterations (runs) of models, for now we will use it to log a .pkl file which we can load into our Streamlit app. We will be using a Random Forest classifier.

MLflow autologging in Databricks

For PySpark in Databricks, we do not need to explicitly call the MLflow functions. Databricks has MLflow autologging enabled by default on ML clusters, which we are using.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, recall_score
import mlflow.sklearn

with mlflow.start_run():
    
    mlflow.sklearn.log_model(scaler, "rf_scaler")
    
    # Train Random Forest classifier
    rf = RandomForestClassifier(random_state=10)
    rf.fit(X_train_scaled, y_train)

    # Predict on test set
    y_pred = rf.predict(X_test_scaled)

    # Evaluate the model
    f1 = f1_score(y_test, y_pred, average="weighted")
    accuracy = accuracy_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred, average="weighted")

    # Log metrics to MLflow
    mlflow.log_metric("f1_score", f1)
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("recall", recall)

    # Log final model to MLflow
    mlflow.sklearn.log_model(rf, "rf_model")

print(f"F1 Score: {f1}")
print(f"Accuracy: {accuracy}")
print(f"Recall: {recall}")
F1 Score: 0.8171709415334705
Accuracy: 0.817741935483871
Recall: 0.817741935483871
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.classification import RandomForestClassifier

rf = RandomForestClassifier(featuresCol="scaled_features", labelCol="Class")
rf_model = rf.fit(train_scaled)
rf_results = rf_model.transform(test_scaled)


f1 = MulticlassClassificationEvaluator(labelCol="Class")
accuracy = MulticlassClassificationEvaluator(
    labelCol="Class", metricName="accuracy")
recall = MulticlassClassificationEvaluator(
    labelCol="Class", metricName="recallByLabel")

f1.evaluate(rf_results)
accuracy.evaluate(rf_results)
recall.evaluate(rf_results)

These metrics seem fine! Not something to write home about if you’d ask me, but good enough. It could definitely be improved upon, but again: not the scope of this post. The model and the corresponding scaler are saved under a local directory /mlruns/0/<run_id>/artifacts/. We extract the corresponding .pkl files and put them in the top-level, where the other .py files reside as well.

Continuing with scikit-learn results

As I am using the Databricks Community version, I found it very hard (or perhaps it is impossible) to extract the files to a local machine. Therefore we continue with scikit-learn results and code output.

Step 5: Creating a Streamlit app

In other projects I have fiddled around with Streamlit and I really enjoy its simplicity. It allows for quick deployment and has a decent default interface. Rather than building the app.py piece by piece, I will provide the file and explain parts of the code that are not self-explanatory.

app.py (1/2)
import pickle
import pandas as pd
import streamlit as st

# Load MLflow model and scaler
with open("rf_model.pkl", "rb") as f:
    model = pickle.load(f)

with open("rf_scaler.pkl", "rb") as f:
    scaler = pickle.load(f)

# Load data
df = pd.read_csv("2022_2023_Football_Player_Stats.csv", sep=";", encoding="latin1")

# Positions
positions = {0: "Defender", 1: "Forward", 2: "Goalkeeper", 3: "Midfielder"}


# Helper function to preprocess input data
def preprocess_input(data):
    # Scale data
    data = scaler.transform(data)
    
    return data


# Function to predict using the model
def predict(data):
    # Preprocess input data
    X = preprocess_input(data)
    
    # Make predictions
    predictions = model.predict(X)
    
    return predictions


# Function to compare input with existing players
def compare(data, n):
    cols = ["Shots", "PasMedAtt", "Pas3rd", "Clr"]
    mse = ((df[["Shots", "PasMedAtt", "Pas3rd", "Clr"]] - data.iloc[0]) ** 2).mean(axis=1)
    
    idxs = mse.nsmallest(n).index
    return df.loc[idxs][["Player", "Age", "Pos", "Squad", "Comp"] + cols]


# Function to format a line of player information
def format_player(row):
    return f"*{row['Player']}*, a {row['Age']} year old {row['Pos']} playing for {row['Squad']} in {row['Comp']}."


# Function that provides text for players similar to input
def text_similar_players(data):
    data = data.reset_index()

    text = 'The players closest to your input are:\n'
    for index, row in data.iterrows():
        player_info = format_player(row)
        text += f"{index + 1}. {player_info}\n"
    
    return text

# ...
1
Compare user input with all players in the dataset. Use the MSE as the metric to minimise and find the n closest/most similar players with respect to input statistics.
2
Provide a list of n players that are most similar to user input.

Honestly not very much to explain.

app.py (1/2)
# Streamlit UI
def main():
    st.set_page_config(page_title="Classifying Footballers", layout="wide")
    st.markdown("""
        <style>
            .reportview-container {
                margin-top: -2em;
            }
            #MainMenu {visibility: hidden;}
            .stDeployButton {display:none;}
            footer {visibility: hidden;}
            #stDecoration {display:none;}
        </style>
    """, unsafe_allow_html=True)
    
    st.title("Predicting Football player positions based on 2022/23 stats")
    
    st.sidebar.success(f"Visit [blog.panliyong.nl](https://blog.panliyong.nl/posts/006_football) for the post!")
    
    # Create input form
    st.sidebar.header("Input Features")
    shots = st.sidebar.number_input("Shots", min_value=0.0, max_value=10.0, step=0.1, value=1.0)
    pas_med_att = st.sidebar.number_input("Passes Medium Att", min_value=0.0, max_value=70.0, step=1.0, value=20.0)
    pas_3rd = st.sidebar.number_input("3rd Passes", min_value=0.0, max_value=20.0, step=1.0, value=3.0)
    clr = st.sidebar.number_input("Clearances", min_value=0.0, max_value=15.0, step=0.1, value=3.0)
    
    # Create column layout
    left_column, right_column = st.columns(2)
    
    with left_column:
        # Players compare
        st.markdown("### Find similar players")
        n_players = st.slider("How many similar players do you want to find?", min_value=1, max_value=20, step=1,
                              value=5)
    
    # Create prediction button
    if st.sidebar.button("Predict"):
        # Create a dataframe from the input
        input_data = pd.DataFrame({
            "Shots": [shots],
            "PasMedAtt": [pas_med_att],
            "Pas3rd": [pas_3rd],
            "Clr": [clr]
        })
        
        # Predict
        prediction = predict(input_data)
        
        # Display prediction
        position = positions.get(prediction[0])
        position_styled = f"<span style='color:purple'>{position}</span>"
        
        st.sidebar.info(f"Predicted Position: {position}")
        
        # Check which player is closest to input
        players = compare(input_data, n=n_players)
        
        text = text_similar_players(players)
        
        with left_column:
            st.info(text)
        
        with right_column:
            st.markdown(f"# **Predicted a** {position_styled}",
                        unsafe_allow_html=True)
            st.markdown("## Data of similar players:")
            st.dataframe(players.reset_index(drop=True))


# Run the app
if __name__ == "__main__":
    main()
1
Remove developer settings for Streamlit.
2
Define user input interactivity.

We can try and see if the Streamlit app runs with the following command

Terminal
streamlit run app.py

This is what the app looks like:

Now it’s time to prepare it for deployment!

Step 6: Make a Dockerfile

A few things to keep in mind when writing this Dockerfile is that the requirements.txt and the other environment files stored by MLflow have many irrelevant dependencies. The only things we need are pandas, streamlit and scikit-learn. That makes building the Dockerfile quicker as there are less dependencies that have to be installed.

However, another important factor is that all the time I have been developing on a Windows (64-bit) machine. The Raspberry Pi which the app is going to run on is a Linux (arm64) machine. As these platforms are not the same, we cannot simply use docker build commands and expect them to work on the Linux device.

Let’s first consider our Dockerfile:

FROM python:3.10-slim

WORKDIR /app

COPY app.py .
COPY 2022_2023_Football_Player_Stats.csv .
COPY rf_model.pkl .
COPY rf_scaler.pkl .

RUN pip install streamlit pandas scikit-learn==1.4.0

EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.address=0.0.0.0"]

The default port of Streamlit apps is 8501 and we did not change that, so we have to expose this port. Furthermore only the .pkl files and app.py and .csv file are all the files we need. Then we can install the libraries we need to run the Streamlit app.

Now comes the part where we build the image and push it to Docker Hub under panliyong/football-positions. Up for grabs on the Raspberry Pi!

Terminal
docker buildx build --platform linux/arm64 -t panliyong/football-positions -f Dockerfile --push .

Step 7: Deploying on Raspberry Pi

The image is now on Docker Hub and we can pull the image now. Let’s name the container football-positions and map port 8501 of the Pi to the exposed port. I have also had issues with containers running out of memory, so in order to keep this container running we can update the restart policy to always restart the container.

Terminal
docker pull panliyong/football-positions
docker run -d --name football-positions -p 8501:8501 panliyong/football-positions
docker update --restart=always football-positions

My domain panliyong.nl and subdomains are managed through Cloudflare and are hosted on the Apache web server on the same Pi. We can serve this on football-positions.panliyong.nl for the app. If we go to the DNS settings and add a CNAME record with name football-positions and target panliyong.nl, we enable this new subdomain.

We need to add a virtual host football-positions.conf and enable it.

football-positions.conf
<VirtualHost *:80>
    ServerName football-positions.panliyong.nl
    ServerAlias www.football-positions.panliyong.nl

    ProxyPass / http://localhost:8501/
    ProxyPassReverse / http://localhost:8501/

    # This was the fix mentioned below!
    <Location "/_stcore/stream">
        ProxyPass               ws://localhost:8501/_stcore/stream
        ProxyPassReverse        ws://localhost:8501/_stcore/stream
    </Location>


    # ...
</VirtualHost>

<IfModule mod_ssl.c>
    <VirtualHost *:443>
        ServerAlias www.football-positions.panliyong.nl
        ServerName football-positions.panliyong.nl

        SSLProxyEngine on
        ProxyPass / http://localhost:8501/
        ProxyPassReverse / http://localhost:8501/

        # ...

        # Redirect HTTP to HTTPS
        RewriteEngine On
        RewriteCond %{HTTPS} off
        RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
    </VirtualHost>
</IfModule>

Enable it with these commands:

sudo a2ensite football-positions.conf
sudo systemctl reload apache2

If everything is correctly set up the app should be accessible through football-positions.panliyong.nl.

Inaccessible due to Streamlit updates

It appears that the app is not loading on the Apache web server due to websocket errors and some API changes. I’ll be looking for a fix…

(2024-04-22): The app is hosted on football-positions.streamlit.app for the time being!

(2024-04-29): I have found the fix! The app is hosted on football-positions.panliyong.nl!

TL;DR

It’s not as difficult as it sounds to prepare a model for deployment. Due to unforeseen changes in the streamlit==1.33 package, the app did not load due to websocket erros. We will be looking back on this soon!

(2024-04-22): The app is hosted on football-positions.streamlit.app for the time being!

(2024-04-29): I have found the fix! The app is hosted on football-positions.panliyong.nl!

That’s all for today, thanks for reading!

Back to top

Footnotes

  1. Results of scikit-learn and PySpark can differ, but I am talking about the scikit-learn metrics.↩︎

  2. This could also be done in the initial run command, but I forgot and was too lazy to delete it…↩︎