How much do you need to earn to reach your desired net income?

Neat
visualisation
machine learning
Author

Li Yong Pan

Published

February 8, 2024

Recently my group of friends were talking about taking on new jobs and of course salary was one of the topics that was talked about. One thing someone said that I remember clearly is

It will always be gross to net salary, never the other way around.

And I just thought… why not? Obviously it is very practical to know how much you need to earn gross, if you are interested in earning a set amount a month. It shouldn’t be too difficult to calculate net salaries based on gross values, and with some interpolation it’s quite easy to visualise and compute for arbitrary values. Specifically we will try to visualise an answer to the question:

How much do I need to earn gross, in order to receive a desired net salary per month?

We will do so in the following steps:

  1. Defining classes for Salary and Tax.

  2. Computing a dataset of net salaries based on gross salaries for steps of €100, from €1k to €100k.

  3. Plotting net ~ gross and the tax rate.

  4. Training a model to ‘predict’ gross salaries based on net salaries.

Important

The implementation in this post only considers salaries for persons that have not yet reached the legal age for the basic state pension (AOW). Furthermore this post assumes that a person works under payroll for one employer (one source of income) and does not take other benefits into account.

Step 1: Defining classes

Salary

A person’s salary is usually given by the gross value earned each month. In the Netherlands an additional minimum of 8% needs to be added to that, commonly referred to as holiday allowance in Dutch. On top of that, another common bonus - but certainly not mandatory - is an additional gross monthly salary accumulated over an entire year. In Dutch it is referred to as a thirteenth month or a bonus rate. The percentage of the bonus rate can vary, while a thirteenth month usually is a fixed gross bonus. In particular a thirteenth month would be an additional gross monthly salary, whereas a a bonus rate could be 10% (or more/less) of the entire annual salary.

The yearly gross salary is calculated as follows:

(1)gross yearly salary=12gross monthly salary(1+pholiday+pbonus rate)+bonus

for

  • pholiday= holiday allowance in percentage

  • pbonus= bonus allowance in percentage

  • bonus= bonus allowance fixed

A fairly straight-forward calculation! Keep in mind that can be extended upon if there are other bonuses.

Code
from typing import Callable, Literal, Union
import numpy as np

Floats = Union[float, np.ndarray]


class Salary():
    def __init__(self, gross: Floats, holiday_allowance: float = 8.0,
                 bonus_rate: float = 0, bonus: float = None) -> None:
        self.gross: Floats = gross
        self.holiday_allowance: float = holiday_allowance
        self.bonus_rate: float = bonus_rate
        self.bonus: float = bonus if bonus is not None else self.gross

    def __str__(self) -> str:
        formatted_gross = f"€{self.gross:,}"
        formatted_bonus = f"€{self.bonus:,}"

        return f"Simulating a person with:\n"\
               f"{'Gross salary per month': <25} {formatted_gross: >10}\n"\
               f"{'Holiday allowance': <25} {self.holiday_allowance/100: >10.2%}\n"\
               f"{'Bonus rate': <25} {self.bonus_rate/100: >10.2%}\n"\
               f"{'Bonus fixed': <25} {formatted_bonus: >10}\n"

    def calculate_gross_salary_yearly(self) -> Floats:
        rate_bonus: float = (100 + self.holiday_allowance + self.bonus_rate)/100
        return 12 * self.gross * rate_bonus


salary = Salary(3000)
print(salary)
Simulating a person with:
Gross salary per month        €3,000
Holiday allowance              8.00%
Bonus rate                     0.00%
Bonus fixed                   €3,000

In this post it is actually not necessary to implement a separate class for the salary. Ultimately we only need a one-to-one mapping of gross and net salaries on a yearly basis. Sometimes it is actually a hindrance to compute this given the gross monthly salary and the allowance/bonus percentages. However, I have another project in mind for which this class will come in handy!

Tax

Implementing the Tax class is actually easier than what I had imagined it to be. Let’s break it down in a few steps to understand what is necessary to perform this calculation. First and foremost we have to determine the gross yearly salary in which every allowance or additional gross bonus is included. Then we have to compute the values for these three components:

  1. Gross taxes, in Dutch referred to as tax box 1.
    The more you earn, the more taxes you have to pay.

  2. Labour credit
    This credit is earned for simply working and also depends on the gross salary on a yearly basis. In particular for 2024, you will not get any credit if you earn more than €124,935. The more you earn, the less credit you receive.

  3. General credit
    Just like labour credit, general credit is applied all the time but you won’t receive any if you earn more than €75,518 gross in 2024. Again, you get less credit the more you earn.

(2)net taxes=gross taxes(labour credit+general credit)

From the relation between net salary and taxes are clear. The credit is subtracted from the gross taxes, of which the result is the net taxes that you need to pay. Combining and gives .

(3)net yearly salary=gross yearly salarynet taxes

Code
CreditFormulas = dict[tuple[float | int,
                            float | int], Callable[[float], float]]
Floats = Union[float, np.ndarray]

credit_labour_2024: CreditFormulas = {
    (0, 11491): lambda x: 0.08425 * x,
    (11491, 24821): lambda x: 968 + 0.31433 * (x - 11490),
    (24821, 39958): lambda x: 5158 + 0.02471 * (x - 24820),
    (39958, 124935): lambda x: 5532 - 0.06510 * (x - 39958),
    (124935, float('inf')): lambda x: 0.0
}
credit_general_2024: CreditFormulas = {
    (0, 24813): lambda x: 3362,
    (24813, 75518): lambda x: 3362 - 0.06630 * (x - 24812),
    (75518, float('inf')): lambda x: 0.0
}

tax_brackets_2024 = {
    (0, 75518): lambda x: 0.3697*x,
    (75518, float('inf')): lambda x: 0.495*x
}


class Tax():
    def __init__(self,
                 tax_brackets: CreditFormulas = tax_brackets_2024,
                 credit_labour_formulas: CreditFormulas = credit_labour_2024,
                 credit_general_formulas: CreditFormulas = credit_general_2024
                 ) -> None:

        self.tax_brackets: list[float] = tax_brackets
        self.credit_labour_formulas: CreditFormulas = credit_labour_formulas
        self.credit_general_formulas: CreditFormulas = credit_general_formulas

    def calculate_gross_taxes(self, total_gross: Floats) -> Floats:
        sum_gross_taxes: Floats = np.zeros_like(total_gross)

        for (lower, upper), tax_formula in self.tax_brackets.items():
            taxable_income = np.minimum(total_gross, upper) - lower
            taxable_income = np.maximum(taxable_income, 0)
            sum_gross_taxes += tax_formula(taxable_income)

        return sum_gross_taxes

    def calculate_credit(self, total_gross, type: Literal['labour', 'general']) -> Floats:
        ranges_options: dict[str, CreditFormulas] = {
            'labour': self.credit_labour_formulas,
            'general': self.credit_general_formulas
        }
        salary_ranges = ranges_options[type]

        result = np.zeros_like(total_gross, dtype=float)

        for (l, u), function in salary_ranges.items():
            mask = np.logical_and(l <= total_gross, total_gross < u)
            result += np.where(mask, function(total_gross), 0.0)

        return result

    def calculate_net_taxes(self, total_gross: Floats) -> Floats:
        taxes_gross: Floats = self.calculate_gross_taxes(total_gross)
        labour_credit: Floats = self.calculate_credit(
            total_gross, type='labour')
        general_credit: Floats = self.calculate_credit(
            total_gross, type='general')

        return np.maximum(
            taxes_gross - (labour_credit + general_credit), 0)

    def calculate_net_salary(self, total_gross: Floats) -> Floats:
        # taxes_gross: Floats = self.calculate_gross_taxes(total_gross)
        # labour_credit: Floats = self.calculate_credit(total_gross, type='labour')
        # general_credit: Floats = self.calculate_credit(
        #     total_gross, type='general')

        income_tax: Floats = self.calculate_net_taxes(total_gross)

        return total_gross - income_tax


tax = Tax()
gross_salary = salary.calculate_gross_salary_yearly()
f"For a person earning {salary.gross} a month, with {salary.holiday_allowance}% holiday allowance and {salary.bonus_rate:.2f}% bonus, the annual net salary is {tax.calculate_net_salary(gross_salary):.2f}"
'For a person earning 3000 a month, with 8.0% holiday allowance and 0.00% bonus, the annual net salary is 32440.78'

Step 2: Computing a dataset of net salaries

For an an array of gross salaries x=[1.000,1.100,...,999.900,1.000.000], we can compute the net salaries y by using the calculate_net_salary() method in the Tax class. For some other plots we would like to show some other descriptive figures, such as the amount of tax paid as percentage of a gross salary.

Code
import pandas as pd
salary_start, salary_stop, step_size = 1, 1000, 0.1
steps = int((salary_stop - salary_start) / step_size) + 1
salaries = 1000 * np.linspace(start=salary_start, stop=salary_stop, num=steps)
gross_salaries = salaries
gross_taxes = tax.calculate_gross_taxes(gross_salaries)
labour_credit = tax.calculate_credit(gross_salaries, type='labour')
general_credit = tax.calculate_credit(gross_salaries, type='general')
net_taxes = tax.calculate_net_taxes(gross_salaries)
net_salaries = tax.calculate_net_salary(gross_salaries)

df = pd.DataFrame({
    "gross_salary_yearly": gross_salaries,
    "gross_taxes": gross_taxes,
    'labour_credit': labour_credit,
    'general_credit': general_credit,
    'net_taxes': net_taxes,
    "tax_rate": net_taxes/gross_salaries,
    "net_salary_yearly": net_salaries,
    "net_salary_monthly": net_salaries/12,
    "salary_rate": net_salaries/gross_salaries
})
df
gross_salary_yearly gross_taxes labour_credit general_credit net_taxes tax_rate net_salary_yearly net_salary_monthly salary_rate
0 1000.0 369.7000 84.250 3362.0 0.0000 0.000000 1000.0000 83.333333 1.000000
1 1100.0 406.6700 92.675 3362.0 0.0000 0.000000 1100.0000 91.666667 1.000000
2 1200.0 443.6400 101.100 3362.0 0.0000 0.000000 1200.0000 100.000000 1.000000
3 1300.0 480.6100 109.525 3362.0 0.0000 0.000000 1300.0000 108.333333 1.000000
4 1400.0 517.5800 117.950 3362.0 0.0000 0.000000 1400.0000 116.666667 1.000000
... ... ... ... ... ... ... ... ... ...
9986 999600.0 485339.5946 0.000 0.0 485339.5946 0.485534 514260.4054 42855.033783 0.514466
9987 999700.0 485389.0946 0.000 0.0 485389.0946 0.485535 514310.9054 42859.242117 0.514465
9988 999800.0 485438.5946 0.000 0.0 485438.5946 0.485536 514361.4054 42863.450450 0.514464
9989 999900.0 485488.0946 0.000 0.0 485488.0946 0.485537 514411.9054 42867.658783 0.514463
9990 1000000.0 485537.5946 0.000 0.0 485537.5946 0.485538 514462.4054 42871.867117 0.514462

9991 rows × 9 columns

Step 3: Plotting net salaries against gross salaries

In each tab we have plot interactive graphs. Try hovering on them! Double clicking on each plot toggles between the total range and a fixed range.

A first look at the net salaries against gross salaries. Plotted alongside is the graph of y=x, showcasing the deviation from the ideal scenario where a person would earn net as much as they earn gross. However, we still have not answered the main question yet. This graph helps with showing how much you earn more net annually if your gross salary increases by €100. If your gross salary is €40k, then you earn €33,1k net. If that is raised by €100, you will earn €33,15k net.

Code
import plotly.express as px
import plotly.io as pio

pio.templates.default = "plotly_dark"

fig = px.scatter(df,
                 x="gross_salary_yearly",
                 y="net_salary_yearly", title="Net-Gross Salaries (Yearly)")

fig.update_layout(
    xaxis_title="Gross Salary",
    yaxis_title="Net Salary"
)

fig.add_shape(
    type='line',
    x0=df['gross_salary_yearly'].min(),
    y0=df['gross_salary_yearly'].min(),
    x1=df['gross_salary_yearly'].max(),
    y1=df['gross_salary_yearly'].max(),
    line=dict(dash='dash', color='red')
)

fig.update_xaxes(range=[0, 100000])
fig.update_yaxes(range=[0, 100000])
fig.update_traces(hovertemplate='Gross Salary: €%{x}<br>Net Salary: €%{y}')

fig.show()

How much of my gross salary goes to taxes? Somewhat painful, but interesting to see. A higher gross salary results in paying more taxes - obviously, as a different result would be quite strange. The graph shows that a person with €50k annual salary sees 23.8% of that going to taxes. Someone with €100k annual gross salary waives off 38,4% for taxes. Theoretically, not more than 49,5% of your salary can go to taxes as it is hardcapped by the tax brackets - y=0.495 is an asymptote in 2024.

Code
fig = px.scatter(df,
                 x="gross_salary_yearly",
                 y="tax_rate",
                 title="Marginal Tax Rate")

fig.update_layout(
    xaxis_title="Gross Salary (Yearly)",
    yaxis_title="Tax Rate"
)

fig.update_xaxes(range=[0, 100000])
fig.update_yaxes(tickformat=".0%")
fig.update_traces(hovertemplate='Gross Salary: €%{x}<br>Tax Rate: %{y:.2%}')


fig.show()

Are the values in the tax_rate column indeed non-decreasing?

(df["tax_rate"].diff().iloc[1:] >= 0).all()
True

Step 4: Training the model

Let’s train a model in which the gross salary is the dependent variable, and the net salary is the independent variable. We want to train a model which can completely capture the relationship, meaning it has a mean squared error of 0. We will use a random forest model to try and achieve this. First we define x (net) and y (gross) and import the relevant classes and function.

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

x = df['net_salary_yearly'].to_numpy() / 12
y = df['gross_salary_yearly'].to_numpy()

Now we go though some tuning of hyperparameters to achieve the best result.

bootstrap

regressor = RandomForestRegressor(
    n_estimators=100, random_state=0, bootstrap=False)
regressor_lesser = RandomForestRegressor(
    n_estimators=100, random_state=0, bootstrap=True)

regressor.fit(x.reshape(-1, 1), y)
regressor_lesser.fit(x.reshape(-1, 1), y)

y_pred = regressor.predict(x.reshape(-1, 1))
y_pred_lesser = regressor_lesser.predict(x.reshape(-1, 1))
print(mean_squared_error(y, y_pred))
print(mean_squared_error(y, y_pred_lesser))
9.02521855413928e-25
413.2848563710619

We start with the default amount of trees specified in RandomForestRegressor and notice that using the default value of the bootstrap argument (True) gives us a higher MSE. It makes sense, because by using a bootstrap sampling method we introduce randomness in the training process. However, we do not want this as we aim to get an MSE value of 0. Our data is clean and contains no outliers and as a result bootstrapping would only make our predictions - which in fact are interpolations - worse. Therefore in our final model this argument will be set to False. Notice that the MSE of the first regression model is incredibly close to 0. Can we get this to 0?

n_estimators

regressor = RandomForestRegressor(
    n_estimators=1, random_state=0, bootstrap=False)
regressor_lesser = RandomForestRegressor(
    n_estimators=100, random_state=0, bootstrap=False)

regressor.fit(x.reshape(-1, 1), y)
regressor_lesser.fit(x.reshape(-1, 1), y)

y_pred = regressor.predict(x.reshape(-1, 1))
y_pred_lesser = regressor_lesser.predict(x.reshape(-1, 1))
print(mean_squared_error(y, y_pred))
print(mean_squared_error(y, y_pred_lesser))
0.0
9.02521855413928e-25

As we are not using bootstrap resampling, does it even make sense to use more than 1 tree, let alone 100? In theory it would still be possible that each tree captures different aspects of the data. Yet coming back to the earlier statement that our dataset is clean and contains little to no hidden patterns, it should not be really beneficial to use multiple trees. The MSE of the regressor with only one tree is 0. That should mean that we can get the same results when using only a single decision tree.

from sklearn.tree import DecisionTreeRegressor

regressor = RandomForestRegressor(
    n_estimators=1, random_state=0, bootstrap=False)
regressor_tree = DecisionTreeRegressor()

regressor.fit(x.reshape(-1, 1), y)
regressor_tree.fit(x.reshape(-1, 1), y)

y_pred = regressor.predict(x.reshape(-1, 1))
y_pred_tree = regressor_tree.predict(x.reshape(-1, 1))
print(np.all(y_pred == y_pred_tree))
True

So we did not need a random forest after all! Of course, we could still go for a random forest with only one tree. Now on to the answer of the main question

TL;DR

How much do I need to earn gross, in order to receive a desired net salary per month?

x = np.arange(start=100, stop=10100, step=100)
y = regressor_tree.predict(x.reshape(-1, 1))

fig = px.scatter(x=x, y=y, title="Net Salary Monthly - Gross Salary Yearly")
fig.update_layout(
    xaxis_title="Net Salary (Monthly)",
    yaxis_title="Gross Salary (Yearly)"
)

tickvals = list(range(0, 10001, 1000))
ticktext = [f"{val//1000}k" if val != 0 else '0' for val in tickvals]
fig.update_xaxes(tickvals=tickvals, ticktext=ticktext)

fig.update_traces(
    hovertemplate='Net Salary (Month): €%{x}<br>Gross Salary (Annual): €%{y}')

fig.show()
01k2k3k4k5k6k7k8k9k10k050k100k150k200k
Net Salary Monthly - Gross Salary YearlyNet Salary (Monthly)Gross Salary (Yearly)

That’s all for today, thanks for reading!

Back to top

Footnotes

  1. There are some exceptions, as mentioned on https://business.gov.nl/regulation/holiday-allowance/↩︎