Do not invest in a model tuned to accuracy: probability weighted accuracy, and others, are more trustworthy.
Accuracy as a metric, in trading, is misleading. The definition of accuracy, as it relates to a single trading strategy, is the total number of days a strategy makes money over the total number of days the strategy took a position. This misses at least one key point in trading applications: profits and losses.
In his recent books, Marcos Lopez de Prado teaches key principles when using metrics to evaluate a trading strategy. The points below are summaries of principles pulled from his work and my own thoughts on the matter.
(The code chunks are pulled from his work too, but I adjusted them only slightly to make them make sense to me. Hopefully my minor adjustments help you too.)
For machine learning training purposes, we simply need a metric that mirrors how we want to evaluate a strategy’s performance.
Sizing your bets incorrectly will get you into more trouble than incorrectly guessing the direction of price moves. Yes, if you make a bet and are wrong, then you will lose money, but the goal is to lose small amounts when you are wrong and to make large amounts when you are right. Savvy?
Log Loss
Compared to accuracy, a better metric is log loss. Log loss “rewards” correct predictions assigned higher confidence, while it “punishes” incorrect predictions assigned higher confidence. This is to say, a good log loss score is one where long and short bets are made accurately with predicted probabilities that are high and where long and short bets are made incorrectly with predicted probabilities that are low. My complaint with log loss is that it is difficult to interpret.
Log loss is difficult to interpret, but you should check out this Stack Exchange question and the answer given by Fed Zee. Simply put, Fed Zee shows some complexities to log loss as he compares log loss scores to accuracy.
A lower log loss score is better, but the best way to use log loss is by negating it; then, higher values are better, just like all other metrics (e.g., accuracy, recall, F1, etc.). scikit-learn’s implementation is sufficient.
from sklearn.metrics import log_loss...probabilities = clf.predict_proba(X_test)neg_log_loss = -log_loss(
y_test, probabilities, w_test, labels=clf.classes_)
Weighted Accuracy
To retain the interpretability of accuracy and to extend its functionality, you can use weighted accuracy. Weighted accuracy will compute accuracy but give higher or lower weights based on your input.
One way to make this metric valuable is to pass in returns as weights. Correct predictions which would have made more money receive higher weight. Incorrect predictions, which would have lost more money also receive higher weights. Correct and incorrect predictions which would have returned low profits or losses receive lower weights. This emulates how we want to evaluate strategies.
def weighted_accuracy(yn, wght, normalize=False):
"""
Weighted accuracy (normalize=True), or weighted sum
(normalize=False):param yn: indicator function yn E 0,1 where yn = 1 when
prediction was correct, yn = 0 otherwise
:param: wght: sample weights
"""
if normalize:
return np.average(yn, weights=wght)
elif wght is not None:
return np.dot(yn, wght)
else:
return yn.sum()
Probability Weighted Accuracy
You can probably guess, by its name, what this metric is measuring. Yes! You are correct. Probability weighted accuracy, introduced by Marcos Lopez de Prado, in Machine Learning for Asset Managers, uses probabilities to weight accuracy, similarly to log loss, however, it is much more interpretable.
Probability weighted accuracy punishes bad predictions made with high confidence more severely than accuracy, but less severely than log-loss. — Marcos Lopez de Prado
def probability_weighted_accuracy(yn, pn, K):
"""
PWA punishes bad predictions made with high confidence more
severely than accuracy, but less severely than log-loss:param yn: indicator function yn E 0,1 where yn = 1 when
prediction was correct, yn = 0 otherwise
:param pn: max k pn,k where pn,k is the prob associated
with prediction n of label k
:param K: num of classes
"""
return np.sum(
(yn * (pn.max(axis=1) - (K * 1.) ** -1))[:-1]) / np.sum(
(pn.max(axis=1) - (K * 1.) ** -1)[:-1])