Misleading Metrics Like Accuracy Are Not Worth Your Time | by Dylan Cunningham | The Capital

Dogecoin (DOGE) Eyes Crazy Price Pump If This Pattern Plays Out

April 21, 2024

Crypto Analyst Sounds Buy Alarm For Dogecoin

April 21, 2024

Do not invest in a model tuned to accuracy: probability weighted accuracy, and others, are more trustworthy.

Accuracy as a metric, in trading, is misleading. The definition of accuracy, as it relates to a single trading strategy, is the total number of days a strategy makes money over the total number of days the strategy took a position. This misses at least one key point in trading applications: profits and losses.

Photo by Ricardo Arce on Unsplash

In his recent books, Marcos Lopez de Prado teaches key principles when using metrics to evaluate a trading strategy. The points below are summaries of principles pulled from his work and my own thoughts on the matter.

(The code chunks are pulled from his work too, but I adjusted them only slightly to make them make sense to me. Hopefully my minor adjustments help you too.)

For machine learning training purposes, we simply need a metric that mirrors how we want to evaluate a strategy’s performance.

Sizing your bets incorrectly will get you into more trouble than incorrectly guessing the direction of price moves. Yes, if you make a bet and are wrong, then you will lose money, but the goal is to lose small amounts when you are wrong and to make large amounts when you are right. Savvy?

Log Loss

Compared to accuracy, a better metric is log loss. Log loss “rewards” correct predictions assigned higher confidence, while it “punishes” incorrect predictions assigned higher confidence. This is to say, a good log loss score is one where long and short bets are made accurately with predicted probabilities that are high and where long and short bets are made incorrectly with predicted probabilities that are low. My complaint with log loss is that it is difficult to interpret.

Log loss is difficult to interpret, but you should check out this Stack Exchange question and the answer given by Fed Zee. Simply put, Fed Zee shows some complexities to log loss as he compares log loss scores to accuracy.

A lower log loss score is better, but the best way to use log loss is by negating it; then, higher values are better, just like all other metrics (e.g., accuracy, recall, F1, etc.). scikit-learn’s implementation is sufficient.

from sklearn.metrics import log_loss...probabilities = clf.predict_proba(X_test)neg_log_loss = -log_loss(
y_test, probabilities, w_test, labels=clf.classes_)

Weighted Accuracy

To retain the interpretability of accuracy and to extend its functionality, you can use weighted accuracy. Weighted accuracy will compute accuracy but give higher or lower weights based on your input.

One way to make this metric valuable is to pass in returns as weights. Correct predictions which would have made more money receive higher weight. Incorrect predictions, which would have lost more money also receive higher weights. Correct and incorrect predictions which would have returned low profits or losses receive lower weights. This emulates how we want to evaluate strategies.

def weighted_accuracy(yn, wght, normalize=False):
"""
Weighted accuracy (normalize=True), or weighted sum 
(normalize=False):param yn: indicator function yn E 0,1 where yn = 1 when 
prediction was correct, yn = 0 otherwise
:param: wght: sample weights
"""
if normalize:
return np.average(yn, weights=wght)
elif wght is not None:
return np.dot(yn, wght)
else:
return yn.sum()

Probability Weighted Accuracy

You can probably guess, by its name, what this metric is measuring. Yes! You are correct. Probability weighted accuracy, introduced by Marcos Lopez de Prado, in Machine Learning for Asset Managers, uses probabilities to weight accuracy, similarly to log loss, however, it is much more interpretable.

Probability weighted accuracy punishes bad predictions made with high confidence more severely than accuracy, but less severely than log-loss. — Marcos Lopez de Prado

def probability_weighted_accuracy(yn, pn, K):
"""
PWA punishes bad predictions made with high confidence more
severely than accuracy, but less severely than log-loss:param yn: indicator function yn E 0,1 where yn = 1 when 
prediction was correct, yn = 0 otherwise
:param pn: max k pn,k where pn,k is the prob associated 
with prediction n of label k
:param K: num of classes
"""
return np.sum(
(yn * (pn.max(axis=1) - (K * 1.) ** -1))[:-1]) / np.sum(
(pn.max(axis=1) - (K * 1.) ** -1)[:-1])

Source link

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.