Advisories for Pypi/Scikit-Learn package

2024

scikit-learn sensitive data leakage vulnerability

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the stop_words_ attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the …

2020

Denial of Service

The svm_predict_values in svm.cpp in Libsvm, as used in scikit-learn and other products, allows attackers to cause a denial of service (segmentation fault) via a crafted model SVM (introduced via pickle, json, or any other model permanence standard) with a large value in the _n_support array. Note, the scikit-learn vendor's position is that the behavior can only occur if the library's API is violated by an application that changes a …