Decision Tree Best Split
Description
Given an array of numeric feature values and a parallel array of class labels, return the split threshold that minimizes the size-weighted Gini impurity. Candidate thresholds are the distinct feature values; a threshold t sends rows with feature ≤ t to the left and the rest to the right. If several thresholds tie, return the smallest.
Examples
[1,2,3,4], [0,0,1,1]2Each candidate threshold is scored by the impurity of the two groups it creates, weighted by their sizes, and the cleanest split is chosen.
[1,2,3,4], [0,1,0,1]1Each candidate threshold is scored by the impurity of the two groups it creates, weighted by their sizes, and the cleanest split is chosen.
[5,10,15], [0,0,1]10Each candidate threshold is scored by the impurity of the two groups it creates, weighted by their sizes, and the cleanest split is chosen.
Constraints
- •
1 ≤ length ≤ 10⁴ - •
feature.length == labels.length