Decision Tree Best Split

HardMachine LearningMathArrayTree

Description

Given an array of numeric feature values and a parallel array of class labels, return the split threshold that minimizes the size-weighted Gini impurity. Candidate thresholds are the distinct feature values; a threshold t sends rows with feature ≤ t to the left and the rest to the right. If several thresholds tie, return the smallest.

Examples

Input:[1,2,3,4], [0,0,1,1]
Output:2
Explanation:

Each candidate threshold is scored by the impurity of the two groups it creates, weighted by their sizes, and the cleanest split is chosen.

Input:[1,2,3,4], [0,1,0,1]
Output:1
Explanation:

Each candidate threshold is scored by the impurity of the two groups it creates, weighted by their sizes, and the cleanest split is chosen.

Input:[5,10,15], [0,0,1]
Output:10
Explanation:

Each candidate threshold is scored by the impurity of the two groups it creates, weighted by their sizes, and the cleanest split is chosen.

Constraints

  • 1 ≤ length ≤ 10⁴
  • feature.length == labels.length

Ready to solve this problem?

Practice solo and sharpen your skills for technical interviews.