Information Gain
Description
Given an array of numeric feature values, a parallel array of integer class labels, and a split threshold, return the information gain of splitting the data into a left group (feature ≤ threshold) and a right group (feature > threshold). Information gain is the entropy of all labels minus the size-weighted average entropy of the two groups, where entropy uses log base 2. Round the result to 4 decimal places.
Examples
[1,2,3,4], [0,0,1,1], 21Splitting at threshold 2 separates the labels into a left group at or below it and a right group above it. The split sends each class cleanly to its own side, fully separating them.
[1,2,3,4], [0,1,0,1], 20Splitting at threshold 2 separates the labels into a left group at or below it and a right group above it. Both sides keep the same class mix as the whole, so the split adds no information.
[1,2,3,4,5], [0,0,0,1,1], 30.971Splitting at threshold 3 separates the labels into a left group at or below it and a right group above it. The gain is the parent entropy minus the size-weighted entropy of those two groups.
Constraints
- •
1 ≤ length ≤ 10⁴ - •
feature.length == labels.length