Information Gain

HardMachine LearningStatisticsArray

Description

Given an array of numeric feature values, a parallel array of integer class labels, and a split threshold, return the information gain of splitting the data into a left group (feature ≤ threshold) and a right group (feature > threshold). Information gain is the entropy of all labels minus the size-weighted average entropy of the two groups, where entropy uses log base 2. Round the result to 4 decimal places.

Examples

Input:[1,2,3,4], [0,0,1,1], 2
Output:1
Explanation:

Splitting at threshold 2 separates the labels into a left group at or below it and a right group above it. The split sends each class cleanly to its own side, fully separating them.

Input:[1,2,3,4], [0,1,0,1], 2
Output:0
Explanation:

Splitting at threshold 2 separates the labels into a left group at or below it and a right group above it. Both sides keep the same class mix as the whole, so the split adds no information.

Input:[1,2,3,4,5], [0,0,0,1,1], 3
Output:0.971
Explanation:

Splitting at threshold 3 separates the labels into a left group at or below it and a right group above it. The gain is the parent entropy minus the size-weighted entropy of those two groups.

Constraints

  • 1 ≤ length ≤ 10⁴
  • feature.length == labels.length

Ready to solve this problem?

Practice solo and sharpen your skills for technical interviews.