Split group of courses into two with constraints

74 Views Asked by At

The school I work for is trying to split the district of students into an A group and a B group while taking into consideration students from the same family and balancing class sections to be as close to 50/50 as possible.

I have a spreadsheet of students that is similar to the following:

enter image description here

Could anyone point me into a direction that would allow me optimize this list into two groups and maintaining the family to be in the same group? Meaning - All of Family ID 286 need to be in the same group A or B? All while balancing CourseID to be as close to a 50/50 split as possible? The end goal is to have each course ID split as equally as possible and maintain families within the same group.

Unfortunately I don't even know what kind of math this would be - so not sure what tag to assign it to? Thanks!

2

There are 2 best solutions below

2
On BEST ANSWER

Number the students from $1$ to $n$. For each student $k$ we'll have a variable $x_k$ that takes the value $0$ or $1$, where $0$ indicates that the student is assigned to group A, and $1$ that he is assigned to group B. For each class, we need an equation, or an inequality, indicating that about half the class is in group B.

For example, suppose one class is made up of students $1,2,4,8,9,11,14$. There are $7$ students in the class, so ideally we want $$3\leq x_1+x_2+x_4+x_8+x_9+x_{11}+x_{14}\leq4$$ When students are in the same family, their variables must take the same value. If students $4,8,17$ are in the same family, then we have the equations $$x_4=x_8\\x_4=x_{17}$$

This then is a problem in $0$-$1$ integer linear programming, in which only the constraints must be satisfied, and unfortunately, this is known to be NP-complete.

In the form I have stated the problem, it isn't necessarily true that a solution exists, so we should perhaps loosen the requirements. For each class, define the "assignment error" to be the difference between the number of group B students in the class and half the class size. For the example class above, the error would be $$\left\lvert x_1+x_2+x_4+x_8+x_9+x_{11}+x_{14}-3.5\right\rvert$$ Then we would seek to minimize the sum of the assignment errors over all classes, subject to the family constraints.

Still we have $0$-$1$ variables, so I believe, though I'm not certain, that it's still NP-complete. I am not at all expert in this area, as I imagine has been apparent from the discussion. I've edited the tags in hopes of attracting the attention of an expert.

How many students and classes do you have? You should add as much detail as possible to your question, so that if an expert does look at it, he'll be able to give you good advice.

You'll almost surely need software to do the problem, unless it's very small, and I'm sure there would be a learning curve. Here's a list of such software to get you started.

Also, if you don't have any luck here, you might try https://scicomp.stackexchange.com/ which seems to have related questions.

Sorry I can't be of more help. Good luck.

5
On

As @saulspatz described, you can solve the problem exactly via integer linear programming. An alternative heuristic approach to this partition problem is to apply a greedy algorithm that is guaranteed to be no more than 17% above optimal and is typically closer than 17%. First merge all students from the same family, yielding one positive integer count per family. Then sort the families in decreasing order of this count. Finally, loop through this list, adding each family to the group that has the smaller number of people.