advanced 'Knapsack problem'

Hello and a nice day,
all I am asking for is a small pointer:

I thought about writing a program that helps dividing a bulk of students in several classes.

[tr][td]Each student has some attributs:

  • Gender
  • Second language
  • 3 ‘Friend wishes’ (They would like to be in class with those 3 other students)
  • Performance category
  • one or two more…

Then we have a fixed set of classes.
For example:
4 classes, each is supposed to hold approximately the same student count
Each class is supposed to have as much boys as girls
2 classes offer both languages, the other two each just one,
The performance category is supposed to be mixed too.
And so on…[/td][/tr]

In the moment I just throw the students in the classes randomly and then try to optimize with an arbitrary count of steps.
In each step I pick a student and try to see if he can be placed better; or may improves the breakdown when he swaps with an other student.

But there must be a better solution out there.
I have no idea about graphs but I guess there is already an algorithm i can use?
I found some things like ‘Graph Partitioning’ but nothing I could apply. (Maybe I just overlooked the solution…)

Can you tell me what I should google for? Any reduction for ‘Graph Partitioning’ or an other keyword,
that would be a big help!

-ClaasJG

Can I ask you why you chose this problem? It seems rather specific- is it homework?

Your approach sounds like a greedy algorithm- it might work okay, but it will get stuck in a “local optimum” if, for example, the best solution involves moving more than one student. A cheap way to get around that is by moving X students instead of 1, but that quickly gets into taking-too-long land.

A more interesting approach might be to use other algorithms, such as evolutionary computation.

But it’s hard to help with general “how do I do this” type questions without seeing anything you’ve tried.

What exactly is the cost function?

You might also take a look at the assignment problem, including the ‘see also’ links.

This also sounds like something you could potentially solve with a theorem prover like Z3.

[…]Can I ask you why you chose this problem? It seems rather specific- is it homework?[…]
No my father is teacher (‘Mittelstuffenleiter’ I don’t know the English equivalent , sry) a each year he needs about 4 days intensive work to create the classes.
He ask me if I can help him, that’s why I ‘chose’ that problem :wink:

[…]What exactly is the cost function?[…]
That exactly is my problem. I have so many attributes and each is weight differently.
Some are ‘absolute’: A student with the second language ‘a’ is not allowed in class ‘xy’
Some change with each decision: Lets say student ‘x’ has 3 other students he wants to be with in the same class.
Every class is supposed to be approx. the same size, aso.

But from the links you provided me I got an idea what to do (while I have no idea if it is worth implementing it but I will try)
I will create a directed graph holding students and their ‘wishes’. I will place them geometrically.
Students will be placed close with their friends and further away from other students on a plane.

Then I will alter the ‘y’ dimension for each student. Language 1 becomes a ‘layer’ and language 2. (This should be possible for more then 2 languages in multiply dimensions).
Now each student will attract their friends in y dimension. This way we will have some students in both layers (lang. 1 and lang. 2) and some between (mixed class).

I can do that for every attribute adding an other dimension, and then I should be able to simply ‘stamp’ the classes out with a N dimensional ‘cookie form’.
I may have to do some checks. (So no lang. 1 student gets dragged into a pure lang. 2 class). But it would allow me to filter for a gender average or class size aso)

This is maybe not a mathematical solution, but while I am inspired by the links you gave me, I have my trouble with math :frowning:

I will write in this thread again, but I still have to dig into the topic. I knew a matrix just for transforming things, never knew about a determinant and I have some other knowledge deficits.

-ClaasJG

I don’t think you can do this in a reasonable time if the number of students is large (i.e., with a small order "big O).

Have you considered a non-deterministic method like an evolutionary algorithm (there are more complex approached that simulate physical process but I think an evolutionary approach is the easier to understand if you aren’t a physics geek).

You could spit them randomly and then assign points (based on priority) for things that match criteria (say, even balance of races and sexes, number that share a language, number of friend matches, etc.). Then, randomly mutate the sets by swapping some members, rate them, and keep the best fit (there are more complex approaches that try to simulate biology exactly, but that is probably good enough). If there is a “must” criterion (say, all the same language) you could pre-sort the whole set and split them, and do the rest on each created subset.

Good luck!

Yeah, this is an SMT problem http://en.wikipedia.org/wiki/Satisfiability_modulo_theories. What you are doing is called constraint programming.

If you really want to roll your own, I’d do it this way:

  • first make a solver that finds a solution to the hard (boolean) constraints
  • generate a single valid solution
  • traverse the space of valid solutions, testing the overall value of the soft constraints as you go
  • after some time or after exhausting the search space, return the best result found so far

From your description of the problem it shouldn’t be too hard to define a function that takes a valid solution and generates similar, valid solutions for the traversal, so this kind of approach should be ok.

If you can make the first step non-deterministic then keep repeating with greedy searches from new valid start points – this is much less prone to getting stuck in local optima than evolutionary algorithms and will be quicker too.

I kept thinking of this and I need to explain why I suggested a non-deterministic method is that the part about matching friend is a little vague. If these friends are discrete groups of for, for which any member has the other three for friends, then an analytical approach seems reasonable and not all that hard (I would go by splitting and recombining even portion of things to be balanced). If its like real life, where John’s best friends may be Bob, Joe, and Jane but Jane’s are John, Mary, and Sue (forming complex networks that reach through the whole sample) I can’t think of a non-brute force (factorial time) purely analytic solution. However, that doesn’t mean there isn’t one – I don’t know a things about formal SMT theory (as in Juno’s post right above this; I might read up on that), which I’ve actually never heard of before, so that may offer another way I’d never think of.

I finished a rudimentary version so far.
It could be marked as finished (at least the implementation of the ‘idea’ now I have to refactor to avoid things like List<List<List>>).
It now sorts the students taking care about class size, a gender and a performance average, wishes aso.
I just have to make it more userfriendly and tweak the Heuristic a bit more.
Here is a small ‘teaser’.

Supposed I will reach a state I am happy about the code I can post it.

-ClaasJG
Thanks for the help. I heard about ‘NP’ and ‘P’ and I hate it :wink: