College Application as Portfolio Optimization

As I approach graduation, I have narrowed the scope of my thesis research to a particular subproblem arising in equilibrium model of the college admissions market which turns out to be quite interesting on its own: the optimal college application strategy. For each school, you are given a utility value, your probability of admission, and the cost of submitting an application. Given a fixed budget to spend on application fees, to which schools should you apply?

We can think of the college application strategy as a portfolio optimization problem. Given a selection of random variables, you are to purchase the subset that maximizes a payoff function subject to a budget constraint. However, unlike the traditional Markowitz model, the portfolio is not continuously allocated: You cannot apply to half of one school and half of another. Moreover, students care not about the aggregate performance of the portfolio (for example, the expected sum of stock prices minus their variance times a risk coefficient) but about the performance of its best asset: If you get into your favorite school, it doesn’t matter what happens at your second and third choice.

These two modifications significantly increase the difficulty of the problem. In the special case where all colleges have identical application fees, which is equivalent to the Korean admissions system in which you are only allowed to apply to h = 3 schools, the intuitive algorithm that picks the top h schools after discounting their utility values by the probability of rejection can be off by a factor of h. There is an exact, polynomial-time algorithm, but it is somewhat subtle. In the general setup with arbitrary application fees, we can show that the problem is NP-complete. Nonetheless, we can produce an exact solution in pseudopolynomial time using dynamic programming, and there also exists a fully polynomial-time approximation scheme.

A pre-alpha version of my work is now on Github. My immediate next steps are to improve the documentation of the Julia implementation and produce benchmarks to confirm the theoretical performance guarantees on time complexity, memory usage, and approximation error. Going forward, there are further generalizations of the problem and alternative framings that may warrant further exploration.