College Scorecard API Tutorial

Many researchers studying American higher education want to work with the College Scorecard dataset, which was created under the Obama administration and contains detailed information about college quality, affordability, and student outcomes.

It is possible to download the entire dataset, but the file is large and difficult to navigate, and for most researchers, a more efficient strategy is to learn how to query the database for just the data that you are interested in.

In the video below, I explain how to use Python to submit an API query to the server that hosts the College Scorecard data, and show how to convert the JSON output into a Pandas dataframe or Excel spreadsheet. I also demonstrate a few data analysis and visualization tasks.

The video assumes a very basic understanding of Python.

A New Notion of Equilibrium

My previous posts about admissions markets have considered a notion of equilibrium called market clearing or Walrasian equilibrium. This means that each school picks a target number of students to recruit, and adjust its admissions standards until the number of enrollees equals the target. Deferred acceptance mechanisms, which are used in public school districts like Boston and New York City to assign students to middle and high schools, are essentially algorithms for the Walrasian equilibrium. I expand this argument in “Characterizing Nonatomic Admissions Markets,” a preprint I posted on arXiv last week, and develop a parametric market that enables easy computation of market-clearing score cutoffs from school preferability parameters and vice-versa.

However, in college admissions markets, schools care about not just the size of the entering class, but also how qualified the students are. Characterizing a college’s preference of one entering class relative to another is a difficult problem. For one thing, while it is somewhat reasonable to assume that schools can make a partial ordinal ranking of their applicants (student A is better than student B), the ranking is typically not cardinal (student A is twice as good as student B). For another, even if we have a cardinal ranking, it is unclear how utility aggregates across students. If student A is twice as good as student B, then is recruiting student A equivalent to recruiting two students like B?

Nonetheless, colleges must have preferences over sets of students, because there are many liberal-arts colleges that desperately need tuition dollars, but still reject some applicants. This behavior can only be explained if adding the underqualified students to the entering class would compromise its overall quality. In turn, this suggests rather than Walrasian equilibrium, a more realistic notion of equilibrium for decentralized admissions markets is the Nash equilibrium. The Nash equilibrium means that each school has tuned its admissions standards such that changing them could only yield lower utility according to some utility function that rates the quality of the entering class. Finding the Nash equilibrium can be difficult, because each school can control only its own actions, but its utility depends on the collective actions of all the schools in the market.

Recently, I have been experimenting with modeling each school’s utility as a linear combination of the log of the size of the entering class and the log of the percentile score of its least-qualified student. We can find some interesting computational results by applying this utility function to the parametric market developed in the arXiv paper.

For example, when the number of colleges in the market is large, their utility functions approach concavity, and we can expect the market to reach a Nash equilibrium in schools’ admissions standards under light assumptions on its dynamics. However, when there are just a few schools, then the utility functions are neither concave nor necessarily unimodal. Instead, they take the cloudlike shape shown in gold in the figure below. In this animation, each school follows the gradient of its utility function to update its cutoff at each iteration, and the market converges to a local equilibrium. School 2, in the top right, could achieve higher overall utility by reducing its cutoff to around 0.17, but it never gets that far.

By picking a different update rule, such as having each school update its cutoff to its global “best response” at each iteration, the market may cycle or behave chaotically instead of converging.