PrefLib
PrefLib is a reference library of preference data maintained by Nicholas Mattei and Simon Rey. The original version of this website was developed by Nicholas Mattei and Toby Walsh.
We aim at providing a comprehensive resource for the multiple research communities that deal with preferences: computational social choice, recommender systems, data mining, machine learning, combinatorial optimization, to name just a few.
The strength of PrefLib is to provide carefully curated data, formatted in a unified format. We encourage the users to read the detailed explanations that we provide regarding the format and the modeling choices. Once everything is clear, feel free to explore the datasets we are hosting, or to search for specific files that may interest you.
PrefLib-Tools
Providing data is only the first step, the next one being actually using the data. To help you with that, we provide a Python library specifically designed to work with PrefLib instances: the PrefLib-Tools. This library is distributed in PyPi.
Have a look at the preflibtools repository, where you can download the code; and check out the documentation!
Data Usage and Citation Policy
Constructing and maintaining this website and its database requires a lot of work. We ask that you provide a reference to our website when publishing research based on data gathered here. Here are some references you can use.
- Nicholas Mattei and Toby Walsh. PrefLib: A Library of Preference Data. Proceedings of Third International Conference on Algorithmic Decision Theory (ADT 2013) — PDF — Bibtex.
- Nicholas Mattei and Toby Walsh. A Preflib.org Retrospective: Lessons Learned and New Directions. Trends in Computational Social Choice — PDF — Bibtex.
In addition, many dataset have specific citation requirements. Make sure to always include them whenever you used a file taken from such a dataset (especially if you downloaded aggregated data files).
Contributing to PrefLib
We rely on the support of the community in order to increase the usefulness and coverage of this site. If you want to donate a new dataset, report an issue with an existing dataset, or suggest changes to the website, several GitHub repositories are at your disposal:
- PrefLib-Data hosts the (raw) data and the related scripts;
- PrefLib-Jekyll hosts the Jekyll project for the website;
- PrefLib-Tools hosts the code for the PrefLib-Tools.
If you need anything, have a look at those repositories, open new issues, comments, like, subscribe and share the word!
In Brief
We currently host:
- 68 datasets
- 12562 data files
- More than 2.83 GB of data
Other Links
Here are some links that you might find relevant as well.
- DEMOCRATIX: A Declarative Approach to Winner Determination
- Pnyx: An Easy to Use Aggregation Tool
- Whale4: Which Alternative is Elected?
- VoteLib: A Library of Voting Behavior
- Pabulib: A Library of Participatory Budgeting Instances
- CRISNER: A Qualitative Preference Reasoner
- Spliddit: Quick and Easy Solutions to Fair Division Problems
- RoboVote: AI Driven Decisions
To find more data check these websites.
- UC Irvine Machine Learning Repository
- University of Minnesota GroupLens Data Sets
- CSPLib: A Problem Library for Constraints
- Microsoft Learning to Rank Datasets
- SATLib: The Satisfiability Library
- Preference-Learning.org
- Toshihiro Kamishima's Sushi Preference Dataset
- MAX-SAT Evaluations and Datasets
- Stanford Network Analysis Project
The community
We want to thank all who participated in the development of PrefLib. This list may not be exhaustive, but we hope it is. Contact us if you feel unjustly treated.
- Haris Aziz and Omer Lev: who provided great help at the launch of the website, discussing the design of the first version of the website, and troubleshooting it.
- Niclas Boehmer: who contributed a large amount of data, the datasets summarized on this page (together with Nathan Schaar), the Polkadot Network, the Kusama Network, the Eurovision Song Contest, the Marble League, the United Kingdom General Elections, the Comparative Study of Electoral Systems, and the Poland Local Elections datasets.
- Dominik Peters: who provided useful feedback, created the amazing new logo, committed some code, and contributed the Breakfast dataset.
- Jeffrey O'Neill: maintainer of the website OpenSTV.org. (now OpaVote) who contributed a lot of datasets: irish, Burlington, Glasgow, Aspen, Berkley, Minneapolis, Oakland, Pierce, San Francisco, San Leandro, Takoma Park.
- Robert Bredereck: who contributed the F1 and Skiing, Web Search, and the Web Search Clean datasets.
- Jean-Francois Laslier: who contributed the datasets French Presidental Election, Proto French Election, and its rating counterpart; together with Karine Van der Straeten, and Michel Balinski.
- Andrew Mao: who contributed the Dots, and Puzzle datasets.
- Nicolaus Tideman and Florenz Plassmann: who contributed the ERS dataset.
- Piotr Faliszewski: who contributed the AGH dataset.
- Carleton Coffrin: who contributed the T-Shirt dataset.
- Lihi Dery: who contributed the Social Recommendation dataset.
- Toshihiro Kamishima: who contributed the Sushi dataset.
- Michal Regenwetter and Anna Popova: who contributed the APA dataset.
- Jeremy A. Hansen: who contributed the Vermont District Races dataset.
- Alejandro Rosete Suarez and Milton Garcia Borroto: who contributed the Cujae dataset.
- Ulle Endriss: who contributed the Poster Competition dataset and part of the AAMAS dataset.
- Ioannis Caragiannis: who contributed the Cities Survey dataset.
- John P. Dickerson: who contributed the Kidney dataset.
- Rafael Bordini, Edith Elkind, John Thangarajah, and David Shield: who contributed part of the AAMAS dataset.
- David Manlove: who contributed the Project Bidding dataset.
- Hongning Wang: who contributed the Trip Advisor dataset.
- Martin Lackner: who contributed the Parliamentary Elections dataset.
- Andrzej Kaczmarczyk: who contributed the Camp Songs dataset.
- Michał Ramsza and Honorata Sosnowska: who contributed the Alternative Order Experiment .
- Dušan Knop and Šimon Schierreich: who contributed the CTU Tutorial Time Selection dataset.
These Papers are Using PrefLib
Below is a list of papers that have made use of or directly referenced data stored here at PrefLib. The papers have been automatically added from Google Scholar, if there is a problem with a paper or if you want to add one paper, please contact us.
Even more references, papers, and tutorials can be found in the proceedings of the EXPLORE Workshops:
Some Tools to Work with Preferences
Empirical experiments with real data are becoming a more fundamental part of work in computational social choice. In addition to a lightweight set of tools for working with data from PrefLib we also host documentation for several of these project. Please contact Nick if you have code that you would like to share with the community.
Iterative Voting Simulator
This is a voting simulator built for the paper A Local-Dominance Theory of Voting Equilibria. We are releasing its source code to be expanded and enhanced by the community. However, it is quite versatile in its current construction, and can be used for various simulations "as is".
Kidney Dataset Generator
Kidney failure is a life-threatening health issue that affects hundreds of thousands of people worldwide. In the US alone, the waitlist for a kidney transplant has over 100,000 patients. This list is growing: demand far outstrips supply.This codebase includes: structural elements of kidney exchange like "pools", "hospitals", and "pairs", a couple of kidney exchange graph generators, a couple of kidney exchange solvers (max weight, failure-aware, fairness-aware, individually rational), and a dynamic kidney exchange simulator.
CRISNER: A Qualitative Preference Reasoner for CP-nets, TCP-nets, CP-Theories
CRISNER stands for Conditional and Relative Importance Statement Network PrEference Reasoner. It can reason about ceteris paribus preference languages such as CP-nets, TCP-nets and CP-theories. Given a preference specification (a set of preference statements) in one of these languages, CRISNER succinctly encodes its induced preference graph (IPG) into a Kripke structure model in the language of the NuSMV model checker. This Kripke model is reachability-equivalent to the induced preference graph. CRISNER generates the model only once, and then translates each query posed against this preference specification into a temporal logic formula in computation-tree temporal logic (CTL) such that the formula is verified in the Kripke model if and only if the query holds true according to the ceteris paribus semantics of the preference language. The model checker either affirms the query or returns with a counterexample. For answering queries related to equivalence and subsumption checking of two sets of preferences, CRISNER constructs a combined IPG and uses temporal queries in CTL to identify whether every dominance that holds in one also holds in the other, and vice-versa.