Below you can find all the datasets. You can order them differently and filter them by tag easily.

Combinatorial Election Experiment Matching Mturk Politics Ratings Sport STV

Sort by: Newest first · Oldest first · Dataset name (A-Z) · Series Number

  • Poland Local Elections

    00068

    Election Politics

    This dataset collects voting data from recent Polish local elections. In 2014, in all cities with up to 100 000 inhabitants a first-past-the-post system was used. For this, all cities with up to 20 000/50 000/100 000 inhabitants where divided into 15/21/23 constituencies. The dataset consists of elections from 1317 cities (excluding ones with low vote length). In a file, each constituency is considered to be a voter, ranking the alternatives as in the election results of that constituency.

    This dataset was donated by Niclas Boehmer.

    Consists of 1315 data files.

  • Comparative Study of Electoral Systems

    00067

    Election Politics

    This dataset presents data collected as part of the Comparative Study of Electoral Systems. This study consists of post-election studies from (federal) elections from different countries. In some of these post-election studies, participants were asked to rank all important political parties or leaders in their country that they know on a scale from 0 to 10 according to how much they agree with the views of the party. For each of the 174 post-election studies where this question was asked, a data file was created with the parties as candidates. Each voter in the data file then corresponds to a participant in the survey and ranks the parties according to the participant's answer. Check the website of the CSES for more details.

    This dataset was donated by Niclas Boehmer.

    Consists of 305 data files.

  • United Kingdom General Elections

    00066

    Election Politics

    This dataset collects voting data from recent UK general elections. For each general elections, the UK territory is divided into constituencies. In a file, each constituency is considered to be a voter, ranking the alternatives as in the election results of that constituency.

    This dataset was donated by Niclas Boehmer.

    Consists of 13 data files.

  • Marble League (FKA Marble Olympics)

    00065

    Sport

    The Marble League (formerly known as the MarbleLympics) is an annual tournament where marbles from different teams compete against each other in a number of different sports events (see here for more details).

    For each instance of the league, several events are organised that all lead to an intermediate ranking of the competitors. In the files, each event corresponds to a voter ranking the alternatives as they were ranked in the event.

    This dataset was donated by Niclas Boehmer.

    Consists of 4 data files.

  • Eurovision Song Contest

    00064

    Election

    This dataset collects the vote from the European Song Contest. Every candidate is a country (resp. their representative singer) and every vote is also a country. In the original format they only organised a final, from 2004 onwards semi-finals were added.

    This dataset was donated by Niclas Boehmer.

    Consists of 73 data files.

  • CTU AG1 Tutorial Time Selection

    00063

    Election

    This dataset contains the results of surveying students of the Czech Technical University in Prague about their preferred tutorial time. Each student selected, from the set of predefined alternatives, those that fits into their schedule.

    The data on this page has been donated by Dušan Knop and Šimon Schierreich.

    Consists of 1 data file.

  • Alternative Order Experiment

    00062

    Experiment

    This dataset contains the results of a simple experiment regarding voting over landscape images with varying displaying order. There are 19 agents, each voting in two rounds. Eight images (alternatives) are denoted by A through H. In the first round, the images were displayed in the sequence A, B, C, D, E, F, G, H, while in the second round, the sequence was D, C, B, A, H, G, F, E.

    To allow identify voters from the first to the second round, in addition to our standard file formats, we provide a CSV file that provides the preferences submitted by a voter in both rounds.

    These data were donated by Honorata Sosnowska from the SGH Warsaw School of Economics. The work concerned with the data was supported by the SGH Warsaw School of Economics grant KAE/S21 and the National Center for Science grant UMO-2018/31/B/HS4/01005 Opus 16.

    Consists of 3 data files.

  • Kusama Network

    00061

    Election

    Certain blockchain protocols conduct approval-based committee elections on a day-to-day basis. Specifically, these elections occur in blockchains using the Nominated Proof-of-Stake (NPoS) protocol. In this system, a subset of stakeholders, called validators, are elected to run the consensus protocol, which is crucial for the integrity of the blockchain. The problem of selecting the validators can be modeled as a committee election.

    This dataset presents the voting data of the Kusama network, a blockchain system that implements the Nominated Proof-of-Stake (NPoS) protocol. The dataset contains 96 elections from the Polkadot blockchain. These elections contain roughly 2000 candidates and 10 000 voters each.

    Note that in practice voters are assigned weights (that are of highly different scales). We cannot present this data in the PrefLib data. To every ".cat" file that includes the approval ballots corresponds thus a ".dat" file that describes the weights.

    This dataset has been converted into the PrefLib format based on the sources provided by Niclas Böhmer (available here).

    Consists of 1520 data files.

  • Polkadot Network

    00060

    Election

    Certain blockchain protocols conduct approval-based committee elections on a day-to-day basis. Specifically, these elections occur in blockchains using the Nominated Proof-of-Stake (NPoS) protocol. In this system, a subset of stakeholders, called validators, are elected to run the consensus protocol, which is crucial for the integrity of the blockchain. The problem of selecting the validators can be modeled as a committee election.

    This dataset presents the voting data of the Polkadot network, a blockchain system that implements the Nominated Proof-of-Stake (NPoS) protocol. The dataset contains 96 elections from the Polkadot blockchain. These elections contain between 18 202 and 48 025 voters and between 920 and 1080 candidates.

    Note that in practice voters are assigned weights (that are of highly different scales). We cannot present this data in the PrefLib data. To every ".cat" file that includes the approval ballots corresponds thus a ".dat" file that describes the weights.

    This dataset has been converted into the PrefLib format based on the sources provided by Niclas Böhmer (available here).

    Consists of 496 data files.

  • Camp Songs

    00059

    Election

    The dataset consists of two pre-camp surveys, conducted for youth summer camps in 2022 and 2023 in Poland. Several weeks before each camp, campers were asked to fill out a survey, which included (among others) two questions related to CCM-genre music pieces. Responses to each question form an approval election. So, for each year, we obtained two elections.

    In the first question, survey participants--that is, voters---were presented approximately 80 song titles---candidates. The participants were asked to select at least 15 of their favorite ones to sing during camp activities. The lower bound on the number of selections was not enforced. In the second question, which involved far fewer songs (around ten), the participants were asked to select songs they would like to learn. This time, there was no indication of the number of choices, hence, some participants selected none.

    The questions remained the same in all years, however, the presented songs were different. Specifically, in 2022, the first and second questions involved 78 and 8 songs, respectively. A year later, the respective numbers were 82 and 10. The survey had 39 participants in 2022 and 56 in 2023.

    The data on this page has been donated by Andrzej Kaczmarczyk.

    Consists of 4 data files.

  • NSW Legislative Assembly Election Data

    00058

    Election Politics STV

    The New South Wales (NSW) Legislative Assembly is the lower of two houses of the Parliament of New South Wales, an Australian state. The Assembly comprises 93 seats, each representing one of 93 Districts.

    In these elections, voters submitted Optional Preferential Votes; these ballots required at least one candidate to be specified. The outcome of each election was determined by the Instant-Runoff Voting (IRV) social choice function.

    The data sets posted below correspond to each of the NSW districts in the 2015, 2019 and 2023 NSW Legislative Assembly elections. The elements numbered 1-93 correspond to the 2015 election, those numbered 94-186 correspond to the 2019 election, and those numbered 187-279 correspond to the 2023 election.

    These datafiles comprise all formal votes cast in each contest, with informal votes omitted.

    Consists of 279 data files.

  • Parliamentary Elections

    00057

    Election Politics

    This dataset gathers parliamentary elections. The Austrian elections were provided by Martin Lackner.

    Consists of 9 data files.

  • Seasons Power Ranking

    00056

    Election

    This dataset contains elections generated from weekly power rankings. Specifically, the underlying power ranking data (kaggle.com/masseyratings/rankings) contains weekly power rankings of college basketball teams (between 2001 and 2021), college baseball teams (between 2010 and 2021), and college American football teams (between 1997 and 2021) from different media outlets and ranking systems.

    For each of the three sports (basketball, baseball, American football), for each season and each ranking system, we created an election where each vote corresponds to the power ranking of the teams in one week of the season according to the ranking system.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the sports, followed by the relevant year and ranking system.

    The combined power rankings and weekly power rankings datasets were generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 4015 data files.

  • Combined Power Ranking

    00055

    Election

    This dataset contains elections generated from weekly power rankings. Specifically, the underlying power ranking data (kaggle.com/masseyratings/rankings) contains weekly power rankings of college basketball teams (between 2001 and 2021), college baseball teams (between 2010 and 2021), and college American football teams (between 1997 and 2021) from different media outlets and ranking systems.

    For each of the three sports (basketball, baseball, American football), for each season, we created an election where each vote corresponds to the power ranking of the teams in one of the weeks of the season according to one of the ranking systems.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the relevant sports followed by the relevant year.

    The season power rankings and weekly power rankings datasets were generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 53 data files.

  • Weeks Power Ranking

    00054

    Election

    This dataset contains elections generated from weekly power rankings. Specifically, the underlying power ranking data (kaggle.com/masseyratings/rankings) contains weekly power rankings of college basketball teams (between 2001 and 2021), college baseball teams (between 2010 and 2021), and college American football teams (between 1997 and 2021) from different media outlets and ranking systems.

    For each of the three sports (basketball, baseball, American football), for each week in one of the seasons, we created an election where each vote corresponds to the power ranking of the teams in this week according to one of the ranking systems.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the sports, followed by the day on which the power rankings were published (there is one day from each covered weeek).

    The combined power rankings and season power rankings datasets were generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 956 data files.

  • Formula 1 Races

    00053

    Election

    This dataset contains elections generated from the Formula 1 World Championship. The underlying Formula 1 data (kaggle.com/rohanrao/formula-1-world-championship-1950-2020) contains the finishing times of all drivers in all laps of races taking place between 1950 and 2020. For each race (taking place between 1950 and 2020), we created an election where each vote corresponds to a lap in the race and ranks the drivers by the time they spend in this lap.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the year in which the race took place followed by the name of the race.

    The Formula 1 seasons dataset contains elections generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 454 data files.

  • Formula 1 Seasons

    00052

    Election

    This dataset contains elections generated from the Formula 1 World Championship. The underlying Formula 1 data (kaggle.com/rohanrao/formula-1-world-championship-1950-2020) contains the finishing times of all drivers in all laps of races taking place between 1950 and 2020. For each year, we created an election where each vote corresponds to a race in this year and ranks the drivers by their total finishing time in this race.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    The Formula 1 races dataset contains elections generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 71 data files.

  • Countries Ranking

    00051

    Election

    This dataset contains elections generated from indicator-based rankings of countries. For each year between 2005 and 2016, the underlying country ranking data (based on the popular world happiness report; kaggle.com/alcidesoxa/world-happiness-report-2005-2018) contains different quantitative indicators for the happiness of citizens from over 100 countries. For each year, we created an election where the countries are the candidates and each vote ranks them according to one indicator.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    Other indicator-based rankings have been used to create the university rankings and city rankings datasets.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 12 data files.

  • Movehub City Ranking

    00050

    Election

    This dataset contains an election generated from indicator-based rankings of cities. The underlying city ranking data (kaggle.com/blitzr/movehub-city-rankings) contains twelve quantitative indicators for the life quality in 216 different cities determined by movehub.com. We created a single election where each city is a candidate and each vote corresponds to the ranking of the cities with respect to one of the indicators.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    Other indicator-based rankings have been used to create the country rankings and university rankings datasets.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 1 data file.

  • Multilaps Competitions

    00049

    Election

    This dataset contains elections generated from multi-lap sports competitions.

    The underlying mylaps data contains the completion time of athletes in each lap of a multi-lap competition (specifically, speed skating and cycling competitions) crawled from results.sporthive.com. For each race, we created an election in which the athletes are the candidates and each vote corresponds to one lap and ranks the athletes by their completion time.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 635 data files.

  • Spotify Countries Chart

    00048

    Election

    This dataset contains elections generated from charts on Spotify. For each day between the 1st of January 2017 and the 9th of January 2018, the Spotify data (kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking) contains a daily ranking of the 200 most listened songs in 53 different countries. In our elections, candidates model songs. For each month and each country, we created an election where each vote corresponds to the ranking of the songs on one day of the month in the country.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of a patch starts with an abbreviation of the relevant country followed by the relevant year and the month. Note that the names of candidates are the IDs of the respective songs on spotify (e.g., candidate 10nqz67NQWWa7XPq7ycihi corresponds to "Welcome to New York" from Taylor Swift open.spotify.com/track/10nqz67NQWWa7XPq7ycihi).

    The spotify daily charts dataset contains elections generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 645 data files.

  • Spotify Daily Chart

    00047

    Election

    This dataset contains elections generated from charts on Spotify. For each day between the 1st of January 2017 and the 9th of January 2018, the Spotify data (kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking) contains a daily ranking of the 200 most listened songs in 53 different countries. In our elections, candidates model songs. For each day between the 1st of January 2017 and 9th January 2018, we created an election where each vote corresponds to the ranking of the songs on this day in one of the 53 countries.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. Note that the names of candidates are the IDs of the respective songs on spotify (e.g., candidate 10nqz67NQWWa7XPq7ycihi corresponds to "Welcome to New York" from Taylor Swift ).

    The spotify country charts dataset contains elections generated from the same data.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 362 data files.

  • Global University Ranking

    00046

    Election

    This dataset contains elections generated from indicator-based rankings of universities. For each year between 2012 and 2015, the university ranking data (kaggle.com/mylesoneill/world-university-rankings) contains rankings of universities according to different criteria provided by three systems. For each year, we created an election where the universities are the candidates and each vote ranks them according to one criterion used by one of the three systems.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    Other indicator-based rankings have been used to create the country rankings and city rankings datasets.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 4 data files.

  • Tennis Ranking

    00045

    Election

    This dataset contains elections generated from tennis world rankings. The underlying tennis data (kaggle.com/mimoopoo/atp-tennis-rankings-1990-to-2019) contains weekly rankings of the top 100 male tennis players published by the ATP between January 1990 and September 2019. For each year, we created an election where each player is a candidate and each vote corresponds to the ranking of the players in one week.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    Other sports world rankings have been used to create the table tennis world rankings and boxing world rankings datasets.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 29 data files.

  • Table Tennis Ranking

    00044

    Election

    This dataset contains elections generated from tennis world rankings. The underlying table tennis data (kaggle.com/romanzdk/ittf-table-tennis-player-rankings-and-information) contains the monthly ITTF ranking of the top 500-1500 male and female table tennis players between 2001 and 2020. For each year, we created an election where each player is a candidate and each vote corresponds to the ranking of the players in one month.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the relevant gender followed by the year.

    Other sports world rankings have been used to create the boxing world rankings and tennis world rankings datasets.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 38 data files.

  • Cycling Races

    00043

    Election

    This dataset contains elections generated from road bicycle racing competitions. It consists of two parts.

    Tour de France. For each edition of the Tour de France between 1903 and 2021, the underlying Tour de France data (procyclingstats.com) contains the completion times of all riders for each stage. For each edition, we created an election in which the riders are the candidates and each vote corresponds to a stage and ranks the riders by their completion time.

    Giro d'Italia. For each edition of the Giro d'Italia between 1910 and 2020, the underlying data Giro d'Italia data (procyclingstats.com) contains the completion times of all riders for each stage of the edition. For each edition, we created an election in which the riders are the candidates and each vote corresponds to a stage and ranks the riders by their completion time.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts either with gdi (Giro d'Italia) or tdf (Tour de France), followed by the respective year.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 196 data files.

  • Boxing

    00042

    Election

    This dataset contains elections generated from boxing world rankings. The underlying boxing data (kaggle.com/martj42/ufc-rankings) contains the Ultimate Fighting Championship rankings of the top 16 fighters in twelve different weight classes in different weeks between February 2013 and August 2021. For each year and weight class, we created an election where each fighter is a candidate and each vote corresponds to the ranking of the fighters in one week.

    Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the relevant weight class followed by the year.

    Other sports world rankings have been used to create the table tennis world rankings and tennis world rankings datasets.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 99 data files.

  • Boardgames Geek Ranking

    00041

    Election

    This dataset contains an election generated from board game charts.

    The underlying board games data (kaggle.com/mseinstein/bgg_top2000) contains a weekly ranking of the 2000 most popular board games on boardgamegeek.com between October 2018 and December 2021. We created a single election where each game is a candidate and each vote corresponds to the ranking of the games in one week.

    The patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.

    This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.

    Consists of 1 data file.

  • Breakfast Items

    00035

    Election

    This dataset was collected by Green and Rao (1972), and is a standard example in the literature on multidimensional unfolding (which is about embedding preferences in Euclidean space). They obtained strict preference rankings over a collection of 15 sweet breakfast items (such as toast, muffins, donuts) from "a group of 42 respondents, 21 Wharton MBA students and their wives. The questionnaire was self-administered separately by husband and wife. All subjects independently filled out the same questionnaire and received compensation for their efforts."

    Green and Rao asked for these preferences in 6 different situations: "Overall preferences", "When I'm having a breakfast consisting of juice, bacon and eggs, and beverage", "When I'm having a breakfast consisting of juice, cold cereal, and beverage", "When I'm having a breakfast consisting of juice, pancakes, sausage, and beverage", "Breakfast, with beverage only", "At snack time, with beverage only". The rankings for each of these situations are provided in separate data patches. A .csv file presents the ranking of a same respondent across the different data files; rankings in odd positions (1st, 3rd, ...) come from the MBA students, the ranking in the following line from that student's wife.

    This dataset was digitized by Dominik Peters from the listings provided in the appendix of the 1972 book.

    Consists of 7 data files.

  • Computer Science Conference Bidding Data

    00039

    Matching

    This dataset contains the bidding data from 3 Computer Science Conferences. This contains the bids of all reviewers (aside a small number of opt-outs) over a subset of papers at the conference.

    The bidding language for these conferences is yes/maybe/conflict. In order to make these more useful for PreLib users, we have converted them to incomplete partial orders of the form {yes} > {maybe} > {no response}. The papers for which a reviewer had a conflict have been removed from their preference list. All reviewers had different preference orderings, hence each file contains as many entries as reviewers.

    Consists of 3 data files.

  • Project Bidding Data

    00038

    Matching

    This dataset contains bids of students over a set of projects for student/project allocations at the School of Computing Science, University of Glasgow. Each project is supervised by an individual each with a maximum capacity of supervision. There are 8 years worth of data in this set and with between 31 and 51 students and 56 and 155 projects. This data was kindly donated by David Manlove who collected this data.

    In addition to the strict and incomplete preference profiles of the students we have extended the profiles with all unranked items tied at the end. We have also posted .dat files containing the supervisor identifiers and capacities. The format for the .dat files is Supervisor ID, Capacity, Projects; where Projects is a space separated list of the projects supervised by the Supervisor. Each project has a capacity of 1 while each supervisor has a variable capacity. In academic sessions 2007-08 and 2008-09 there were no supervisor capacities in force, thus the projects and supervisors are in 1-1 correspondence.

    Consists of 8 data files.

  • AAMAS Bidding Data

    00037

    Matching

    This dataset contains the bids of reviewers over papers from the 2015, 2016 and 2021 Autonomous Agents and Multiagent Systems Conference.

    For the years 2015 and 2016, inclusion in these data sets were explicitly opt-in; 2015 contains 9,817 bids of 201 reviewers over 613 papers; this represents about 40% of the actual 22,360 bids of 281 reviewers over 670 papers. The 2016 data contains 161 out of 393 reviewers with bids over 442 out of 550 papers. For the year 2021, 526 submissions, 71 SPC members, and 596 regular PC members passed the checks (not opting-out, etc...).

    The bidding language for these conferences is yes/maybe/no/conflict. In order to make these more useful for PreLib users, we have converted them to categorical data of the form {yes} > {maybe} > {no response} > {no} > {conflict}. Note that not all years have the same categories. We are deeply grateful to the IFAAMAS board and Rafael Bordini, Edith Elkind, John Thangarajah, and David Shield for approving, coordinating, and providing this dataset. The 2021 data has been generously provided by Ulle Endriss.

    Consists of 3 data files.

  • Cities Survey

    00034

    Election

    This dataset contains noisy input from two surveys, one about cost of living and one about population, of 392 individuals over 36 alternatives for cost of living and 48 alternatives for population. Each individual provided a ranking of six given cities in terms of cost of living and a ranking of six countries in terms of population.

    The data were collected among participants of the 3rd PatrasIQ research and technology exhibition, in Patras, Greece in April 2016. We received input from 392 volunteers; each of them was given a random bundle of six cities (from a pool of 36) and a random bundle of six countries (from a pool of 48), and was asked to give a strict ranking of the given cities and countries in terms of his/her estimation about their cost of living indices and population (in decreasing order), respectively.

    In the cost of living treatment each city appears in at least 57 and at most 70 bundles/votes. The alternative ids define a ground truth, i.e., a strict ranking of all 36 cities according to cost of living index data retrieved from numbeo.com in April 2016. In the population treatment Each country appears in at least 47 and at most 52 bundles/votes. The alternative ids define a ground truth, i.e., a strict ranking of all 48 countries according to population data retrieved from wikipedia.org in April 2016.

    The data on this page has been donated by Iannis Caragiannis.

    Consists of 2 data files.

  • San Sebastian Poster Competition

    00033

    Election

    Approval Ballots from the San Sebastian Poster Competition held during The Summer School on Computational Social Choice organized by COST Action IC1205 at the Miramar Palace in San Sebastian in July 2016. This set has two elections of approval ballots with 17 alternatives and about 60 voters each. The data on this page was donated by Ulle Endriss.

    Two elections were held, using approval voting. In the first election the alternatives were posters A1-A17; in the second election the alternatives were posters B1-B17. There were 67 eligible voters (56 summer school participants, including the 34 poster presenters, as well as 7 lecturers and 4 organizers). Of these, 65 voters participated in the first election and 60 voters participated in the second election (1 voter did not vote in either election). The elections were conducted using the Whale3 system of Sylvain Bouveret. Most of the posters are available at the summer school website.

    The original data file (00033-00000001.dat) includes one column per poster. Each of the two sets of posters is ordered by the number of approvals received. Each row corresponds to a voter. The voters are ordered by the number of approvals they have given across both elections, except that the 7 voters who only participated in one of the two elections are listed last. The other files are converted into standard PrefLib format where all approved alternatives are considered a tied equivalence class.

    Consists of 3 data files.

  • Education Surveys in Informatics (Cujae)

    00032

    Election

    This dataset contains the results of surveying students and professors in the Faculty of Informatics, Instituto Superior Politecnico Jose Antonio Echeverria (Cujae, Havana, Cuba) about their preferences on courses and the most important aspects affecting their performance as students and professionals. Answers include ties and missing elements. These surveys, conducted in 2015, include criteria about different numbers of aspects (6 to 32 candidates) and 13 courses.

    This dataset was donated by Alejandro Rosete Suarez and Milton Garcia Borroto and may be augmented with new surveys in the future.

    Consists of 10 data files.

  • Vermont District Races

    00031

    Election Politics

    This dataset contains votes for 15 different races for various public offices held in Vermont in 2014. This data was collected and donated by Jeremy A. Hansen. There are 3 to 6 candidates and 532 to 1960 voters in these data files. Not all races were competitive so not every race is reported for every district.

    Consists of 15 data files.

  • UK Labor Party Leadership Vote

    00030

    Election Politics

    The 2010 UK Labor Party Leadership Vote is posted at www.rangevoting.org. This set contains the votes cast by all 266 MPs over the 5 leadership candidates. The votes are incomplete strict orders which we have posted along with extensions placing all unranked candidates tied at the end and pairwise graphs.

    Consists of 1 data file.

  • Proto French Election Ratings

    00029

    Combinatorial Politics

    This analog dataset to the 2002 French Presidential Election Dataset was collected by Jean-Francois Laslier, Karine Van der Straeten and Michel Balinski. It consists of 398 approval ballots and subjective ratings on a 20 point scale collected over potential candidates for the 2002 French Presidential election cast by students at Institut d’Etudes Politiques de Paris.

    This dataset preserves both the approval ballots and the subjective ratings of the candidates by each of the voters. The Approvals are coded as either a 1.0 for approved or a 0.0 for not approved. The subjective ratings are on 20 point scale where a score of -1.0 is when no input was provided (as compared to a rating of 0.0, the lowest possible).

    Consists of 1 data file.

  • APA Election Data

    00028

    Election

    This dataset contains the results of the elections of the American Psychological Association between 1998 - 2009. The voters are allowed to rank any number of the 5 candidates without ties. Each of these elections have 5 candidates and between 13,318 and 20,239 voters.

    These data were donated by Michal Regenwetter and Anna Popova from the University of Illinois at Urbana-Champaign. The work that analyzed this data was supported by National Science Foundation grants SES # 08-20009, ICES # 1216016 (PI: M. Regenwetter), the University Library at the University of Illinois at Urbana-Champaign (PI: A. Popova), and the Basic Research Program at HSE (S. Popov). We thank the American Psychological Association for permitting access to its election ballot data.

    Consists of 12 data files.

  • Proto French Election

    00027

    Election Politics

    This analog dataset to the 2002 French Presidential Election Dataset was collected by Jean-Francois Laslier, Karine Van der Straeten and Michel Balinski. It consists of 398 approval ballots collected over potential candidates for the 2002 French Presidential election cast by students at Institut d'Etudes Politiques de Paris.

    This dataset is interesting as its companion dataset Proto French Election Ratings has both the subjective evaluations of the candidates, along with the approvals. This dataset only preserves the approval ballots cast by the students. As the candidate set is the potential presidential candidates (and thus, not the exact set used in ED-00026), this is presented as a separate dataset.

    Consists of 1 data file.

  • 2002 French Presidental Election

    00026

    Election Politics

    The 2002 French Presidental Election Dataset was collected by Jean-Francois Laslier and Karine Van der Straeten. It consists of 2,597 approval ballots collected in parallel to the actual election in 6 different districts in France.

    The approval votes were collected at a set of polling stations in France during the first round of voting in the 2002 French National Election. Voters in these districts were informed prior to the election that they would have the ability to cast an approval ballot along with their normal ballot for the election. Overall, over 75% of those who turned up to vote participated in the experiment. Each of the files represent one district voting on the same election. There are between 367 and 476 voters (2,597 in all) and 16 candidates. Additional details the method used to collect the data and results of analysis can be found in the required citation for the use of this dataset.

    Consists of 6 data files.

  • Mechanical Turk Puzzle

    00025

    Election

    The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 793-797 voters over 4 candidates.

    Each of the candidates correspond to an instance of the sliding puzzle game presented to a user on Mechanical Turk, who is asked to rank the items from those in a position closest to solution (first) to those requiring the most moves to complete (last). Thus, for all of these data sets there is a ground truth ranking which corresponds to the candidate names in sorted order. In the Puzzle task, each task contains elements requiring d, d+3, d+6, and d+9 moves to complete, where d = {5, 7, 9, 11}. This allows for more noise to be introduced to various iterations of the task. For each i, 40 sets of puzzles were placed on Mechanical Turk and were ranked by 20 users. As per the data owners request these 160 individual trails have been aggregated into a single file for each i. The individual trial runs are available upon request.

    Consists of 4 data files.

  • Mechanical Turk Dots

    00024

    Election

    The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 794-800 voters over 4 candidates.

    Each of the candidates correspond to random dots presented to a user on Mechanical Turk, who is asked to rank the items from those containing the least dots (first) to those containing the most dots (last). Thus, for all of these data sets there is a ground truth ranking which corresponds to the candidate names in sorted order. In the Dots task, each task contains elements with 200, 200+i, 200+2i, and 200+3i dots, where i = {3, 5, 7, 9}. This allows for more noise to be introduced to various iterations of the task. For each i, 40 sets of puzzles were placed on Mechanical Turk and were ranked by 20 users. As per the data owners request these 160 individual trails have been aggregated into a single file for each i. The individual trial runs are available upon request.

    Consists of 4 data files.

  • Takoma Park Election Data

    00023

    Election Politics STV

    The Takoma Park Data contains the results from the 2007 Takoma Park, WA special election for city council. The set contains one elections with between 4 canddiates and about 400 voters.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 1 data file.

  • San Leandro Election Data

    00022

    Election Politics STV

    The San Leandro data contains the results from several elections, including mayor and city council elections, held in San Leandro, CA between 2010 and 2012. The set contains 3 distinct elections with between 4 and 7 canddiates and about 25,000 voters each.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 3 data files.

  • San Francisco Election Data

    00021

    Election Politics STV

    The San Francsico data contains the results from several elections, including board of supervisors, district attorny, and mayoral elections, held in San Francisco, CA between 2008 and 2012. The set contains 14 distinct elections with between 4 and 25 canddiates and 18,000 and 195,000 voters.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 14 data files.

  • Pierce Election Data

    00020

    Election Politics STV

    The 2008 Pierce Data contains the results from several elections, including county executive, held in Pierce, WA in 2008. The set contains 4 distinct elections with between 4 and 7 canddiates and 40,000 and 300,000 voters.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 4 data files.

  • Oakland Election Data

    00019

    Election Politics STV

    The 2010 Oakland Data contains the results from the city council and mayoral elections held in Oakland, CA in 2010. The set contains 7 distinct elections with between 4 and 11 canddiates and 900 and 145,000 voters.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 7 data files.

  • Minneapolis Election Data

    00018

    Election Politics STV

    The 2009 Minneapolis Data contains the results from the election for the Parks and Rec Commissioner and Tax Assessor in Minneapolis, MN. The set contains about 30,000 votes over 7-400 candidates. The full data sets contain ballots along with write in candidates (Mikey Mouse and Yoda are well represented). The No Write In files contain the same votes removing any write-ins and modifying the votes accordingly.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 4 data files.

  • Berkley Election Data

    00017

    Election Politics STV

    The 2010 Berkley Data contains the results from a city council election (District 7) in Berkley, CA. The set contains about 4,000 votes over 4 candidates.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 1 data file.

  • Aspen Election Data

    00016

    Election Politics STV

    The 2009 Aspen Data contains the results from the mayoral and city council elections held in Aspen, CO in 2009. The data contains two different elections with about 2,500 votes each over 5 and 11 candidates.

    Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

    The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 2 data files.

  • Clean Web Search

    00015

    Election

    This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

    These data files differ from the other set of web data in that these files are forced to be complete. This means that the results are restricted to only those candidates (sites) that appear in all three datasets. The data files marked big contain around 200 (max 242) candidates each while the data files marked small contain between 10 and 50 candidates. The search querys are shown in the names of the individual data files below. For the WebImpact files the number of search results for a particular term were used to creage a complete ranking over the search terms. These files measure the webimpact of various world cities and countries. We have extended this data into tournament graphs and weighted majoirty graphs.

    Consists of 79 data files.

  • Sushi Data

    00014

    Election

    This dataset contains the results of a series of surveys conducted by Toshihiro Kamishima asking 5000 individuals for their preferences about various kinds of sushi. There are three different datasets that were elicited in different ways:

    • Element Series 00000001 contains 10 complete strict rank orders of 10 different kinds of sushi.
    • Element Series 00000002 contains individual's strict rank ordering of 100 different kinds of sushi (candidates).
    • Element Series 00000003 contains individual's scoring of sushi items on a scale of 0-4, with repeats allowed.

    This dataset contains 14 files in total including soc, soi, toi, and toc files.

    Note that the dataset was incorrectly converted, it has been fixed as of Jan 2016, please re-download.

    Due to licence issues we require that you go through Toshihiro Kamishima website to obtain the datafiles and observe the following licence terms:

    • We involve Toshihiro Kamishima, his colleagues, and their employers. You involve the user of this data and his/her colleagues, and their employers. We are NOT liable for any damages or losses, arising out of or related to your use or inability to use this data set. You can use this data set for any research purpose. You must not redistribute without our permission. We would like you to acknowledge the use of these program codes or data sets in publications by citing one of our related publications, if you could.

    Consists of 3 data files.

  • T Shirt

    00012

    Election

    This dataset contains complete rank orderings of T-Shirt designs voted on by members of the Optimization Research Group at NICTA. There are 11 designs (candidates) and 30 votes about these deisgns. Voters were required to submit complete strict orders.

    This data has been kindly donated by Carleton Coffrin.

    Consists of 1 data file.

  • Web Search

    00011

    Election

    This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

    The data files marked big contain around 2000 candidates each while the data files marked small contain between 100 and 200 results. The search querys are shown in the names of the individual data files below. For the WebImpact files the number of search results for a particular term were used to creage a complete ranking over the search terms. These files measure the webimpact of various world cities and countries. The results are not complete and not every candidate (website) is ranked by all the voters (search engines). We have extended this data into tournament graphs, weighted majoirty graphs, and created a toc dataset where all candidates are tied, at the end of rankings.

    Consists of 77 data files.

  • Skiing Competitions

    00010

    Election Sport

    This dataset contains the Cross Country Skiing and Ski Jumping results from the 2006-2009 World Championships. This data is provided by Robert Bredereck at TU Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

    The results from each competition in the season provides a rank ordering over the candidates (competitiors). We have created a toc datafile where all candidates are tied, at the end of rankings.

    Note that this dataset used to contain the Formula 1 data. A larger set of the F1 data is now available in the 00052 and 00053 datasets.

    Consists of 2 data files.

  • Trip Advisor Data

    00040

    Combinatorial

    This dataset contains 675,069 reviews of 1,851 hotels across the world scraped from Trip Advisor. The data was scraped and donated by Hongning Wang.

    One file contains the numerical aspect ratings provided by the users, along with other information about the hotel. The other files contains the text of the users review (split into 3 files). These reviews have been slightly modified, all excess spaces and tabs have been removed and all commas have been changed to semi-colons.

    Both files are encoded in the dat format but are actually CSV files. The first line of each file explains the fields within the file. Some of the usernames are encoded in Unicode so please be careful when parsing the files!

    Consists of 4 data files.

  • Kidney Data

    00036

    Matching

    This dataset contains 310 instances of synthetic kidney donor pools. The data was generated using a state of the art donor pool generation method (described in Saidman et al., Increasing the opportunity of live kidney donation by matching for two-and three-way exchanges. Transplantation 81(5), 2006) and was donated by John Dickerson. John has recently posted his generation as well as his exchange solving code online; it is available here.

    The dataset consists of 10 randomly generated instances of kidney exchanges with 16, 32, 64, 128, 256, 512, 1024, 2048 patients and, as a percentage of the pool, altruists at 0%, 5%, 10%, and 15% for a total of 310 data files. The main components use the wmd data format. Each edge has a source and multiple destinations to represent the patients that can receive a kidney from the source. All edges have weight 1 unless they connect from a patient to an altruist (who does not need a kidney), which have weight 0.

    There is a dat file associated with each kidney exchange datafile. This file contains some extra fields that may be of interest to researchers. Specifically, the file contains the following files: Pair index number of the pair in the corresponding wmd file.; Patient the blood type of the person needing the kidney; Donor the blood type of the person donating the kidney; Wife-P? 1 if the person needing the kidney is the wife of the donor; %Pra denotes the panel reactive antibody level of the patient, discretized into three levels; Out-Deg the number of nodes in the wmd file that can receive a kidney from this donor; Altruist1 if the corresponding pair is an altruist.

    Consists of 310 data files.

  • Social Recommendation

    00013

    Combinatorial

    This dataset contains the Facebook Social Graph and full ratings of 16 restaurants and 23 pubs by 93 users.

    You can find anonymous versions of the social network and the items ratings. It includes three files:

    • links.csv - The is an edge list that contains the Facebook social friendship ties of all the participants. These links are undirected.
    • pubs.csv - The file contains the list of participants and their ratings for 23 Pubs.
    • rest.csv - The file contains the list of participants and their ratings for 16 Restaurants.

    Each line in the rating files (pubs.csv and rest.csv) represents a participant with the structure: userid,X1,...,Xn. The userid in these files corresponds with the ids in the links.csv file.

    The data on this page has been donated by Lihi Dery.

    Consists of 3 data files.

  • AGH Course Selection

    00009

    Election

    This dataset contains the results of surveying students at AGH University of Science and Technology about their course preferences. Each student provided a rank ordering over all the courses with no missing elements. There are 9 courses to choose from in 2003 and 7 in 2004.

    The data on this page has been donated by Piotr Faliszewski.

    Consists of 2 data files.

  • Glasgow City Council

    00008

    Election Politics STV

    This data set contains the results of the 2007 Glasgow City Council elections, seperated by Ward. There are 21 wards, each with different candidates and voters. These files report the results of all the Ward level elections which were origionally held under STV. In this data set there is a maximum of 13 candidates and a minimum of 8 candidates. The maximum number of voters is 12,744 and the minimum is 5,199.

    The data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 21 data files.

  • Electoral Reform Society (ERS) Data

    00007

    Election

    This dataset contains the results of 86 separate elections of various elections held by non-profit organizations, trade unions, and professional organizations. They were originally donated by Nicolaus Tideman who secured NSF funding to have the ballots tabulated. The ballots are from elections held under various voting rules requiring incomplete strict orders. The tabulated results were initially collected by the Electoral Reform Society in the UK in order to support the adoption of STV and other range voting methods.

    The files contain vote records with a maximum of 29 candidates and as few as 3; the number of voters ranges from 9 to 3419. The toc files have all unranked candidates tied, at the end of the order. Additionally, some of these are complete sets of ballots from the given elections and some are random samples from the set of all ballots.

    Consists of 87 data files.

  • Skate Data

    00006

    Election Sport

    This dataset contains figure skating rankings from various competitions during the 1998 season including the World Juniors, World Championships, and the Olympics. These data sets generally have 10-25 candidates (skaters) and 8-10 judges (voters).

    The candidates (skaters) are ordered such that the first candidate skated first, and on down the list. We have maintained this order as presented in the original versions of this dataset.

    Consists of 48 data files.

  • Burlington Election Data

    00005

    Election Politics STV

    The 2009 Burlington, Vermont Mayoral Election Data is posted online at www.rangevoting.org. It contains a number of interesting features when evaluated with the IRV method. Namely, the majority candidate in the first round does not emerge as the winner of the election.

    The 2006 Burlington, Vermont Mayoral data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 4 data files.

  • Netflix Prize Data

    00004

    Election

    The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this Netflix released real ratings about movies from the users of the system. Any set of movies can be transformed into an election via a process outlined by Mattei, Forshee, and Goldsmith (reference below).

    The data sets posted below correspond 100 random 3 and 4 candidate elections drawn from Data Set 1 in the paper , "An Empirical Study of Voting Rules and Manipulation with Large Datasets." The elements numbered 1 - 100 are all 3 candidate elections and the elements 101 - 201 are all 4 candidate elections.

    Consists of 200 data files.

  • Mariner Path Selection

    00003

    Election

    The Mariner Trajectory Selection Data Set is the votes cast by the various science teams responsible for selecting the trajectory for the 1977 interplanetary satellite. There were a total of 10 science teams voting over 32 possible paths. All these votes are complete but indifference was allowed between some of the objects.

    Consists of 1 data file.

  • Debian Project Data

    00002

    Election

    The Debian Project Leader Elections are held yearly with most of the ballots available online.

    We have captured several years of data below including the vote for the Debian logo. Some years there have been only a few candidate and we have omitted these years. The included data sets have between 4 and 9 candidates depending on instance and about 400 individual votes per instance.

    Consists of 8 data files.

  • Irish Election Data

    00001

    Election Politics STV

    The Dublin North, West, and Meath data sets contain a complete record of votes for two separate elections held in Dublin, Ireland in 2002. The votes were posted online but have since been removed.

    The data sets are not complete, they contain many partial votes over the candidate set. The North data set contains 43,942 votes over 12 candidates, the West data set contains 29,988 over 9 candidates, and the Meath set contains 64,081 votes over 14 candidates.

    The Meath data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

    Consists of 3 data files.