academic · dataset overview
Tayyar dataset
A position dataset for MENA political actors. 98 parties and 274 politicians across 20 countries, scored on 16 axes. Every fact carries an external source citation; the entire dataset is exported in full as CSV and JSON, and the code that produces it will be open-sourced under MIT. This page is the canonical citable surface — the cards below summarize what's in the dataset, the methodology buttons below jump to the deep dives.
What's included
- Field-level source citations. Each fact about each entity links back to where it came from — founding year cites a different source than current leader, which cites a different source than legal status. The coverage page tracks the rollup.
- 16 calibrated axes. Economic, social, state-religion, democracy, west-alignment, regional-stance, Palestinian question, civil liberties, regime stance, pan-Arab, federalism, modernization, gender, iran-posture, press-freedom, sectarianism. Each one comes with a scoring rubric and concrete MENA examples anchored along the scale.
- Richer status than yes/no. Parties carry a government role (lead, coalition major / minor, confidence-and-supply, opposition major / minor, extra-parliamentary, banned) and a legal status (legal, restricted, outlawed, dissolved, merged away). Opposition and independent flags sit alongside.
- Declared vs. behavioral on key cases. Parties like Hezbollah and Hamas read as more committed to democracy in what they declare than in what they do; 20 parties carry a rhetoric-vs-record gap that is itself the finding. The home compass has a lens toggle to switch between views.
- 291 primary-source documents and 101 verified quotes. The document corpus carries verbatim manifestos, charters, parliamentary speeches, and UN addresses with country / party / politician attribution. The quote corpus drives Who-said-it and the "On the record" sections on every party / politician page.
- A semi-live event feed. Pulse tracks recent political shifts with a confidence rating — confirmed, reported, rumored, speculative — so in-flux developments (party formations, merger talks) can sit alongside confirmed events without being conflated. Subscribable as RSS.
- Free downloads. Every table is available as CSV or JSON at /data. No API key, no rate limit.
Cite as
Cite the paper. The dataset is in active development; a tagged snapshot pins a citation that won't shift as it evolves, and a public repository and permanent DOI are in preparation.
How to cite 1 reference
Paper 1 Gara, T. (2026). The Model as One Rater Among Several: Measuring Political Positions in Data-Sparse Regions with a Language-Model Panel. Preprint; arXiv ID forthcoming.
Show BibTeX
@unpublished{gara_tayyar_2026,
author = {Gara, Tarek},
title = {The Model as One Rater Among Several: Measuring Political Positions in Data-Sparse Regions with a Language-Model Panel},
year = {2026},
note = {Preprint; arXiv ID forthcoming},
url = {https://tarekgara.com/tayyar/paper}
} Gara, T. (2026). The Model as One Rater Among Several: Measuring Political Positions in Data-Sparse Regions with a Language-Model Panel [Preprint]. https://tarekgara.com/tayyar/paper
Where to read more
- Methodology — how the dataset got built and where it falls short
- Findings — the structural patterns the data shows
- Coverage — verification status, country breakdowns, special-status leaderboards
- Axes catalog — all 16 axes with correlations and per-axis stats
Downloads
Every table exportable as CSV or JSON. No auth required.
What's coming next
Changes are tracked as versioned snapshots; the repository will be open-sourced under MIT on publication. The roadmap, in the order it'll land:
- Document-grounded scoring. Positions derived from reading party platforms, speeches, and voting records — with the specific passages cited. The hand-coded scores stay as the baseline; the document-grounded ones replace them as they're produced.
- Inter-rater agreement. Cohen's κ between hand-coded and document-grounded scores reported on methodology. Where they agree, the rubric's doing its job; where they don't, that's a finding worth writing up.
- Lens system at scale. Declared / behavioral / perceived rows generated for every party, not just the hand-coded marquee cases. The compass lens toggle then carries information across the whole dataset.
- Second-pass verification. Each fact and each position score reviewed against primary sources by someone other than the author.