limitations · what the dataset doesn't do
Limitations
The methodology page covers how the dataset got built. This page is the other half: where it's rough, where it leans on author judgment, where reasonable people would code things differently, and what isn't there at all. A reader should know all of this before treating any specific number as load-bearing.
Author bias
Every position score in the current release was assigned by one person. That person reads widely on MENA politics in English, French, and Arabic, but has political views, blind spots, and reading preferences like anyone else. Where the author's reading aligns with the academic consensus, scores are likely close to where another careful coder would put them. Where it doesn't — and the author tries to flag these in entity descriptions — scores reflect a particular reading.
The next round of scoring will be document-grounded rather than author-grounded, and inter-rater agreement will be reported axis by axis. Until then: treat scores as starting estimates that a careful reader can improve on.
English-source weight
Citations skew toward English-language reference material — mostly Wikipedia, English-language analysis from places like Carnegie, Brookings, and the BBC. Arabic and Hebrew primary sources are reachable through the entity descriptions, but the citation field itself usually points at the English summary. For an entity like the National Pact in Lebanon or the Iraqi Constitution, the English Wikipedia article isn't the primary source — it's a convenient pointer to the primary source. The next round of citation work will diversify.
The choice of axes is editorial
20 axes is a lot; it isn't all. There are reasonable axes the dataset doesn't have: stance on normalization with Israel as its own dimension (currently split across regional-stance and Palestinian question); attitude toward the GCC bloc; economic development model (rentier vs. productive economy); ethnic-particularism vs. civic-citizenship within multi-ethnic states; environmental and climate policy. Each of these is a real cleavage in MENA politics. They didn't make the cut for reasons of curation labor, not because they don't matter.
Inside each axis, the rubric is also a choice. The democracy axis collapses electoral democracy, judicial independence, freedom of dissent, and peaceful transfer of power into one number. Reasonable coders disagree about whether they should be split.
Categorical schemes lose nuance
The family-tag system carries about a dozen values — four Islamist sub-families (Sunni-electoral, Salafi, Shia, militant), two leftist, two nationalist, Jewish- and Christian-religious, and secular-liberal. It is finer than a single "Islamist" bucket — the Egyptian Brotherhood, the Tunisian Ennahda, Hamas, and Hezbollah land in different sub-families — but it is still a simplification: any one tag flattens real variation, and a few actors fit two boxes at once. Coalition role has eight tiers, which is better than yes/no but still doesn't capture Hezbollah's coalition-partner-with-paramilitary-autonomy or the IDF's relationship with the Israeli executive. Where the taxonomy forces a bad fit, the entity description tries to flag it.
Lens system covers 20 parties of 98
The compass has a lens toggle that switches between composite, declared, and behavioral views. For the 20 parties where rhetoric and record meaningfully diverge on at least one axis, the toggle does something — and the gap is hand-coded with confidence noted per row. For the other 78 parties, declared and behavioral fall back to the composite score. The system is wired; it just isn't fully scored across the dataset yet.
No time series on scores
Each party has one score per axis, reflecting roughly its position over the most recent decade weighted by salience. There's no way today to ask "where was the Israeli Labor Party on economic in 1990 vs. 2015?" or "did Ennahda's stance on state-religion shift after 2011?" Party drift over time is a real question; the dataset can't answer it yet. Events on the Pulse feed give some of the timing context, but they're separate from the score layer.
Country coverage is uneven
Tier-1 countries get full party-and-position coverage. Tier-2 countries (Saudi Arabia, Kuwait, Qatar, UAE, Oman) get politicians and country metadata only — because they don't host formal political parties, not because of a curation gap. That's the honest data: a country page that shows "no parties" reflects the actual political space.
Among Tier-1 countries, party rosters are not parliamentary censuses. They're a chosen anchor set — typically the parties that dominate academic and journalistic coverage of each country. Yemen and Libya are thinner than Israel and Lebanon because their political landscapes are thinner (and more militia-organized than party-organized).
Confidence ratings are approximate
Each score carries a confidence in the 0–1 range. These were assigned by the author and are themselves a rough self-assessment. A confidence of 0.6 should be read as "I'm reasonably sure but a careful coder could push this by ±2." A confidence of 0.4 should be read as "this is my best reading; another reader could land somewhere different." The ratings order well — higher means more confident, lower means less — but the specific values don't have a probabilistic interpretation.
Events have a confidence tier; bills don't
The Pulse feed flags events as confirmed, reported, rumored, or speculative. The bills surface assumes everything is confirmed; there's no equivalent tier for "rumored constitutional amendment" or "expected anti-corruption decree." For now, anything in the bills catalog has actually happened. If a planned but not-yet-enacted bill becomes worth tracking, the schema will need a confidence column added.
Hebrew and Arabic name coverage
Most entities have English, Arabic, and where applicable Hebrew names. A handful of historical figures and a few smaller parties have partial coverage. Where a Hebrew or Arabic name is missing, it usually means the author wasn't confident about the exact rendering — better to leave it blank than to introduce a transliteration the entity itself wouldn't use.
What this list is and isn't
Listing limitations isn't a hedge. It's the same impulse as citing sources: readers should know what they're looking at. If you find a limitation that should be on this list and isn't, tell me and it goes here. Limitations are part of the dataset, not a footnote.